diff options
Diffstat (limited to 'source/luametatex/source/libraries/mimalloc/readme.md')
-rw-r--r-- | source/luametatex/source/libraries/mimalloc/readme.md | 61 |
1 files changed, 33 insertions, 28 deletions
diff --git a/source/luametatex/source/libraries/mimalloc/readme.md b/source/luametatex/source/libraries/mimalloc/readme.md index 588630992..10195b026 100644 --- a/source/luametatex/source/libraries/mimalloc/readme.md +++ b/source/luametatex/source/libraries/mimalloc/readme.md @@ -12,8 +12,8 @@ is a general purpose allocator with excellent [performance](#performance) charac Initially developed by Daan Leijen for the run-time systems of the [Koka](https://koka-lang.github.io) and [Lean](https://github.com/leanprover/lean) languages. -Latest release tag: `v2.0.7` (2022-11-03). -Latest stable tag: `v1.7.7` (2022-11-03). +Latest release tag: `v2.0.9` (2022-12-23). +Latest stable tag: `v1.7.9` (2022-12-23). mimalloc is a drop-in replacement for `malloc` and can be used in other programs without code changes, for example, on dynamically linked ELF-based systems (Linux, BSD, etc.) you can use it as: @@ -27,6 +27,8 @@ It also has an easy way to override the default allocator in [Windows](#override to integrate and adapt in other projects. For runtime systems it provides hooks for a monotonic _heartbeat_ and deferred freeing (for bounded worst-case times with reference counting). + Partly due to its simplicity, mimalloc has been ported to many systems (Windows, macOS, + Linux, WASM, various BSD's, Haiku, MUSL, etc) and has excellent support for dynamic overriding. - __free list sharding__: instead of one big free list (per size class) we have many smaller lists per "mimalloc page" which reduces fragmentation and increases locality -- @@ -36,13 +38,13 @@ It also has an easy way to override the default allocator in [Windows](#override per mimalloc page, but for each page we have multiple free lists. In particular, there is one list for thread-local `free` operations, and another one for concurrent `free` operations. Free-ing from another thread can now be a single CAS without needing - sophisticated coordination between threads. Since there will be + sophisticated coordination between threads. Since there will be thousands of separate free lists, contention is naturally distributed over the heap, and the chance of contending on a single location will be low -- this is quite similar to randomized algorithms like skip lists where adding a random oracle removes the need for a more complex algorithm. - __eager page reset__: when a "page" becomes empty (with increased chance - due to free list sharding) the memory is marked to the OS as unused ("reset" or "purged") + due to free list sharding) the memory is marked to the OS as unused (reset or decommitted) reducing (real) memory pressure and fragmentation, especially in long running programs. - __secure__: _mimalloc_ can be built in secure mode, adding guard pages, @@ -50,20 +52,19 @@ It also has an easy way to override the default allocator in [Windows](#override heap vulnerabilities. The performance penalty is usually around 10% on average over our benchmarks. - __first-class heaps__: efficiently create and use multiple heaps to allocate across different regions. - A heap can be destroyed at once instead of deallocating each object separately. + A heap can be destroyed at once instead of deallocating each object separately. - __bounded__: it does not suffer from _blowup_ \[1\], has bounded worst-case allocation - times (_wcat_), bounded space overhead (~0.2% meta-data, with low internal fragmentation), - and has no internal points of contention using only atomic operations. + times (_wcat_) (upto OS primitives), bounded space overhead (~0.2% meta-data, with low + internal fragmentation), and has no internal points of contention using only atomic operations. - __fast__: In our benchmarks (see [below](#performance)), _mimalloc_ outperforms other leading allocators (_jemalloc_, _tcmalloc_, _Hoard_, etc), - and often uses less memory. A nice property - is that it does consistently well over a wide range of benchmarks. There is also good huge OS page - support for larger server programs. + and often uses less memory. A nice property is that it does consistently well over a wide range + of benchmarks. There is also good huge OS page support for larger server programs. The [documentation](https://microsoft.github.io/mimalloc) gives a full overview of the API. -You can read more on the design of _mimalloc_ in the [technical report](https://www.microsoft.com/en-us/research/publication/mimalloc-free-list-sharding-in-action) which also has detailed benchmark results. +You can read more on the design of _mimalloc_ in the [technical report](https://www.microsoft.com/en-us/research/publication/mimalloc-free-list-sharding-in-action) which also has detailed benchmark results. -Enjoy! +Enjoy! ### Branches @@ -77,8 +78,13 @@ Note: the `v2.x` version has a new algorithm for managing internal mimalloc page and fragmentation compared to mimalloc `v1.x` (especially for large workloads). Should otherwise have similar performance (see [below](#performance)); please report if you observe any significant performance regression. +* 2022-12-23, `v1.7.9`, `v2.0.9`: Supports building with asan and improved [Valgrind](#valgrind) support. + Support abitrary large alignments (in particular for `std::pmr` pools). + Added C++ STL allocators attached to a specific heap (thanks @vmarkovtsev). + Heap walks now visit all object (including huge objects). Support Windows nano server containers (by Johannes Schindelin,@dscho). Various small bug fixes. + * 2022-11-03, `v1.7.7`, `v2.0.7`: Initial support for [Valgrind](#valgrind) for leak testing and heap block overflow detection. Initial - support for attaching heaps to a speficic memory area (only in v2). Fix `realloc` behavior for zero size blocks, remove restriction to integral multiple of the alignment in `alloc_align`, improved aligned allocation performance, reduced contention with many threads on few processors (thank you @dposluns!), vs2022 support, support `pkg-config`, . + support for attaching heaps to a specific memory area (only in v2). Fix `realloc` behavior for zero size blocks, remove restriction to integral multiple of the alignment in `alloc_align`, improved aligned allocation performance, reduced contention with many threads on few processors (thank you @dposluns!), vs2022 support, support `pkg-config`, . * 2022-04-14, `v1.7.6`, `v2.0.6`: fix fallback path for aligned OS allocation on Windows, improve Windows aligned allocation even when compiling with older SDK's, fix dynamic overriding on macOS Monterey, fix MSVC C++ dynamic overriding, fix @@ -87,7 +93,7 @@ Note: the `v2.x` version has a new algorithm for managing internal mimalloc page * 2022-02-14, `v1.7.5`, `v2.0.5` (alpha): fix malloc override on Windows 11, fix compilation with musl, potentially reduced - committed memory, add `bin/minject` for Windows, + committed memory, add `bin/minject` for Windows, improved wasm support, faster aligned allocation, various small fixes. @@ -99,9 +105,9 @@ Note: the `v2.x` version has a new algorithm for managing internal mimalloc page thread_id on Android, prefer 2-6TiB area for aligned allocation to work better on pre-windows 8, various small fixes. * 2021-04-06, `v1.7.1`, `v2.0.1` (beta): fix bug in arena allocation for huge pages, improved aslr on large allocations, initial M1 support (still experimental). - + * 2021-01-31, `v2.0.0`: beta release 2.0: new slice algorithm for managing internal mimalloc pages. - + * 2021-01-31, `v1.7.0`: stable release 1.7: support explicit user provided memory regions, more precise statistics, improve macOS overriding, initial support for Apple M1, improved DragonFly support, faster memcpy on Windows, various small fixes. @@ -115,9 +121,9 @@ Special thanks to: memory model bugs using the [genMC] model checker. * Weipeng Liu (@pongba), Zhuowei Li, Junhua Wang, and Jakub Szymanski, for their early support of mimalloc and deployment at large scale services, leading to many improvements in the mimalloc algorithms for large workloads. -* Jason Gibson (@jasongibson) for exhaustive testing on large scale workloads and server environments, and finding complex bugs +* Jason Gibson (@jasongibson) for exhaustive testing on large scale workloads and server environments, and finding complex bugs in (early versions of) `mimalloc`. -* Manuel Pöter (@mpoeter) and Sam Gross(@colesbury) for finding an ABA concurrency issue in abandoned segment reclamation. Sam also created the [no GIL](https://github.com/colesbury/nogil) Python fork which +* Manuel Pöter (@mpoeter) and Sam Gross(@colesbury) for finding an ABA concurrency issue in abandoned segment reclamation. Sam also created the [no GIL](https://github.com/colesbury/nogil) Python fork which uses mimalloc internally. @@ -304,8 +310,8 @@ or via environment variables: of a thread to not allocate in the huge OS pages; this prevents threads that are short lived and allocate just a little to take up space in the huge OS page area (which cannot be reset). The huge pages are usually allocated evenly among NUMA nodes. - We can use `MIMALLOC_RESERVE_HUGE_OS_PAGES_AT=N` where `N` is the numa node (starting at 0) to allocate all - the huge pages at a specific numa node instead. + We can use `MIMALLOC_RESERVE_HUGE_OS_PAGES_AT=N` where `N` is the numa node (starting at 0) to allocate all + the huge pages at a specific numa node instead. Use caution when using `fork` in combination with either large or huge OS pages: on a fork, the OS uses copy-on-write for all pages in the original process including the huge OS pages. When any memory is now written in that area, the @@ -342,24 +348,24 @@ When _mimalloc_ is built using debug mode, various checks are done at runtime to ## Valgrind -Generally, we recommend using the standard allocator with the amazing [Valgrind] tool (and -also for other address sanitizers). -However, it is possible to build mimalloc with Valgrind support. This has a small performance -overhead but does allow detecting memory leaks and byte-precise buffer overflows directly on final +Generally, we recommend using the standard allocator with the amazing [Valgrind] tool (and +also for other address sanitizers). +However, it is possible to build mimalloc with Valgrind support. This has a small performance +overhead but does allow detecting memory leaks and byte-precise buffer overflows directly on final executables. To build with valgrind support, use the `MI_VALGRIND=ON` cmake option: ``` > cmake ../.. -DMI_VALGRIND=ON ``` -This can also be combined with secure mode or debug mode. +This can also be combined with secure mode or debug mode. You can then run your programs directly under valgrind: ``` > valgrind <myprogram> ``` -If you rely on overriding `malloc`/`free` by mimalloc (instead of using the `mi_malloc`/`mi_free` API directly), +If you rely on overriding `malloc`/`free` by mimalloc (instead of using the `mi_malloc`/`mi_free` API directly), you also need to tell `valgrind` to not intercept those calls itself, and use: ``` @@ -573,7 +579,7 @@ The _alloc-test_, by [OLogN Technologies AG](http://ithare.com/testing-memory-allocators-ptmalloc2-tcmalloc-hoard-jemalloc-while-trying-to-simulate-real-world-loads/), is a very allocation intensive benchmark doing millions of allocations in various size classes. The test is scaled such that when an allocator performs almost identically on _alloc-test1_ as _alloc-testN_ it -means that it scales linearly. +means that it scales linearly. The _sh6bench_ and _sh8bench_ benchmarks are developed by [MicroQuill](http://www.microquill.com/) as part of SmartHeap. @@ -754,4 +760,3 @@ free list encoding](https://github.com/microsoft/mimalloc/blob/783e3377f79ee82af * 2019-10-07, `v1.1.0`: stable release 1.1. * 2019-09-01, `v1.0.8`: pre-release 8: more robust windows dynamic overriding, initial huge page support. * 2019-08-10, `v1.0.6`: pre-release 6: various performance improvements. - |