docs: add missing MALLOC_MMAP_THRESHOLD_ docs

2024-08-30 20:32:17 +00:00 · 2024-03-25 18:08:55 +11:00
parent 6d366fb519
commit c77eff8500
2 changed files with 44 additions and 1 deletions
--- a/docs/features/CONFIGURATION.md
+++ b/docs/features/CONFIGURATION.md
@ -190,5 +190,48 @@ The `log_format` option provides several alternative formats:
 - `syslog` - the log level and error message only, allowing the syslog system to attach the time and date
 - `legacy` - a format similar to the one used by the legacy 2.3 InvokeAI releases.
 ### Model Cache
 #### `glibc` Memory Allocator Fragmentation
 Python (and PyTorch) relies on the memory allocator from the C Standard Library (`libc`). On linux, with the GNU C Standard Library implementation (`glibc`), our memory access patterns have been observed to cause severe memory fragmentation. This fragmentation results in large amounts of memory that has been freed but can't be released back to the OS. Loading models from disk and moving them between CPU/CUDA seem to be the operations that contribute most to the fragmentation. This memory fragmentation issue can result in OOM crashes during frequent model switching, even if `max_cache_size` is set to a reasonable value (e.g. a OOM crash with `max_cache_size=16` on a system with 32GB of RAM).
 This problem may also exist on other OSes, and other `libc` implementations. But, at the time of writing, it has only been investigated on linux with `glibc`.
 To better understand how the `glibc` memory allocator works, see these references:
 - Basics: <https://www.gnu.org/software/libc/manual/html_node/The-GNU-Allocator.html>
 - Details: <https://sourceware.org/glibc/wiki/MallocInternals>
 Note the differences between memory allocated as chunks in an arena vs. memory allocated with `mmap`. Under `glibc`'s default configuration, most model tensors get allocated as chunks in an arena making them vulnerable to the problem of fragmentation.
 ##### Workaround
 We can work around this memory fragmentation issue by setting the following env var:
 ```bash
 # Force blocks >1MB to be allocated with `mmap` so that they are released to the system immediately when they are freed.
 MALLOC_MMAP_THRESHOLD_=1048576
 ```
 If you use the `invoke.sh` launcher script, you do not need to set this env var, as we set it to `1048576` for you.
 ##### Manual Configuration
 In case the default value causes performance issues, you can pass `--malloc_threshold` to the `invoke.sh`:
 - Set the env var to a specific value: `./invoke.sh --malloc_threshold=0 # release _all_ blocks asap` or `./invoke.sh --malloc_threshold=16777216 # raise the limit to 16MB`
 - Unset the env var (let the OS handle the value dynamically, may create a memory leak): `./invoke.sh --malloc_threshold=unset`
 ##### Supplementary Light Reading
 See the following references for more information about the `malloc` tunable parameters:
 - <https://www.gnu.org/software/libc/manual/html_node/Malloc-Tunable-Parameters.html>
 - <https://www.gnu.org/software/libc/manual/html_node/Memory-Allocation-Tunables.html>
 - <https://man7.org/linux/man-pages/man3/mallopt.3.html>
 The model cache emits debug logs that provide visibility into the state of the `libc` memory allocator. See the `LibcUtil` class for more info on how these `libc` malloc stats are collected.
 [basic guide to yaml files]: https://circleci.com/blog/what-is-yaml-a-beginner-s-guide/
 [Model Marketplace API Keys]: #model-marketplace-api-keys
--- a/installer/templates/invoke.sh.in
+++ b/installer/templates/invoke.sh.in
@ -46,7 +46,7 @@ if [ "$(uname -s)" == "Darwin" ]; then
    export PYTORCH_ENABLE_MPS_FALLBACK=1
 fi
-# Avoid glibc memory fragmentation. See #6007, #4784 and invokeai/backend/model_management/README.md for details.
+# Avoid glibc memory fragmentation. See #6007, #4784 and docs/features/CONFIGURATION.md for details.
 # Some systems may need this to be set to a different value, so we may override this via command-line argument below.
 export MALLOC_MMAP_THRESHOLD_=1048576 # 1MB