From d64e17e04350484633920320fc77a51e035d3032 Mon Sep 17 00:00:00 2001 From: Ryan Dick Date: Tue, 3 Oct 2023 13:36:46 -0400 Subject: [PATCH] Add README with info about glib memory fragmentation caused by the model cache. --- invokeai/backend/model_management/README.md | 27 +++++++++++++++++++++ 1 file changed, 27 insertions(+) create mode 100644 invokeai/backend/model_management/README.md diff --git a/invokeai/backend/model_management/README.md b/invokeai/backend/model_management/README.md new file mode 100644 index 0000000000..0d94f39642 --- /dev/null +++ b/invokeai/backend/model_management/README.md @@ -0,0 +1,27 @@ +# Model Cache + +## `glibc` Memory Allocator Fragmentation + +Python (and PyTorch) relies on the memory allocator from the C Standard Library (`libc`). On linux, with the GNU C Standard Library implementation (`glibc`), our memory access patterns have been observed to cause severe memory fragmentation. This fragmentation results in large amounts of memory that has been freed but can't be released back to the OS. Loading models from disk and moving them between CPU/CUDA seem to be the operations that contribute most to the fragmentation. This memory fragmentation issue can result in OOM crashes during frequent model switching, even if `max_cache_size` is set to a reasonable value (e.g. a OOM crash with `max_cache_size=16` on a system with 32GB of RAM). + +This problem may also exist on other OSes, and other `libc` implementations. But, at the time of writing, it has only been investigated on linux with `glibc`. + +To better understand how the `glibc` memory allocator works, see these references: +- Basics: https://www.gnu.org/software/libc/manual/html_node/The-GNU-Allocator.html +- Details: https://sourceware.org/glibc/wiki/MallocInternals + +Note the differences between memory allocated as chunks in an arena vs. memory allocated with `mmap`. Under `glibc`'s default configuration, most model tensors get allocated as chunks in an arena making them vulnerable to the problem of fragmentation. + +We can work around this memory fragmentation issue by setting the following env var: + +```bash +# Force blocks >1MB to be allocated with `mmap` so that they are released to the system immediately when they are freed. +MALLOC_MMAP_THRESHOLD_=1048576 +``` + +See the following references for more information about the `malloc` tunable parameters: +- https://www.gnu.org/software/libc/manual/html_node/Malloc-Tunable-Parameters.html +- https://www.gnu.org/software/libc/manual/html_node/Memory-Allocation-Tunables.html +- https://man7.org/linux/man-pages/man3/mallopt.3.html + +The model cache emits debug logs that provide visibility into the state of the `libc` memory allocator. See the `LibcUtil` class for more info on how these `libc` malloc stats are collected.