mirror of https://github.com/invoke-ai/InvokeAI synced 2024-08-30 20:32:17 +00:00

History

psychedelicious f19c6069a9 fix(backend): handle systems with `glibc` < 2.33 `mallinfo2` is not available on `glibc` < 2.33. On these systems, we successfully load the library but get an `AttributeError` on attempting to access `mallinfo2`. I'm not sure if the old `mallinfo` will work, and not sure how to install it safely to test, so for now we just handle the `AttributeError`. This means the enhanced memory snapshot logic will be skipped for these systems, which isn't a big deal.		2023-10-14 09:00:11 -04:00
..
models	Add support for T2I-Adapter in node workflows (#4612 )	2023-10-05 16:29:16 +11:00
__init__.py	isort wip 3	2023-09-12 13:01:58 -04:00
convert_ckpt_to_diffusers.py	enable v_prediction for sd-1 models	2023-09-24 12:22:29 -04:00
libc_util.py	(minor) clean up typos.	2023-10-03 15:00:03 -04:00
lora.py	isort wip 3	2023-09-12 13:01:58 -04:00
memory_snapshot.py	fix(backend): handle systems with `glibc` < 2.33	2023-10-14 09:00:11 -04:00
model_cache.py	Demote model cache logs from warning to debug based on the conversation here: https://discord.com/channels/1020123559063990373/1049495067846524939/1161647290189090816	2023-10-11 12:02:46 -04:00
model_load_optimizations.py	Fix bug in skip_torch_weight_init() where the original behavior of torch.nn.Conv*d modules wasn't being restored correctly.	2023-10-10 10:05:50 -04:00
model_manager.py	Initial (barely) working version of IP-Adapter model management.	2023-09-13 08:27:24 -04:00
model_merge.py	isort wip 3	2023-09-12 13:01:58 -04:00
model_probe.py	Add support for T2I-Adapter in node workflows (#4612 )	2023-10-05 16:29:16 +11:00
model_search.py	fix probing for ip_adapter folders	2023-09-23 22:32:03 -04:00
README.md	Add README with info about glib memory fragmentation caused by the model cache.	2023-10-03 14:25:34 -04:00
seamless.py	chore: seamless print statement cleanup	2023-08-29 13:09:30 +12:00
util.py	Format by black	2023-08-11 03:20:56 +03:00

README.md

Model Cache

`glibc` Memory Allocator Fragmentation

Python (and PyTorch) relies on the memory allocator from the C Standard Library (libc). On linux, with the GNU C Standard Library implementation (glibc), our memory access patterns have been observed to cause severe memory fragmentation. This fragmentation results in large amounts of memory that has been freed but can't be released back to the OS. Loading models from disk and moving them between CPU/CUDA seem to be the operations that contribute most to the fragmentation. This memory fragmentation issue can result in OOM crashes during frequent model switching, even if max_cache_size is set to a reasonable value (e.g. a OOM crash with max_cache_size=16 on a system with 32GB of RAM).

This problem may also exist on other OSes, and other libc implementations. But, at the time of writing, it has only been investigated on linux with glibc.

To better understand how the glibc memory allocator works, see these references:

Note the differences between memory allocated as chunks in an arena vs. memory allocated with mmap. Under glibc's default configuration, most model tensors get allocated as chunks in an arena making them vulnerable to the problem of fragmentation.

We can work around this memory fragmentation issue by setting the following env var:

# Force blocks >1MB to be allocated with `mmap` so that they are released to the system immediately when they are freed.
MALLOC_MMAP_THRESHOLD_=1048576

See the following references for more information about the malloc tunable parameters:

The model cache emits debug logs that provide visibility into the state of the libc memory allocator. See the LibcUtil class for more info on how these libc malloc stats are collected.

README.md

Model Cache

glibc Memory Allocator Fragmentation

`glibc` Memory Allocator Fragmentation