616 Commits

Author SHA1 Message Date
d7ab464176 Offload the current model when locking if it is already partially loaded and we have insufficient VRAM. 2025-01-07 02:53:44 +00:00
5b42b7bd45 Add a utility to help with determining the working memory required for expensive operations. 2025-01-07 01:20:15 +00:00
b343f81644 Use torch.cuda.memory_allocated() rather than torch.cuda.memory_reserved() to be more conservative in setting dynamic VRAM cache limits. 2025-01-07 01:20:15 +00:00
fc4a22fe78 Allow expensive operations to request more working memory. 2025-01-07 01:20:13 +00:00
a167632f09 Calculate model cache size limits dynamically based on the available RAM / VRAM. 2025-01-07 01:14:20 +00:00
6a9de1fcf3 Change definition of VRAM in use for the ModelCache from sum of model weights to the total torch.cuda.memory_allocated(). 2025-01-07 00:31:53 +00:00
e5180c4e6b Add get_effective_device(...) utility to aid in determining the effective device of models that are partially loaded. 2025-01-07 00:31:00 +00:00
1b7bb70bde Improve handling of cases when application code modifies the size of a model after registering it with the model cache. 2025-01-07 00:31:00 +00:00
7127040c3a Remove unused function set_nested_attr(...). 2025-01-07 00:31:00 +00:00
ceb2498a67 Add log prefix to model cache logs. 2025-01-07 00:31:00 +00:00
d0bfa019be Add 'enable_partial_loading' config flag. 2025-01-07 00:31:00 +00:00
535e45cedf First pass at adding partial loading support to the ModelCache. 2025-01-07 00:30:58 +00:00
c579a218ef Allow models to be locked in VRAM, even if they have been dropped from the RAM cache (related: https://github.com/invoke-ai/InvokeAI/issues/7513). 2025-01-06 23:02:52 +00:00
8b4b0ff0cf Fix bug in CustomConv1d and CustomConv2d patch calculations. 2024-12-29 19:10:19 +00:00
a8bef59699 First pass at making custom layer patches work with weights streamed from the CPU to the GPU. 2024-12-29 17:01:37 +00:00
6d49ee839c Switch the LayerPatcher to use 'custom modules' to manage layer patching. 2024-12-29 01:18:30 +00:00
0525f967c2 Fix the _autocast_forward_with_patches() function for CustomConv1d and CustomConv2d. 2024-12-29 00:22:37 +00:00
2855bb6b41 Update BaseLayerPatch.get_parameters(...) to accept a dict of orig_parameters rather than orig_module. This will enable compatibility between patching and cpu->gpu streaming. 2024-12-28 21:12:53 +00:00
20acfc9a00 Raise in CustomEmbedding and CustomGroupNorm if a patch is applied. 2024-12-28 20:49:17 +00:00
918f541af8 Add unit test for a SetParameterLayer patch applied to a CustomFluxRMSNorm layer. 2024-12-28 20:44:48 +00:00
93e76b61d6 Add CustomFluxRMSNorm layer. 2024-12-28 20:33:38 +00:00
f692e217ea Add patch support to CustomConv1d and CustomConv2d (no unit tests yet). 2024-12-27 22:23:17 +00:00
f2981979f9 Get custom layer patches working with all quantized linear layer types. 2024-12-27 22:00:22 +00:00
ef970a1cdc Add support for FluxControlLoRALayer in CustomLinear layers and add a unit test for it. 2024-12-27 21:00:47 +00:00
e24e386a27 Add support for patches to CustomModuleMixin and add a single unit test (more to come). 2024-12-27 18:57:13 +00:00
b06d61e3c0 Improve custom layer wrap/unwrap logic. 2024-12-27 16:29:48 +00:00
7d6ab0ceb2 Add a CustomModuleMixin class with a flag for enabling/disabling autocasting (since it incurs some runtime speed overhead.) 2024-12-26 20:08:30 +00:00
987c9ae076 Move custom autocast modules to separate files in a custom_modules/ directory. 2024-12-24 22:21:31 +00:00
7214d4969b Workaround a weird quirk of QuantState.to() and add a unit test to exercise it. 2024-12-24 14:32:11 +00:00
f8a6accf8a Fix bitsandbytes imports to avoid ImportErrors on MacOS. 2024-12-24 14:32:11 +00:00
f8ab414f99 Add CachedModelOnlyFullLoad to mirror the CachedModelWithPartialLoad for models that cannot or should not be partially loaded. 2024-12-24 14:32:11 +00:00
c6795a1b47 Make CachedModelWithPartialLoad work with models that have non-persistent buffers. 2024-12-24 14:32:11 +00:00
0a8fc74ae9 Add CachedModelWithPartialLoad to manage partially-loaded models using the new autocast modules. 2024-12-24 14:32:11 +00:00
dc54e8763b Add CustomInvokeLinearNF4 to enable CPU -> GPU streaming for InvokeLinearNF4 layers. 2024-12-24 14:32:11 +00:00
1b56020876 Add CustomInvokeLinear8bitLt layer for device streaming with InvokeLinear8bitLt layers. 2024-12-24 14:32:11 +00:00
fe0ef2c27c Add torch module autocast utilities. 2024-12-24 14:32:11 +00:00
55b13c1da3 (minor) Add TODO comment regarding the location of get_model_cache_key(). 2024-12-24 14:23:19 +00:00
7dc3e0fdbe Get rid of ModelLocker. It was an unnecessary layer of indirection. 2024-12-24 14:23:18 +00:00
a39bcf7e85 Move lock(...) and unlock(...) logic from ModelLocker to the ModelCache and make a bunch of ModelCache properties/methods private. 2024-12-24 14:23:18 +00:00
a7c72992a6 Pull get_model_cache_key(...) out of ModelCache. The ModelCache should not be concerned with implementation details like the submodel_type. 2024-12-24 14:23:18 +00:00
d30a9ced38 Rename model_cache_default.py -> model_cache.py. 2024-12-24 14:23:18 +00:00
e0bfa6157b Remove ModelCacheBase. 2024-12-24 14:23:18 +00:00
83ea6420e2 Move CacheStats to its own file. 2024-12-24 14:23:18 +00:00
ce11a1952e Move CacheRecord out to its own file. 2024-12-24 14:23:18 +00:00
e48dee4c4a Rip out ModelLockerBase. 2024-12-24 14:23:18 +00:00
0c2f96daf1 add probe for ControlLoRA x diffusers 2024-12-17 14:01:41 -05:00
c9b2cce627 Add diffusers config object for control loras 2024-12-17 14:01:41 -05:00
401fb392b8 add FLUX control loras to starter models 2024-12-17 09:29:21 -05:00
7fad4c9491 Rename LoRAModelRaw to ModelPatchRaw. 2024-12-17 13:20:19 +00:00
41664f88db Rename backend/patches/conversions/ to backend/patches/lora_conversions/ 2024-12-17 13:20:19 +00:00