|
52a8ad1c18
|
chore: rename model.size to model.file_size
to disambiguate from RAM size or pixel size
|
2025-04-10 09:53:03 +10:00 |
|
|
98260a8efc
|
test: add size field to test model configs
|
2025-04-10 09:53:03 +10:00 |
|
|
182580ff69
|
Imports
|
2025-03-26 12:55:10 +11:00 |
|
|
8e9d5c1187
|
Ruff formatting
|
2025-03-26 12:30:31 +11:00 |
|
|
99aac5870e
|
Remove star imports
|
2025-03-26 12:27:00 +11:00 |
|
|
5357d6e08e
|
Rename ConcatenatedLoRALayer to MergedLayerPatch. And other minor cleanup.
|
2025-01-28 14:51:35 +00:00 |
|
|
28514ba59a
|
Update ConcatenatedLoRALayer to work with all sub-layer types.
|
2025-01-28 14:51:35 +00:00 |
|
|
e2f05d0800
|
Add unit tests for LoKR patch layers. The new tests trigger a bug when LoKR layers are applied to BnB-quantized layers (also impacts several other LoRA variant types).
|
2025-01-22 09:20:40 +11:00 |
|
|
36a3869af0
|
Add keep_ram_copy_of_weights config option.
|
2025-01-16 15:35:25 +00:00 |
|
|
c76d08d1fd
|
Add keep_ram_copy option to CachedModelOnlyFullLoad.
|
2025-01-16 15:08:23 +00:00 |
|
|
04087c38ce
|
Add keep_ram_copy option to CachedModelWithPartialLoad.
|
2025-01-16 14:51:44 +00:00 |
|
|
974b4671b1
|
Deprecate the ram and vram configs to make the migration to dynamic
memory limits smoother for users who had previously overriden these
values.
|
2025-01-07 16:45:29 +00:00 |
|
|
d7ab464176
|
Offload the current model when locking if it is already partially loaded and we have insufficient VRAM.
|
2025-01-07 02:53:44 +00:00 |
|
|
5eafe1ec7a
|
Fix ModelCache execution device selection in unit tests.
|
2025-01-07 01:20:15 +00:00 |
|
|
a167632f09
|
Calculate model cache size limits dynamically based on the available RAM / VRAM.
|
2025-01-07 01:14:20 +00:00 |
|
|
402dd840a1
|
Add seed to flaky unit test.
|
2025-01-07 00:31:00 +00:00 |
|
|
d0bfa019be
|
Add 'enable_partial_loading' config flag.
|
2025-01-07 00:31:00 +00:00 |
|
|
535e45cedf
|
First pass at adding partial loading support to the ModelCache.
|
2025-01-07 00:30:58 +00:00 |
|
|
9a0a226ce1
|
Fix bitsandbytes imports in unit tests on MacOS.
|
2024-12-30 10:41:48 -05:00 |
|
|
52fc5a64d4
|
Add a unit test for a LoRA patch applied to a quantized linear layer with weights streamed from CPU to GPU.
|
2024-12-29 17:14:55 +00:00 |
|
|
a8bef59699
|
First pass at making custom layer patches work with weights streamed from the CPU to the GPU.
|
2024-12-29 17:01:37 +00:00 |
|
|
6d49ee839c
|
Switch the LayerPatcher to use 'custom modules' to manage layer patching.
|
2024-12-29 01:18:30 +00:00 |
|
|
918f541af8
|
Add unit test for a SetParameterLayer patch applied to a CustomFluxRMSNorm layer.
|
2024-12-28 20:44:48 +00:00 |
|
|
93e76b61d6
|
Add CustomFluxRMSNorm layer.
|
2024-12-28 20:33:38 +00:00 |
|
|
f2981979f9
|
Get custom layer patches working with all quantized linear layer types.
|
2024-12-27 22:00:22 +00:00 |
|
|
ef970a1cdc
|
Add support for FluxControlLoRALayer in CustomLinear layers and add a unit test for it.
|
2024-12-27 21:00:47 +00:00 |
|
|
5ee7405f97
|
Add more unit tests for custom module LoRA patching: multiple LoRAs and ConcatenatedLoRALayers.
|
2024-12-27 19:47:21 +00:00 |
|
|
e24e386a27
|
Add support for patches to CustomModuleMixin and add a single unit test (more to come).
|
2024-12-27 18:57:13 +00:00 |
|
|
b06d61e3c0
|
Improve custom layer wrap/unwrap logic.
|
2024-12-27 16:29:48 +00:00 |
|
|
7d6ab0ceb2
|
Add a CustomModuleMixin class with a flag for enabling/disabling autocasting (since it incurs some runtime speed overhead.)
|
2024-12-26 20:08:30 +00:00 |
|
|
9692a36dd6
|
Use a fixture to parameterize tests in test_all_custom_modules.py so that a fresh instance of the layer under test is initialized for each test.
|
2024-12-26 19:41:25 +00:00 |
|
|
b0b699a01f
|
Add unit test to test that isinstance(...) behaves as expected with custom module types.
|
2024-12-26 18:45:56 +00:00 |
|
|
a8b2c4c3d2
|
Add inference tests for all custom module types (i.e. to test autocasting from cpu to device).
|
2024-12-26 18:33:46 +00:00 |
|
|
03944191db
|
Split test_autocast_modules.py into separate test files to mirror the source file structure.
|
2024-12-24 22:29:11 +00:00 |
|
|
987c9ae076
|
Move custom autocast modules to separate files in a custom_modules/ directory.
|
2024-12-24 22:21:31 +00:00 |
|
|
0fc538734b
|
Skip flaky test when running on Github Actions, and further reduce peak unit test memory.
|
2024-12-24 14:32:11 +00:00 |
|
|
7214d4969b
|
Workaround a weird quirk of QuantState.to() and add a unit test to exercise it.
|
2024-12-24 14:32:11 +00:00 |
|
|
a83a999b79
|
Reduce peak memory used for unit tests.
|
2024-12-24 14:32:11 +00:00 |
|
|
f8a6accf8a
|
Fix bitsandbytes imports to avoid ImportErrors on MacOS.
|
2024-12-24 14:32:11 +00:00 |
|
|
f8ab414f99
|
Add CachedModelOnlyFullLoad to mirror the CachedModelWithPartialLoad for models that cannot or should not be partially loaded.
|
2024-12-24 14:32:11 +00:00 |
|
|
c6795a1b47
|
Make CachedModelWithPartialLoad work with models that have non-persistent buffers.
|
2024-12-24 14:32:11 +00:00 |
|
|
0a8fc74ae9
|
Add CachedModelWithPartialLoad to manage partially-loaded models using the new autocast modules.
|
2024-12-24 14:32:11 +00:00 |
|
|
dc54e8763b
|
Add CustomInvokeLinearNF4 to enable CPU -> GPU streaming for InvokeLinearNF4 layers.
|
2024-12-24 14:32:11 +00:00 |
|
|
1b56020876
|
Add CustomInvokeLinear8bitLt layer for device streaming with InvokeLinear8bitLt layers.
|
2024-12-24 14:32:11 +00:00 |
|
|
97d56f7dc9
|
Add torch module autocast unit test for GGUF-quantized models.
|
2024-12-24 14:32:11 +00:00 |
|
|
fe0ef2c27c
|
Add torch module autocast utilities.
|
2024-12-24 14:32:11 +00:00 |
|
|
d30a9ced38
|
Rename model_cache_default.py -> model_cache.py.
|
2024-12-24 14:23:18 +00:00 |
|
|
e0bfa6157b
|
Remove ModelCacheBase.
|
2024-12-24 14:23:18 +00:00 |
|
|
fef26a5f2f
|
Consolidate all LoRA patching logic in the LoRAPatcher.
|
2024-09-15 04:39:56 +03:00 |
|
|
92b8477299
|
Fixup FLUX LoRA unit tests.
|
2024-09-15 04:39:56 +03:00 |
|