Commit Graph

24 Commits

Author SHA1 Message Date
Ryan Dick
29fe1533f2 Fix bug in InvokeLinear8bitLt that was causing old state information to persist after loading from a state dict. This manifested as state tensors being left on the GPU even when a model had been offloaded to the CPU cache. 2024-08-29 19:08:18 +00:00
Brandon Rising
65bb46bcca Rename params for flux and flux vae, add comments explaining use of the config_path in model config 2024-08-26 20:17:50 -04:00
Ryan Dick
635d2f480d ruff 2024-08-26 20:17:50 -04:00
Brandon Rising
56b9906e2e Setup scaffolding for in progress images and add ability to cancel the flux node 2024-08-26 20:17:50 -04:00
Ryan Dick
dff4a88baa Move quantization scripts to a scripts/ subdir. 2024-08-26 20:17:50 -04:00
Ryan Dick
a21f6c4964 Update docs for T5 quantization script. 2024-08-26 20:17:50 -04:00
Ryan Dick
97562504b7 Remove all references to optimum-quanto and downgrade diffusers. 2024-08-26 20:17:50 -04:00
Ryan Dick
b9dd354e2b Fixes to the T5XXL quantization script. 2024-08-26 20:17:50 -04:00
Ryan Dick
33c2fbd201 Add script for quantizing a T5 model. 2024-08-26 20:17:50 -04:00
Ryan Dick
b66f19d4d1 Add docs to the quantization scripts. 2024-08-26 20:17:50 -04:00
Ryan Dick
4105a78b83 Update load_flux_model_bnb_llm_int8.py to work with a single-file FLUX transformer checkpoint. 2024-08-26 20:17:50 -04:00
Ryan Dick
19a68afb3a Fix bug in InvokeInt8Params that was causing it to use double the necessary VRAM. 2024-08-26 20:17:50 -04:00
Ryan Dick
cfac7c8189 Move requantize.py to the quatnization/ dir. 2024-08-26 20:17:50 -04:00
Ryan Dick
ac96f187bd Remove duplicate log_time(...) function. 2024-08-26 20:17:50 -04:00
Brandon Rising
57168d719b Fix styling/lint 2024-08-26 20:17:50 -04:00
Brandon Rising
4bd7fda694 Install sub directories with folders correctly, ensure consistent dtype of tensors in flux pipeline and vae 2024-08-26 20:17:50 -04:00
Brandon Rising
2d9042fb93 Run Ruff 2024-08-26 20:17:50 -04:00
Brandon Rising
9ed53af520 Run Ruff 2024-08-26 20:17:50 -04:00
Brandon Rising
56fda669fd Manage quantization of models within the loader 2024-08-26 20:17:50 -04:00
Ryan Dick
1fa6bddc89 WIP on moving from diffusers to FLUX 2024-08-26 20:17:50 -04:00
Ryan Dick
d3a5ca5247 More improvements for LLM.int8() - not fully tested. 2024-08-26 20:17:50 -04:00
Ryan Dick
f01f56a98e LLM.int8() quantization is working, but still some rough edges to solve. 2024-08-26 20:17:50 -04:00
Ryan Dick
99b0f79784 Clean up NF4 implementation. 2024-08-26 20:17:50 -04:00
Ryan Dick
eeabb7ebe5 Make quantized loading fast for both T5XXL and FLUX transformer. 2024-08-26 20:17:50 -04:00