20 Commits

Author SHA1 Message Date
Ryan Dick
70def37280 Move quantization scripts to a scripts/ subdir. 2024-08-23 18:08:37 +00:00
Ryan Dick
8af3c72de7 Update docs for T5 quantization script. 2024-08-23 18:07:14 +00:00
Ryan Dick
6405214940 Remove all references to optimum-quanto and downgrade diffusers. 2024-08-23 18:04:17 +00:00
Ryan Dick
86e49c423c Fixes to the T5XXL quantization script. 2024-08-23 18:03:23 +00:00
Ryan Dick
6d838fa997 Add script for quantizing a T5 model. 2024-08-23 18:03:23 +00:00
Ryan Dick
42bbab74b3 Add docs to the quantization scripts. 2024-08-21 19:08:28 +00:00
Ryan Dick
203542c7a8 Update load_flux_model_bnb_llm_int8.py to work with a single-file FLUX transformer checkpoint. 2024-08-21 19:08:16 +00:00
Ryan Dick
7f62033f1f Fix bug in InvokeInt8Params that was causing it to use double the necessary VRAM. 2024-08-21 19:08:00 +00:00
Ryan Dick
e41025ddc7 Move requantize.py to the quatnization/ dir. 2024-08-21 18:21:44 +00:00
Ryan Dick
d11dc6ddd0 Remove duplicate log_time(...) function. 2024-08-21 18:10:24 +00:00
Brandon Rising
dd24f83d43 Fix styling/lint 2024-08-21 09:10:22 -04:00
Brandon Rising
115f350f6f Install sub directories with folders correctly, ensure consistent dtype of tensors in flux pipeline and vae 2024-08-21 09:09:39 -04:00
Brandon Rising
46b6314482 Run Ruff 2024-08-21 09:06:38 -04:00
Brandon Rising
46d5107ff1 Run Ruff 2024-08-21 09:06:38 -04:00
Brandon Rising
6ea1278d22 Manage quantization of models within the loader 2024-08-21 09:06:34 -04:00
Ryan Dick
d7a39a4d67 WIP on moving from diffusers to FLUX 2024-08-21 08:59:19 -04:00
Ryan Dick
3e8a550fab More improvements for LLM.int8() - not fully tested. 2024-08-21 08:59:19 -04:00
Ryan Dick
0e96794c6e LLM.int8() quantization is working, but still some rough edges to solve. 2024-08-21 08:59:19 -04:00
Ryan Dick
23a7328a66 Clean up NF4 implementation. 2024-08-21 08:59:19 -04:00
Ryan Dick
cdd47b657b Make quantized loading fast for both T5XXL and FLUX transformer. 2024-08-21 08:59:19 -04:00