Commit Graph

20 Commits

Author SHA1 Message Date
70def37280 Move quantization scripts to a scripts/ subdir. 2024-08-23 18:08:37 +00:00
8af3c72de7 Update docs for T5 quantization script. 2024-08-23 18:07:14 +00:00
6405214940 Remove all references to optimum-quanto and downgrade diffusers. 2024-08-23 18:04:17 +00:00
86e49c423c Fixes to the T5XXL quantization script. 2024-08-23 18:03:23 +00:00
6d838fa997 Add script for quantizing a T5 model. 2024-08-23 18:03:23 +00:00
42bbab74b3 Add docs to the quantization scripts. 2024-08-21 19:08:28 +00:00
203542c7a8 Update load_flux_model_bnb_llm_int8.py to work with a single-file FLUX transformer checkpoint. 2024-08-21 19:08:16 +00:00
7f62033f1f Fix bug in InvokeInt8Params that was causing it to use double the necessary VRAM. 2024-08-21 19:08:00 +00:00
e41025ddc7 Move requantize.py to the quatnization/ dir. 2024-08-21 18:21:44 +00:00
d11dc6ddd0 Remove duplicate log_time(...) function. 2024-08-21 18:10:24 +00:00
dd24f83d43 Fix styling/lint 2024-08-21 09:10:22 -04:00
115f350f6f Install sub directories with folders correctly, ensure consistent dtype of tensors in flux pipeline and vae 2024-08-21 09:09:39 -04:00
46b6314482 Run Ruff 2024-08-21 09:06:38 -04:00
46d5107ff1 Run Ruff 2024-08-21 09:06:38 -04:00
6ea1278d22 Manage quantization of models within the loader 2024-08-21 09:06:34 -04:00
d7a39a4d67 WIP on moving from diffusers to FLUX 2024-08-21 08:59:19 -04:00
3e8a550fab More improvements for LLM.int8() - not fully tested. 2024-08-21 08:59:19 -04:00
0e96794c6e LLM.int8() quantization is working, but still some rough edges to solve. 2024-08-21 08:59:19 -04:00
23a7328a66 Clean up NF4 implementation. 2024-08-21 08:59:19 -04:00
cdd47b657b Make quantized loading fast for both T5XXL and FLUX transformer. 2024-08-21 08:59:19 -04:00