Commit Graph

1643 Commits

Author SHA1 Message Date
Lincoln Stein
97a7f51721
don't use cpu state_dict for model unpatching when executing on cpu (#6631)
Co-authored-by: Lincoln Stein <lstein@gmail.com>
2024-07-18 15:34:01 -04:00
Ryan Dick
f866b49255 Add some ESRGAN and SwinIR upscale models to the starter models list. 2024-07-16 15:55:10 -04:00
Ryan Dick
6b0ca88177
Merge branch 'main' into ryan/spandrel-upscale-tiling 2024-07-16 15:40:14 -04:00
Ryan Dick
81991e072b Merge branch 'main' into ryan/spandrel-upscale 2024-07-16 15:14:08 -04:00
psychedelicious
38343917f8 fix(backend): revert non-blocking device transfer
In #6490 we enabled non-blocking torch device transfers throughout the model manager's memory management code. When using this torch feature, torch attempts to wait until the tensor transfer has completed before allowing any access to the tensor. Theoretically, that should make this a safe feature to use.

This provides a small performance improvement but causes race conditions in some situations. Specific platforms/systems are affected, and complicated data dependencies can make this unsafe.

- Intermittent black images on MPS devices - reported on discord and #6545, fixed with special handling in #6549.
- Intermittent OOMs and black images on a P4000 GPU on Windows - reported in #6613, fixed in this commit.

On my system, I haven't experience any issues with generation, but targeted testing of non-blocking ops did expose a race condition when moving tensors from CUDA to CPU.

One workaround is to use torch streams with manual sync points. Our application logic is complicated enough that this would be a lot of work and feels ripe for edge cases and missed spots.

Much safer is to fully revert non-locking - which is what this change does.
2024-07-16 08:59:42 +10:00
psychedelicious
28e79c4c5e chore: ruff
Looks like an upstream change to ruff resulted in this file being a violation.
2024-07-15 14:05:04 +10:00
Ryan Dick
ab775726b7 Add tiling support to the SpoandrelImageToImage node. 2024-07-10 14:25:19 -04:00
Ryan Dick
650902dc29 Fix broken unit test caused by non-existent model path. 2024-07-10 13:59:17 -04:00
Ryan Dick
7b5d4935b4 Merge branch 'main' into ryan/spandrel-upscale 2024-07-09 13:47:11 -04:00
Ryan Dick
af63c538ed Demote error log to warning to models treated as having size 0. 2024-07-09 08:35:43 -04:00
Ryan Dick
0ce6ec634d Do not assign the result of SpandrelImageToImageModel.load_from_file(...) during probe to ensure that the model is immediately gc'd. 2024-07-05 14:05:12 -04:00
Ryan Dick
35f8781ea2 Fix static type errors with SCHEDULER_NAME_VALUES. And, avoid bi-directional cross-directory imports, which contribute to circular import issues. 2024-07-05 07:38:35 -07:00
Ryan Dick
36202d6d25 Delete unused duplicate libc_util.py file. The active version is at invokeai/backend/model_manager/libc_util.py. 2024-07-04 10:30:40 -04:00
Ryan Dick
1d449097cc Apply ruff rule to disallow all relative imports. 2024-07-04 09:35:37 -04:00
Ryan Dick
9da5925287 Add ruff rule to disallow relative parent imports. 2024-07-04 09:35:37 -04:00
Ryan Dick
414750a45d Update calc_model_size_by_data(...) to handle all expected model types, and to log an error if an unexpected model type is received. 2024-07-04 09:08:25 -04:00
Ryan Dick
a405f14ea2 Fix SpandrelImageToImageModel size calculation for the model cache. 2024-07-03 16:38:16 -04:00
Ryan Dick
114320ee69 (minor) typo 2024-07-03 16:28:21 -04:00
Ryan Dick
6161aa73af Move pil_to_tensor() and tensor_to_pil() utilities to the SpandrelImageToImage class. 2024-07-03 16:28:21 -04:00
Ryan Dick
1ab20f43c8 Tidy spandrel model probe logic, and document the reasons behind the current implementation. 2024-07-03 16:28:21 -04:00
Ryan Dick
29c8ddfb88 WIP - A bunch of boilerplate to support Spandrel Image-to-Image models throughout the model manager and the frontend. 2024-07-03 16:28:21 -04:00
Ryan Dick
95079dc7d4 Use a ModelIdentifierField to identify the spandrel model in the UpscaleSpandrelInvocation. 2024-07-03 16:28:21 -04:00
Ryan Dick
2a1514272f Set the dtype correctly for SpandrelImageToImageModels when they are loaded. 2024-07-03 16:28:21 -04:00
Ryan Dick
59ce9cf41c WIP - Begin to integrate SpandreImageToImageModel type into the model manager. 2024-07-03 16:28:21 -04:00
Ryan Dick
e6abea7bc5 (minor) Remove redundant else clause on a for-loop with no break statement. 2024-07-03 16:28:21 -04:00
Ryan Dick
c335f92345 (minor) simplify startswith(...) syntax. 2024-07-03 16:28:21 -04:00
Ryan Dick
e4813f800a Update calc_model_size_by_data(...) to handle all expected model types, and to log an error if an unexpected model type is received. 2024-07-02 21:51:45 -04:00
Ryan Dick
3752509066 Expose the VAE tile_size on the VAE encode and decode invocations. 2024-07-02 09:07:03 -04:00
Ryan Dick
79640ba14e Add context manager for overriding VAE tiling params. 2024-07-02 09:07:03 -04:00
Kent Keirsey
5df2a79549 Update starter models 2024-06-28 17:49:45 +10:00
Kent Keirsey
10b9088312 update controlnet starter models 2024-06-28 17:49:45 +10:00
Lincoln Stein
3e0fb45dd7
Load single-file checkpoints directly without conversion (#6510)
* use model_class.load_singlefile() instead of converting; works, but performance is poor

* adjust the convert api - not right just yet

* working, needs sql migrator update

* rename migration_11 before conflict merge with main

* Update invokeai/backend/model_manager/load/model_loaders/stable_diffusion.py

Co-authored-by: Ryan Dick <ryanjdick3@gmail.com>

* Update invokeai/backend/model_manager/load/model_loaders/stable_diffusion.py

Co-authored-by: Ryan Dick <ryanjdick3@gmail.com>

* implement lightweight version-by-version config migration

* simplified config schema migration code

* associate sdxl config with sdxl VAEs

* remove use of original_config_file in load_single_file()

---------

Co-authored-by: Lincoln Stein <lstein@gmail.com>
Co-authored-by: Ryan Dick <ryanjdick3@gmail.com>
2024-06-27 17:31:28 -04:00
Ryan Dick
14775cc9c4 ruff format 2024-06-27 09:45:13 -04:00
psychedelicious
c7562dd6c0
fix(backend): mps should not use non_blocking
We can get black outputs when moving tensors from CPU to MPS. It appears MPS to CPU is fine. See:
- https://github.com/pytorch/pytorch/issues/107455
- https://discuss.pytorch.org/t/should-we-set-non-blocking-to-true/38234/28

Changes:
- Add properties for each device on `TorchDevice` as a convenience.
- Add `get_non_blocking` static method on `TorchDevice`. This utility takes a torch device and returns the flag to be used for non_blocking when moving a tensor to the device provided.
- Update model patching and caching APIs to use this new utility.

Fixes: #6545
2024-06-27 19:15:23 +10:00
Ryan Dick
9a3b8c6fcb Fix handling of init_timestep in StableDiffusionGeneratorPipeline and improve its documentation. 2024-06-26 12:51:51 -04:00
Ryan Dick
bd74b84cc5 Revert "Remove the redundant init_timestep parameter that was being passed around. It is simply the first element of the timesteps array."
This reverts commit fa40061eca.
2024-06-26 12:51:51 -04:00
Brandon Rising
dc23bebebf Run ruff 2024-06-26 21:46:59 +10:00
Kent Keirsey
38b6f90c02 Update prevention exception message 2024-06-26 21:46:59 +10:00
Ryan Dick
cd9dfefe3c Fix inpainting mask shape assertions. 2024-06-25 11:31:52 -07:00
Ryan Dick
e1af78c702 Make the tile_overlap input to MultiDiffusion *strictly* control the amount of overlap rather than being a lower bound. 2024-06-25 11:31:52 -07:00
Ryan Dick
c5588e1ff7 Add TODO comment explaining why some schedulers do not interact well with MultiDiffusion. 2024-06-25 11:31:52 -07:00
Ryan Dick
07ac292680 Consolidate _region_step() function - the separation wasn't really adding any value. 2024-06-25 11:31:52 -07:00
Ryan Dick
7c032ea604 (minor) Fix some documentation typos. 2024-06-25 11:31:52 -07:00
Ryan Dick
fa40061eca Remove the redundant init_timestep parameter that was being passed around. It is simply the first element of the timesteps array. 2024-06-25 11:31:52 -07:00
Ryan Dick
25067e4f0d Delete rough notes. 2024-06-25 11:31:52 -07:00
Ryan Dick
fb0aaa3e6d Fix advanced scheduler behaviour in MultiDiffusionPipeline. 2024-06-25 11:31:52 -07:00
Ryan Dick
c22526b9d0 Fix handling of stateful schedulers in MultiDiffusionPipeline. 2024-06-25 11:31:52 -07:00
Ryan Dick
c881882f73 Connect TiledMultiDiffusionDenoiseLatents to the MultiDiffusionPipeline backend. 2024-06-25 11:31:52 -07:00
Ryan Dick
36473fc52a Remove regional conditioning logic from MultiDiffusionPipeline - it is not yet supported. 2024-06-25 11:31:52 -07:00
Ryan Dick
b9964ecc4a Initial (untested) implementation of MultiDiffusionPipeline. 2024-06-25 11:31:52 -07:00