Commit Graph

139 Commits

Author SHA1 Message Date
38343917f8 fix(backend): revert non-blocking device transfer
In #6490 we enabled non-blocking torch device transfers throughout the model manager's memory management code. When using this torch feature, torch attempts to wait until the tensor transfer has completed before allowing any access to the tensor. Theoretically, that should make this a safe feature to use.

This provides a small performance improvement but causes race conditions in some situations. Specific platforms/systems are affected, and complicated data dependencies can make this unsafe.

- Intermittent black images on MPS devices - reported on discord and #6545, fixed with special handling in #6549.
- Intermittent OOMs and black images on a P4000 GPU on Windows - reported in #6613, fixed in this commit.

On my system, I haven't experience any issues with generation, but targeted testing of non-blocking ops did expose a race condition when moving tensors from CUDA to CPU.

One workaround is to use torch streams with manual sync points. Our application logic is complicated enough that this would be a lot of work and feels ripe for edge cases and missed spots.

Much safer is to fully revert non-locking - which is what this change does.
2024-07-16 08:59:42 +10:00
1d449097cc Apply ruff rule to disallow all relative imports. 2024-07-04 09:35:37 -04:00
c7562dd6c0 fix(backend): mps should not use non_blocking
We can get black outputs when moving tensors from CPU to MPS. It appears MPS to CPU is fine. See:
- https://github.com/pytorch/pytorch/issues/107455
- https://discuss.pytorch.org/t/should-we-set-non-blocking-to-true/38234/28

Changes:
- Add properties for each device on `TorchDevice` as a convenience.
- Add `get_non_blocking` static method on `TorchDevice`. This utility takes a torch device and returns the flag to be used for non_blocking when moving a tensor to the device provided.
- Update model patching and caching APIs to use this new utility.

Fixes: #6545
2024-06-27 19:15:23 +10:00
8e47e005a7 Tidy SilenceWarnings context manager:
- Fix type errors
- Enable SilenceWarnings to be used as both a context manager and a decorator
- Remove duplicate implementation
- Check the initial verbosity on __enter__() rather than __init__()
2024-06-18 15:06:22 -04:00
57c831442e fix safe_filename() on windows 2024-04-28 14:42:40 -04:00
a26667d3ca make download and convert cache keys safe for filename length 2024-04-28 12:24:36 -04:00
bb04f496e0 Merge branch 'main' into lstein/feat/simple-mm2-api 2024-04-28 11:33:26 -04:00
d72f272f16 Address change requests in first round of PR reviews.
Pending:

- Move model install calls into model manager and create passthrus in invocation_context.
- Consider splitting load_model_from_url() into a call to get the path and a call to load the path.
2024-04-24 23:53:30 -04:00
2b9f06dc4c Re-enable app shutdown actions (#6244)
* closes #6242

* only override sigINT during slow model scanning

* fix ruff formatting

---------

Co-authored-by: Lincoln Stein <lstein@gmail.com>
2024-04-19 06:45:42 -04:00
e93f4d632d [util] Add generic torch device class (#6174)
* introduce new abstraction layer for GPU devices

* add unit test for device abstraction

* fix ruff

* convert TorchDeviceSelect into a stateless class

* move logic to select context-specific execution device into context API

* add mock hardware environments to pytest

* remove dangling mocker fixture

* fix unit test for running on non-CUDA systems

* remove unimplemented get_execution_device() call

* remove autocast precision

* Multiple changes:

1. Remove TorchDeviceSelect.get_execution_device(), as well as calls to
   context.models.get_execution_device().
2. Rename TorchDeviceSelect to TorchDevice
3. Added back the legacy public API defined in `invocation_api`, including
   choose_precision().
4. Added a config file migration script to accommodate removal of precision=autocast.

* add deprecation warnings to choose_torch_device() and choose_precision()

* fix test crash

* remove app_config argument from choose_torch_device() and choose_torch_dtype()

---------

Co-authored-by: Lincoln Stein <lstein@gmail.com>
2024-04-15 13:12:49 +00:00
182810337c Add utility to_standard_float_mask(...) to convert various mask formats to a standardized format. 2024-04-09 08:12:12 -04:00
9ab6655491 feat(backend): clean up choose_precision
- Allow user-defined precision on MPS.
- Use more explicit logic to handle all possible cases.
- Add comments.
- Remove the app_config args (they were effectively unused, just get the config using the singleton getter util)
2024-04-07 09:41:05 -04:00
3681e34d5a Use defaults for db_dir and outdir since config no longer writes defaults to invokeai.yaml 2024-03-28 22:39:48 -04:00
a397fdbd25 chore: ruff 2024-03-27 08:16:27 -04:00
a291a42abc feat: display torch device on startup
This functionality disappeared at some point.
2024-03-27 08:16:27 -04:00
b378cfcb46 cleanup: remove unused scripts, cruft
App runs & tests pass.
2024-03-20 15:05:25 +11:00
d871fca643 partially address --root CLI argument handling
- fix places where `get_config()` is being called at import time rather
  than at run time.

- add regression test for import time get_config() calling.
2024-03-19 09:24:28 +11:00
60492500db chore: ruff 2024-03-19 09:24:28 +11:00
897fe497dc fix(config): use new get_config across the app, use correct settings 2024-03-19 09:24:28 +11:00
7b1f9409bc fix(config): drop nonexistent config.use_cpu setting 2024-03-19 09:24:28 +11:00
afd9ae7712 tidy(mm): remove convenience methods from high level model manager service
These were added as a hold-me-over for the nodes API changes, no longer needed. A followup commit will fix the nodes API to not rely on these.
2024-03-07 10:56:59 +11:00
dd9daf8efb chore: ruff 2024-03-01 10:42:33 +11:00
5a3195f757 final tidying before marking PR as ready for review
- Replace AnyModelLoader with ModelLoaderRegistry
- Fix type check errors in multiple files
- Remove apparently unneeded `get_model_config_enum()` method from model manager
- Remove last vestiges of old model manager
- Updated tests and documentation

resolve conflict with seamless.py
2024-03-01 10:42:33 +11:00
5d612ec095 Tidy names and locations of modules
- Rename old "model_management" directory to "model_management_OLD" in order to catch
  dangling references to original model manager.
- Caught and fixed most dangling references (still checking)
- Rename lora, textual_inversion and model_patcher modules
- Introduce a RawModel base class to simplfy the Union returned by the
  model loaders.
- Tidy up the model manager 2-related tests. Add useful fixtures, and
  a finalizer to the queue and installer fixtures that will stop the
  services and release threads.
2024-03-01 10:42:33 +11:00
db340bc253 fix invokeai_configure script to work with new mm; rename CLIs 2024-03-01 10:42:33 +11:00
78ef946e01 BREAKING CHANGES: invocations now require model key, not base/type/name
- Implement new model loader and modify invocations and embeddings

- Finish implementation loaders for all models currently supported by
  InvokeAI.

- Move lora, textual_inversion, and model patching support into
  backend/embeddings.

- Restore support for model cache statistics collection (a little ugly,
  needs work).

- Fixed up invocations that load and patch models.

- Move seamless and silencewarnings utils into better location
2024-03-01 10:42:33 +11:00
5745ce9c7d Multiple refinements on loaders:
- Cache stat collection enabled.
- Implemented ONNX loading.
- Add ability to specify the repo version variant in installer CLI.
- If caller asks for a repo version that doesn't exist, will fall back
  to empty version rather than raising an error.
2024-03-01 10:42:33 +11:00
67eb715093 loaders for main, controlnet, ip-adapter, clipvision and t2i 2024-03-01 10:42:33 +11:00
8ba5360269 model loading and conversion implemented for vaes 2024-03-01 10:42:33 +11:00
0f8af643d1 chore(backend): rename ModelInfo -> LoadedModelInfo
We have two different classes named `ModelInfo` which might need to be used by API consumers. We need to export both but have to deal with this naming collision.

The `ModelInfo` I've renamed here is the one that is returned when a model is loaded. It's the object least likely to be used by API consumers.
2024-03-01 10:42:33 +11:00
1057314508 Fix ruff? 2024-02-01 20:40:28 -05:00
413fe566b8 Fix imports 2024-02-01 20:40:28 -05:00
c9b5f06c42 Update diffusers + hotfix 2024-02-01 20:40:28 -05:00
a769f93be0 Remove unnecessary change 2024-01-31 07:16:14 -06:00
14efc95707 Allow passing of a civit api key 2024-01-31 07:16:14 -06:00
74e644c4ba Allow bfloat16 to be configurable in invoke.yaml (#5469)
* feat: allow bfloat16 to be configurable in invoke.yaml

* fix: `torch_dtype()` util

- Use `choose_precision` to get the precision string
- Do not reference deprecated `config.full_precision` flat (why does this still exist?), if a user had this enabled it would override their actual precision setting and potentially cause a lot of confusion.

---------

Co-authored-by: psychedelicious <4822129+psychedelicious@users.noreply.github.com>
2024-01-12 18:40:37 +00:00
6460dcc7e0 use torch.bfloat16 on cuda systems 2024-01-04 23:25:52 -05:00
2d11d97dad remove MacOS Sonoma check in devices.py (#5312)
* remove MacOS Sonoma check in devices.py

As of pytorch 2.1.0, float16 works with our MPS fixes on Sonoma, so the check is no longer needed.

* remove unused platform import
2023-12-22 00:42:47 +00:00
3bfaee9c57 Merge branch 'main' into refactor/model-manager-3 2023-12-04 22:51:45 -05:00
3c7d1fcd32 clean up get_logger() call 2023-12-04 22:41:59 -05:00
2d2ef5d72c ensure that setting loglevel on one logger doesn't change others 2023-12-02 11:48:51 -05:00
ecd3dcd5df Merge branch 'main' into refactor/model-manager-3 2023-11-27 22:15:51 -05:00
ae82df0fda fix a bunch of type mismatches in the logging module 2023-11-28 09:38:35 +11:00
8ef596eac7 further changes for ruff 2023-11-26 17:13:31 -05:00
8695ad6f59 all features implemented, docs updated, ready for review 2023-11-26 13:18:21 -05:00
19baea1883 all backend features in place; config scanning is failing on controlnet 2023-11-24 19:37:46 -05:00
4465f97cdf Merge branch 'main' into refactor/model-manager-2 2023-11-14 07:51:57 +11:00
04d8f2dfea fix(backend): fix controlnet zip len
Do not use `strict=True` when scaling controlnet conditioning.

When using `guess_mode` (e.g. `more_control` or `more_prompt`), `down_block_res_samples` and `scales` are zipped.

These two objects are of different lengths, so using zip's strict mode raises an error.

In testing, `len(scales) === len(down_block_res_samples) + 1`.

It appears this behaviour is intentional, as the final "extra" item in `scales` is used immediately afterwards.
2023-11-13 15:45:03 +11:00
8afe517204 add note about discriminated union and Body() issue; blackified 2023-11-12 16:50:05 -05:00
2b36565e9e awkward workaround for double-Annotated in model_record route 2023-11-10 21:32:44 -05:00