InvokeAI/invokeai/backend/util
psychedelicious 38343917f8 fix(backend): revert non-blocking device transfer
In #6490 we enabled non-blocking torch device transfers throughout the model manager's memory management code. When using this torch feature, torch attempts to wait until the tensor transfer has completed before allowing any access to the tensor. Theoretically, that should make this a safe feature to use.

This provides a small performance improvement but causes race conditions in some situations. Specific platforms/systems are affected, and complicated data dependencies can make this unsafe.

- Intermittent black images on MPS devices - reported on discord and #6545, fixed with special handling in #6549.
- Intermittent OOMs and black images on a P4000 GPU on Windows - reported in #6613, fixed in this commit.

On my system, I haven't experience any issues with generation, but targeted testing of non-blocking ops did expose a race condition when moving tensors from CUDA to CPU.

One workaround is to use torch streams with manual sync points. Our application logic is complicated enough that this would be a lot of work and feels ripe for edge cases and missed spots.

Much safer is to fully revert non-locking - which is what this change does.
2024-07-16 08:59:42 +10:00
..
__init__.py Apply ruff rule to disallow all relative imports. 2024-07-04 09:35:37 -04:00
attention.py chore: ruff 2024-03-01 10:42:33 +11:00
catch_sigint.py Re-enable app shutdown actions (#6244) 2024-04-19 06:45:42 -04:00
db_maintenance.py Use defaults for db_dir and outdir since config no longer writes defaults to invokeai.yaml 2024-03-28 22:39:48 -04:00
devices.py fix(backend): revert non-blocking device transfer 2024-07-16 08:59:42 +10:00
hotfixes.py Fix ruff? 2024-02-01 20:40:28 -05:00
logging.py partially address --root CLI argument handling 2024-03-19 09:24:28 +11:00
mask.py Add utility to_standard_float_mask(...) to convert various mask formats to a standardized format. 2024-04-09 08:12:12 -04:00
mps_fixes.py add note about discriminated union and Body() issue; blackified 2023-11-12 16:50:05 -05:00
silence_warnings.py Tidy SilenceWarnings context manager: 2024-06-18 15:06:22 -04:00
test_utils.py tidy(mm): remove convenience methods from high level model manager service 2024-03-07 10:56:59 +11:00
util.py Tidy SilenceWarnings context manager: 2024-06-18 15:06:22 -04:00