Commit Graph

911 Commits

Author SHA1 Message Date
Lincoln Stein
84f5cbdd97 make choose_torch_dtype() usable outside an invocation context 2024-04-16 19:19:19 -04:00
Lincoln Stein
edac01d4fb reverse stupid hack 2024-04-16 18:13:59 -04:00
Lincoln Stein
eaadc55c7d make pause/resume work in multithreaded environment 2024-04-16 16:55:56 -04:00
Lincoln Stein
89f8326c0b Merge branch 'lstein/feat/multi-gpu' of github.com:invoke-ai/InvokeAI into lstein/feat/multi-gpu 2024-04-16 16:27:08 -04:00
Lincoln Stein
99558de178 device selection calls go through TorchDevice 2024-04-16 16:26:58 -04:00
Lincoln Stein
77130f108d
Merge branch 'main' into lstein/feat/multi-gpu 2024-04-16 16:14:27 -04:00
Lincoln Stein
371f5bc782 simplify logic for retrieving execution devices 2024-04-16 15:52:03 -04:00
Lincoln Stein
fb9b7fb63a make object_serializer._new_name() thread-safe; add max_threads config 2024-04-16 15:23:49 -04:00
Lincoln Stein
bd833900a3 add tid to cache name to avoid non-safe uuid4 on windows 2024-04-16 15:02:06 -04:00
Lincoln Stein
fce6b3e44c maybe solve race issue 2024-04-16 13:09:26 +10:00
Lincoln Stein
a84f3058e2 revert object_serializer_forward_cache.py 2024-04-15 22:28:48 -04:00
Lincoln Stein
f7436f3bae fixup config_default; patch TorchDevice to work dynamically 2024-04-15 22:15:50 -04:00
Lincoln Stein
7dd93cb810 fix merge issues; likely nonfunctional 2024-04-15 21:16:21 -04:00
Lincoln Stein
e93f4d632d
[util] Add generic torch device class (#6174)
* introduce new abstraction layer for GPU devices

* add unit test for device abstraction

* fix ruff

* convert TorchDeviceSelect into a stateless class

* move logic to select context-specific execution device into context API

* add mock hardware environments to pytest

* remove dangling mocker fixture

* fix unit test for running on non-CUDA systems

* remove unimplemented get_execution_device() call

* remove autocast precision

* Multiple changes:

1. Remove TorchDeviceSelect.get_execution_device(), as well as calls to
   context.models.get_execution_device().
2. Rename TorchDeviceSelect to TorchDevice
3. Added back the legacy public API defined in `invocation_api`, including
   choose_precision().
4. Added a config file migration script to accommodate removal of precision=autocast.

* add deprecation warnings to choose_torch_device() and choose_precision()

* fix test crash

* remove app_config argument from choose_torch_device() and choose_torch_dtype()

---------

Co-authored-by: Lincoln Stein <lstein@gmail.com>
2024-04-15 13:12:49 +00:00
psychedelicious
b18442ded4 fix(queue): poll queue on finished queue item
When a queue item is finished (completed, canceled, failed), immediately poll the queue for the next queue item.

Closes #6189
2024-04-12 07:31:47 +10:00
Lincoln Stein
dedf0c6ffa fix ruff issues 2024-04-12 07:19:16 +10:00
Lincoln Stein
579082ac10 [mm] clear the cache entry for a model that got an OOM during loading 2024-04-12 07:19:16 +10:00
fieldOfView
dca30d5462 (feat) add a method to get the path of an image from the invocation context
Fixes #6175
2024-04-08 18:42:55 +10:00
Lincoln Stein
812f10730f
adjust free vram calculation for models that will be removed by lazy offloading (#6150)
Co-authored-by: Lincoln Stein <lstein@gmail.com>
2024-04-04 22:51:12 -04:00
psychedelicious
8c15d14099 fix: use locale encoding
We have had a few bugs with v4 related to file encodings, especially on Windows.

Windows uses its own character encodings instead of `utf-8`, often `cp1252`. Some characters cannot be decoded using `utf-8`, causing `UnicodeDecodeError`.

There are a couple places where this can cause problems:
- In the installer bootstrap, we install or upgrade `pip` and decode the result, using `subprocess`.

  The input to this includes the user's home dir. In #6105, the user had one of the problematic characters in their username. `subprocess` attempts and fails to decode the username, which crashes the installer.

  To fix this, we need to use `locale.getpreferredencoding()` when executing the command.
- Similarly, in the model install service and config class, we attempt to load a yaml config file. If a problematic character is in the path to the file (which often includes the user's home dir), we can get the same error.

  One example is  #6129 in which the models.yaml migration fails.

  To fix this, we need to open the file with `locale.getpreferredencoding()`.
2024-04-04 15:30:47 +11:00
psychedelicious
9c51abb46e fix(config): get root from venv
This logic was a bit wonky. It only selected the `venv` parent if there was already an `invokeai.yaml` file in it. Removed this constraint.
2024-04-04 10:54:23 +11:00
psychedelicious
7ff2371c07 fix(mm): do not rename model file if model record is renamed
Renaming the model file to the model name introduces unnecessary contraints on model names.

For example, a model name can technically be any length, but a model _filename_ cannot be too long.

There are also constraints on valid characters for filenames which shouldn't be applied to model record names.

I believe the old behaviour is a holdover from the old system.
2024-04-04 07:17:38 +11:00
psychedelicious
e655399324 fix(config): handle windows paths in invokeai.yaml migration for legacy_conf_dir
The logic incorrectly set the `legacy_conf_dir` on windows, where the slashes go the other direction. Handle this case and update tests to catch it.
2024-04-02 08:06:59 -04:00
psychedelicious
f75de8a35c feat(db): add migration 9 - empty session queue
Empties the session queue. This is done to prevent any lingering session queue items from causing pydantic errors due to changed schemas.
2024-04-02 13:25:14 +11:00
Lincoln Stein
9adb15f86c working but filled with debug statements 2024-04-01 18:44:24 -04:00
psychedelicious
4049217728 feat(db): back up database before running migrations
Just in case.
2024-04-02 09:10:53 +11:00
Lincoln Stein
3d69372785 implement session-level reservation of gpus 2024-04-01 16:01:43 -04:00
Lincoln Stein
9df0980c46 parallel processing working on single-GPU, not tested on multi 2024-04-01 00:07:47 -04:00
psychedelicious
f83edcf990 feat(nodes): simplify processor loop with an early continue
Prefer an early return/continue to reduce the indentation of the processor loop. Easier to read.

There are other ways to improve its structure but at first glance, they seem to involve changing the logic in scarier ways.
2024-04-01 08:39:25 +11:00
psychedelicious
a6dd50aeaf fix(nodes): 100% cpu usage when processor paused
Should be waiting on the resume event instead of checking it in a loop
2024-04-01 08:39:25 +11:00
Lincoln Stein
cef51ad80d Merge branch 'psyche/fix/nodes/processor-cpu-usage' into lstein/feat/multi-gpu 2024-03-31 17:05:23 -04:00
Lincoln Stein
83356ec74c fix merge conflicts 2024-03-31 17:04:57 -04:00
psychedelicious
32d3e4dc5c feat(nodes): simplify processor loop with an early continue
Prefer an early return/continue to reduce the indentation of the processor loop. Easier to read.

There are other ways to improve its structure but at first glance, they seem to involve changing the logic in scarier ways.
2024-04-01 07:55:42 +11:00
Lincoln Stein
a1dcab9c38 remove references to vram_cache in tests 2024-03-31 16:52:01 -04:00
psychedelicious
bd9b00a6bf fix(nodes): 100% cpu usage when processor paused
Should be waiting on the resume event instead of checking it in a loop
2024-04-01 07:45:36 +11:00
Lincoln Stein
eaa2c68693 remove vram_cache and don't move VRAM models back into CPU 2024-03-31 16:37:13 -04:00
Lincoln Stein
1badf0f32f refactor if/else logic slightly 2024-03-31 12:42:39 -04:00
Lincoln Stein
3c9c58e0fa fix 100% CPU load in session_processor_default._process() 2024-03-31 12:42:39 -04:00
psychedelicious
9a1b35fa37 fix(queue): pause & resume
This must not have been tested after the processors were unified. Needed to shift the logic around so the resume event is handled correctly. Clear and easy fix.
2024-03-30 08:25:33 -04:00
Lincoln Stein
5be69f191d remove debug statement 2024-03-29 17:37:04 -04:00
Lincoln Stein
0ac1c0f339 use is_relative_to() rather than relying on string matching to determine relative directory positioning 2024-03-29 10:56:06 -04:00
Lincoln Stein
c308654442 migrate legacy conf files that were incorrectly relative to root 2024-03-29 10:56:06 -04:00
psychedelicious
b0ffe36d21 feat(mm): update v3 models.yaml migration logic to handle relative paths for legacy config files 2024-03-29 10:56:06 -04:00
psychedelicious
6b3fdb8a93 fix(mm): handle relative model paths in _register_orphaned_models 2024-03-29 10:56:06 -04:00
psychedelicious
7639e05dd2 feat(mm): add migration for RC users to migrate their dbs 2024-03-29 10:56:06 -04:00
psychedelicious
6d261a5a13 fix(mm): handle relative conversion config paths
I have tested main, controlnet and vae checkpoint conversions.
2024-03-29 10:56:06 -04:00
psychedelicious
c5d1bd1360 feat(mm): use relative paths for invoke-managed models
We switched all model paths to be absolute in #5900. In hindsight, this is a mistake, because it makes the `models_dir` non-portable.

This change reverts to the previous model pathing:
- Invoke-managed models (in the `models_dir`) are stored with relative paths
- Non-invoke-managed models (outside the `models_dir`, i.e. in-place installed models) still have absolute paths.

## Why absolute paths make things non-portable

Let's say my `models_dir` is `/media/rhino/invokeai/models/`. In the DB, all model paths will be absolute children of this path, like this:

- `/media/rhino/invokeai/models/sd-1/main/model1.ckpt`

I want to change my `models_dir` to `/home/bat/invokeai/models/`. I update my `invokeai.yaml` file and physically move the files to that directory.

On startup, the app checks for missing models. Because all of my model paths were absolute, they now point to a nonexistent path. All models are broken.

There are a couple options to recover from this situation, neither of which are reasonable:

1. The user must manually update every model's path. Unacceptable UX.
2. On startup, we check for missing models. For each missing model, we compare its path with the last-known models dir. If there is a match, we replace that portion of the path with the new models dir. Then we re-check to see if the path exists. If it does, we update the models DB entry. Brittle and requires a new DB entry for last-known models dir.

It's better to use relative paths for Invoke-managed models.
2024-03-29 10:56:06 -04:00
Lincoln Stein
3409711ed3 close #6080 2024-03-28 22:51:45 -04:00
brandonrising
43bcedee10 Run ruff 2024-03-29 08:45:34 +11:00
brandonrising
98cc9b963c Only cancel session processor if current generating queue item is cancelled 2024-03-29 08:45:34 +11:00