Commit Graph

1026 Commits

Author SHA1 Message Date
Lincoln Stein
ead1748c54 issue a download progress event when install download starts 2024-05-28 19:30:42 -04:00
psychedelicious
21aa42627b feat(events): add dynamic invocation & result validators
This is required to get these event fields to deserialize correctly. If omitted, pydantic uses `BaseInvocation`/`BaseInvocationOutput`, which is not correct.

This is similar to the workaround in the `Graph` and `GraphExecutionState` classes where we need to fanagle pydantic with manual validation handling.
2024-05-28 05:11:37 -07:00
psychedelicious
a4f88ff834 feat(events): add __event_name__ as ClassVar to EventBase
This improves types for event consumers that need to access the event name.
2024-05-28 05:11:37 -07:00
Lincoln Stein
cd12ca6e85 add migration_11; fix typo 2024-05-27 22:40:01 -04:00
Lincoln Stein
34e1eb19f9 merge with main and resolve conflicts 2024-05-27 22:20:34 -04:00
psychedelicious
b50133d5e1 feat(events): register event schemas
This allows for events to be dispatched using dicts as payloads, and have the dicts validated as pydantic schemas.
2024-05-27 11:13:47 +10:00
psychedelicious
bbb90ff949 feat(events): restore whole invocation to event payloads
Removing this is a breaking API change - some consumers of the events need the whole invocation. Didn't realize that until now.
2024-05-27 10:17:02 +10:00
psychedelicious
9d9801b2c2 feat(events): stronger generic typing for event registration 2024-05-27 10:17:02 +10:00
psychedelicious
dfad37a262 docs: update comments & docstrings 2024-05-27 09:06:02 +10:00
psychedelicious
084cf26ed6 refactor: remove all session events
There's no longer any need for session-scoped events now that we have the session queue. Session started/completed/canceled map 1-to-1 to queue item status events, but queue item status events also have an event for failed state.

We can simplify queue and processor handling substantially by removing session events and instead using queue item events.

- Remove the session-scoped events entirely.
- Remove all event handling from session queue. The processor still needs to respond to some events from the queue: `QueueClearedEvent`, `BatchEnqueuedEvent` and `QueueItemStatusChangedEvent`.
- Pass an `is_canceled` callback to the invocation context instead of the cancel event
- Update processor logic to ensure the local instance of the current queue item is synced with the instance in the database. This prevents race conditions and ensures lifecycle callback do not get stale callbacks.
- Update docstrings and comments
- Add `complete_queue_item` method to session queue service as an explicit way to mark a queue item as successfully completed. Previously, the queue listened for session complete events to do this.

Closes #6442
2024-05-27 09:06:02 +10:00
psychedelicious
368127bd25 feat(events): register_events supports single event 2024-05-27 09:06:02 +10:00
psychedelicious
c0aabcd8ea tidy(events): use tuple index access for event payloads 2024-05-27 09:06:02 +10:00
psychedelicious
ed6c716ddc fix(mm): emit correct event when model load complete 2024-05-27 09:06:02 +10:00
psychedelicious
575943d0ad fix(processor): move session started event to session runner 2024-05-27 09:06:02 +10:00
psychedelicious
25d1d2b591 tidy(processor): use separate handlers for each event type
Just a bit clearer without needing `isinstance` checks.
2024-05-27 09:06:02 +10:00
psychedelicious
64d553f72c feat(events): restore temp handling of user/project 2024-05-27 09:06:02 +10:00
psychedelicious
a9f773c03c fix(mm): port changes into new model_install_common file
Some subtle changes happened between this PR's last update and now. Bring them into the file.
2024-05-27 09:06:02 +10:00
psychedelicious
0f733c42fc fix(events): fix denoise progress percentage
- Restore calculation of step percentage but in the backend instead of client
- Simplify signatures for denoise progress event callbacks
- Clean up `step_callback.py` (types, do not recreate constant matrix on every step, formatting)
2024-05-27 09:06:02 +10:00
psychedelicious
d97186dfc8 feat(events): remove payload registry, add method to get event classes
We don't need to use the payload schema registry. All our events are dispatched as pydantic models, which are already validated on instantiation.

We do want to add all events to the OpenAPI schema, and we referred to the payload schema registry for this. To get all events, add a simple helper to EventBase. This is functionally identical to using the schema registry.
2024-05-27 09:06:02 +10:00
psychedelicious
88a2340b95 feat(events): use builder pattern for download events 2024-05-27 09:06:02 +10:00
psychedelicious
567b87cc50 docs(events): update event docstrings 2024-05-27 09:06:02 +10:00
psychedelicious
655f62008f fix(mm): check for presence of invoker before emitting model load event
The model loader emits events. During testing, it doesn't have access to a fully-mocked events service, so the test fails when attempting to call a nonexistent method. There was a check for this previously, but I accidentally removed it. Restored.
2024-05-27 09:06:02 +10:00
psychedelicious
bf03127c69 fix(events): add missing __event_name__ to EventBase 2024-05-27 09:06:02 +10:00
psychedelicious
2dc752ea83 feat(events): simplify event classes
- Remove ABCs, they do not work well with pydantic
- Remove the event type classvar - unused
- Remove clever logic to require an event name - we already get validation for this during schema registration.
- Rename event bases to all end in "Base"
2024-05-27 09:06:02 +10:00
psychedelicious
8d79ce94aa feat(ui): update UI to use new events
- Use OpenAPI schema for event payload types
- Update all event listeners
- Add missing events / remove old nonexistent events
2024-05-27 09:06:02 +10:00
psychedelicious
9bd78823a3 refactor(events): use pydantic schemas for events
Our events handling and implementation has a couple pain points:
- Adding or removing data from event payloads requires changes wherever the events are dispatched from.
- We have no type safety for events and need to rely on string matching and dict access when interacting with events.
- Frontend types for socket events must be manually typed. This has caused several bugs.

`fastapi-events` has a neat feature where you can create a pydantic model as an event payload, give it an `__event_name__` attr, and then dispatch the model directly.

This allows us to eliminate a layer of indirection and some unpleasant complexity:
- Event handler callbacks get type hints for their event payloads, and can use `isinstance` on them if needed.
- Event payload construction is now the responsibility of the event itself (a pydantic model), not the service. Every event model has a `build` class method, encapsulating this logic. The build methods are provided as few args as possible. For example, `InvocationStartedEvent.build()` gets the invocation instance and queue item, and can choose the data it wants to include in the event payload.
- Frontend event types may be autogenerated from the OpenAPI schema. We use the payload registry feature of `fastapi-events` to collect all payload models into one place, making it trivial to keep our schema and frontend types in sync.

This commit moves the backend over to this improved event handling setup.
2024-05-27 09:06:02 +10:00
psychedelicious
50dd569411 fix(processor): race condition that could result in node errors not getting reported
I had set the cancel event at some point during troubleshooting an unrelated issue. It seemed logical that it should be set there, and didn't seem to break anything. However, this is not correct.

The cancel event should not be set in response to a queue status change event. Doing so can cause a race condition when nodes are executed very quickly.

It's possible that a previously-executed session's queue item status change event is handled after the next session starts executing. The cancel event is set and the session runner sees it aborting the session run early.

In hindsight, it doesn't make sense to set the cancel event here either. It should be set in response to user action, e.g. the user cancelled the session or cleared the queue (which implicitly cancels the current session). These events actually trigger the queue item status changed event, so if we set the cancel event here, we'd be setting it twice per cancellation.
2024-05-24 20:02:24 +10:00
psychedelicious
9c926f249f feat(processor): add debug log stmts to session running callbacks 2024-05-24 20:02:24 +10:00
psychedelicious
80faeac913 fix(processor): fix race condition related to clearing the queue 2024-05-24 20:02:24 +10:00
psychedelicious
9117db2673 tidy(queue): delete unused delete_queue_item method 2024-05-24 20:02:24 +10:00
psychedelicious
4a48aa98a4 chore: ruff 2024-05-24 20:02:24 +10:00
psychedelicious
e365d35c93 docs(processor): update docstrings, comments 2024-05-24 20:02:24 +10:00
psychedelicious
2dd3a85ade feat(processor): update enriched errors & fail_queue_item() 2024-05-24 20:02:24 +10:00
psychedelicious
a8492bd7e4 feat(events): add enriched errors to events 2024-05-24 20:02:24 +10:00
psychedelicious
25954ea750 feat(queue): session queue error handling
- Add handling for new error columns `error_type`, `error_message`, `error_traceback`.
- Update queue item model to include the new data. The `error_traceback` field has an alias of `error` for backwards compatibility.
- Add `fail_queue_item` method. This was previously handled by `cancel_queue_item`. Splitting this functionality makes failing a queue item a bit more explicit. We also don't need to handle multiple optional error args.
-
2024-05-24 20:02:24 +10:00
psychedelicious
887b73aece feat(db): add error_type, error_message, rename error -> error_traceback to session_queue table 2024-05-24 20:02:24 +10:00
psychedelicious
3c41c67d13 fix(processor): restore missing update of session 2024-05-24 20:02:24 +10:00
psychedelicious
6c79be7dc3 chore: ruff 2024-05-24 20:02:24 +10:00
psychedelicious
097619ef51 feat(processor): get user/project from queue item w/ fallback 2024-05-24 20:02:24 +10:00
psychedelicious
a1f7a9cd6f fix(app): fix logging of error classes instead of class names 2024-05-24 20:02:24 +10:00
psychedelicious
25b9c19eed feat(app): handle preparation errors as node errors
We were not handling node preparation errors as node errors before. Here's the explanation, copied from a comment that is no longer required:

---

TODO(psyche): Sessions only support errors on nodes, not on the session itself. When an error occurs outside
node execution, it bubbles up to the processor where it is treated as a queue item error.

Nodes are pydantic models. When we prepare a node in `session.next()`, we set its inputs. This can cause a
pydantic validation error. For example, consider a resize image node which has a constraint on its `width`
input field - it must be greater than zero. During preparation, if the width is set to zero, pydantic will
raise a validation error.

When this happens, it breaks the flow before `invocation` is set. We can't set an error on the invocation
because we didn't get far enough to get it - we don't know its id. Hence, we just set it as a queue item error.

---

This change wraps the node preparation step with exception handling. A new `NodeInputError` exception is raised when there is a validation error. This error has the node (in the state it was in just prior to the error) and an identifier of the input that failed.

This allows us to mark the node that failed preparation as errored, correctly making such errors _node_ errors and not _processor_ errors. It's much easier to diagnose these situations. The error messages look like this:

> Node b5ac87c6-0678-4b8c-96b9-d215aee12175 has invalid incoming input for height

Some of the exception handling logic is cleaned up.
2024-05-24 20:02:24 +10:00
psychedelicious
cc2d877699 docs(app): explain why errors are handled poorly 2024-05-24 20:02:24 +10:00
psychedelicious
be82404759 tidy(app): "outputs" -> "output" 2024-05-24 20:02:24 +10:00
psychedelicious
33f9fe2c86 tidy(app): rearrange proccessor 2024-05-24 20:02:24 +10:00
psychedelicious
1d973f92ff feat(app): support multiple processor lifecycle callbacks 2024-05-24 20:02:24 +10:00
psychedelicious
7f70cde038 feat(app): make things in session runner private 2024-05-24 20:02:24 +10:00
psychedelicious
47722528a3 feat(app): iterate on processor split 2
- Use protocol to define callbacks, this allows them to have kwargs
- Shuffle the profiler around a bit
- Move `thread_limit` and `polling_interval` to `__init__`; `start` is called programmatically and will never get these args in practice
2024-05-24 20:02:24 +10:00
psychedelicious
be41c84305 feat(app): iterate on processor split
- Add `OnNodeError` and `OnNonFatalProcessorError` callbacks
- Move all session/node callbacks to `SessionRunner` - this ensures we dump perf stats before resetting them and generally makes sense to me
- Remove `complete` event from `SessionRunner`, it's essentially the same as `OnAfterRunSession`
- Remove extraneous `next_invocation` block, which would treat a processor error as a node error
- Simplify loops
- Add some callbacks for testing, to be removed before merge
2024-05-24 20:02:24 +10:00
brandonrising
82b4298b03 Fix next node calling logic 2024-05-24 20:02:24 +10:00
brandonrising
fa6c7badd6 Run ruff 2024-05-24 20:02:24 +10:00
brandonrising
45d2504c1e Break apart session processor and the running of each session into separate classes 2024-05-24 20:02:24 +10:00
psychedelicious
93e4c3dbc2 feat(app): update queue item's session on session completion
The session is never updated in the queue after it is first enqueued. As a result, the queue detail view in the frontend never never updates and the session itself doesn't show outputs, execution graph, etc.

We need a new method on the queue service to update a queue item's session, then call it before updating the queue item's status.

Queue item status may be updated via a session-type event _or_ queue-type event. Adding the updated session to all these events is a hairy - simpler to just update the session before we do anything that could trigger a queue item status change event:
- Before calling `emit_session_complete` in the processor (handles session error, completed and cancel events and the corresponding queue events)
- Before calling `cancel_queue_item` in the processor (handles another way queue items can be canceled, outside the session execution loop)

When serializing the session, both in the new service method and the `get_queue_item` endpoint, we need to use `exclude_none=True` to prevent unexpected validation errors.
2024-05-24 08:59:49 +10:00
Lincoln Stein
987ee704a1
Merge branch 'main' into lstein/feat/simple-mm2-api 2024-05-17 22:54:03 -04:00
Lincoln Stein
d968c6f379 refactor multifile download code 2024-05-17 22:29:19 -04:00
psychedelicious
17e1fc5254 chore(app): ruff 2024-05-18 09:21:45 +10:00
maryhipp
84e031edc2 add nulable project also 2024-05-18 09:21:45 +10:00
maryhipp
b6b7e737e0 ruff 2024-05-18 09:21:45 +10:00
maryhipp
5f3e7afd45 add nullable user to invocation error events 2024-05-18 09:21:45 +10:00
psychedelicious
b0cfca9d24 fix(app): pass image metadata as stringified json 2024-05-18 09:04:37 +10:00
psychedelicious
985ef89825 fix(app): type annotations in images service 2024-05-18 09:04:37 +10:00
psychedelicious
5928ade5fd feat(app): simplified create image API
Graph, metadata and workflow all take stringified JSON only. This makes the API consistent and means we don't need to do a round-trip of pydantic parsing when handling this data.

It also prevents a failure mode where an uploaded image's metadata, workflow or graph are old and don't match the current schema.

As before, the frontend does strict validation and parsing when loading these values.
2024-05-18 09:04:37 +10:00
psychedelicious
93ebc175c6 fix(app): retain graph in metadata when uploading images 2024-05-18 09:04:37 +10:00
psychedelicious
922716d2ab feat(ui): store graph in image metadata
The previous super-minimal implementation had a major issue - the saved workflow didn't take into account batched field values. When generating with multiple iterations or dynamic prompts, the same workflow with the first prompt, seed, etc was stored in each image.

As a result, when the batch results in multiple queue items, only one of the images has the correct workflow - the others are mismatched.

To work around this, we can store the _graph_ in the image metadata (alongside the workflow, if generated via workflow editor). When loading a workflow from an image, we can choose to load the workflow or the graph, preferring the workflow.

Internally, we need to update images router image-saving services. The changes are minimal.

To avoid pydantic errors deserializing the graph, when we extract it from the image, we will leave it as stringified JSON and let the frontend's more sophisticated and flexible parsing handle it. The worklow is also changed to just return stringified JSON, so the API is consistent.
2024-05-18 09:04:37 +10:00
Lincoln Stein
2dae5eb7ad more refactoring; HF subfolders not working 2024-05-16 22:26:18 -04:00
Lincoln Stein
911a24479b add tests for model install file size reporting 2024-05-16 07:18:33 -04:00
Lincoln Stein
f29c406fed refactor model_install to work with refactored download queue 2024-05-13 22:49:15 -04:00
Lincoln Stein
287c679f7b clean up type checking for single file and multifile download job callbacks 2024-05-13 18:31:40 -04:00
Lincoln Stein
0bf14c2830 add multifile_download() method to download service 2024-05-12 20:14:00 -06:00
Lincoln Stein
b48d4a049d bad implementation of diffusers folder download 2024-05-08 21:21:01 -07:00
Lincoln Stein
f211c95dbc move access token regex matching into download queue 2024-05-05 21:00:31 -04:00
Lincoln Stein
49c84cd423
Merge branch 'main' into lstein/feat/simple-mm2-api 2024-04-30 18:13:42 -04:00
psychedelicious
d861bc690e feat(mm): handle PC_PATH_MAX on external drives on macOS
`PC_PATH_MAX` doesn't exist for (some?) external drives on macOS. We need error handling when retrieving this value.

Also added error handling for `PC_NAME_MAX` just in case. This does work for me for external drives on macOS, though.

Closes #6277
2024-04-30 07:57:03 -04:00
Lincoln Stein
7c39929758 support VRAM caching of dict models that lack to() 2024-04-28 13:41:06 -04:00
Lincoln Stein
bb04f496e0 Merge branch 'main' into lstein/feat/simple-mm2-api 2024-04-28 11:33:26 -04:00
Lincoln Stein
70903ef057 refactor load_ckpt_from_url() 2024-04-28 11:33:23 -04:00
Lincoln Stein
d72f272f16 Address change requests in first round of PR reviews.
Pending:

- Move model install calls into model manager and create passthrus in invocation_context.
- Consider splitting load_model_from_url() into a call to get the path and a call to load the path.
2024-04-24 23:53:30 -04:00
psychedelicious
2cee436ecf tidy(app): remove unused class 2024-04-23 17:12:14 +10:00
psychedelicious
e6386d969f fix(app): only clear tempdirs if ephemeral and before creating tempdir
Also, this needs to happen in init, else it deletes the temp dir created in init
2024-04-23 17:12:14 +10:00
Lincoln Stein
53808149fb moved cleanup routine into object_serializer_disk.py 2024-04-23 17:12:14 +10:00
Lincoln Stein
2b9f06dc4c
Re-enable app shutdown actions (#6244)
* closes #6242

* only override sigINT during slow model scanning

* fix ruff formatting

---------

Co-authored-by: Lincoln Stein <lstein@gmail.com>
2024-04-19 06:45:42 -04:00
Lincoln Stein
34cdfc61ab
Merge branch 'main' into lstein/feat/simple-mm2-api 2024-04-17 17:18:13 -04:00
Lincoln Stein
fce6b3e44c maybe solve race issue 2024-04-16 13:09:26 +10:00
Lincoln Stein
470a39935c fix merge conflicts with main 2024-04-15 09:24:57 -04:00
Lincoln Stein
e93f4d632d
[util] Add generic torch device class (#6174)
* introduce new abstraction layer for GPU devices

* add unit test for device abstraction

* fix ruff

* convert TorchDeviceSelect into a stateless class

* move logic to select context-specific execution device into context API

* add mock hardware environments to pytest

* remove dangling mocker fixture

* fix unit test for running on non-CUDA systems

* remove unimplemented get_execution_device() call

* remove autocast precision

* Multiple changes:

1. Remove TorchDeviceSelect.get_execution_device(), as well as calls to
   context.models.get_execution_device().
2. Rename TorchDeviceSelect to TorchDevice
3. Added back the legacy public API defined in `invocation_api`, including
   choose_precision().
4. Added a config file migration script to accommodate removal of precision=autocast.

* add deprecation warnings to choose_torch_device() and choose_precision()

* fix test crash

* remove app_config argument from choose_torch_device() and choose_torch_dtype()

---------

Co-authored-by: Lincoln Stein <lstein@gmail.com>
2024-04-15 13:12:49 +00:00
Lincoln Stein
3ddd7ced49 change names of convert and download caches and add migration script 2024-04-14 15:57:33 -04:00
Lincoln Stein
41b909cbe3 port dw_openpose, depth_anything, and lama processors to new model download scheme 2024-04-14 15:57:03 -04:00
Lincoln Stein
3a26c7bb9e fix merge conflicts 2024-04-12 00:58:11 -04:00
Lincoln Stein
df5ebdbc4f add invocation_context.load_ckpt_from_url() method 2024-04-12 00:55:21 -04:00
Lincoln Stein
af1b57a01f add simplified model manager install API to InvocationContext 2024-04-11 21:46:00 -04:00
psychedelicious
b18442ded4 fix(queue): poll queue on finished queue item
When a queue item is finished (completed, canceled, failed), immediately poll the queue for the next queue item.

Closes #6189
2024-04-12 07:31:47 +10:00
Lincoln Stein
dedf0c6ffa fix ruff issues 2024-04-12 07:19:16 +10:00
Lincoln Stein
579082ac10 [mm] clear the cache entry for a model that got an OOM during loading 2024-04-12 07:19:16 +10:00
fieldOfView
dca30d5462 (feat) add a method to get the path of an image from the invocation context
Fixes #6175
2024-04-08 18:42:55 +10:00
Lincoln Stein
812f10730f
adjust free vram calculation for models that will be removed by lazy offloading (#6150)
Co-authored-by: Lincoln Stein <lstein@gmail.com>
2024-04-04 22:51:12 -04:00
psychedelicious
8c15d14099 fix: use locale encoding
We have had a few bugs with v4 related to file encodings, especially on Windows.

Windows uses its own character encodings instead of `utf-8`, often `cp1252`. Some characters cannot be decoded using `utf-8`, causing `UnicodeDecodeError`.

There are a couple places where this can cause problems:
- In the installer bootstrap, we install or upgrade `pip` and decode the result, using `subprocess`.

  The input to this includes the user's home dir. In #6105, the user had one of the problematic characters in their username. `subprocess` attempts and fails to decode the username, which crashes the installer.

  To fix this, we need to use `locale.getpreferredencoding()` when executing the command.
- Similarly, in the model install service and config class, we attempt to load a yaml config file. If a problematic character is in the path to the file (which often includes the user's home dir), we can get the same error.

  One example is  #6129 in which the models.yaml migration fails.

  To fix this, we need to open the file with `locale.getpreferredencoding()`.
2024-04-04 15:30:47 +11:00
Lincoln Stein
9cc1f20ad5 add simplified model manager install API to InvocationContext 2024-04-03 23:26:48 -04:00
psychedelicious
9c51abb46e fix(config): get root from venv
This logic was a bit wonky. It only selected the `venv` parent if there was already an `invokeai.yaml` file in it. Removed this constraint.
2024-04-04 10:54:23 +11:00
psychedelicious
7ff2371c07 fix(mm): do not rename model file if model record is renamed
Renaming the model file to the model name introduces unnecessary contraints on model names.

For example, a model name can technically be any length, but a model _filename_ cannot be too long.

There are also constraints on valid characters for filenames which shouldn't be applied to model record names.

I believe the old behaviour is a holdover from the old system.
2024-04-04 07:17:38 +11:00
psychedelicious
e655399324 fix(config): handle windows paths in invokeai.yaml migration for legacy_conf_dir
The logic incorrectly set the `legacy_conf_dir` on windows, where the slashes go the other direction. Handle this case and update tests to catch it.
2024-04-02 08:06:59 -04:00
psychedelicious
f75de8a35c feat(db): add migration 9 - empty session queue
Empties the session queue. This is done to prevent any lingering session queue items from causing pydantic errors due to changed schemas.
2024-04-02 13:25:14 +11:00