* introduce new abstraction layer for GPU devices
* add unit test for device abstraction
* fix ruff
* convert TorchDeviceSelect into a stateless class
* move logic to select context-specific execution device into context API
* add mock hardware environments to pytest
* remove dangling mocker fixture
* fix unit test for running on non-CUDA systems
* remove unimplemented get_execution_device() call
* remove autocast precision
* Multiple changes:
1. Remove TorchDeviceSelect.get_execution_device(), as well as calls to
context.models.get_execution_device().
2. Rename TorchDeviceSelect to TorchDevice
3. Added back the legacy public API defined in `invocation_api`, including
choose_precision().
4. Added a config file migration script to accommodate removal of precision=autocast.
* add deprecation warnings to choose_torch_device() and choose_precision()
* fix test crash
* remove app_config argument from choose_torch_device() and choose_torch_dtype()
---------
Co-authored-by: Lincoln Stein <lstein@gmail.com>
When using refiner with a mask (i.e. inpainting), we don't have noise provided as an input to the node.
This situation uniquely hits a code path that wasn't reviewed when gradient denoising was implemented.
That code path does two things wrong:
- It lerp'd the input latents. This was fixed in 5a1f4cb1ce.
- It added noise to the latents an extra time. This is fixed in this change.
We don't need to add noise in `latents_from_embeddings` because we do it just a lines later in `AddsMaskGuidance`.
- Remove the extraneous call to `add_noise`
- Make `seed` a required arg. We never call the function without seed anyways. If we refactor this in the future, it will be clearer that we need to look at how seed is handled.
- Move the call to create the noise to a deeper conditional, just before we call `AddsMaskGuidance`. The created noise tensor is now only used in that function, no need to create it every time.
Note: Whether or not having both noise and latents as inputs on the node is correct is a separate conversation. This change just fixes the issue with the current setup.
The seamless logic errors when a second GPU is selected. I don't understand why, but a workaround is to skip the model patching when there there are no seamless axes specified.
This is also just a good practice regardless - don't patch the model unless we need to. Probably a negligible perf impact.
Closes#6010
- Replace AnyModelLoader with ModelLoaderRegistry
- Fix type check errors in multiple files
- Remove apparently unneeded `get_model_config_enum()` method from model manager
- Remove last vestiges of old model manager
- Updated tests and documentation
resolve conflict with seamless.py
- Replace legacy model manager service with the v2 manager.
- Update invocations to use new load interface.
- Fixed many but not all type checking errors in the invocations. Most
were unrelated to model manager
- Updated routes. All the new routes live under the route tag
`model_manager_v2`. To avoid confusion with the old routes,
they have the URL prefix `/api/v2/models`. The old routes
have been de-registered.
- Added a pytest for the loader.
- Updated documentation in contributing/MODEL_MANAGER.md
- Implement new model loader and modify invocations and embeddings
- Finish implementation loaders for all models currently supported by
InvokeAI.
- Move lora, textual_inversion, and model patching support into
backend/embeddings.
- Restore support for model cache statistics collection (a little ugly,
needs work).
- Fixed up invocations that load and patch models.
- Move seamless and silencewarnings utils into better location
If the user specifies `torch-sdp` as the attention type in `config.yaml`, we can go ahead and use it (if available) rather than always throwing an exception.
Also, the PREVIOUS commit (@8d3885d, which was already pushed to github repo) was wrongly commented, but too late to fix without a force push or other mucking that I'm reluctant to do. That commit is actually the one that has all the changes to diffusers_pipeline.py to use additional arg down_intrablock_additional_residuals (introduced in diffusers PR https://github.com/huggingface/diffusers/pull/5362) to detangle T2I-Adapter from ControlNet inputs to main UNet.
Upgrade pydantic and fastapi to latest.
- pydantic~=2.4.2
- fastapi~=103.2
- fastapi-events~=0.9.1
**Big Changes**
There are a number of logic changes needed to support pydantic v2. Most changes are very simple, like using the new methods to serialized and deserialize models, but there are a few more complex changes.
**Invocations**
The biggest change relates to invocation creation, instantiation and validation.
Because pydantic v2 moves all validation logic into the rust pydantic-core, we may no longer directly stick our fingers into the validation pie.
Previously, we (ab)used models and fields to allow invocation fields to be optional at instantiation, but required when `invoke()` is called. We directly manipulated the fields and invocation models when calling `invoke()`.
With pydantic v2, this is much more involved. Changes to the python wrapper do not propagate down to the rust validation logic - you have to rebuild the model. This causes problem with concurrent access to the invocation classes and is not a free operation.
This logic has been totally refactored and we do not need to change the model any more. The details are in `baseinvocation.py`, in the `InputField` function and `BaseInvocation.invoke_internal()` method.
In the end, this implementation is cleaner.
**Invocation Fields**
In pydantic v2, you can no longer directly add or remove fields from a model.
Previously, we did this to add the `type` field to invocations.
**Invocation Decorators**
With pydantic v2, we instead use the imperative `create_model()` API to create a new model with the additional field. This is done in `baseinvocation.py` in the `invocation()` wrapper.
A similar technique is used for `invocation_output()`.
**Minor Changes**
There are a number of minor changes around the pydantic v2 models API.
**Protected `model_` Namespace**
All models' pydantic-provided methods and attributes are prefixed with `model_` and this is considered a protected namespace. This causes some conflict, because "model" means something to us, and we have a ton of pydantic models with attributes starting with "model_".
Forunately, there are no direct conflicts. However, in any pydantic model where we define an attribute or method that starts with "model_", we must tell set the protected namespaces to an empty tuple.
```py
class IPAdapterModelField(BaseModel):
model_name: str = Field(description="Name of the IP-Adapter model")
base_model: BaseModelType = Field(description="Base model")
model_config = ConfigDict(protected_namespaces=())
```
**Model Serialization**
Pydantic models no longer have `Model.dict()` or `Model.json()`.
Instead, we use `Model.model_dump()` or `Model.model_dump_json()`.
**Model Deserialization**
Pydantic models no longer have `Model.parse_obj()` or `Model.parse_raw()`, and there are no `parse_raw_as()` or `parse_obj_as()` functions.
Instead, you need to create a `TypeAdapter` object to parse python objects or JSON into a model.
```py
adapter_graph = TypeAdapter(Graph)
deserialized_graph_from_json = adapter_graph.validate_json(graph_json)
deserialized_graph_from_dict = adapter_graph.validate_python(graph_dict)
```
**Field Customisation**
Pydantic `Field`s no longer accept arbitrary args.
Now, you must put all additional arbitrary args in a `json_schema_extra` arg on the field.
**Schema Customisation**
FastAPI and pydantic schema generation now follows the OpenAPI version 3.1 spec.
This necessitates two changes:
- Our schema customization logic has been revised
- Schema parsing to build node templates has been revised
The specific aren't important, but this does present additional surface area for bugs.
**Performance Improvements**
Pydantic v2 is a full rewrite with a rust backend. This offers a substantial performance improvement (pydantic claims 5x to 50x depending on the task). We'll notice this the most during serialization and deserialization of sessions/graphs, which happens very very often - a couple times per node.
I haven't done any benchmarks, but anecdotally, graph execution is much faster. Also, very larges graphs - like with massive iterators - are much, much faster.