Some tech debt related to dynamic pydantic schemas for invocations became problematic. Including the invocations and results in the event schemas was breaking pydantic's handling of ref schemas. I don't really understand why - I think it's a pydantic bug in a remote edge case that we are hitting.
After many failed attempts I landed on this implementation, which is actually much tidier than what was in there before.
- Create pydantic-enabled types for `AnyInvocation` and `AnyInvocationOutput` and use these in place of the janky dynamic unions. Actually, they are kinda the same, but better encapsulated. Use these in `Graph`, `GraphExecutionState`, `InvocationEventBase` and `InvocationCompleteEvent`.
- Revise the custom openapi function to work with the new models.
- Split out the custom openapi function to a separate file. Add a `post_transform` callback so consumers can customize the output schema.
- Update makefile scripts.
* avoid copying model back from cuda to cpu
* handle models that don't have state dicts
* add assertions that models need a `device()` method
* do not rely on torch.nn.Module having the device() method
* apply all patches after model is on the execution device
* fix model patching in latents too
* log patched tokenizer
* closes#6375
---------
Co-authored-by: Lincoln Stein <lstein@gmail.com>
These simplify loading multiple LoRAs. Instead of requiring chained lora loader nodes, configure each LoRA (model & weight) with a selector, collect them, then send the collection to the collection loader to apply all of the LoRAs to the UNet/CLIP models.
The collection loaders accept a single lora or collection of loras.
There were some invalid constraints with the processors - minimum of 0 for resolution or multiple of 64 for resolution.
Made minimum 1px and no multiple ofs.
* introduce new abstraction layer for GPU devices
* add unit test for device abstraction
* fix ruff
* convert TorchDeviceSelect into a stateless class
* move logic to select context-specific execution device into context API
* add mock hardware environments to pytest
* remove dangling mocker fixture
* fix unit test for running on non-CUDA systems
* remove unimplemented get_execution_device() call
* remove autocast precision
* Multiple changes:
1. Remove TorchDeviceSelect.get_execution_device(), as well as calls to
context.models.get_execution_device().
2. Rename TorchDeviceSelect to TorchDevice
3. Added back the legacy public API defined in `invocation_api`, including
choose_precision().
4. Added a config file migration script to accommodate removal of precision=autocast.
* add deprecation warnings to choose_torch_device() and choose_precision()
* fix test crash
* remove app_config argument from choose_torch_device() and choose_torch_dtype()
---------
Co-authored-by: Lincoln Stein <lstein@gmail.com>
`LatentsField` objects have an optional `seed` field. This should only be populated when the latents are noise, generated from a seed.
`DenoiseLatentsInvocation` needs a seed value for scheduler initialization. It's used in a few places, and there is some logic for determining the seed to use with a series of fallbacks:
- Use the seed from the noise (a `LatentsField` object)
- Use the seed from the latents (a `LatentsField` object - normally it won't have a seed)
- Use `0` as a final fallback
In `DenoisLatentsInvocation`, we set the seed in the `LatentsOutput`, even though the output latents are not noise.
This is normally fine, but when we use refiner, we re-use the those same latents for the refiner denoise. This causes that characteristic same-seed-fried look on the refiner pass.
Simple fix - do not set the field in the output latents.