* new OffloadingDevice loads one model at a time, on demand
* fixup! new OffloadingDevice loads one model at a time, on demand
* fix(prompt_to_embeddings): call the text encoder directly instead of its forward method
allowing any associated hooks to run with it.
* more attempts to get things on the right device from the offloader
* more attempts to get things on the right device from the offloader
* make offloading methods an explicit part of the pipeline interface
* inlining some calls where device is only used once
* ensure model group is ready after pipeline.to is called
* fixup! Strategize slicing based on free [V]RAM (#2572)
* doc(offloading): docstrings for offloading.ModelGroup
* doc(offloading): docstrings for offloading-related pipeline methods
* refactor(offloading): s/SimpleModelGroup/FullyLoadedModelGroup
* refactor(offloading): s/HotSeatModelGroup/LazilyLoadedModelGroup
to frame it is the same terms as "FullyLoadedModelGroup"
---------
Co-authored-by: Damian Stewart <null@damianstewart.com>
Tensors with diffusers no longer have to be multiples of 8. This broke Perlin noise generation. We now generate noise for the next largest multiple of 8 and return a cropped result. Fixes#2674.
`generator` now asks `InvokeAIDiffuserComponent` to do postprocessing work on latents after every step. Thresholding - now implemented as replacing latents outside of the threshold with random noise - is called at this point. This postprocessing step is also where we can hook up symmetry and other image latent manipulations in the future.
Note: code at this layer doesn't need to worry about MPS as relevant torch functions are wrapped and made MPS-safe by `generator.py`.
Strategize slicing based on free [V]RAM when not using xformers. Free [V]RAM is evaluated at every generation. When there's enough memory, the entire generation occurs without slicing. If there is not enough free memory, we use diffusers' sliced attention.
- implement the following pattern for finding data files under both
regular and editable install conditions:
import invokeai.foo.bar as bar
path = bar.__path__[0]
- this *seems* to work reliably with Python 3.9. Testing on 3.10 needs
to be performed.
1) Downgrade numpy to avoid dependency conflict with numba
2) Move all non ldm/invoke files into `invokeai`. This includes assets, backend, frontend, and configs.
3) Fix up way that the backend finds the frontend and the generator finds the NSFW caution.png icon.
* Update --hires_fix
Change `--hires_fix` to calculate initial width and height based on the model's resolution (if available) and with a minimum size.
This commit suppresses a few irrelevant warning messages that the
diffusers module produces:
1. The warning that turning off the NSFW detector makes you an
irresponsible person.
2. Warnings about running fp16 models stored in CPU (we are not running
them in CPU, just caching them in CPU RAM)