Merge branch 'main' into dev/installer

This commit is contained in:
Lincoln Stein
2023-02-01 17:50:22 -05:00
committed by GitHub
27 changed files with 593 additions and 214 deletions

View File

@ -52,12 +52,17 @@ introduces several changes you should know about.
path: models/diffusers/hakurei-haifu-diffusion-1.4
```
2. The format of the models directory has changed to mimic the
HuggingFace cache directory. By default, diffusers models are
now automatically downloaded and retrieved from the directory
`ROOTDIR/models/diffusers`, while other models are stored in
the directory `ROOTDIR/models/hub`. This organization is the
same as that used by HuggingFace for its cache management.
2. In order of precedence, InvokeAI will now use HF_HOME, then
XDG_CACHE_HOME, then finally default to `ROOTDIR/models` to
store HuggingFace diffusers models.
Consequently, the format of the models directory has changed to
mimic the HuggingFace cache directory. When HF_HOME and XDG_HOME
are not set, diffusers models are now automatically downloaded
and retrieved from the directory `ROOTDIR/models/diffusers`,
while other models are stored in the directory
`ROOTDIR/models/hub`. This organization is the same as that used
by HuggingFace for its cache management.
This allows you to share diffusers and ckpt model files easily with
other machine learning applications that use the HuggingFace
@ -66,7 +71,13 @@ introduces several changes you should know about.
cache models in. To tell InvokeAI to use the standard HuggingFace
cache directory, you would set HF_HOME like this (Linux/Mac):
`export HF_HOME=~/.cache/hugging_face`
`export HF_HOME=~/.cache/huggingface`
Both HuggingFace and InvokeAI will fall back to the XDG_CACHE_HOME
environment variable if HF_HOME is not set; this path
takes precedence over `ROOTDIR/models` to allow for the same sharing
with other machine learning applications that use HuggingFace
libraries.
3. If you upgrade to InvokeAI 2.3.* from an earlier version, there
will be a one-time migration from the old models directory format

View File

@ -239,28 +239,24 @@ Generate an image with a given prompt, record the seed of the image, and then
use the `prompt2prompt` syntax to substitute words in the original prompt for
words in a new prompt. This works for `img2img` as well.
- `a ("fluffy cat").swap("smiling dog") eating a hotdog`.
- quotes optional: `a (fluffy cat).swap(smiling dog) eating a hotdog`.
- for single word substitutions parentheses are also optional:
`a cat.swap(dog) eating a hotdog`.
- Supports options `s_start`, `s_end`, `t_start`, `t_end` (each 0-1) loosely
corresponding to bloc97's `prompt_edit_spatial_start/_end` and
`prompt_edit_tokens_start/_end` but with the math swapped to make it easier to
intuitively understand.
- Example usage:`a (cat).swap(dog, s_end=0.3) eating a hotdog` - the `s_end`
argument means that the "spatial" (self-attention) edit will stop having any
effect after 30% (=0.3) of the steps have been done, leaving Stable
Diffusion with 70% of the steps where it is free to decide for itself how to
reshape the cat-form into a dog form.
- The numbers represent a percentage through the step sequence where the edits
should happen. 0 means the start (noisy starting image), 1 is the end (final
image).
- For img2img, the step sequence does not start at 0 but instead at
(1-strength) - so if strength is 0.7, s_start and s_end must both be
greater than 0.3 (1-0.7) to have any effect.
- Convenience option `shape_freedom` (0-1) to specify how much "freedom" Stable
Diffusion should have to change the shape of the subject being swapped.
- `a (cat).swap(dog, shape_freedom=0.5) eating a hotdog`.
For example, consider the prompt `a cat.swap(dog) playing with a ball in the forest`. Normally, because of the word words interact with each other when doing a stable diffusion image generation, these two prompts would generate different compositions:
- `a cat playing with a ball in the forest`
- `a dog playing with a ball in the forest`
| `a cat playing with a ball in the forest` | `a dog playing with a ball in the forest` |
| --- | --- |
| img | img |
- For multiple word swaps, use parentheses: `a (fluffy cat).swap(barking dog) playing with a ball in the forest`.
- To swap a comma, use quotes: `a ("fluffy, grey cat").swap("big, barking dog") playing with a ball in the forest`.
- Supports options `t_start` and `t_end` (each 0-1) loosely corresponding to bloc97's `prompt_edit_tokens_start/_end` but with the math swapped to make it easier to
intuitively understand. `t_start` and `t_end` are used to control on which steps cross-attention control should run. With the default values `t_start=0` and `t_end=1`, cross-attention control is active on every step of image generation. Other values can be used to turn cross-attention control off for part of the image generation process.
- For example, if doing a diffusion with 10 steps for the prompt is `a cat.swap(dog, t_start=0.3, t_end=1.0) playing with a ball in the forest`, the first 3 steps will be run as `a cat playing with a ball in the forest`, while the last 7 steps will run as `a dog playing with a ball in the forest`, but the pixels that represent `dog` will be locked to the pixels that would have represented `cat` if the `cat` prompt had been used instead.
- Conversely, for `a cat.swap(dog, t_start=0, t_end=0.7) playing with a ball in the forest`, the first 7 steps will run as `a dog playing with a ball in the forest` with the pixels that represent `dog` locked to the same pixels that would have represented `cat` if the `cat` prompt was being used instead. The final 3 steps will just run `a cat playing with a ball in the forest`.
> For img2img, the step sequence does not start at 0 but instead at `(1.0-strength)` - so if the img2img `strength` is `0.7`, `t_start` and `t_end` must both be greater than `0.3` (`1.0-0.7`) to have any effect.
Prompt2prompt `.swap()` is not compatible with xformers, which will be temporarily disabled when doing a `.swap()` - so you should expect to use more VRAM and run slower that with xformers enabled.
The `prompt2prompt` code is based off
[bloc97's colab](https://github.com/bloc97/CrossAttentionControl).