Merge branch 'main' into dev/installer

2024-08-30 20:32:17 +00:00 · 2023-02-01 17:50:22 -05:00
parent 11ac50a6ea 31146eb797
commit 8ce9f07223
27 changed files with 593 additions and 214 deletions
--- a/docs/CHANGELOG.md
+++ b/docs/CHANGELOG.md
@ -52,12 +52,17 @@ introduces several changes you should know about.
  path: models/diffusers/hakurei-haifu-diffusion-1.4
  ```

-2. The format of the models directory has changed to mimic the
-   HuggingFace cache directory. By default, diffusers models are
-   now automatically downloaded and retrieved from the directory
-   `ROOTDIR/models/diffusers`, while other models are stored in
-   the directory `ROOTDIR/models/hub`. This organization is the
-   same as that used by HuggingFace for its cache management.
+2. In order of precedence, InvokeAI will now use HF_HOME, then
+   XDG_CACHE_HOME, then finally default to `ROOTDIR/models` to
+   store HuggingFace diffusers models.
+
+   Consequently, the format of the models directory has changed to
+   mimic the HuggingFace cache directory. When HF_HOME and XDG_HOME
+   are not set, diffusers models are now automatically downloaded
+   and retrieved from the directory `ROOTDIR/models/diffusers`,
+   while other models are stored in the directory
+   `ROOTDIR/models/hub`. This organization is the same as that used
+   by HuggingFace for its cache management.

   This allows you to share diffusers and ckpt model files easily with
   other machine learning applications that use the HuggingFace
@ -66,7 +71,13 @@ introduces several changes you should know about.
   cache models in. To tell InvokeAI to use the standard HuggingFace
   cache directory, you would set HF_HOME like this (Linux/Mac):

-   `export HF_HOME=~/.cache/hugging_face`
+   `export HF_HOME=~/.cache/huggingface`
+
+   Both HuggingFace and InvokeAI will fall back to the XDG_CACHE_HOME
+   environment variable if HF_HOME is not set; this path
+   takes precedence over `ROOTDIR/models` to allow for the same sharing
+   with other machine learning applications that use HuggingFace
+   libraries.

 3. If you upgrade to InvokeAI 2.3.* from an earlier version, there
   will be a one-time migration from the old models directory format
--- a/docs/features/PROMPTS.md
+++ b/docs/features/PROMPTS.md
@ -239,28 +239,24 @@ Generate an image with a given prompt, record the seed of the image, and then
 use the `prompt2prompt` syntax to substitute words in the original prompt for
 words in a new prompt. This works for `img2img` as well.

- `a ("fluffy cat").swap("smiling dog") eating a hotdog`.
-    - quotes optional: `a (fluffy cat).swap(smiling dog) eating a hotdog`.
-    - for single word substitutions parentheses are also optional:
-      `a cat.swap(dog) eating a hotdog`.
- Supports options `s_start`, `s_end`, `t_start`, `t_end` (each 0-1) loosely
-  corresponding to bloc97's `prompt_edit_spatial_start/_end` and
-  `prompt_edit_tokens_start/_end` but with the math swapped to make it easier to
-  intuitively understand.
-    - Example usage:`a (cat).swap(dog, s_end=0.3) eating a hotdog` - the `s_end`
-      argument means that the "spatial" (self-attention) edit will stop having any
-      effect after 30% (=0.3) of the steps have been done, leaving Stable
-      Diffusion with 70% of the steps where it is free to decide for itself how to
-      reshape the cat-form into a dog form.
-    - The numbers represent a percentage through the step sequence where the edits
-      should happen. 0 means the start (noisy starting image), 1 is the end (final
-      image).
-        - For img2img, the step sequence does not start at 0 but instead at
-          (1-strength) - so if strength is 0.7, s_start and s_end must both be
-          greater than 0.3 (1-0.7) to have any effect.
- Convenience option `shape_freedom` (0-1) to specify how much "freedom" Stable
-  Diffusion should have to change the shape of the subject being swapped.
-    - `a (cat).swap(dog, shape_freedom=0.5) eating a hotdog`.
+For example, consider the prompt `a cat.swap(dog) playing with a ball in the forest`. Normally, because of the word words interact with each other when doing a stable diffusion image generation, these two prompts would generate different compositions:
+  - `a cat playing with a ball in the forest`
+  - `a dog playing with a ball in the forest`
+
+| `a cat playing with a ball in the forest` | `a dog playing with a ball in the forest` |
+| --- | --- |
+| img | img |
+
+
+      - For multiple word swaps, use parentheses: `a (fluffy cat).swap(barking dog) playing with a ball in the forest`.
+      - To swap a comma, use quotes: `a ("fluffy, grey cat").swap("big, barking dog") playing with a ball in the forest`.
+- Supports options `t_start` and `t_end` (each 0-1) loosely corresponding to bloc97's `prompt_edit_tokens_start/_end` but with the math swapped to make it easier to
+  intuitively understand. `t_start` and `t_end` are used to control on which steps cross-attention control should run. With the default values `t_start=0` and `t_end=1`, cross-attention control is active on every step of image generation. Other values can be used to turn cross-attention control off for part of the image generation process.
+    - For example, if doing a diffusion with 10 steps for the prompt is `a cat.swap(dog, t_start=0.3, t_end=1.0) playing with a ball in the forest`, the first 3 steps will be run as `a cat playing with a ball in the forest`, while the last 7 steps will run as `a dog playing with a ball in the forest`, but the pixels that represent `dog` will be locked to the pixels that would have represented `cat` if the `cat` prompt had been used instead.
+    - Conversely, for `a cat.swap(dog, t_start=0, t_end=0.7) playing with a ball in the forest`, the first 7 steps will run as `a dog playing with a ball in the forest` with the pixels that represent `dog` locked to the same pixels that would have represented `cat` if the `cat` prompt was being used instead. The final 3 steps will just run `a cat playing with a ball in the forest`.
+    > For img2img, the step sequence does not start at 0 but instead at `(1.0-strength)` - so if the img2img `strength` is `0.7`, `t_start` and `t_end` must both be greater than `0.3` (`1.0-0.7`) to have any effect.
+
+Prompt2prompt `.swap()` is not compatible with xformers, which will be temporarily disabled when doing a `.swap()` - so you should expect to use more VRAM and run slower that with xformers enabled.

 The `prompt2prompt` code is based off
 [bloc97's colab](https://github.com/bloc97/CrossAttentionControl).