`. For example, you can rewrite the
previous command to look like this:
-~~~~
+```bash
invoke> !fix ./outputs/000002.seed.png -embiggen_tiles 1 2 3
-~~~~
+```
A new file named `000002.seed.fixed.png` will be created in the output directory. Note that
the `!fix` command does not replace the original file, unlike the behavior at generate time.
You do not need to provide the prompt, and `!fix` automatically selects a good strength for
embiggen-ing.
+!!! note
-**Note**
-Because the same prompt is used on all the tiled images, and the model
-doesn't have the context of anything outside the tile being run - it
-can end up creating repeated pattern (also called 'motifs') across all
-the tiles based on that prompt. The best way to combat this is
-lowering the `--strength` (`-f`) to stay more true to the init image,
-and increasing the number of steps so there is more compute-time to
-create the detail. Anecdotally `--strength` 0.35-0.45 works pretty
-well on most things. It may also work great in some examples even with
-the `--strength` set high for patterns, landscapes, or subjects that
-are more abstract. Because this is (relatively) fast, you can also
-preserve the best parts from each.
+ Because the same prompt is used on all the tiled images, and the model
+ doesn't have the context of anything outside the tile being run - it
+ can end up creating repeated pattern (also called 'motifs') across all
+ the tiles based on that prompt. The best way to combat this is
+ lowering the `--strength` (`-f`) to stay more true to the init image,
+ and increasing the number of steps so there is more compute-time to
+ create the detail. Anecdotally `--strength` 0.35-0.45 works pretty
+ well on most things. It may also work great in some examples even with
+ the `--strength` set high for patterns, landscapes, or subjects that
+ are more abstract. Because this is (relatively) fast, you can also
+ preserve the best parts from each.
Author: [Travco](https://github.com/travco)
diff --git a/docs/features/IMG2IMG.md b/docs/features/IMG2IMG.md
index 769e3b546a..a540ff5cc9 100644
--- a/docs/features/IMG2IMG.md
+++ b/docs/features/IMG2IMG.md
@@ -2,7 +2,9 @@
title: Image-to-Image
---
-# :material-image-multiple: **IMG2IMG**
+# :material-image-multiple: Image-to-Image
+
+## `img2img`
This script also provides an `img2img` feature that lets you seed your creations with an initial
drawing or photo. This is a really cool feature that tells stable diffusion to build the prompt on
@@ -15,13 +17,17 @@ tree on a hill with a river, nature photograph, national geographic -I./test-pic
This will take the original image shown here:
+
-
+
+
and generate a new image based on it as shown here:
+
+
-The `--init_img (-I)` option gives the path to the seed picture. `--strength (-f)` controls how much
+The `--init_img` (`-I`) option gives the path to the seed picture. `--strength` (`-f`) controls how much
the original will be modified, ranging from `0.0` (keep the original intact), to `1.0` (ignore the
original completely). The default is `0.75`, and ranges from `0.25-0.90` give interesting results.
Other relevant options include `-C` (classification free guidance scale), and `-s` (steps). Unlike `txt2img`,
@@ -37,18 +43,21 @@ a very different image:
`photograph of a tree on a hill with a river`
+
+
-(When designing prompts, think about how the images scraped from the internet were captioned. Very few photographs will
-be labeled "photograph" or "photorealistic." They will, however, be captioned with the publication, photographer, camera
-model, or film settings.)
+!!! tip
+
+ When designing prompts, think about how the images scraped from the internet were captioned. Very few photographs will
+ be labeled "photograph" or "photorealistic." They will, however, be captioned with the publication, photographer, camera
+ model, or film settings.
If the initial image contains transparent regions, then Stable Diffusion will only draw within the
-transparent regions, a process called "inpainting". However, for this to work correctly, the color
+transparent regions, a process called [`inpainting`](./INPAINTING.md#creating-transparent-regions-for-inpainting). However, for this to work correctly, the color
information underneath the transparent needs to be preserved, not erased.
-More details can be found here:
-[Creating Transparent Images For Inpainting](./INPAINTING.md#creating-transparent-regions-for-inpainting)
+!!! warning
**IMPORTANT ISSUE** `img2img` does not work properly on initial images smaller than 512x512. Please scale your
image to at least 512x512 before using it. Larger images are not a problem, but may run out of VRAM on your
@@ -60,9 +69,9 @@ tree on a hill with a river, national geographic -I./test-pictures/big-sketch.pn
## How does it actually work, though?
-The main difference between `img2img` and `prompt2img` is the starting point. While `prompt2img` always starts with pure
-gaussian noise and progressively refines it over the requested number of steps, `img2img` skips some of these earlier steps
-(how many it skips is indirectly controlled by the `--strength` parameter), and uses instead your initial image mixed with gaussian noise as the starting image.
+The main difference between `img2img` and `prompt2img` is the starting point. While `prompt2img` always starts with pure
+gaussian noise and progressively refines it over the requested number of steps, `img2img` skips some of these earlier steps
+(how many it skips is indirectly controlled by the `--strength` parameter), and uses instead your initial image mixed with gaussian noise as the starting image.
**Let's start** by thinking about vanilla `prompt2img`, just generating an image from a prompt. If the step count is 10, then the "latent space" (Stable Diffusion's internal representation of the image) for the prompt "fire" with seed `1592514025` develops something like this:
@@ -70,25 +79,33 @@ gaussian noise and progressively refines it over the requested number of steps,
invoke> "fire" -s10 -W384 -H384 -S1592514025
```
+
![latent steps](../assets/img2img/000019.steps.png)
+
-Put simply: starting from a frame of fuzz/static, SD finds details in each frame that it thinks look like "fire" and brings them a little bit more into focus, gradually scrubbing out the fuzz until a clear image remains.
+Put simply: starting from a frame of fuzz/static, SD finds details in each frame that it thinks look like "fire" and brings them a little bit more into focus, gradually scrubbing out the fuzz until a clear image remains.
-**When you use `img2img`** some of the earlier steps are cut, and instead an initial image of your choice is used. But because of how the maths behind Stable Diffusion works, this image needs to be mixed with just the right amount of noise (fuzz/static) for where it is being inserted. This is where the strength parameter comes in. Depending on the set strength, your image will be inserted into the sequence at the appropriate point, with just the right amount of noise.
+**When you use `img2img`** some of the earlier steps are cut, and instead an initial image of your choice is used. But because of how the maths behind Stable Diffusion works, this image needs to be mixed with just the right amount of noise (fuzz/static) for where it is being inserted. This is where the strength parameter comes in. Depending on the set strength, your image will be inserted into the sequence at the appropriate point, with just the right amount of noise.
### A concrete example
-Say I want SD to draw a fire based on this hand-drawn image:
+I want SD to draw a fire based on this hand-drawn image:
+
![drawing of a fireplace](../assets/img2img/fire-drawing.png)
+
Let's only do 10 steps, to make it easier to see what's happening. If strength is `0.7`, this is what the internal steps the algorithm has to take will look like:
-![](../assets/img2img/000032.steps.gravity.png)
+
+![gravity32](../assets/img2img/000032.steps.gravity.png)
+
With strength `0.4`, the steps look more like this:
-![](../assets/img2img/000030.steps.gravity.png)
+
+![gravity30](../assets/img2img/000030.steps.gravity.png)
+
Notice how much more fuzzy the starting image is for strength `0.7` compared to `0.4`, and notice also how much longer the sequence is with `0.7`:
@@ -97,11 +114,12 @@ Notice how much more fuzzy the starting image is for strength `0.7` compared to
| initial image that SD sees | ![](../assets/img2img/000032.step-0.png) | ![](../assets/img2img/000030.step-0.png) |
| steps argument to `invoke>` | `-S10` | `-S10` |
| steps actually taken | 7 | 4 |
-| latent space at each step | ![](../assets/img2img/000032.steps.gravity.png) | ![](../assets/img2img/000030.steps.gravity.png) |
-| output | ![](../assets/img2img/000032.1592514025.png) | ![](../assets/img2img/000030.1592514025.png) |
+| latent space at each step | ![gravity32](../assets/img2img/000032.steps.gravity.png) | ![gravity30](../assets/img2img/000030.steps.gravity.png) |
+| output | ![000032.1592514025](../assets/img2img/000032.1592514025.png) | ![000030.1592514025](../assets/img2img/000030.1592514025.png) |
Both of the outputs look kind of like what I was thinking of. With the strength higher, my input becomes more vague, *and* Stable Diffusion has more steps to refine its output. But it's not really making what I want, which is a picture of cheery open fire. With the strength lower, my input is more clear, *but* Stable Diffusion has less chance to refine itself, so the result ends up inheriting all the problems of my bad drawing.
+If you want to try this out yourself, all of these are using a seed of `1592514025` with a width/height of `384`, step count `10`, the default sampler (`k_lms`), and the single-word prompt `"fire"`:
If you want to try this out yourself, all of these are using a seed of `1592514025` with a width/height of `384`, step count `10`, the default sampler (`k_lms`), and the single-word prompt `fire`:
@@ -121,33 +139,39 @@ Here's strength `0.4` (note step count `50`, which is `20 ÷ 0.4` to make sure S
invoke> "fire" -s50 -W384 -H384 -S1592514025 -I /tmp/fire-drawing.png -f 0.4
```
-![](../assets/img2img/000035.1592514025.png)
+
+![000035.1592514025](../assets/img2img/000035.1592514025.png)
+
-and strength `0.7` (note step count `30`, which is roughly `20 ÷ 0.7` to make sure SD does `20` steps from my image):
+and here is strength `0.7` (note step count `30`, which is roughly `20 ÷ 0.7` to make sure SD does `20` steps from my image):
```commandline
invoke> "fire" -s30 -W384 -H384 -S1592514025 -I /tmp/fire-drawing.png -f 0.7
```
-![](../assets/img2img/000046.1592514025.png)
+
+![000046.1592514025](../assets/img2img/000046.1592514025.png)
+
In both cases the image is nice and clean and "finished", but because at strength `0.7` Stable Diffusion has been give so much more freedom to improve on my badly-drawn flames, they've come out looking much better. You can really see the difference when looking at the latent steps. There's more noise on the first image with strength `0.7`:
-![](../assets/img2img/000046.steps.gravity.png)
+![gravity46](../assets/img2img/000046.steps.gravity.png)
than there is for strength `0.4`:
-![](../assets/img2img/000035.steps.gravity.png)
+![gravity35](../assets/img2img/000035.steps.gravity.png)
-and that extra noise gives the algorithm more choices when it is evaluating how to denoise any particular pixel in the image.
+and that extra noise gives the algorithm more choices when it is evaluating how to denoise any particular pixel in the image.
Unfortunately, it seems that `img2img` is very sensitive to the step count. Here's strength `0.7` with a step count of `29` (SD did 19 steps from my image):
-![](../assets/img2img/000045.1592514025.png)
+
+![gravity45](../assets/img2img/000045.1592514025.png)
+
By comparing the latents we can sort of see that something got interpreted differently enough on the third or fourth step to lead to a rather different interpretation of the flames.
-![](../assets/img2img/000046.steps.gravity.png)
-![](../assets/img2img/000045.steps.gravity.png)
+![gravity46](../assets/img2img/000046.steps.gravity.png)
+![gravity45](../assets/img2img/000045.steps.gravity.png)
-This is the result of a difference in the de-noising "schedule" - basically the noise has to be cleaned by a certain degree each step or the model won't "converge" on the image properly (see https://huggingface.co/blog/stable_diffusion for more about that). A different step count means a different schedule, which means things get interpreted slightly differently at every step.
+This is the result of a difference in the de-noising "schedule" - basically the noise has to be cleaned by a certain degree each step or the model won't "converge" on the image properly (see [stable diffusion blog](https://huggingface.co/blog/stable_diffusion) for more about that). A different step count means a different schedule, which means things get interpreted slightly differently at every step.
diff --git a/docs/features/INPAINTING.md b/docs/features/INPAINTING.md
index 40b01ae13c..c488c72d16 100644
--- a/docs/features/INPAINTING.md
+++ b/docs/features/INPAINTING.md
@@ -6,21 +6,29 @@ title: Inpainting
## **Creating Transparent Regions for Inpainting**
-Inpainting is really cool. To do it, you start with an initial image and use a photoeditor to make
-one or more regions transparent (i.e. they have a "hole" in them). You then provide the path to this
-image at the invoke> command line using the `-I` switch. Stable Diffusion will only paint within the
-transparent region.
+Inpainting is really cool. To do it, you start with an initial image
+and use a photoeditor to make one or more regions transparent
+(i.e. they have a "hole" in them). You then provide the path to this
+image at the dream> command line using the `-I` switch. Stable
+Diffusion will only paint within the transparent region.
-There's a catch. In the current implementation, you have to prepare the initial image correctly so
-that the underlying colors are preserved under the transparent area. Many imaging editing
-applications will by default erase the color information under the transparent pixels and replace
-them with white or black, which will lead to suboptimal inpainting. You also must take care to
-export the PNG file in such a way that the color information is preserved.
+There's a catch. In the current implementation, you have to prepare
+the initial image correctly so that the underlying colors are
+preserved under the transparent area. Many imaging editing
+applications will by default erase the color information under the
+transparent pixels and replace them with white or black, which will
+lead to suboptimal inpainting. It often helps to apply incomplete
+transparency, such as any value between 1 and 99%
-If your photoeditor is erasing the underlying color information, `invoke.py` will give you a big fat
-warning. If you can't find a way to coax your photoeditor to retain color values under transparent
-areas, then you can combine the `-I` and `-M` switches to provide both the original unedited image
-and the masked (partially transparent) image:
+You also must take care to export the PNG file in such a way that the
+color information is preserved. There is often an option in the export
+dialog that lets you specify this.
+
+If your photoeditor is erasing the underlying color information,
+`dream.py` will give you a big fat warning. If you can't find a way to
+coax your photoeditor to retain color values under transparent areas,
+then you can combine the `-I` and `-M` switches to provide both the
+original unedited image and the masked (partially transparent) image:
```bash
invoke> "man with cat on shoulder" -I./images/man.png -M./images/man-transparent.png
@@ -28,6 +36,26 @@ invoke> "man with cat on shoulder" -I./images/man.png -M./images/man-transparent
We are hoping to get rid of the need for this workaround in an upcoming release.
+### Inpainting is not changing the masked region enough!
+
+One of the things to understand about how inpainting works is that it
+is equivalent to running img2img on just the masked (transparent)
+area. img2img builds on top of the existing image data, and therefore
+will attempt to preserve colors, shapes and textures to the best of
+its ability. Unfortunately this means that if you want to make a
+dramatic change in the inpainted region, for example replacing a red
+wall with a blue one, the algorithm will fight you.
+
+You have a couple of options. The first is to increase the values of
+the requested steps (`-sXXX`), strength (`-f0.XX`), and/or
+condition-free guidance (`-CXX.X`). If this is not working for you, a
+more extreme step is to provide the `--inpaint_replace 0.X` (`-r0.X`)
+option. This value ranges from 0.0 to 1.0. The higher it is the less
+attention the algorithm will pay to the data underneath the masked
+region. At high values this will enable you to replace colored regions
+entirely, but beware that the masked region mayl not blend in with the
+surrounding unmasked regions as well.
+
---
## Recipe for GIMP
@@ -44,33 +72,34 @@ We are hoping to get rid of the need for this workaround in an upcoming release.
8. In the export dialogue, Make sure the "Save colour values from
transparent pixels" checkbox is selected.
+---
## Recipe for Adobe Photoshop
1. Open image in Photoshop
-![step1](../assets/step1.png)
+ ![step1](../assets/step1.png)
2. Use any of the selection tools (Marquee, Lasso, or Wand) to select the area you desire to inpaint.
-![step2](../assets/step2.png)
+ ![step2](../assets/step2.png)
3. Because we'll be applying a mask over the area we want to preserve, you should now select the inverse by using the ++shift+ctrl+i++ shortcut, or right clicking and using the "Select Inverse" option.
4. You'll now create a mask by selecting the image layer, and Masking the selection. Make sure that you don't delete any of the underlying image, or your inpainting results will be dramatically impacted.
-![step4](../assets/step4.png)
+ ![step4](../assets/step4.png)
5. Make sure to hide any background layers that are present. You should see the mask applied to your image layer, and the image on your canvas should display the checkered background.
-![step5](../assets/step5.png)
+ ![step5](../assets/step5.png)
-6. Save the image as a transparent PNG by using the "Save a Copy" option in the File menu, or using the Alt + Ctrl + S keyboard shortcut
+6. Save the image as a transparent PNG by using `File`-->`Save a Copy` from the menu bar, or by using the keyboard shortcut ++alt+ctrl+s++
-![step6](../assets/step6.png)
+ ![step6](../assets/step6.png)
7. After following the inpainting instructions above (either through the CLI or the Web UI), marvel at your newfound ability to selectively invoke. Lookin' good!
-![step7](../assets/step7.png)
+ ![step7](../assets/step7.png)
8. In the export dialogue, Make sure the "Save colour values from transparent pixels" checkbox is selected.
diff --git a/docs/features/OTHER.md b/docs/features/OTHER.md
index 820b46c32d..b05e3ea36e 100644
--- a/docs/features/OTHER.md
+++ b/docs/features/OTHER.md
@@ -6,15 +6,13 @@ title: Others
## **Google Colab**
-Stable Diffusion AI Notebook:
-
Open and follow instructions to use an isolated environment running
-Dream.
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg){ align="right" }](https://colab.research.google.com/github/lstein/stable-diffusion/blob/main/notebooks/Stable_Diffusion_AI_Notebook.ipynb)
-Output Example: ![Colab Notebook](../assets/colab_notebook.png)
+Open and follow instructions to use an isolated environment running Dream.
+
+Output Example:
+
+![Colab Notebook](../assets/colab_notebook.png)
---
@@ -33,12 +31,12 @@ invoke> "pond garden with lotus by claude monet" --seamless -s100 -n4
## **Shortcuts: Reusing Seeds**
Since it is so common to reuse seeds while refining a prompt, there is now a shortcut as of version
-1.11. Provide a `**-S**` (or `**--seed**`) switch of `-1` to use the seed of the most recent image
-generated. If you produced multiple images with the `**-n**` switch, then you can go back further
-using -2, -3, etc. up to the first image generated by the previous command. Sorry, but you can't go
+1.11. Provide a `-S` (or `--seed`) switch of `-1` to use the seed of the most recent image
+generated. If you produced multiple images with the `-n` switch, then you can go back further
+using `-2`, `-3`, etc. up to the first image generated by the previous command. Sorry, but you can't go
back further than one command.
-Here's an example of using this to do a quick refinement. It also illustrates using the new `**-G**`
+Here's an example of using this to do a quick refinement. It also illustrates using the new `-G`
switch to turn on upscaling and face enhancement (see previous section):
```bash
@@ -58,7 +56,7 @@ outputs/img-samples/000040.3498014304.png: "a cute child playing hopscotch" -G1.
## **Weighted Prompts**
You may weight different sections of the prompt to tell the sampler to attach different levels of
-priority to them, by adding `:(number)` to the end of the section you wish to up- or downweight. For
+priority to them, by adding `:` to the end of the section you wish to up- or downweight. For
example consider this prompt:
```bash
@@ -71,24 +69,30 @@ combination of integers and floating point numbers, and they do not need to add
---
-## Thresholding and Perlin Noise Initialization Options
+## **Thresholding and Perlin Noise Initialization Options**
Two new options are the thresholding (`--threshold`) and the perlin noise initialization (`--perlin`) options. Thresholding limits the range of the latent values during optimization, which helps combat oversaturation with higher CFG scale values. Perlin noise initialization starts with a percentage (a value ranging from 0 to 1) of perlin noise mixed into the initial noise. Both features allow for more variations and options in the course of generating images.
-For better intuition into what these options do in practice, [here is a graphic demonstrating them both](static/truncation_comparison.jpg) in use. In generating this graphic, perlin noise at initialization was programmatically varied going across on the diagram by values 0.0, 0.1, 0.2, 0.4, 0.5, 0.6, 0.8, 0.9, 1.0; and the threshold was varied going down from
+For better intuition into what these options do in practice:
+
+![here is a graphic demonstrating them both](../assets/truncation_comparison.jpg)
+
+In generating this graphic, perlin noise at initialization was programmatically varied going across on the diagram by values 0.0, 0.1, 0.2, 0.4, 0.5, 0.6, 0.8, 0.9, 1.0; and the threshold was varied going down from
0, 1, 2, 3, 4, 5, 10, 20, 100. The other options are fixed, so the initial prompt is as follows (no thresholding or perlin noise):
-```
- a portrait of a beautiful young lady -S 1950357039 -s 100 -C 20 -A k_euler_a --threshold 0 --perlin 0
+```bash
+invoke> "a portrait of a beautiful young lady" -S 1950357039 -s 100 -C 20 -A k_euler_a --threshold 0 --perlin 0
```
Here's an example of another prompt used when setting the threshold to 5 and perlin noise to 0.2:
-```
- a portrait of a beautiful young lady -S 1950357039 -s 100 -C 20 -A k_euler_a --threshold 5 --perlin 0.2
+```bash
+invoke> "a portrait of a beautiful young lady" -S 1950357039 -s 100 -C 20 -A k_euler_a --threshold 5 --perlin 0.2
```
-Note: currently the thresholding feature is only implemented for the k-diffusion style samplers, and empirically appears to work best with `k_euler_a` and `k_dpm_2_a`. Using 0 disables thresholding. Using 0 for perlin noise disables using perlin noise for initialization. Finally, using 1 for perlin noise uses only perlin noise for initialization.
+!!! note
+
+ currently the thresholding feature is only implemented for the k-diffusion style samplers, and empirically appears to work best with `k_euler_a` and `k_dpm_2_a`. Using 0 disables thresholding. Using 0 for perlin noise disables using perlin noise for initialization. Finally, using 1 for perlin noise uses only perlin noise for initialization.
---
@@ -120,7 +124,7 @@ internet. In the following runs, it will load up the cached versions of the requ
`.cache` directory of the system.
```bash
-(ldm) ~/stable-diffusion$ python3 ./scripts/preload_models.py
+(invokeai) ~/stable-diffusion$ python3 ./scripts/preload_models.py
preloading bert tokenizer...
Downloading: 100%|██████████████████████████████████| 28.0/28.0 [00:00<00:00, 49.3kB/s]
Downloading: 100%|██████████████████████████████████| 226k/226k [00:00<00:00, 2.79MB/s]
diff --git a/docs/features/OUTPAINTING.md b/docs/features/OUTPAINTING.md
index 7d54f1bfc9..1f1e1dbdfa 100644
--- a/docs/features/OUTPAINTING.md
+++ b/docs/features/OUTPAINTING.md
@@ -25,14 +25,16 @@ implementations.
Consider this image:
+
![curly_woman](../assets/outpainting/curly.png)
+
Pretty nice, but it's annoying that the top of her head is cut
off. She's also a bit off center. Let's fix that!
-~~~~
+```bash
invoke> !fix images/curly.png --outcrop top 64 right 64
-~~~~
+```
This is saying to apply the `outcrop` extension by extending the top
of the image by 64 pixels, and the right of the image by the same
@@ -42,7 +44,9 @@ specify any number of pixels to extend. You can also abbreviate
The result looks like this:
+
![curly_woman_outcrop](../assets/outpainting/curly-outcrop.png)
+
The new image is actually slightly larger than the original (576x576,
because 64 pixels were added to the top and right sides.)
@@ -66,33 +70,36 @@ The `outpaint` extension does the same thing, but with subtle
differences. Starting with the same image, here is how we would add an
additional 64 pixels to the top of the image:
-~~~
+```bash
invoke> !fix images/curly.png --out_direction top 64
-~~~
+```
-(you can abbreviate ``--out_direction` as `-D`.
+(you can abbreviate `--out_direction` as `-D`.
The result is shown here:
+
![curly_woman_outpaint](../assets/outpainting/curly-outpaint.png)
+
Although the effect is similar, there are significant differences from
outcropping:
-1. You can only specify one direction to extend at a time.
-2. The image is **not** resized. Instead, the image is shifted by the specified
+- You can only specify one direction to extend at a time.
+- The image is **not** resized. Instead, the image is shifted by the specified
number of pixels. If you look carefully, you'll see that less of the lady's
torso is visible in the image.
-3. Because the image dimensions remain the same, there's no rounding
+- Because the image dimensions remain the same, there's no rounding
to multiples of 64.
-4. Attempting to outpaint larger areas will frequently give rise to ugly
+- Attempting to outpaint larger areas will frequently give rise to ugly
ghosting effects.
-5. For best results, try increasing the step number.
-6. If you don't specify a pixel value in -D, it will default to half
+- For best results, try increasing the step number.
+- If you don't specify a pixel value in `-D`, it will default to half
of the whole image, which is likely not what you want.
-Neither `outpaint` nor `outcrop` are perfect, but we continue to tune
-and improve them. If one doesn't work, try the other. You may also
-wish to experiment with other `img2img` arguments, such as `-C`, `-f`
-and `-s`.
+!!! tip
+ Neither `outpaint` nor `outcrop` are perfect, but we continue to tune
+ and improve them. If one doesn't work, try the other. You may also
+ wish to experiment with other `img2img` arguments, such as `-C`, `-f`
+ and `-s`.
diff --git a/docs/features/POSTPROCESS.md b/docs/features/POSTPROCESS.md
index 148f52fb1e..29c63ffd14 100644
--- a/docs/features/POSTPROCESS.md
+++ b/docs/features/POSTPROCESS.md
@@ -1,8 +1,9 @@
-
---
title: Postprocessing
---
+# :material-image-edit: Postprocessing
+
## Intro
This extension provides the ability to restore faces and upscale
@@ -33,13 +34,13 @@ work. These are loaded when you run `scripts/preload_models.py`. If
GFPAN is failing with an error, please run the following from the
InvokeAI directory:
-~~~~
+```bash
python scripts/preload_models.py
-~~~~
+```
If you do not run this script in advance, the GFPGAN module will attempt
to download the models files the first time you try to perform facial
-reconstruction.
+reconstruction.
Alternatively, if you have GFPGAN installed elsewhere, or if you are
using an earlier version of this package which asked you to install
@@ -88,13 +89,13 @@ too.
### Example Usage
```bash
-invoke> superman dancing with a panda bear -U 2 0.6 -G 0.4
+invoke> "superman dancing with a panda bear" -U 2 0.6 -G 0.4
```
This also works with img2img:
```bash
-invoke> a man wearing a pineapple hat -I path/to/your/file.png -U 2 0.5 -G 0.6
+invoke> "a man wearing a pineapple hat" -I path/to/your/file.png -U 2 0.5 -G 0.6
```
!!! note
@@ -122,20 +123,20 @@ In order to setup CodeFormer to work, you need to download the models
like with GFPGAN. You can do this either by running
`preload_models.py` or by manually downloading the [model
file](https://github.com/sczhou/CodeFormer/releases/download/v0.1.0/codeformer.pth)
-and saving it to `ldm/restoration/codeformer/weights` folder.
+and saving it to `ldm/invoke/restoration/codeformer/weights` folder.
You can use `-ft` prompt argument to swap between CodeFormer and the
default GFPGAN. The above mentioned `-G` prompt argument will allow
you to control the strength of the restoration effect.
-### Usage:
+### Usage
The following command will perform face restoration with CodeFormer instead of
the default gfpgan.
` -G 0.8 -ft codeformer`
-### Other Options:
+### Other Options
- `-cf` - cf or CodeFormer Fidelity takes values between `0` and `1`. 0 produces
high quality results but low accuracy and 1 produces lower quality results but
@@ -161,7 +162,7 @@ previously-generated file. Just use the syntax `!fix path/to/file.png
2X for a file named `./outputs/img-samples/000044.2945021133.png`,
just run:
-```
+```bash
invoke> !fix ./outputs/img-samples/000044.2945021133.png -G 0.8 -U 2
```
@@ -169,7 +170,7 @@ A new file named `000044.2945021133.fixed.png` will be created in the output
directory. Note that the `!fix` command does not replace the original file,
unlike the behavior at generate time.
-### Disabling:
+### Disabling
If, for some reason, you do not wish to load the GFPGAN and/or ESRGAN libraries,
you can disable them on the invoke.py command line with the `--no_restore` and
diff --git a/docs/features/PROMPTS.md b/docs/features/PROMPTS.md
index 361a2fd5bc..b5ef26858b 100644
--- a/docs/features/PROMPTS.md
+++ b/docs/features/PROMPTS.md
@@ -1,8 +1,8 @@
---
-title: Prompting Features
+title: Prompting-Features
---
-# :octicons-command-palette-24: Prompting Features
+# :octicons-command-palette-24: Prompting-Features
## **Reading Prompts from a File**
@@ -19,14 +19,15 @@ innovative packaging for a squid's dinner -S137038382
Then pass this file's name to `invoke.py` when you invoke it:
```bash
-(ldm) ~/stable-diffusion$ python3 scripts/invoke.py --from_file "path/to/prompts.txt"
+(invokeai) ~/stable-diffusion$ python3 scripts/invoke.py --from_file "path/to/prompts.txt"
```
You may read a series of prompts from standard input by providing a filename of `-`:
```bash
-(ldm) ~/stable-diffusion$ echo "a beautiful day" | python3 scripts/invoke.py --from_file -
+(invokeai) ~/stable-diffusion$ echo "a beautiful day" | python3 scripts/invoke.py --from_file -
```
+
---
## **Negative and Unconditioned Prompts**
@@ -34,7 +35,7 @@ You may read a series of prompts from standard input by providing a filename of
Any words between a pair of square brackets will instruct Stable
Diffusion to attempt to ban the concept from the generated image.
-```bash
+```text
this is a test prompt [not really] to make you understand [cool] how this works.
```
@@ -46,25 +47,33 @@ original prompt:
`#!bash "A fantastical translucent poney made of water and foam, ethereal, radiant, hyperalism, scottish folklore, digital painting, artstation, concept art, smooth, 8 k frostbite 3 engine, ultra detailed, art by artgerm and greg rutkowski and magali villeneuve" -s 20 -W 512 -H 768 -C 7.5 -A k_euler_a -S 1654590180`
+
![step1](../assets/negative_prompt_walkthru/step1.png)
+
That image has a woman, so if we want the horse without a rider, we can influence the image not to have a woman by putting [woman] in the prompt, like this:
`#!bash "A fantastical translucent poney made of water and foam, ethereal, radiant, hyperalism, scottish folklore, digital painting, artstation, concept art, smooth, 8 k frostbite 3 engine, ultra detailed, art by artgerm and greg rutkowski and magali villeneuve [woman]" -s 20 -W 512 -H 768 -C 7.5 -A k_euler_a -S 1654590180`
+
![step2](../assets/negative_prompt_walkthru/step2.png)
+
That's nice - but say we also don't want the image to be quite so blue. We can add "blue" to the list of negative prompts, so it's now [woman blue]:
`#!bash "A fantastical translucent poney made of water and foam, ethereal, radiant, hyperalism, scottish folklore, digital painting, artstation, concept art, smooth, 8 k frostbite 3 engine, ultra detailed, art by artgerm and greg rutkowski and magali villeneuve [woman blue]" -s 20 -W 512 -H 768 -C 7.5 -A k_euler_a -S 1654590180`
+
![step3](../assets/negative_prompt_walkthru/step3.png)
+
Getting close - but there's no sense in having a saddle when our horse doesn't have a rider, so we'll add one more negative prompt: [woman blue saddle].
`#!bash "A fantastical translucent poney made of water and foam, ethereal, radiant, hyperalism, scottish folklore, digital painting, artstation, concept art, smooth, 8 k frostbite 3 engine, ultra detailed, art by artgerm and greg rutkowski and magali villeneuve [woman blue saddle]" -s 20 -W 512 -H 768 -C 7.5 -A k_euler_a -S 1654590180`
+
![step4](../assets/negative_prompt_walkthru/step4.png)
+
!!! notes "Notes about this feature:"
@@ -101,44 +110,58 @@ illustrate, here are three images generated using various combinations
of blend weights. As usual, unless you fix the seed, the prompts will give you
different results each time you run them.
+---
+
+
### "blue sphere, red cube, hybrid"
+
This example doesn't use melding at all and represents the default way
of mixing concepts.
-
+
+![blue-sphere-red-cube-hyprid](../assets/prompt-blending/blue-sphere-red-cube-hybrid.png)
+
It's interesting to see how the AI expressed the concept of "cube" as
the four quadrants of the enclosing frame. If you look closely, there
is depth there, so the enclosing frame is actually a cube.
+
### "blue sphere:0.25 red cube:0.75 hybrid"
-
+![blue-sphere-25-red-cube-75](../assets/prompt-blending/blue-sphere-0.25-red-cube-0.75-hybrid.png)
+
Now that's interesting. We get neither a blue sphere nor a red cube,
but a red sphere embedded in a brick wall, which represents a melding
of concepts within the AI's "latent space" of semantic
representations. Where is Ludwig Wittgenstein when you need him?
+
### "blue sphere:0.75 red cube:0.25 hybrid"
-
+![blue-sphere-75-red-cube-25](../assets/prompt-blending/blue-sphere-0.75-red-cube-0.25-hybrid.png)
+
Definitely more blue-spherey. The cube is gone entirely, but it's
really cool abstract art.
+
### "blue sphere:0.5 red cube:0.5 hybrid"
-
+![blue-sphere-5-red-cube-5-hybrid](../assets/prompt-blending/blue-sphere-0.5-red-cube-0.5-hybrid.png)
+
Whoa...! I see blue and red, but no spheres or cubes. Is the word
"hybrid" summoning up the concept of some sort of scifi creature?
Let's find out.
+
### "blue sphere:0.5 red cube:0.5"
-
+![blue-sphere-5-red-cube-5](../assets/prompt-blending/blue-sphere-0.5-red-cube-0.5.png)
+
Indeed, removing the word "hybrid" produces an image that is more like
what we'd expect.
@@ -146,4 +169,3 @@ what we'd expect.
In conclusion, prompt blending is great for exploring creative space,
but can be difficult to direct. A forthcoming release of InvokeAI will
feature more deterministic prompt weighting.
-
diff --git a/docs/features/TEXTUAL_INVERSION.md b/docs/features/TEXTUAL_INVERSION.md
index 15ac0891ad..7ce0f41c5a 100644
--- a/docs/features/TEXTUAL_INVERSION.md
+++ b/docs/features/TEXTUAL_INVERSION.md
@@ -1,8 +1,8 @@
---
-title: TEXTUAL_INVERSION
+title: Textual-Inversion
---
-# :material-file-document-plus-outline: TEXTUAL_INVERSION
+# :material-file-document: Textual Inversion
## **Personalizing Text-to-Image Generation**
@@ -23,13 +23,13 @@ As the default backend is not available on Windows, if you're using that
platform, set the environment variable `PL_TORCH_DISTRIBUTED_BACKEND` to `gloo`
```bash
-python3 ./main.py --base ./configs/stable-diffusion/v1-finetune.yaml \
- --actual_resume ./models/ldm/stable-diffusion-v1/model.ckpt \
- -t \
- -n my_cat \
- --gpus 0 \
- --data_root D:/textual-inversion/my_cat \
- --init_word 'cat'
+python3 ./main.py -t \
+ --base ./configs/stable-diffusion/v1-finetune.yaml \
+ --actual_resume ./models/ldm/stable-diffusion-v1/model.ckpt \
+ -n my_cat \
+ --gpus 0 \
+ --data_root D:/textual-inversion/my_cat \
+ --init_word 'cat'
```
During the training process, files will be created in
@@ -59,7 +59,8 @@ Once the model is trained, specify the trained .pt or .bin file when starting
invoke using
```bash
-python3 ./scripts/invoke.py --embedding_path /path/to/embedding.pt
+python3 ./scripts/invoke.py \
+ --embedding_path /path/to/embedding.pt
```
Then, to utilize your subject at the invoke prompt
@@ -80,9 +81,9 @@ LDM checkpoints using:
```bash
python3 ./scripts/merge_embeddings.py \
- --manager_ckpts /path/to/first/embedding.pt \
- [