Merge branch 'development' into Improved-fetch-and-option-to-replay-commands-from-file

2024-08-30 20:32:17 +00:00 · 2022-10-08 13:26:22 +02:00
parent 935a9d3c75 3b0c4b74b6
commit dfd5899611
239 changed files with 8262 additions and 3944 deletions
--- a/docs/features/CLI.md
+++ b/docs/features/CLI.md
@ -146,6 +146,7 @@ Here are the dream> command that apply to txt2img:
 | --cfg_scale <float>| -C<float> | 7.5                 | How hard to try to match the prompt to the generated image; any number greater than 1.0 works, but the useful range is roughly 5.0 to 20.0 |
 | --seed <int>       | -S<int>   | None                | Set the random seed for the next series of images. This can be used to recreate an image generated previously.|
 | --sampler <sampler>| -A<sampler>| k_lms              | Sampler to use. Use -h to get list of available samplers. |
+| --hires_fix        |           |                     | Larger images often have duplication artefacts. This option suppresses duplicates by generating the image at low res, and then using img2img to increase the resolution |
 | --grid             | -g        | False               | Turn on grid mode to return a single image combining all the images generated by this prompt |
 | --individual       | -i        | True                | Turn off grid mode (deprecated; leave off --grid instead) |
 | --outdir <path>    |  -o<path> | outputs/img_samples  | Temporarily change the location of these images |
@ -249,9 +250,9 @@ generated image and either loads them into the command line
 (Linux|Mac), or prints them out in a comment for copy-and-paste
 (Windows). You may provide either the name of a file in the current
 output directory, or a full file path.
-Given a wildcard path to a folder with image png files, 
-command will retrieve the dream command used to generate the images,
-and save them to a file commands.txt for further processing
+Specify path to a folder with image png files, and wildcard *.png 
+to retrieve the dream command used to generate the images,
+and save them to a file commands.txt for further processing.
 Name of the saved file could be set as the second argument to !fetch

 ~~~
@ -299,10 +300,25 @@ dream> !20
 dream> watercolor of beautiful woman sitting under tree wearing broad hat and flowing garment -v0.2 -n6 -S2878767194
 ~~~

+## !search <search string>
+
+This is similar to !history but it only returns lines that contain
+`search string`. For example:
+
+~~~
+dream> !search surreal
+[21] surrealist painting of beautiful woman sitting under tree wearing broad hat and flowing garment -v0.2 -n6 -S2878767194
+~~~
+
+## !clear
+
+This clears the search history from memory and disk. Be advised that
+this operation is irreversible and does not issue any warnings!
+
 # Command-line editing and completion

-If you are on a Macintosh or Linux machine, the command-line offers
-convenient history tracking, editing, and command completion.
+The command-line offers convenient history tracking, editing, and
+command completion.

 - To scroll through previous commands and potentially edit/reuse them, use the up and down cursor keys.
 - To edit the current command, use the left and right cursor keys to position the cursor, and then backspace, delete or insert characters.
@ -312,7 +328,8 @@ convenient history tracking, editing, and command completion.
 - To paste a cut section back in, position the cursor where you want to paste, and type CTRL-Y

 Windows users can get similar, but more limited, functionality if they
-launch dream.py with the "winpty" program:
+launch dream.py with the "winpty" program and have the `pyreadline3`
+library installed:

 ~~~
 > winpty python scripts\dream.py
--- a/docs/features/IMG2IMG.md
+++ b/docs/features/IMG2IMG.md
@ -9,7 +9,7 @@ drawing or photo. This is a really cool feature that tells stable diffusion to b
 top of the image you provide, preserving the original's basic shape and layout. To use it, provide
 the `--init_img` option as shown here:

-```bash
+```commandline
 dream> "waterfall and rainbow" --init_img=./init-images/crude_drawing.png --strength=0.5 -s100 -n4
 ```

@ -26,5 +26,99 @@ If the initial image contains transparent regions, then Stable Diffusion will on
 transparent regions, a process called "inpainting". However, for this to work correctly, the color
 information underneath the transparent needs to be preserved, not erased.

-More Details can be found here:
+More details can be found here:
 [Creating Transparent Images For Inpainting](./INPAINTING.md#creating-transparent-regions-for-inpainting)
+
+## How does it actually work, though?
+
+The main difference between `img2img` and `prompt2img` is the starting point. While `prompt2img` always starts with pure 
+gaussian noise and progressively refines it over the requested number of steps, `img2img` skips some of these earlier steps 
+(how many it skips is indirectly controlled by the `--strength` parameter), and uses instead your initial image mixed with gaussian noise as the starting image. 
+
+**Let's start** by thinking about vanilla `prompt2img`, just generating an image from a prompt. If the step count is 10, then the "latent space" (Stable Diffusion's internal representation of the image) for the prompt "fire" with seed `1592514025` develops something like this:
+
+```commandline
+dream> "fire" -s10 -W384 -H384 -S1592514025
+```
+
+![latent steps](../assets/img2img/000019.steps.png)
+
+Put simply: starting from a frame of fuzz/static, SD finds details in each frame that it thinks look like "fire" and brings them a little bit more into focus, gradually scrubbing out the fuzz until a clear image remains. 
+
+**When you use `img2img`** some of the earlier steps are cut, and instead an initial image of your choice is used. But because of how the maths behind Stable Diffusion works, this image needs to be mixed with just the right amount of noise (fuzz/static) for where it is being inserted. This is where the strength parameter comes in. Depending on the set strength, your image will be inserted into the sequence at the appropriate point, with just the right amount of noise. 
+
+### A concrete example
+
+Say I want SD to draw a fire based on this hand-drawn image:
+
+![drawing of a fireplace](../assets/img2img/fire-drawing.png)
+
+Let's only do 10 steps, to make it easier to see what's happening. If strength is `0.7`, this is what the internal steps the algorithm has to take will look like:
+
+![](../assets/img2img/000032.steps.gravity.png)
+
+With strength `0.4`, the steps look more like this:
+
+![](../assets/img2img/000030.steps.gravity.png)
+
+Notice how much more fuzzy the starting image is for strength `0.7` compared to `0.4`, and notice also how much longer the sequence is with `0.7`:
+
+|  | strength = 0.7 | strength = 0.4 |
+| -- | -- | -- |
+| initial image that SD sees | ![](../assets/img2img/000032.step-0.png) | ![](../assets/img2img/000030.step-0.png) |
+| steps argument to `dream>` | `-S10` | `-S10` |
+| steps actually taken | 7 | 4 |
+| latent space at each step | ![](../assets/img2img/000032.steps.gravity.png) | ![](../assets/img2img/000030.steps.gravity.png) |
+| output | ![](../assets/img2img/000032.1592514025.png) | ![](../assets/img2img/000030.1592514025.png) |
+
+Both of the outputs look kind of like what I was thinking of. With the strength higher, my input becomes more vague, *and* Stable Diffusion has more steps to refine its output. But it's not really making what I want, which is a picture of cheery open fire. With the strength lower, my input is more clear, *but* Stable Diffusion has less chance to refine itself, so the result ends up inheriting all the problems of my bad drawing.
+
+
+If you want to try this out yourself, all of these are using a seed of `1592514025` with a width/height of `384`, step count `10`, the default sampler (`k_lms`), and the single-word prompt `fire`:
+
+```commandline
+dream> "fire" -s10 -W384 -H384 -S1592514025 -I /tmp/fire-drawing.png --strength 0.7
+```
+
+The code for rendering intermediates is on my (damian0815's) branch [document-img2img](https://github.com/damian0815/InvokeAI/tree/document-img2img) - run `dream.py` and check your `outputs/img-samples/intermediates` folder while generating an image. 
+
+### Compensating for the reduced step count
+
+After putting this guide together I was curious to see how the difference would be if I increased the step count to compensate, so that SD could have the same amount of steps to develop the image regardless of the strength. So I ran the generation again using the same seed, but this time adapting the step count to give each generation 20 steps.
+
+Here's strength `0.4` (note step count `50`, which is `20 ÷ 0.4` to make sure SD does `20` steps from my image):
+
+```commandline
+dream> "fire" -s50 -W384 -H384 -S1592514025 -I /tmp/fire-drawing.png -f 0.4
+```
+
+![](../assets/img2img/000035.1592514025.png)
+
+and strength `0.7` (note step count `30`, which is roughly `20 ÷ 0.7` to make sure SD does `20` steps from my image):
+
+```commandline
+dream> "fire" -s30 -W384 -H384 -S1592514025 -I /tmp/fire-drawing.png -f 0.7
+```
+
+![](../assets/img2img/000046.1592514025.png)
+
+In both cases the image is nice and clean and "finished", but because at strength `0.7` Stable Diffusion has been give so much more freedom to improve on my badly-drawn flames, they've come out looking much better. You can really see the difference when looking at the latent steps. There's more noise on the first image with strength `0.7`:
+
+![](../assets/img2img/000046.steps.gravity.png)
+
+than there is for strength `0.4`:
+
+![](../assets/img2img/000035.steps.gravity.png)
+
+and that extra noise gives the algorithm more choices when it is evaluating how to denoise any particular pixel in the image. 
+
+Unfortunately, it seems that `img2img` is very sensitive to the step count. Here's strength `0.7` with a step count of `29` (SD did 19 steps from my image):
+
+![](../assets/img2img/000045.1592514025.png)
+
+By comparing the latents we can sort of see that something got interpreted differently enough on the third or fourth step to lead to a rather different interpretation of the flames.
+
+![](../assets/img2img/000046.steps.gravity.png)
+![](../assets/img2img/000045.steps.gravity.png)
+
+This is the result of a difference in the de-noising "schedule" - basically the noise has to be cleaned by a certain degree each step or the model won't "converge" on the image properly (see https://huggingface.co/blog/stable_diffusion for more about that). A different step count means a different schedule, which means things get interpreted slightly differently at every step. 
--- a/docs/features/OTHER.md
+++ b/docs/features/OTHER.md
@ -55,6 +55,43 @@ outputs/img-samples/000040.3498014304.png: "a cute child playing hopscotch" -G1.

 ---

+## **Weighted Prompts**
+
+You may weight different sections of the prompt to tell the sampler to attach different levels of
+priority to them, by adding `:(number)` to the end of the section you wish to up- or downweight. For
+example consider this prompt:
+
+```bash
+tabby cat:0.25 white duck:0.75 hybrid
+```
+
+This will tell the sampler to invest 25% of its effort on the tabby cat aspect of the image and 75%
+on the white duck aspect (surprisingly, this example actually works). The prompt weights can use any
+combination of integers and floating point numbers, and they do not need to add up to 1.
+
+---
+
+## Thresholding and Perlin Noise Initialization Options
+
+Two new options are the thresholding (`--threshold`) and the perlin noise initialization (`--perlin`) options. Thresholding limits the range of the latent values during optimization, which helps combat oversaturation with higher CFG scale values. Perlin noise initialization starts with a percentage (a value ranging from 0 to 1) of perlin noise mixed into the initial noise. Both features allow for more variations and options in the course of generating images.
+
+For better intuition into what these options do in practice, [here is a graphic demonstrating them both](static/truncation_comparison.jpg) in use. In generating this graphic, perlin noise at initialization was programmatically varied going across on the diagram by values 0.0, 0.1, 0.2, 0.4, 0.5, 0.6, 0.8, 0.9, 1.0; and the threshold was varied going down from
+0, 1, 2, 3, 4, 5, 10, 20, 100. The other options are fixed, so the initial prompt is as follows (no thresholding or perlin noise):
+
+```
+    a portrait of a beautiful young lady -S 1950357039 -s 100 -C 20 -A k_euler_a --threshold 0 --perlin 0
+```
+
+Here's an example of another prompt used when setting the threshold to 5 and perlin noise to 0.2:
+
+```
+    a portrait of a beautiful young lady -S 1950357039 -s 100 -C 20 -A k_euler_a --threshold 5 --perlin 0.2
+```
+
+Note: currently the thresholding feature is only implemented for the k-diffusion style samplers, and empirically appears to work best with `k_euler_a` and `k_dpm_2_a`. Using 0 disables thresholding. Using 0 for perlin noise disables using perlin noise for initialization. Finally, using 1 for perlin noise uses only perlin noise for initialization.
+
+---
+
 ## **Simplified API**

 For programmers who wish to incorporate stable-diffusion into other products, this repository
--- a/docs/features/OUTPAINTING.md
+++ b/docs/features/OUTPAINTING.md
@ -4,75 +4,95 @@ title: Outpainting

 # :octicons-paintbrush-16: Outpainting

-## Continous outpainting
+## Outpainting and outcropping

-This extension uses the inpainting code to extend an existing image to
-any direction of "top", "right", "bottom" or "left". To use it you
-need to provide an initial image with -I and an extension direction
-with -D (direction). When extending using outpainting a higher img2img
-strength value of 0.83 is the default.
+Outpainting is a process by which the AI generates parts of the image
+that are outside its original frame. It can be used to fix up images
+in which the subject is off center, or when some detail (often the top
+of someone's head!) is cut off.

-The code is not foolproof. Sometimes it will do a good job extending
-the image, and other times it will generate ghost images and other
-artifacts. In addition, the code works best on images that were
-generated by dream.py, because it will be able to recover the original
-prompt that generated the file and "understand" what you are trying to
-achieve.
+InvokeAI supports two versions of outpainting, one called "outpaint"
+and the other "outcrop." They work slightly differently and each has
+its advantages and drawbacks.

-### Basic Usage
+### Outcrop

-To illustrate, consider this image generated with the prompt "fantasy
-portrait of eleven princess." It's nice, but rather annoying that the
-top of the head has been cropped off.
+The `outcrop` extension allows you to extend the image in 64 pixel
+increments in any dimension. You can apply the module to any image
+previously-generated by InvokeAI. Note that it will **not** work with
+arbitrary photographs or Stable Diffusion images created by other
+implementations.

-![elven_princess](../assets/outpainting/elven_princess.png)
+Consider this image:

-We can fix that using the `!fix` command!
+![curly_woman](../assets/outpainting/curly.png)
+
+Pretty nice, but it's annoying that the top of her head is cut
+off. She's also a bit off center. Let's fix that!

 ~~~~
-dream> !fix my_images/elven_princess.png -D top 50
+dream> !fix images/curly.png --outcrop top 64 right 64
 ~~~~

-This is telling dream.py to open up a rectangle 50 pixels high at the
-top of the image and outpaint into it. The result is:
+This is saying to apply the `outcrop` extension by extending the top
+of the image by 64 pixels, and the right of the image by the same
+amount. You can use any combination of top|left|right|bottom, and
+specify any number of pixels to extend. You can also abbreviate
+`--outcrop` to `-c`.

-![elven_princess.fixed](../assets/outpainting/elven_princess.outpainted.png)
+The result looks like this:

-Viola! You can similarly specify `bottom`, `left` or `right` to
-outpaint into these margins.
+![curly_woman_outcrop](../assets/outpainting/curly-outcrop.png)

-There are some limitations to be aware of:
+The new image is actually slightly larger than the original (576x576,
+because 64 pixels were added to the top and right sides.)

-1. You cannot change the size of the image rectangle. In the example,
-   notice that the whole image is shifted downwards by 50 pixels, rather
-   than the top being extended upwards.
+A number of caveats:

-2. Attempting to outpaint larger areas will frequently give rise to ugly
+1. Although you can specify any pixel values, they will be rounded up
+to the nearest multiple of 64. Smaller values are better. Larger
+extensions are more likely to generate artefacts. However, if you wish
+you can run the !fix command repeatedly to cautiously expand the
+image.
+
+2. The extension is stochastic, meaning that each time you run it
+you'll get a slightly different result. You can run it repeatedly
+until you get an image you like. Unfortunately `!fix` does not
+currently respect the `-n` (`--iterations`) argument.
+
+## Outpaint
+
+The `outpaint` extension does the same thing, but with subtle
+differences. Starting with the same image, here is how we would add an
+additional 64 pixels to the top of the image:
+
+~~~
+dream> !fix images/curly.png --out_direction top 64
+~~~
+
+(you can abbreviate ``--out_direction` as `-D`.
+
+The result is shown here:
+
+![curly_woman_outpaint](../assets/outpainting/curly-outpaint.png)
+
+Although the effect is similar, there are significant differences from
+outcropping:
+
+1. You can only specify one direction to extend at a time.
+2. The image is **not** resized. Instead, the image is shifted by the specified
+number of pixels. If you look carefully, you'll see that less of the lady's
+torso is visible in the image.
+3. Because the image dimensions remain the same, there's no rounding
+to multiples of 64.
+4. Attempting to outpaint larger areas will frequently give rise to ugly
   ghosting effects.
-
-3. For best results, try increasing the step number.
-
-4. If you don't specify a pixel value in -D, it will default to half
+5. For best results, try increasing the step number.
+6. If you don't specify a pixel value in -D, it will default to half
   of the whole image, which is likely not what you want.

-You can do more with `!fix` including upscaling and facial
-reconstruction of previously-generated images. See
-[./UPSCALE.md#fixing-previously-generated-images] for the details.
-
-### Advanced Usage
-
-For more control over the outpaintihg process, you can provide the
-`-D` option at image generation time. This allows you to apply all the
-controls, including the ability to resize the image and apply face-fixing
-and upscaling. For example:
-
-~~~~
-dream> man with cat on shoulder -I./images/man.png -D bottom 100 -W960 -H960 -fit
-~~~~
-
-Or even shorter, since the prompt is read from the metadata of the old image:
-
-~~~~
-dream> -I./images/man.png -D bottom 100 -W960 -H960 -fit -U2 -G1
-~~~~
+Neither `outpaint` nor `outcrop` are perfect, but we continue to tune
+and improve them. If one doesn't work, try the other. You may also
+wish to experiment with other `img2img` arguments, such as `-C`, `-f`
+and `-s`.

--- a/docs/features/POSTPROCESS.md
+++ b/docs/features/POSTPROCESS.md
@ -1,14 +1,18 @@

 ---
-title: Upscale
+title: Postprocessing
 ---

 ## Intro

-The script provides the ability to restore faces and upscale. You can apply
-these operations at the time you generate the images, or at any time to a
-previously-generated PNG file, using the
-[!fix](#fixing-previously-generated-images) command.
+This extension provides the ability to restore faces and upscale
+images.
+
+Face restoration and upscaling can be applied at the time you generate
+the images, or at any later time against a previously-generated PNG
+file, using the [!fix](#fixing-previously-generated-images)
+command. [Outpainting and outcropping](OUTPAINTING.md) can only be
+applied after the fact.

 ## Face Fixing

@ -31,7 +35,7 @@ into **src/gfpgan/experiments/pretrained_models**. On Mac and Linux systems,
 here's how you'd do it using **wget**:

 ```bash
-wget https://github.com/TencentARC/GFPGAN/releases/download/v1.3.0/GFPGANv1.4.pth src/gfpgan/experiments/pretrained_models/
+wget https://github.com/TencentARC/GFPGAN/releases/download/v1.3.0/GFPGANv1.4.pth -P src/gfpgan/experiments/pretrained_models/
 ```

 Make sure that you're in the InvokeAI directory when you do this.
@ -158,9 +162,9 @@ situations when there is very little facial data to work with.
 ## Fixing Previously-Generated Images

 It is easy to apply face restoration and/or upscaling to any
-previously-generated file. Just use the syntax
-`!fix path/to/file.png <options>`. For example, to apply GFPGAN at strength 0.8
-and upscale 2X for a file named `./outputs/img-samples/000044.2945021133.png`,
+previously-generated file. Just use the syntax `!fix path/to/file.png
+<options>`. For example, to apply GFPGAN at strength 0.8 and upscale
+2X for a file named `./outputs/img-samples/000044.2945021133.png`,
 just run:

 ```
--- a/docs/features/WEB.md
+++ b/docs/features/WEB.md
@ -1,12 +1,15 @@
 ---
-title: Barebones Web Server
+title: InvokeAI Web UI & Server
 ---

-# :material-web: Barebones Web Server
+# :material-web: InvokeAI Web Server

-As of version 1.10, this distribution comes with a bare bones web server (see
-screenshot). To use it, run the `dream.py` script by adding the `--web`
-option.
+As of version 2.0, this distribution's web server has been updated to include 
+an all-new UI, with optimizations to improve common workflows for image generation.
+
+## Getting Started & Initialization Commands
+
+To start the web server, run the `dream.py` script by adding the `--web` parameter.

 ```bash
 (ldm) ~/stable-diffusion$ python3 scripts/dream.py --web
@ -15,7 +18,58 @@ option.
 You can then connect to the server by pointing your web browser at
 http://localhost:9090, or to the network name or IP address of the server.

-Kudos to [Tesseract Cat](https://github.com/TesseractCat) for contributing this
-code, and to [dagf2101](https://github.com/dagf2101) for refining it.
+### Additional Options
+  `--web_develop`    -   Starts the web server in development mode.
+  
+  `--web_verbose`    -   Enables verbose logging
+  
+  `--cors [CORS ...]`    -   Additional allowed origins, comma-separated
+  
+  `--host HOST`    -   Web server: Host or IP to listen on. Set to 0.0.0.0 to
+  accept traffic from other devices on your network.
+                        
+  `--port PORT`    -   Web server: Port to listen on
+  
+  `--gui`    -   Start InvokeAI GUI - This is the "desktop mode" version of the web app. It uses Flask 
+  to create a desktop app experience of the webserver.
+
+
+## Web Specific Features
+
+The web experience offers an incredibly easy-to-use experience for interacting with the InvokeAI toolkit. 
+For detailed guidance on individual features, see the Feature-specific help documents available in this directory.
+Note that the latest functionality available in the CLI may not always be available in the Web interface.
+
+### Dark Mode & Light Mode 
+The InvokeAI interface is available in a nano-carbon black & purple Dark Mode, and a "burn your eyes out Nosferatu" Light Mode. These can be toggled by clicking the Sun/Moon icons at the top right of the interface. 
+
+![InvokeAI Web Server - Dark Mode](../assets/invoke_web_dark.png)
+
+![InvokeAI Web Server - Light Mode](../assets/invoke_web_light.png)
+
+### Invocation Toolbar
+The left side of the InvokeAI interface is available for customizing the prompt and the settings used for invoking your new image. Typing your prompt into the open text field and clicking the Invoke button will produce the image based on the settings configured in the toolbar. 
+
+See below for additional documentation related to each feature:
+- [Core Prompt Settings](./CLI.md)
+- [Variations](./VARIATIONS.md)
+- [Upscaling](./UPSCALE.md)
+- [Image to Image](./IMG2IMG.md)
+- [Inpainting](./INPAINTING.md)
+- [Other](./OTHER.md)
+
+### Invocation Gallery
+The currently selected --outdir (or the default outputs folder) will display all previously generated files on load. As new invocations are generated, these will be dynamically added to the gallery, and can be previewed by selecting them. Each image also has a simple set of actions (e.g., Delete, Use Seed, Use All Parameters, etc.) that can be accessed by hovering over the image.
+
+### Image Workspace
+When an image from the Invocation Gallery is selected, or is generated, the image will be displayed within the center of the interface. A quickbar of common image interactions are displayed along the top of the image, including:
+- Use image in the `Image to Image` workflow
+- Initialize Face Restoration on the selected file
+- Initialize Upscaling on the selected file
+- View File metadata and details
+- Delete the file
+
+## Acknowledgements
+
+A huge shout-out to the core team working to make this vision a reality, including [psychedelicious](https://github.com/psychedelicious), [Kyle0654](https://github.com/Kyle0654) and [blessedcoolant](https://github.com/blessedcoolant). [hipsterusername](https://github.com/hipsterusername) was the team's unofficial cheerleader and added tooltips/docs.

-![Dream Web Server](../assets/dream_web_server.png)