Merge branch 'development' of github.com:lstein/stable-diffusion into asymmetric-tiling

2024-08-30 20:32:17 +00:00 · 2022-10-18 13:34:10 -04:00
parent d6195522aa 71c3835f3e
commit 9d19213b8a
21 changed files with 321 additions and 50 deletions
--- a/docs/features/CLI.md
+++ b/docs/features/CLI.md
@ -85,6 +85,7 @@ overridden on a per-prompt basis (see [List of prompt arguments](#list-of-prompt
 | `--from_file <path>`                      |                                           | `None`                                         | Read list of prompts from a file. Use `-` to read from standard input                                |
 | `--model <modelname>`                     |                                           | `stable-diffusion-1.4`                         | Loads model specified in configs/models.yaml. Currently one of "stable-diffusion-1.4" or "laion400m" |
 | `--full_precision`                        | `-F`                                      | `False`                                        | Run in slower full-precision mode. Needed for Macintosh M1/M2 hardware and some older video cards.   |
+| `--png_compression <0-9>`                 | `-z<0-9>`                                 |  6                                             | Select level of compression for output files, from 0 (no compression) to 9 (max compression)         |
 | `--web`                                   |                                           | `False`                                        | Start in web server mode                                                                             |
 | `--host <ip addr>`                        |                                           | `localhost`                                    | Which network interface web server should listen on. Set to 0.0.0.0 to listen on any.                |
 | `--port <port>`                           |                                           | `9090`                                         | Which port web server should listen for requests on.                                                 |
@ -153,6 +154,7 @@ Here are the invoke> command that apply to txt2img:
 | --seed <int>       | -S<int>   | None                | Set the random seed for the next series of images. This can be used to recreate an image generated previously.|
 | --sampler <sampler>| -A<sampler>| k_lms              | Sampler to use. Use -h to get list of available samplers. |
 | --hires_fix        |           |                     | Larger images often have duplication artefacts. This option suppresses duplicates by generating the image at low res, and then using img2img to increase the resolution |
+| --png_compression <0-9> | -z<0-9> |  6           | Select level of compression for output files, from 0 (no compression) to 9 (max compression)         |
 | --grid             | -g        | False               | Turn on grid mode to return a single image combining all the images generated by this prompt |
 | --individual       | -i        | True                | Turn off grid mode (deprecated; leave off --grid instead) |
 | --outdir <path>    |  -o<path> | outputs/img_samples  | Temporarily change the location of these images |
@ -211,11 +213,35 @@ accepts additional options:
    [Inpainting](./INPAINTING.md) for details.

 inpainting accepts all the arguments used for txt2img and img2img, as
-well as the --mask (-M) argument:
+well as the --mask (-M) and --text_mask (-tm) arguments:

 | Argument <img width="100" align="right"/> |  Shortcut  |  Default            |  Description |
 |--------------------|------------|---------------------|--------------|
 | `--init_mask <path>` | `-M<path>`   | `None`                |Path to an image the same size as the initial_image, with areas for inpainting made transparent.|
+| `--text_mask <prompt> [<float>]` | `-tm <prompt> [<float>]` | <none>  | Create a mask from a text prompt describing part of the image|
+
+`--text_mask` (short form `-tm`) is a way to generate a mask using a
+text description of the part of the image to replace. For example, if
+you have an image of a breakfast plate with a bagel, toast and
+scrambled eggs, you can selectively mask the bagel and replace it with
+a piece of cake this way:
+
+~~~
+invoke> a piece of cake -I /path/to/breakfast.png -tm bagel
+~~~
+
+The algorithm uses <a
+href="https://github.com/timojl/clipseg">clipseg</a> to classify
+different regions of the image. The classifier puts out a confidence
+score for each region it identifies. Generally regions that score
+above 0.5 are reliable, but if you are getting too much or too little
+masking you can adjust the threshold down (to get more mask), or up
+(to get less). In this example, by passing `-tm` a higher value, we
+are insisting on a more stringent classification.
+
+~~~
+invoke> a piece of cake -I /path/to/breakfast.png -tm bagel 0.6
+~~~

 # Other Commands

--- a/docs/features/INPAINTING.md
+++ b/docs/features/INPAINTING.md
@ -34,7 +34,46 @@ original unedited image and the masked (partially transparent) image:
 invoke> "man with cat on shoulder" -I./images/man.png -M./images/man-transparent.png
 ```

-We are hoping to get rid of the need for this workaround in an upcoming release.
+## **Masking using Text**
+
+You can also create a mask using a text prompt to select the part of
+the image you want to alter, using the <a
+href="https://github.com/timojl/clipseg">clipseg</a> algorithm. This
+works on any image, not just ones generated by InvokeAI.
+
+The `--text_mask` (short form `-tm`) option takes two arguments. The
+first argument is a text description of the part of the image you wish
+to mask (paint over). If the text description contains a space, you must
+surround it with quotation marks. The optional second argument is the
+minimum threshold for the mask classifier's confidence score, described
+in more detail below.
+
+To see how this works in practice, here's an image of a still life
+painting that I got off the web.
+
+<img src="../assets/still-life-scaled.jpg">
+
+You can selectively mask out the
+orange and replace it with a baseball in this way:
+
+~~~
+invoke> a baseball -I /path/to/still_life.png -tm orange
+~~~
+
+<img src="../assets/still-life-inpainted.png">
+
+The clipseg classifier produces a confidence score for each region it
+identifies. Generally regions that score above 0.5 are reliable, but
+if you are getting too much or too little masking you can adjust the
+threshold down (to get more mask), or up (to get less). In this
+example, by passing `-tm` a higher value, we are insisting on a tigher
+mask. However, if you make it too high, the orange may not be picked
+up at all!
+
+~~~
+invoke> a baseball -I /path/to/breakfast.png -tm orange 0.6
+~~~
+

 ### Inpainting is not changing the masked region enough!