Merge branch 'development' of github.com:lstein/stable-diffusion into asymmetric-tiling

This commit is contained in:
Carson Katri
2022-10-18 13:34:10 -04:00
21 changed files with 321 additions and 50 deletions

View File

@ -85,6 +85,7 @@ overridden on a per-prompt basis (see [List of prompt arguments](#list-of-prompt
| `--from_file <path>` | | `None` | Read list of prompts from a file. Use `-` to read from standard input |
| `--model <modelname>` | | `stable-diffusion-1.4` | Loads model specified in configs/models.yaml. Currently one of "stable-diffusion-1.4" or "laion400m" |
| `--full_precision` | `-F` | `False` | Run in slower full-precision mode. Needed for Macintosh M1/M2 hardware and some older video cards. |
| `--png_compression <0-9>` | `-z<0-9>` | 6 | Select level of compression for output files, from 0 (no compression) to 9 (max compression) |
| `--web` | | `False` | Start in web server mode |
| `--host <ip addr>` | | `localhost` | Which network interface web server should listen on. Set to 0.0.0.0 to listen on any. |
| `--port <port>` | | `9090` | Which port web server should listen for requests on. |
@ -153,6 +154,7 @@ Here are the invoke> command that apply to txt2img:
| --seed <int> | -S<int> | None | Set the random seed for the next series of images. This can be used to recreate an image generated previously.|
| --sampler <sampler>| -A<sampler>| k_lms | Sampler to use. Use -h to get list of available samplers. |
| --hires_fix | | | Larger images often have duplication artefacts. This option suppresses duplicates by generating the image at low res, and then using img2img to increase the resolution |
| --png_compression <0-9> | -z<0-9> | 6 | Select level of compression for output files, from 0 (no compression) to 9 (max compression) |
| --grid | -g | False | Turn on grid mode to return a single image combining all the images generated by this prompt |
| --individual | -i | True | Turn off grid mode (deprecated; leave off --grid instead) |
| --outdir <path> | -o<path> | outputs/img_samples | Temporarily change the location of these images |
@ -211,11 +213,35 @@ accepts additional options:
[Inpainting](./INPAINTING.md) for details.
inpainting accepts all the arguments used for txt2img and img2img, as
well as the --mask (-M) argument:
well as the --mask (-M) and --text_mask (-tm) arguments:
| Argument <img width="100" align="right"/> | Shortcut | Default | Description |
|--------------------|------------|---------------------|--------------|
| `--init_mask <path>` | `-M<path>` | `None` |Path to an image the same size as the initial_image, with areas for inpainting made transparent.|
| `--text_mask <prompt> [<float>]` | `-tm <prompt> [<float>]` | <none> | Create a mask from a text prompt describing part of the image|
`--text_mask` (short form `-tm`) is a way to generate a mask using a
text description of the part of the image to replace. For example, if
you have an image of a breakfast plate with a bagel, toast and
scrambled eggs, you can selectively mask the bagel and replace it with
a piece of cake this way:
~~~
invoke> a piece of cake -I /path/to/breakfast.png -tm bagel
~~~
The algorithm uses <a
href="https://github.com/timojl/clipseg">clipseg</a> to classify
different regions of the image. The classifier puts out a confidence
score for each region it identifies. Generally regions that score
above 0.5 are reliable, but if you are getting too much or too little
masking you can adjust the threshold down (to get more mask), or up
(to get less). In this example, by passing `-tm` a higher value, we
are insisting on a more stringent classification.
~~~
invoke> a piece of cake -I /path/to/breakfast.png -tm bagel 0.6
~~~
# Other Commands

View File

@ -34,7 +34,46 @@ original unedited image and the masked (partially transparent) image:
invoke> "man with cat on shoulder" -I./images/man.png -M./images/man-transparent.png
```
We are hoping to get rid of the need for this workaround in an upcoming release.
## **Masking using Text**
You can also create a mask using a text prompt to select the part of
the image you want to alter, using the <a
href="https://github.com/timojl/clipseg">clipseg</a> algorithm. This
works on any image, not just ones generated by InvokeAI.
The `--text_mask` (short form `-tm`) option takes two arguments. The
first argument is a text description of the part of the image you wish
to mask (paint over). If the text description contains a space, you must
surround it with quotation marks. The optional second argument is the
minimum threshold for the mask classifier's confidence score, described
in more detail below.
To see how this works in practice, here's an image of a still life
painting that I got off the web.
<img src="../assets/still-life-scaled.jpg">
You can selectively mask out the
orange and replace it with a baseball in this way:
~~~
invoke> a baseball -I /path/to/still_life.png -tm orange
~~~
<img src="../assets/still-life-inpainted.png">
The clipseg classifier produces a confidence score for each region it
identifies. Generally regions that score above 0.5 are reliable, but
if you are getting too much or too little masking you can adjust the
threshold down (to get more mask), or up (to get less). In this
example, by passing `-tm` a higher value, we are insisting on a tigher
mask. However, if you make it too high, the orange may not be picked
up at all!
~~~
invoke> a baseball -I /path/to/breakfast.png -tm orange 0.6
~~~
### Inpainting is not changing the masked region enough!