add clipseg support for creating inpaint masks from text

On the command line, the new option is --text_mask or -tm. Example: ``` invoke> a baseball -I /path/to/still_life.png -tm orange ``` This will find the orange fruit in the still life painting and replace it with an image of a baseball.
2024-08-30 20:32:17 +00:00 · 2022-10-16 23:30:24 -04:00
parent 32122e0312
commit 20551857da
9 changed files with 155 additions and 22 deletions
--- a/docs/features/CLI.md
+++ b/docs/features/CLI.md
@ -154,7 +154,7 @@ Here are the invoke> command that apply to txt2img:
 | --seed <int>       | -S<int>   | None                | Set the random seed for the next series of images. This can be used to recreate an image generated previously.|
 | --sampler <sampler>| -A<sampler>| k_lms              | Sampler to use. Use -h to get list of available samplers. |
 | --hires_fix        |           |                     | Larger images often have duplication artefacts. This option suppresses duplicates by generating the image at low res, and then using img2img to increase the resolution |
-| `--png_compression <0-9>` | `-z<0-9>` |  6           | Select level of compression for output files, from 0 (no compression) to 9 (max compression)         |
+| --png_compression <0-9> | -z<0-9> |  6           | Select level of compression for output files, from 0 (no compression) to 9 (max compression)         |
 | --grid             | -g        | False               | Turn on grid mode to return a single image combining all the images generated by this prompt |
 | --individual       | -i        | True                | Turn off grid mode (deprecated; leave off --grid instead) |
 | --outdir <path>    |  -o<path> | outputs/img_samples  | Temporarily change the location of these images |
@ -212,11 +212,35 @@ accepts additional options:
    [Inpainting](./INPAINTING.md) for details.

 inpainting accepts all the arguments used for txt2img and img2img, as
-well as the --mask (-M) argument:
+well as the --mask (-M) and --text_mask (-tm) arguments:

 | Argument <img width="100" align="right"/> |  Shortcut  |  Default            |  Description |
 |--------------------|------------|---------------------|--------------|
 | `--init_mask <path>` | `-M<path>`   | `None`                |Path to an image the same size as the initial_image, with areas for inpainting made transparent.|
+| `--text_mask <prompt> [<float>]` | `-tm <prompt> [<float>]` | <none>  | Create a mask from a text prompt describing part of the image|
+
+`--text_mask` (short form `-tm`) is a way to generate a mask using a
+text description of the part of the image to replace. For example, if
+you have an image of a breakfast plate with a bagel, toast and
+scrambled eggs, you can selectively mask the bagel and replace it with
+a piece of cake this way:
+
+~~~
+invoke> a piece of cake -I /path/to/breakfast.png -tm bagel
+~~~
+
+The algorithm uses <a
+href="https://github.com/timojl/clipseg">clipseg</a> to classify
+different regions of the image. The classifier puts out a confidence
+score for each region it identifies. Generally regions that score
+above 0.5 are reliable, but if you are getting too much or too little
+masking you can adjust the threshold down (to get more mask), or up
+(to get less). In this example, by passing `-tm` a higher value, we
+are insisting on a more stringent classification.
+
+~~~
+invoke> a piece of cake -I /path/to/breakfast.png -tm bagel 0.6
+~~~

 # Other Commands

--- a/docs/features/INPAINTING.md
+++ b/docs/features/INPAINTING.md
@ -34,7 +34,46 @@ original unedited image and the masked (partially transparent) image:
 invoke> "man with cat on shoulder" -I./images/man.png -M./images/man-transparent.png
 ```

-We are hoping to get rid of the need for this workaround in an upcoming release.
+## **Masking using Text**
+
+You can also create a mask using a text prompt to select the part of
+the image you want to alter, using the <a
+href="https://github.com/timojl/clipseg">clipseg</a> algorithm. This
+works on any image, not just ones generated by InvokeAI.
+
+The `--text_mask` (short form `-tm`) option takes two arguments. The
+first argument is a text description of the part of the image you wish
+to mask (paint over). If the text description contains a space, you must
+surround it with quotation marks. The optional second argument is the
+minimum threshold for the mask classifier's confidence score, described
+in more detail below.
+
+To see how this works in practice, here's an image of a still life
+painting that I got off the web.
+
+<img src="../assets/still-life-scaled.jpg">
+
+You can selectively mask out the
+orange and replace it with a baseball in this way:
+
+~~~
+invoke> a baseball -I /path/to/still_life.png -tm orange
+~~~
+
+<img src="../assets/still-life-inpainted.png">
+
+The clipseg classifier produces a confidence score for each region it
+identifies. Generally regions that score above 0.5 are reliable, but
+if you are getting too much or too little masking you can adjust the
+threshold down (to get more mask), or up (to get less). In this
+example, by passing `-tm` a higher value, we are insisting on a tigher
+mask. However, if you make it too high, the orange may not be picked
+up at all!
+
+~~~
+invoke> a baseball -I /path/to/breakfast.png -tm orange 0.6
+~~~
+

 ### Inpainting is not changing the masked region enough!