cleanup inpainting and img2img

- add a `--inpaint_replace` option that fills masked regions with latent noise. This allows radical changes to inpainted regions at the cost of losing context. - fix up readline, arg processing and metadata writing to accommodate this change - fixed bug in storage and retrieval of variations, discovered incidentally during testing - update documentation
2025-07-25 21:05:37 +00:00 · 2022-10-02 16:37:36 -04:00
parent f6bc13736a
commit 6f93dc7712
6 changed files with 91 additions and 20 deletions
--- a/docs/features/INPAINTING.md
+++ b/docs/features/INPAINTING.md
@ -6,21 +6,29 @@ title: Inpainting

 ## **Creating Transparent Regions for Inpainting**

-Inpainting is really cool. To do it, you start with an initial image and use a photoeditor to make
-one or more regions transparent (i.e. they have a "hole" in them). You then provide the path to this
-image at the invoke> command line using the `-I` switch. Stable Diffusion will only paint within the
-transparent region.
+Inpainting is really cool. To do it, you start with an initial image
+and use a photoeditor to make one or more regions transparent
+(i.e. they have a "hole" in them). You then provide the path to this
+image at the dream> command line using the `-I` switch. Stable
+Diffusion will only paint within the transparent region.

-There's a catch. In the current implementation, you have to prepare the initial image correctly so
-that the underlying colors are preserved under the transparent area. Many imaging editing
-applications will by default erase the color information under the transparent pixels and replace
-them with white or black, which will lead to suboptimal inpainting. You also must take care to
-export the PNG file in such a way that the color information is preserved.
+There's a catch. In the current implementation, you have to prepare
+the initial image correctly so that the underlying colors are
+preserved under the transparent area. Many imaging editing
+applications will by default erase the color information under the
+transparent pixels and replace them with white or black, which will
+lead to suboptimal inpainting. It often helps to apply incomplete
+transparency, such as any value between 1 and 99%

-If your photoeditor is erasing the underlying color information, `invoke.py` will give you a big fat
-warning. If you can't find a way to coax your photoeditor to retain color values under transparent
-areas, then you can combine the `-I` and `-M` switches to provide both the original unedited image
-and the masked (partially transparent) image:
+You also must take care to export the PNG file in such a way that the
+color information is preserved. There is often an option in the export
+dialog that lets you specify this.
+
+If your photoeditor is erasing the underlying color information,
+`dream.py` will give you a big fat warning. If you can't find a way to
+coax your photoeditor to retain color values under transparent areas,
+then you can combine the `-I` and `-M` switches to provide both the
+original unedited image and the masked (partially transparent) image:

 ```bash
 invoke> "man with cat on shoulder" -I./images/man.png -M./images/man-transparent.png
@ -28,6 +36,26 @@ invoke> "man with cat on shoulder" -I./images/man.png -M./images/man-transparent

 We are hoping to get rid of the need for this workaround in an upcoming release.

+### Inpainting is not changing the masked region enough!
+
+One of the things to understand about how inpainting works is that it
+is equivalent to running img2img on just the masked (transparent)
+area. img2img builds on top of the existing image data, and therefore
+will attempt to preserve colors, shapes and textures to the best of
+its ability. Unfortunately this means that if you want to make a
+dramatic change in the inpainted region, for example replacing a red
+wall with a blue one, the algorithm will fight you.
+
+You have a couple of options. The first is to increase the values of
+the requested steps (`-sXXX`), strength (`-f0.XX`), and/or
+condition-free guidance (`-CXX.X`). If this is not working for you, a
+more extreme step is to provide the `--inpaint_replace` option. This
+causes the algorithm to entirely ignore the data underneath the masked
+region and to treat this area like a blank canvas. This will enable
+you to replace colored regions entirely, but beware that the masked
+region will not blend in with the surrounding unmasked regions as
+well.
+
 ---

 ## Recipe for GIMP
--- a/ldm/generate.py
+++ b/ldm/generate.py
@ -260,6 +260,8 @@ class Generate:
            codeformer_fidelity = None,
            save_original    = False,
            upscale          = None,
+            # this is specific to inpainting and causes more extreme inpainting
+            inpaint_replace  = 0.0,
            # Set this True to handle KeyboardInterrupt internally
            catch_interrupts = False,
            hires_fix        = False,
@ -358,6 +360,7 @@ class Generate:
                f'variation weights must be in [0.0, 1.0]: got {[weight for _, weight in with_variations]}'

        width, height, _ = self._resolution_check(width, height, log=True)
+        assert inpaint_replace >=0.0 and inpaint_replace <= 1.0,'inpaint_replace must be between 0.0 and 1.0'

        if sampler_name and (sampler_name != self.sampler_name):
            self.sampler_name = sampler_name
@ -385,6 +388,8 @@ class Generate:
                height,
                fit=fit,
            )
+
+            # TODO: Hacky selection of operation to perform. Needs to be refactored.
            if (init_image is not None) and (mask_image is not None):
                generator = self._make_inpaint()
            elif (embiggen != None or embiggen_tiles != None):
@ -399,6 +404,7 @@ class Generate:
            generator.set_variation(
                self.seed, variation_amount, with_variations
            )
+
            results = generator.generate(
                prompt,
                iterations=iterations,
@ -420,6 +426,7 @@ class Generate:
                perlin=perlin,
                embiggen=embiggen,
                embiggen_tiles=embiggen_tiles,
+                inpaint_replace=inpaint_replace,
            )

            if init_color:
--- a/ldm/invoke/args.py
+++ b/ldm/invoke/args.py
@ -239,6 +239,8 @@ class Args(object):
                switches.append(f'--init_color {a["init_color"]}')
            if a['strength'] and a['strength']>0:
                switches.append(f'-f {a["strength"]}')
+            if a['inpaint_replace']:
+                switches.append(f'--inpaint_replace')
        else:
            switches.append(f'-A {a["sampler_name"]}')

@ -266,11 +268,12 @@ class Args(object):
        # outpainting parameters
        if a['out_direction']:
            switches.append(f'-D {" ".join([str(u) for u in a["out_direction"]])}')
+
        # LS: slight semantic drift which needs addressing in the future:
        # 1. Variations come out of the stored metadata as a packed string with the keyword "variations"
        # 2. However, they come out of the CLI (and probably web) with the keyword "with_variations" and
        #    in broken-out form. Variation (1) should be changed to comply with (2)
-        if a['with_variations']:
+        if a['with_variations'] and len(a['with_variations'])>0:
            formatted_variations = ','.join(f'{seed}:{weight}' for seed, weight in (a["with_variations"]))
            switches.append(f'-V {formatted_variations}')
        if 'variations' in a and len(a['variations'])>0:
@ -694,6 +697,13 @@ class Args(object):
            metavar=('direction','pixels'),
            help='Outcrop the image with one or more direction/pixel pairs: -c top 64 bottom 128 left 64 right 64',
        )
+        img2img_group.add_argument(
+            '-r',
+            '--inpaint_replace',
+            type=float,
+            default=0.0,
+            help='when inpainting, adjust how aggressively to replace the part of the picture under the mask, from 0.0 (a gentle merge) to 1.0 (replace entirely)',
+        )
        postprocessing_group.add_argument(
            '-ft',
            '--facetool',
@ -800,7 +810,8 @@ def metadata_dumps(opt,

    # remove any image keys not mentioned in RFC #266
    rfc266_img_fields = ['type','postprocessing','sampler','prompt','seed','variations','steps',
-                         'cfg_scale','threshold','perlin','step_number','width','height','extra','strength']
+                         'cfg_scale','threshold','perlin','step_number','width','height','extra','strength',
+                         'init_img','init_mask']

    rfc_dict ={}

@ -821,11 +832,15 @@ def metadata_dumps(opt,
    # 'variations' should always exist and be an array, empty or consisting of {'seed': seed, 'weight': weight} pairs
    rfc_dict['variations'] = [{'seed':x[0],'weight':x[1]} for x in opt.with_variations] if opt.with_variations else []

+    # if variations are present then we need to replace 'seed' with 'orig_seed'
+    if hasattr(opt,'first_seed'):
+        rfc_dict['seed'] = opt.first_seed
+
    if opt.init_img:
-        rfc_dict['type']           = 'img2img'
-        rfc_dict['strength_steps'] = rfc_dict.pop('strength')
-        rfc_dict['orig_hash']      = calculate_init_img_hash(opt.init_img)
-        rfc_dict['sampler']        = 'ddim'  # TODO: FIX ME WHEN IMG2IMG SUPPORTS ALL SAMPLERS
+        rfc_dict['type']            = 'img2img'
+        rfc_dict['strength_steps']  = rfc_dict.pop('strength')
+        rfc_dict['orig_hash']       = calculate_init_img_hash(opt.init_img)
+        rfc_dict['inpaint_replace'] = opt.inpaint_replace
    else:
        rfc_dict['type']  = 'txt2img'
        rfc_dict.pop('strength')
--- a/ldm/invoke/generator/base.py
+++ b/ldm/invoke/generator/base.py
@ -5,6 +5,7 @@ including img2img, txt2img, and inpaint
 import torch
 import numpy as  np
 import random
+import os
 from tqdm import tqdm, trange
 from PIL               import Image
 from einops import rearrange, repeat
@ -168,3 +169,14 @@ class Generator():

        return v2

+    # this is a handy routine for debugging use. Given a generated sample,
+    # convert it into a PNG image and store it at the indicated path
+    def save_sample(self, sample, filepath):
+        image = self.sample_to_image(sample)
+        dirname = os.path.dirname(filepath) or '.'
+        if not os.path.exists(dirname):
+            print(f'** creating directory {dirname}')
+            os.makedirs(dirname, exist_ok=True)
+        image.save(filepath,'PNG')
+
+        
--- a/ldm/invoke/generator/inpaint.py
+++ b/ldm/invoke/generator/inpaint.py
@ -18,7 +18,7 @@ class Inpaint(Img2Img):
    @torch.no_grad()
    def get_make_image(self,prompt,sampler,steps,cfg_scale,ddim_eta,
                       conditioning,init_image,mask_image,strength,
-                       step_callback=None,**kwargs):
+                       step_callback=None,inpaint_replace=False,**kwargs):
        """
        Returns a function returning an image derived from the prompt and
        the initial image + mask.  Return value depends on the seed at
@ -58,6 +58,14 @@ class Inpaint(Img2Img):
                noise=x_T
            )

+            # to replace masked area with latent noise, weighted by inpaint_replace strength
+            if inpaint_replace > 0.0:
+                print(f'>> inpaint will replace what was under the mask with a strength of {inpaint_replace}')
+                l_noise = self.get_noise(kwargs['width'],kwargs['height'])
+                inverted_mask = 1.0-mask_image  # there will be 1s where the mask is
+                masked_region = (1.0-inpaint_replace) * inverted_mask * z_enc + inpaint_replace * inverted_mask * l_noise
+                z_enc   = z_enc * mask_image + masked_region
+
            # decode it
            samples = sampler.decode(
                z_enc,
--- a/ldm/invoke/readline.py
+++ b/ldm/invoke/readline.py
@ -52,6 +52,7 @@ COMMANDS = (
    '--skip_normalize','-x',
    '--log_tokenization','-t',
    '--hires_fix',
+    '--inpaint_replace','-r',
    '!fix','!fetch','!history','!search','!clear',
    '!models','!switch','!import_model','!edit_model'
    )