add auto-creation of mask for inpainting (#438)

* now use a single init image for both image and mask * turn on debugging for now to write out mask and image * add back -M option as a fallback
2024-08-30 20:32:17 +00:00 · 2022-09-08 07:34:03 -04:00 · 2022-09-08 07:34:03 -04:00 · 7996a30e3a
commit 7996a30e3a
parent a69ca31f34
5 changed files with 191 additions and 96 deletions
--- a/README.md
+++ b/README.md
@ -22,22 +22,24 @@ text-to-image generator. This fork supports:
   generating images in your browser.

 3. Support for img2img in which you provide a seed image to guide the
-      image creation. (inpainting & masking coming soon)
+      image creation

-4. A notebook for running the code on Google Colab.
+4. Preliminary inpainting support.

-5. Upscaling and face fixing using the optional ESRGAN and GFPGAN
+5. A notebook for running the code on Google Colab.
+
+6. Upscaling and face fixing using the optional ESRGAN and GFPGAN
   packages.

-6. Weighted subprompts for prompt tuning.
+7. Weighted subprompts for prompt tuning.

-7. [Image variations](VARIATIONS.md) which allow you to systematically
+8. [Image variations](VARIATIONS.md) which allow you to systematically
 generate variations of an image you like and combine two or more
 images together to combine the best features of both.

-8. Textual inversion for customization of the prompt language and images.
+9. Textual inversion for customization of the prompt language and images.

-8. ...and more!
+10. ...and more!

 This fork is rapidly evolving, so use the Issues panel to report bugs
 and make feature requests, and check back periodically for
@ -75,9 +77,10 @@ log file of image names and prompts to the selected output directory.
 In addition, as of version 1.02, it also writes the prompt into the PNG
 file's metadata where it can be retrieved using scripts/images2prompt.py

-The script is confirmed to work on Linux and Windows systems. It should
-work on MacOSX as well, but this is not confirmed. Note that this script
-runs from the command-line (CMD or Terminal window), and does not have a GUI.
+The script is confirmed to work on Linux, Windows and Mac
+systems. Note that this script runs from the command-line or can be used
+as a Web application. The Web GUI is currently rudimentary, but a much
+better replacement is on its way.

 ```
 (ldm) ~/stable-diffusion$ python3 ./scripts/dream.py
@ -97,7 +100,7 @@ dream> "there's a fly in my soup" -n6 -g
 dream> q

 # this shows how to retrieve the prompt stored in the saved image's metadata
-(ldm) ~/stable-diffusion$ python3 ./scripts/images2prompt.py outputs/img_samples/*.png
+(ldm) ~/stable-diffusion$ python ./scripts/images2prompt.py outputs/img_samples/*.png
 00009.png: "ashley judd riding a camel" -s150 -S 416354203
 00010.png: "ashley judd riding a camel" -s150 -S 1362479620
 00011.png: "there's a fly in my soup" -n6 -g -S 2685670268
@ -118,29 +121,68 @@ The script itself also recognizes a series of command-line switches
 that will change important global defaults, such as the directory for
 image outputs and the location of the model weight files.

+## Hardware Requirements
+
+You will need one of:
+
+1. An NVIDIA-based graphics card with 8 GB or more of VRAM memory*.
+
+2. An Apple computer with an M1 chip.**
+
+3. At least 12 GB of main memory RAM.
+
+4. At least 6 GB of free disk space for the machine learning model,
+python, and all its dependencies.
+
+* If you are have a Nvidia 10xx series card (e.g. the 1080ti), please
+run the dream script in full-precision mode as shown below.
+
+** Similarly, specify full-precision mode on Apple M1 hardware.
+
+To run in full-precision mode, start dream.py with the
+--full_precision flag:
+
+~~~~
+(ldm) ~/stable-diffusion$ python scripts/dream.py --full_precision
+~~~~
+
 ## Image-to-Image

 This script also provides an img2img feature that lets you seed your
-creations with a drawing or photo. This is a really cool feature that tells
-stable diffusion to build the prompt on top of the image you provide, preserving
-the original's basic shape and layout. To use it, provide the --init_img
-option as shown here:
+creations with an initial drawing or photo. This is a really cool
+feature that tells stable diffusion to build the prompt on top of the
+image you provide, preserving the original's basic shape and
+layout. To use it, provide the --init_img option as shown here:

 ```
 dream> "waterfall and rainbow" --init_img=./init-images/crude_drawing.png --strength=0.5 -s100 -n4
 ```

-The --init_img (-I) option gives the path to the seed picture. --strength (-f) controls how much
-the original will be modified, ranging from 0.0 (keep the original intact), to 1.0 (ignore the original
-completely). The default is 0.75, and ranges from 0.25-0.75 give interesting results.
+The --init_img (-I) option gives the path to the seed
+picture. --strength (-f) controls how much the original will be
+modified, ranging from 0.0 (keep the original intact), to 1.0 (ignore
+the original completely). The default is 0.75, and ranges from
+0.25-0.75 give interesting results.

-You may also pass a -v<count> option to generate count variants on the original image. This is done by
-passing the first generated image back into img2img the requested number of times. It generates interesting
+You may also pass a -v<count> option to generate count variants on the
+original image. This is done by passing the first generated image back
+into img2img the requested number of times. It generates interesting
 variants.

+If the initial image contains transparent regions, then Stable
+Diffusion will only draw within the transparent regions, a process
+called "inpainting". However, for this to work correctly, the color
+information underneath the transparent needs to be preserved, not
+erased. See [Creating Transparent Images for
+Inpainting](#creating-transparent-images-for-inpainting) for details.
+
 ## Seamless Tiling

-The seamless tiling mode causes generated images to seamlessly tile with itself. To use it, add the --seamless option when starting the script which will result in all generated images to tile, or for each dream> prompt as shown here:
+The seamless tiling mode causes generated images to seamlessly tile
+with itself. To use it, add the --seamless option when starting the
+script which will result in all generated images to tile, or for each
+dream> prompt as shown here:
+
 ```
 dream> "pond garden with lotus by claude monet" --seamless -s100 -n4
 ```
@ -774,6 +816,49 @@ of branch>
 You will need to go through the install procedure again, but it should
 be fast because all the dependencies are already loaded.

+# Creating Transparent Regions for Inpainting
+
+Inpainting is really cool. To do it, you start with an initial image
+and use a photoeditor to make one or more regions transparent
+(i.e. they have a "hole" in them). You then provide the path to this
+image at the dream> command line using the -I switch. Stable Diffusion
+will only paint within the transparent region.
+
+There's a catch. In the current implementation, you have to prepare
+the initial image correctly so that the underlying colors are
+preserved under the transparent area. Many imaging editing
+applications will by default erase the color information under the
+transparent pixels and replace them with white or black, which will
+lead to suboptimal inpainting. You also must take care to export the
+PNG file in such a way that the color information is preserved.
+
+If your photoeditor is erasing the underlying color information,
+dream.py will give you a big fat warning. If you can't find a way to
+coax your photoeditor to retain color values under transparent areas,
+then you can combine the -I and -M switches to provide both the
+original unedited image and the masked (partially transparent) image:
+
+~~~~
+dream> man with cat on shoulder -I./images/man.png -M./images/man-transparent.png
+~~~~
+
+We are hoping to get rid of the need for this workaround in an
+upcoming release.
+
+## Recipe for GIMP
+
+GIMP is a popular Linux photoediting tool.
+
+1. Open image in GIMP.
+2. Layer->Transparency->Add Alpha Channel
+2. Use lasoo tool to select region to mask
+3. Choose Select -> Float to create a floating selection
+4. Open the Layers toolbar (^L) and select "Floating Selection"
+5. Set opacity to 0%
+6. Export as PNG
+7. In the export dialogue, Make sure the "Save colour values from
+transparent pixels" checkbox is selected.
+
 # Contributing

 Anyone who wishes to contribute to this project, whether
--- a/ldm/dream/generator/inpaint.py
+++ b/ldm/dream/generator/inpaint.py
@ -16,7 +16,7 @@ class Inpaint(Img2Img):
    
    @torch.no_grad()
    def get_make_image(self,prompt,sampler,steps,cfg_scale,ddim_eta,
-                       conditioning,init_image,init_mask,strength,
+                       conditioning,init_image,mask_image,strength,
                       step_callback=None,**kwargs):
        """
        Returns a function returning an image derived from the prompt and
@ -24,8 +24,8 @@ class Inpaint(Img2Img):
        the time you call it.  kwargs are 'init_latent' and 'strength'
        """

-        init_mask = init_mask[0][0].unsqueeze(0).repeat(4,1,1).unsqueeze(0)
-        init_mask = repeat(init_mask, '1 ... -> b ...', b=1)
+        mask_image = mask_image[0][0].unsqueeze(0).repeat(4,1,1).unsqueeze(0)
+        mask_image = repeat(mask_image, '1 ... -> b ...', b=1)

        # PLMS sampler not supported yet, so ignore previous sampler
        if not isinstance(sampler,DDIMSampler):
@ -66,7 +66,7 @@ class Inpaint(Img2Img):
                img_callback                 = step_callback,
                unconditional_guidance_scale = cfg_scale,
                unconditional_conditioning = uc,
-                mask                       = init_mask,
+                mask                       = mask_image,
                init_latent                = self.init_latent
            )
            return self.sample_to_image(samples)
--- a/ldm/dream/pngwriter.py
+++ b/ldm/dream/pngwriter.py
@ -61,14 +61,10 @@ class PromptFormatter:
        switches.append(f'-A{opt.sampler_name or t2i.sampler_name}')
 # to do: put model name into the t2i object
 #        switches.append(f'--model{t2i.model_name}')
-        if opt.invert_mask:
-            switches.append(f'--invert_mask')
        if opt.seamless or t2i.seamless:
            switches.append(f'--seamless')
        if opt.init_img:
            switches.append(f'-I{opt.init_img}')
-        if opt.mask:
-            switches.append(f'-M{opt.mask}')
        if opt.fit:
            switches.append(f'--fit')
        if opt.strength and opt.init_img is not None:
--- a/ldm/generate.py
+++ b/ldm/generate.py
@ -193,10 +193,9 @@ class Generate:
            log_tokenization=  False,
            with_variations =   None,
            variation_amount =  0.0,
-            # these are specific to img2img
+            # these are specific to img2img and inpaint
            init_img       =    None,
-            mask           =    None,
-            invert_mask    =    False,
+            init_mask      =    None,
            fit            =    False,
            strength       =    None,
            # these are specific to GFPGAN/ESRGAN
@ -217,8 +216,6 @@ class Generate:
           cfg_scale                       // how strongly the prompt influences the image (7.5) (must be >1)
           seamless                        // whether the generated image should tile
           init_img                        // path to an initial image
-           mask                            // path to an initial image mask for inpainting
-           invert_mask                     // paint over opaque areas, retain transparent areas
           strength                        // strength for noising/unnoising init_img. 0.0 preserves image exactly, 1.0 replaces it completely
           gfpgan_strength                 // strength for GFPGAN. 0.0 preserves image exactly, 1.0 replaces it completely
           ddim_eta                        // image randomness (eta=0.0 means the same seed always produces the same image)
@ -293,7 +290,7 @@ class Generate:

        results          = list()
        init_image       = None
-        init_mask_image  = None
+        mask_image       = None

        try:
            uc, c = get_uc_and_c(
@ -302,22 +299,14 @@ class Generate:
                log_tokens=self.log_tokenization
            )

-            if mask and not init_img:
-                raise AssertionError('If mask path is provided, initial image path should be provided as well')
-                
-            if mask and init_img:
-                init_image,size1       = self._load_img(init_img, width, height,fit=fit)
-                init_image.to(self.device)
-                init_mask_image,size2  = self._load_img_mask(mask, width, height,fit=fit, invert=invert_mask)
-                init_mask_image.to(self.device)
-                assert size1==size2,f"for inpainting, the initial image and its mask must be identical sizes, instead got {size1} vs {size2}"
-                generator       = self._make_inpaint()
-            elif init_img:        # little bit of repeated code here, but makes logic clearer
-                init_image,_      = self._load_img(init_img, width, height, fit=fit)
-                init_image.to(self.device)
-                generator       = self._make_img2img()
+            (init_image,mask_image) = self._make_images(init_img,init_mask, width, height, fit)
+            
+            if (init_image is not None) and (mask_image is not None):
+                generator = self._make_inpaint()
+            elif init_image is not None:
+                generator = self._make_img2img()
            else:
-                generator       = self._make_txt2img()
+                generator = self._make_txt2img()

            generator.set_variation(self.seed, variation_amount, with_variations)
            results = generator.generate(
@ -333,9 +322,9 @@ class Generate:
                step_callback  = step_callback,   # called after each intermediate image is generated
                width          = width,
                height         = height,
-                init_image     = init_image,   # notice that init_image is different from init_img
-                init_mask      = init_mask_image,
-                strength       = strength
+                init_image     = init_image,      # notice that init_image is different from init_img
+                mask_image     = mask_image,
+                strength       = strength,
            )

            if upscale is not None or gfpgan_strength > 0:
@ -352,7 +341,7 @@ class Generate:
            )
        except RuntimeError as e:
            print(traceback.format_exc(), file=sys.stderr)
-            print('>> Are you sure your system has an adequate GPU?')
+            print('>> Could not generate image.')

        toc = time.time()
        print('>> Usage stats:')
@ -374,6 +363,31 @@ class Generate:
            )
        return results

+    def _make_images(self, img_path, mask_path, width, height, fit=False):
+        init_image      = None
+        init_mask       = None
+        if not img_path:
+            return None,None
+
+        image        = self._load_img(img_path, width, height, fit=fit) # this returns an Image
+        init_image   = self._create_init_image(image)                   # this returns a torch tensor
+
+        if self._has_transparency(image) and not mask_path:      # if image has a transparent area and no mask was provided, then try to generate mask
+            print('>> Initial image has transparent areas. Will inpaint in these regions.')
+            if self._check_for_erasure(image):
+                print(
+                    '>> WARNING: Colors underneath the transparent region seem to have been erased.\n',
+                    '>>          Inpainting will be suboptimal. Please preserve the colors when making\n',
+                    '>>          a transparency mask, or provide mask explicitly using --init_mask (-M).'
+                )
+            init_mask = self._create_init_mask(image)                   # this returns a torch tensor
+
+        if mask_path:
+            mask_image  = self._load_img(mask_path, width, height, fit=fit) # this returns an Image
+            init_mask   = self._create_init_mask(mask_image)
+
+        return init_image,init_mask
+
    def _make_img2img(self):
        if not self.generators.get('img2img'):
            from ldm.dream.generator.img2img import Img2Img
@ -545,8 +559,9 @@ class Generate:
    def _load_img(self, path, width, height, fit=False):
        assert os.path.exists(path), f'>> {path}: File not found'

-        with Image.open(path) as img:
-            image = img.convert('RGB')
+        #        with Image.open(path) as img:
+        #            image = img.convert('RGBA')
+        image = Image.open(path)
        print(
            f'>> loaded input image of size {image.width}x{image.height} from {path}'
        )
@ -554,57 +569,47 @@ class Generate:
            image = self._fit_image(image,(width,height))
        else:
            image = self._squeeze_image(image)
+        return image

-        size = image.size
+    def _create_init_image(self,image):
+        image = image.convert('RGB')
+        # print(
+        #     f'>> DEBUG: writing the image to img.png'
+        # )
+        # image.save('img.png')
        image = np.array(image).astype(np.float32) / 255.0
        image = image[None].transpose(0, 3, 1, 2)
        image = torch.from_numpy(image)
        image = 2.0 * image - 1.0 
-        return image.to(self.device),size
-
-    def _load_img_mask(self, path, width, height, fit=False, invert=False):
-        assert os.path.exists(path), f'>> {path}: File not found'
-
-        image = Image.open(path)
-        print(
-            f'>> loaded input mask of size {image.width}x{image.height} from {path}'
-        )
-
-        if fit:
-            image = self._fit_image(image,(width,height))
-        else:
-            image = self._squeeze_image(image)
+        return image.to(self.device)

+    def _create_init_mask(self, image):
        # convert into a black/white mask
-        image = self._mask_to_image(image,invert)
+        image = self._image_to_mask(image)
        image = image.convert('RGB')
-        size  = image.size
-
-        # not quite sure what's going on here. It is copied from basunjindal's implementation
-        #        image = image.resize((64, 64), resample=Image.Resampling.LANCZOS)
        # BUG: We need to use the model's downsample factor rather than hardcoding "8"
        from ldm.dream.generator.base import downsampling
-        image = image.resize((size[0]//downsampling, size[1]//downsampling), resample=Image.Resampling.LANCZOS)
+        image = image.resize((image.width//downsampling, image.height//downsampling), resample=Image.Resampling.LANCZOS)
+        # print(
+        #     f'>> DEBUG: writing the mask to mask.png'
+        #     )
+        # image.save('mask.png')
        image = np.array(image)
        image = image.astype(np.float32) / 255.0
        image = image[None].transpose(0, 3, 1, 2)
        image = torch.from_numpy(image)
-        return image.to(self.device),size
+        return image.to(self.device)

    # The mask is expected to have the region to be inpainted
    # with alpha transparency. It converts it into a black/white
    # image with the transparent part black.
-    def _mask_to_image(self, init_mask, invert=False) -> Image:
-        if self._has_transparency(init_mask):
-            # Obtain the mask from the transparency channel
-            mask = Image.new(mode="L", size=init_mask.size, color=255)
-            mask.putdata(init_mask.getdata(band=3))
-            if invert:
-                mask = ImageOps.invert(mask)
-            return mask
-        else:
-            print(f'>> No transparent pixels in this image. Will paint across entire image.')
-            return Image.new(mode="L", size=mask.size, color=0)
+    def _image_to_mask(self, mask_image, invert=False) -> Image:
+        # Obtain the mask from the transparency channel
+        mask = Image.new(mode="L", size=mask_image.size, color=255)
+        mask.putdata(mask_image.getdata(band=3))
+        if invert:
+            mask = ImageOps.invert(mask)
+        return mask

    def _has_transparency(self,image):
        if image.info.get("transparency", None) is not None:
@ -620,6 +625,20 @@ class Generate:
                return True
        return False

+    
+    def _check_for_erasure(self,image):
+        width, height = image.size
+        pixdata       = image.load()
+        colored       = 0
+        for y in range(height):
+            for x in range(width):
+                if pixdata[x, y][3] == 0:
+                    r, g, b, _ = pixdata[x, y]
+                    if (r, g, b) != (0, 0, 0) and \
+                       (r, g, b) != (255, 255, 255):
+                        colored += 1
+        return colored == 0
+
    def _squeeze_image(self,image):
        x,y,resize_needed = self._resolution_check(image.width,image.height)
        if resize_needed:
--- a/scripts/dream.py
+++ b/scripts/dream.py
@ -564,14 +564,9 @@ def create_cmd_parser():
    )
    parser.add_argument(
        '-M',
-        '--mask',
+        '--init_mask',
        type=str,
-        help='Path to inpainting mask; transparent areas will be painted over',
-    )
-    parser.add_argument(
-        '--invert_mask',
-        action='store_true',
-        help='Invert the inpainting mask; opaque areas will be painted over',
+        help='Path to input mask for inpainting mode (supersedes width and height)',
    )
    parser.add_argument(
        '-T',