Merge branch 'development' into development

2024-08-30 20:32:17 +00:00 · 2022-09-12 16:34:10 -04:00 · 2022-09-12 16:34:10 -04:00 · 4a5a228fd8
commit 4a5a228fd8
parent ea60d036d1 6665f4494f
8 changed files with 582 additions and 5 deletions
--- a/README.md
+++ b/README.md
@ -27,7 +27,6 @@ report bugs and make feature requests. Be sure to use the provided
 templates. They will help aid diagnose issues faster._

 # **Table of Contents**
-
 1. [Installation](#installation)
 2. [Major Features](#features)
 3. [Changelog](#latest-changes)
@ -86,6 +85,8 @@ To run in full-precision mode, start `dream.py` with the

 - ## [GFPGAN and Real-ESRGAN Support](docs/features/UPSCALE.md)

+- ## [Embiggen upscaling](docs/features/EMBIGGEN.md)
+
 - ## [Seamless Tiling](docs/features/OTHER.md#seamless-tiling)

 - ## [Google Colab](docs/features/OTHER.md#google-colab)
@ -136,7 +137,7 @@ To run in full-precision mode, start `dream.py` with the
  - Works on M1 Apple hardware.
  - Multiple bug fixes.

-For older changelogs, please visit **[CHANGELOGS](docs/CHANGELOG.md)**.
+For older changelogs, please visit **[CHANGELOGS](docs/CHANGELOG.md)**. 

 # Troubleshooting

--- a/docs/features/EMBIGGEN.md
+++ b/docs/features/EMBIGGEN.md
@ -0,0 +1,134 @@
+# **Embiggen -- upscale your images on limited memory machines**
+
+GFPGAN and Real-ESRGAN are both memory intensive. In order to avoid
+crashes and memory overloads during the Stable Diffusion process,
+these effects are applied after Stable Diffusion has completed its
+work.
+
+In single image generations, you will see the output right away but
+when you are using multiple iterations, the images will first be
+generated and then upscaled and face restored after that process is
+complete. While the image generation is taking place, you will still
+be able to preview the base images.
+
+If you wish to stop during the image generation but want to upscale or
+face restore a particular generated image, pass it again with the same
+prompt and generated seed along with the `-U` and `-G` prompt
+arguments to perform those actions.
+
+## Embiggen 
+
+If you wanted to be able to do more (pixels) without running out of VRAM,
+or you want to upscale with details that couldn't possibly appear
+without the context of a prompt, this is the feature to try out.
+
+Embiggen automates the process of taking an init image, upscaling it,
+cutting it into smaller tiles that slightly overlap, running all the
+tiles through img2img to refine details with respect to the prompt,
+and "stitching" the tiles back together into a cohesive image.
+
+It automatically computes how many tiles are needed, and so it can be fed
+*ANY* size init image and perform Img2Img on it (though it will be run only
+one tile at a time, which can cause problems, see the Note at the end).
+
+If you're familiar with "GoBig" (ala [progrock-stable](https://github.com/lowfuel/progrock-stable))
+it's similar to that, except it can work up to an arbitrarily large size
+(instead of just 2x), with tile overlaps configurable as a ratio, and
+has extra logic to re-run any number of the tile sub-sections of the image
+if for example a small part of a huge run got messed up.
+
+**Usage**
+
+`-embiggen <scaling_factor> <esrgan_strength> <overlap_ratio OR overlap_pixels>`
+
+Takes a scaling factor relative to the size of the `--init_img` (`-I`), followed by
+ESRGAN upscaling strength (0 - 1.0), followed by minimum amount of overlap
+between tiles as a decimal ratio (0 - 1.0) *OR* a number of pixels.
+
+The scaling factor is how much larger than the `--init_img` the output
+should be, and will multiply both x and y axis, so an image that is a
+scaling factor of 3.0 has 3*3= 9 times as many pixels, and will take
+(at least) 9 times as long (see overlap for why it might be
+longer). If the `--init_img` is already the right size `-embiggen 1`,
+and it can also be less than one if the init_img is too big.
+
+Esrgan_strength defaults to 0.75, and the overlap_ratio defaults to
+0.25, both are optional.
+
+
+Unlike Img2Img, the `--width` (`-W`) and `--height` (`-H`) arguments
+do not control the size of the image as a whole, but the size of the
+tiles used to Embiggen the image.
+
+ESRGAN is used to upscale the `--init_img` prior to cutting it into
+tiles/pieces to run through img2img and then stitch back
+together. Embiggen can be run without ESRGAN; just set the strength to
+zero (e.g. `-embiggen 1.75 0`). The output of Embiggen can also be
+upscaled after it's finished (`-U`).
+
+The overlap is the minimum that tiles will overlap with adjacent
+tiles, specified as either a ratio or a number of pixels. How much the
+tiles overlap determines the likelihood the tiling will be noticable,
+really small overlaps (e.g. a couple of pixels) may produce noticeable
+grid-like fuzzy distortions in the final stitched image. Though, as
+the overlapping space doesn't contribute to making the image bigger,
+and the larger the overlap the more tiles (and the more time) it will
+take to finish.
+
+Because the overlapping parts of tiles don't "contribute" to
+increasing size, every tile after the first in a row or column
+effectively only covers an extra `1 - overlap_ratio` on each axis. If
+the input/`--init_img` is same size as a tile, the ideal (for time)
+scaling factors with the default overlap (0.25) are 1.75, 2.5, 3.25,
+4.0 etc..
+
+`-embiggen_tiles <spaced list of tiles>`
+
+An advanced usage useful if you only want to alter parts of the image
+while running Embiggen. It takes a list of tiles by number to run and
+replace onto the initial image e.g. `1 3 5`. It's useful for either
+fixing problem spots from a previous Embiggen run, or selectively
+altering the prompt for sections of an image - for creative or
+coherency reasons.
+
+Tiles are numbered starting with one, and left-to-right,
+top-to-bottom.  So, if you are generating a 3x3 tiled image, the
+middle row would be `4 5 6`.
+
+**Example Usage**
+
+Running Embiggen with 512x512 tiles on an existing image, scaling up by a factor of 2.5x;
+and doing the same again (default ESRGAN strength is 0.75, default overlap between tiles is 0.25):
+
+```
+dream > a photo of a forest at sunset -s 100 -W 512 -H 512 -I outputs/forest.png -f 0.4 -embiggen 2.5
+dream > a photo of a forest at sunset -s 100 -W 512 -H 512 -I outputs/forest.png -f 0.4 -embiggen 2.5 0.75 0.25
+```
+
+If your starting image was also 512x512 this should have taken 9 tiles.
+
+If there weren't enough clouds in the sky of that forest you just made
+(and that image is about 1280 pixels (512*2.5) wide A.K.A. three
+512x512 tiles with 0.25 overlaps wide) we can replace that top row of
+tiles:
+
+```
+dream> a photo of puffy clouds over a forest at sunset -s 100 -W 512 -H 512 -I outputs/000002.seed.png -f 0.5 -embiggen_tiles 1 2 3
+```
+
+**Note**
+
+Because the same prompt is used on all the tiled images, and the model
+doesn't have the context of anything outside the tile being run - it
+can end up creating repeated pattern (also called 'motifs') across all
+the tiles based on that prompt. The best way to combat this is
+lowering the `--strength` (`-f`) to stay more true to the init image,
+and increasing the number of steps so there is more compute-time to
+create the detail.  Anecdotally `--strength` 0.35-0.45 works pretty
+well on most things. It may also work great in some examples even with
+the `--strength` set high for patterns, landscapes, or subjects that
+are more abstract. Because this is (relatively) fast, you can also
+always create a few Embiggen'ed images and manually composite them to
+preserve the best parts from each.
+
+Author: [Travco](https://github.com/travco)
--- a/ldm/dream/generator/embiggen.py
+++ b/ldm/dream/generator/embiggen.py
@ -0,0 +1,403 @@
+'''
+ldm.dream.generator.embiggen descends from ldm.dream.generator
+and generates with ldm.dream.generator.img2img
+'''
+
+import torch
+import numpy as  np
+from PIL import Image
+from ldm.dream.generator.base      import Generator
+from ldm.models.diffusion.ddim     import DDIMSampler
+from ldm.dream.generator.img2img   import Img2Img
+
+class Embiggen(Generator):
+    def __init__(self,model):
+        super().__init__(model)
+        self.init_latent         = None
+
+    @torch.no_grad()
+    def get_make_image(
+        self,
+        prompt,
+        sampler,
+        steps,
+        cfg_scale,
+        ddim_eta,
+        conditioning,
+        init_img,
+        strength,
+        width,
+        height,
+        embiggen,
+        embiggen_tiles,
+        step_callback=None,
+        **kwargs
+    ):
+        """
+        Returns a function returning an image derived from the prompt and multi-stage twice-baked potato layering over the img2img on the initial image
+        Return value depends on the seed at the time you call it
+        """
+        # Construct embiggen arg array, and sanity check arguments
+        if embiggen == None: # embiggen can also be called with just embiggen_tiles
+            embiggen = [1.0] # If not specified, assume no scaling
+        elif embiggen[0] < 0 :
+            embiggen[0] = 1.0
+            print('>> Embiggen scaling factor cannot be negative, fell back to the default of 1.0 !')
+        if len(embiggen) < 2:
+            embiggen.append(0.75)
+        elif embiggen[1] > 1.0 or embiggen[1] < 0 :
+            embiggen[1] = 0.75
+            print('>> Embiggen upscaling strength for ESRGAN must be between 0 and 1, fell back to the default of 0.75 !')
+        if len(embiggen) < 3:
+            embiggen.append(0.25)
+        elif embiggen[2] < 0 :
+            embiggen[2] = 0.25
+            print('>> Overlap size for Embiggen must be a positive ratio between 0 and 1 OR a number of pixels, fell back to the default of 0.25 !')
+
+        # Convert tiles from their user-freindly count-from-one to count-from-zero, because we need to do modulo math
+        # and then sort them, because... people.
+        if embiggen_tiles:
+            embiggen_tiles = list(map(lambda n: n-1, embiggen_tiles))
+            embiggen_tiles.sort()
+
+        # Prep img2img generator, since we wrap over it
+        gen_img2img = Img2Img(self.model)
+
+        # Open original init image (not a tensor) to manipulate
+        initsuperimage = Image.open(init_img)
+
+        with Image.open(init_img) as img:
+            initsuperimage = img.convert('RGB')
+
+        # Size of the target super init image in pixels
+        initsuperwidth, initsuperheight = initsuperimage.size
+
+        # Increase by scaling factor if not already resized, using ESRGAN as able
+        if embiggen[0] != 1.0:
+            initsuperwidth = round(initsuperwidth*embiggen[0])
+            initsuperheight = round(initsuperheight*embiggen[0])
+            if embiggen[1] > 0: # No point in ESRGAN upscaling if strength is set zero
+                from ldm.gfpgan.gfpgan_tools import (
+                    real_esrgan_upscale,
+                )
+                print(f'>> ESRGAN upscaling init image prior to cutting with Embiggen with strength {embiggen[1]}')
+                if embiggen[0] > 2:
+                    initsuperimage = real_esrgan_upscale(
+                        initsuperimage,
+                        embiggen[1], # upscale strength
+                        4, # upscale scale
+                        self.seed,
+                    )
+                else:
+                    initsuperimage = real_esrgan_upscale(
+                        initsuperimage,
+                        embiggen[1], # upscale strength
+                        2, # upscale scale
+                        self.seed,
+                    )
+            # We could keep recursively re-running ESRGAN for a requested embiggen[0] larger than 4x
+            #   but from personal experiance it doesn't greatly improve anything after 4x
+            # Resize to target scaling factor resolution
+            initsuperimage = initsuperimage.resize((initsuperwidth, initsuperheight), Image.Resampling.LANCZOS)
+
+        # Use width and height as tile widths and height
+        # Determine buffer size in pixels
+        if embiggen[2] < 1:
+            if embiggen[2] < 0:
+                embiggen[2] = 0
+            overlap_size_x = round(embiggen[2] * width)
+            overlap_size_y = round(embiggen[2] * height)
+        else:
+            overlap_size_x = round(embiggen[2])
+            overlap_size_y = round(embiggen[2])
+
+        # With overall image width and height known, determine how many tiles we need
+        def ceildiv(a, b):
+            return -1 * (-a // b)
+
+        # X and Y needs to be determined independantly (we may have savings on one based on the buffer pixel count)
+        # (initsuperwidth - width) is the area remaining to the right that we need to layers tiles to fill
+        # (width - overlap_size_x) is how much new we can fill with a single tile
+        emb_tiles_x = 1
+        emb_tiles_y = 1
+        if (initsuperwidth - width) > 0:
+            emb_tiles_x = ceildiv(initsuperwidth - width, width - overlap_size_x) + 1
+        if (initsuperheight - height) > 0:
+            emb_tiles_y = ceildiv(initsuperheight - height, height - overlap_size_y) + 1
+        # Sanity
+        assert emb_tiles_x > 1 or emb_tiles_y > 1, f'ERROR: Based on the requested dimensions of {initsuperwidth}x{initsuperheight} and tiles of {width}x{height} you don\'t need to Embiggen! Check your arguments.'
+
+        # Prep alpha layers --------------
+        # https://stackoverflow.com/questions/69321734/how-to-create-different-transparency-like-gradient-with-python-pil
+        # agradientL is Left-side transparent
+        agradientL = Image.linear_gradient('L').rotate(90).resize((overlap_size_x, height))
+        # agradientT is Top-side transparent
+        agradientT = Image.linear_gradient('L').resize((width, overlap_size_y))
+        # radial corner is the left-top corner, made full circle then cut to just the left-top quadrant
+        agradientC = Image.new('L', (256, 256))
+        for y in range(256):
+            for x in range(256):
+                #Find distance to lower right corner (numpy takes arrays)
+                distanceToLR = np.sqrt([(255 - x) ** 2 + (255 - y) ** 2])[0]
+                #Clamp values to max 255
+                if distanceToLR > 255:
+                    distanceToLR = 255
+                #Place the pixel as invert of distance     
+                agradientC.putpixel((x, y), int(255 - distanceToLR))
+
+        # Create alpha layers default fully white
+        alphaLayerL = Image.new("L", (width, height), 255)
+        alphaLayerT = Image.new("L", (width, height), 255)
+        alphaLayerLTC = Image.new("L", (width, height), 255)
+        # Paste gradients into alpha layers
+        alphaLayerL.paste(agradientL, (0, 0))
+        alphaLayerT.paste(agradientT, (0, 0))
+        alphaLayerLTC.paste(agradientL, (0, 0))
+        alphaLayerLTC.paste(agradientT, (0, 0))
+        alphaLayerLTC.paste(agradientC.resize((overlap_size_x, overlap_size_y)), (0, 0))
+
+        if embiggen_tiles:
+            # Individual unconnected sides
+            alphaLayerR = Image.new("L", (width, height), 255)
+            alphaLayerR.paste(agradientL.rotate(180), (width - overlap_size_x, 0))
+            alphaLayerB = Image.new("L", (width, height), 255)
+            alphaLayerB.paste(agradientT.rotate(180), (0, height - overlap_size_y))
+            alphaLayerTB = Image.new("L", (width, height), 255)
+            alphaLayerTB.paste(agradientT, (0, 0))
+            alphaLayerTB.paste(agradientT.rotate(180), (0, height - overlap_size_y))
+            alphaLayerLR = Image.new("L", (width, height), 255)
+            alphaLayerLR.paste(agradientL, (0, 0))
+            alphaLayerLR.paste(agradientL.rotate(180), (width - overlap_size_x, 0))
+
+            # Sides and corner Layers
+            alphaLayerRBC = Image.new("L", (width, height), 255)
+            alphaLayerRBC.paste(agradientL.rotate(180), (width - overlap_size_x, 0))
+            alphaLayerRBC.paste(agradientT.rotate(180), (0, height - overlap_size_y))
+            alphaLayerRBC.paste(agradientC.rotate(180).resize((overlap_size_x, overlap_size_y)), (width - overlap_size_x, height - overlap_size_y))
+            alphaLayerLBC = Image.new("L", (width, height), 255)
+            alphaLayerLBC.paste(agradientL, (0, 0))
+            alphaLayerLBC.paste(agradientT.rotate(180), (0, height - overlap_size_y))
+            alphaLayerLBC.paste(agradientC.rotate(90).resize((overlap_size_x, overlap_size_y)), (0, height - overlap_size_y))
+            alphaLayerRTC = Image.new("L", (width, height), 255)
+            alphaLayerRTC.paste(agradientL.rotate(180), (width - overlap_size_x, 0))
+            alphaLayerRTC.paste(agradientT, (0, 0))
+            alphaLayerRTC.paste(agradientC.rotate(270).resize((overlap_size_x, overlap_size_y)), (width - overlap_size_x, 0))
+
+            # All but X layers
+            alphaLayerABT = Image.new("L", (width, height), 255)
+            alphaLayerABT.paste(alphaLayerLBC, (0, 0))
+            alphaLayerABT.paste(agradientL.rotate(180), (width - overlap_size_x, 0))
+            alphaLayerABT.paste(agradientC.rotate(180).resize((overlap_size_x, overlap_size_y)), (width - overlap_size_x, height - overlap_size_y))
+            alphaLayerABL = Image.new("L", (width, height), 255)
+            alphaLayerABL.paste(alphaLayerRTC, (0, 0))
+            alphaLayerABL.paste(agradientT.rotate(180), (0, height - overlap_size_y))
+            alphaLayerABL.paste(agradientC.rotate(180).resize((overlap_size_x, overlap_size_y)), (width - overlap_size_x, height - overlap_size_y))
+            alphaLayerABR = Image.new("L", (width, height), 255)
+            alphaLayerABR.paste(alphaLayerLBC, (0, 0))
+            alphaLayerABR.paste(agradientT, (0, 0))
+            alphaLayerABR.paste(agradientC.resize((overlap_size_x, overlap_size_y)), (0, 0))
+            alphaLayerABB = Image.new("L", (width, height), 255)
+            alphaLayerABB.paste(alphaLayerRTC, (0, 0))
+            alphaLayerABB.paste(agradientL, (0, 0))
+            alphaLayerABB.paste(agradientC.resize((overlap_size_x, overlap_size_y)), (0, 0))
+
+            # All-around layer
+            alphaLayerAA = Image.new("L", (width, height), 255)
+            alphaLayerAA.paste(alphaLayerABT, (0, 0))
+            alphaLayerAA.paste(agradientT, (0, 0))
+            alphaLayerAA.paste(agradientC.resize((overlap_size_x, overlap_size_y)), (0, 0))
+            alphaLayerAA.paste(agradientC.rotate(270).resize((overlap_size_x, overlap_size_y)), (width - overlap_size_x, 0))
+
+        # Clean up temporary gradients
+        del agradientL
+        del agradientT
+        del agradientC
+
+        def make_image(x_T):
+            # Make main tiles -------------------------------------------------
+            if embiggen_tiles:
+                print(f'>> Making {len(embiggen_tiles)} Embiggen tiles...')
+            else:
+                print(f'>> Making {(emb_tiles_x * emb_tiles_y)} Embiggen tiles ({emb_tiles_x}x{emb_tiles_y})...')
+
+            emb_tile_store = []
+            for tile in range(emb_tiles_x * emb_tiles_y):
+                # Determine if this is a re-run and replace
+                if embiggen_tiles and not tile in embiggen_tiles:
+                    continue
+                # Get row and column entries
+                emb_row_i = tile // emb_tiles_x
+                emb_column_i = tile % emb_tiles_x
+                # Determine bounds to cut up the init image
+                # Determine upper-left point
+                if emb_column_i + 1 == emb_tiles_x:
+                    left = initsuperwidth - width
+                else:
+                    left = round(emb_column_i * (width - overlap_size_x))
+                if emb_row_i + 1 == emb_tiles_y:
+                    top = initsuperheight - height
+                else:
+                    top = round(emb_row_i * (height - overlap_size_y))
+                right = left + width
+                bottom = top + height
+                
+                # Cropped image of above dimension (does not modify the original)
+                newinitimage = initsuperimage.crop((left, top, right, bottom))
+                # DEBUG:
+                # newinitimagepath = init_img[0:-4] + f'_emb_Ti{tile}.png'
+                # newinitimage.save(newinitimagepath)
+                
+                if embiggen_tiles:
+                    print(f'Making tile #{tile + 1} ({embiggen_tiles.index(tile) + 1} of {len(embiggen_tiles)} requested)')
+                else:
+                    print(f'Starting {tile + 1} of {(emb_tiles_x * emb_tiles_y)} tiles')
+
+                # create a torch tensor from an Image
+                newinitimage = np.array(newinitimage).astype(np.float32) / 255.0
+                newinitimage = newinitimage[None].transpose(0, 3, 1, 2)
+                newinitimage = torch.from_numpy(newinitimage)
+                newinitimage = 2.0 * newinitimage - 1.0
+                newinitimage = newinitimage.to(self.model.device)
+
+                tile_results = gen_img2img.generate(
+                    prompt,
+                    iterations     = 1,
+                    seed           = self.seed,
+                    sampler        = sampler,
+                    steps          = steps,
+                    cfg_scale      = cfg_scale,
+                    conditioning   = conditioning,
+                    ddim_eta       = ddim_eta,
+                    image_callback = None,  # called only after the final image is generated
+                    step_callback  = step_callback,   # called after each intermediate image is generated
+                    width          = width,
+                    height         = height,
+                    init_img       = init_img,        # img2img doesn't need this, but it might in the future
+                    init_image     = newinitimage,    # notice that init_image is different from init_img
+                    mask_image     = None,
+                    strength       = strength,
+                )
+
+                emb_tile_store.append(tile_results[0][0])
+                # DEBUG (but, also has other uses), worth saving if you want tiles without a transparency overlap to manually composite
+                # emb_tile_store[-1].save(init_img[0:-4] + f'_emb_To{tile}.png')
+                del newinitimage
+            
+            # Sanity check we have them all
+            if len(emb_tile_store) == (emb_tiles_x * emb_tiles_y) or (embiggen_tiles != [] and len(emb_tile_store) == len(embiggen_tiles)):
+                outputsuperimage = Image.new("RGBA", (initsuperwidth, initsuperheight))
+                if embiggen_tiles:
+                    outputsuperimage.alpha_composite(initsuperimage.convert('RGBA'), (0, 0))
+                for tile in range(emb_tiles_x * emb_tiles_y):
+                    if embiggen_tiles:
+                        if tile in embiggen_tiles:
+                            intileimage = emb_tile_store.pop(0)
+                        else:
+                            continue
+                    else:
+                        intileimage = emb_tile_store[tile]
+                    intileimage = intileimage.convert('RGBA')
+                    # Get row and column entries
+                    emb_row_i = tile // emb_tiles_x
+                    emb_column_i = tile % emb_tiles_x
+                    if emb_row_i == 0 and emb_column_i == 0 and not embiggen_tiles:
+                        left = 0
+                        top = 0
+                    else:
+                        # Determine upper-left point
+                        if emb_column_i + 1 == emb_tiles_x:
+                            left = initsuperwidth - width
+                        else:
+                            left = round(emb_column_i * (width - overlap_size_x))
+                        if emb_row_i + 1 == emb_tiles_y:
+                            top = initsuperheight - height
+                        else:
+                            top = round(emb_row_i * (height - overlap_size_y))
+                        # Handle gradients for various conditions
+                        # Handle emb_rerun case
+                        if embiggen_tiles:
+                            # top of image
+                            if emb_row_i == 0:
+                                if emb_column_i == 0:
+                                    if (tile+1) in embiggen_tiles: # Look-ahead right
+                                        if (tile+emb_tiles_x) not in embiggen_tiles: # Look-ahead down
+                                            intileimage.putalpha(alphaLayerB)
+                                        # Otherwise do nothing on this tile
+                                    elif (tile+emb_tiles_x) in embiggen_tiles: # Look-ahead down only
+                                        intileimage.putalpha(alphaLayerR)
+                                    else:
+                                        intileimage.putalpha(alphaLayerRBC)
+                                elif emb_column_i == emb_tiles_x - 1:
+                                    if (tile+emb_tiles_x) in embiggen_tiles: # Look-ahead down
+                                        intileimage.putalpha(alphaLayerL)
+                                    else:
+                                        intileimage.putalpha(alphaLayerLBC)
+                                else:
+                                    if (tile+1) in embiggen_tiles: # Look-ahead right
+                                        if (tile+emb_tiles_x) in embiggen_tiles: # Look-ahead down
+                                            intileimage.putalpha(alphaLayerL)
+                                        else:
+                                            intileimage.putalpha(alphaLayerLBC)
+                                    elif (tile+emb_tiles_x) in embiggen_tiles: # Look-ahead down only
+                                        intileimage.putalpha(alphaLayerLR)
+                                    else:
+                                        intileimage.putalpha(alphaLayerABT)
+                            # bottom of image
+                            elif emb_row_i == emb_tiles_y - 1:
+                                if emb_column_i == 0:
+                                    if (tile+1) in embiggen_tiles: # Look-ahead right
+                                        intileimage.putalpha(alphaLayerT)
+                                    else:
+                                        intileimage.putalpha(alphaLayerRTC)
+                                elif emb_column_i == emb_tiles_x - 1:
+                                    # No tiles to look ahead to
+                                    intileimage.putalpha(alphaLayerLTC)
+                                else:
+                                    if (tile+1) in embiggen_tiles: # Look-ahead right
+                                        intileimage.putalpha(alphaLayerLTC)
+                                    else:
+                                        intileimage.putalpha(alphaLayerABB)
+                            # vertical middle of image
+                            else:
+                                if emb_column_i == 0:
+                                    if (tile+1) in embiggen_tiles: # Look-ahead right
+                                        if (tile+emb_tiles_x) in embiggen_tiles: # Look-ahead down
+                                            intileimage.putalpha(alphaLayerT)
+                                        else:
+                                            intileimage.putalpha(alphaLayerTB)
+                                    elif (tile+emb_tiles_x) in embiggen_tiles: # Look-ahead down only
+                                        intileimage.putalpha(alphaLayerRTC)
+                                    else:
+                                        intileimage.putalpha(alphaLayerABL)
+                                elif emb_column_i == emb_tiles_x - 1:
+                                    if (tile+emb_tiles_x) in embiggen_tiles: # Look-ahead down
+                                        intileimage.putalpha(alphaLayerLTC)
+                                    else:
+                                        intileimage.putalpha(alphaLayerABR)
+                                else:
+                                    if (tile+1) in embiggen_tiles: # Look-ahead right
+                                        if (tile+emb_tiles_x) in embiggen_tiles: # Look-ahead down
+                                            intileimage.putalpha(alphaLayerLTC)
+                                        else:
+                                            intileimage.putalpha(alphaLayerABR)
+                                    elif (tile+emb_tiles_x) in embiggen_tiles: # Look-ahead down only
+                                        intileimage.putalpha(alphaLayerABB)
+                                    else:
+                                        intileimage.putalpha(alphaLayerAA)
+                        # Handle normal tiling case (much simpler - since we tile left to right, top to bottom)
+                        else:
+                            if emb_row_i == 0 and emb_column_i >= 1:
+                                intileimage.putalpha(alphaLayerL)
+                            elif emb_row_i >= 1 and emb_column_i == 0:
+                                intileimage.putalpha(alphaLayerT)
+                            else:
+                                intileimage.putalpha(alphaLayerLTC)
+                    # Layer tile onto final image
+                    outputsuperimage.alpha_composite(intileimage, (left, top))
+            else:
+                print(f'Error: could not find all Embiggen output tiles in memory? Something must have gone wrong with img2img generation.')
+
+            # after internal loops and patching up return Embiggen image
+            return outputsuperimage
+        # end of function declaration
+        return make_image
--- a/ldm/dream/generator/img2img.py
+++ b/ldm/dream/generator/img2img.py
@ -1,5 +1,5 @@
 '''
-ldm.dream.generator.txt2img descends from ldm.dream.generator
+ldm.dream.generator.img2img descends from ldm.dream.generator
 '''

 import torch
--- a/ldm/dream/pngwriter.py
+++ b/ldm/dream/pngwriter.py
@ -73,6 +73,10 @@ class PromptFormatter:
            switches.append(f'-G{opt.gfpgan_strength}')
        if opt.upscale:
            switches.append(f'-U {" ".join([str(u) for u in opt.upscale])}')
+        if opt.embiggen:
+            switches.append(f'-embiggen {" ".join([str(u) for u in opt.embiggen])}')
+        if opt.embiggen_tiles:
+            switches.append(f'-embiggen_tiles {" ".join([str(u) for u in opt.embiggen_tiles])}')
        if opt.variation_amount > 0:
            switches.append(f'-v{opt.variation_amount}')
        if opt.with_variations:
--- a/ldm/generate.py
+++ b/ldm/generate.py
@ -207,6 +207,9 @@ class Generate:
            init_mask      =    None,
            fit            =    False,
            strength       =    None,
+            # these are specific to embiggen (which also relies on img2img args)
+            embiggen       =    None,
+            embiggen_tiles =    None,
            # these are specific to GFPGAN/ESRGAN
            gfpgan_strength=    0,
            save_original  =    False,
@ -232,6 +235,10 @@ class Generate:
           image_callback                  // a function or method that will be called each time an image is generated
           with_variations                 // a weighted list [(seed_1, weight_1), (seed_2, weight_2), ...] of variations which should be applied before doing any generation
           variation_amount                // optional 0-1 value to slerp from -S noise to random noise (allows variations on an image)
+           threshold                       // optional value to add thresholding to latent values for k-diffusion samplers (0 disables)
+           perlin                          // optional 0-1 value to add a percentage of perlin noise to the initial noise
+           embiggen                        // scale factor relative to the size of the --init_img (-I), followed by ESRGAN upscaling strength (0-1.0), followed by minimum amount of overlap between tiles as a decimal ratio (0 - 1.0) or number of pixels
+           embiggen_tiles                  // list of tiles by number in order to process and replace onto the image e.g. `0 2 4`

        To use the step callback, define a function that receives two arguments:
        - Image GPU data
@ -276,6 +283,9 @@ class Generate:
        assert (
                0.0 <= variation_amount <= 1.0
        ), '-v --variation_amount must be in [0.0, 1.0]'
+        assert (
+            (embiggen == None and embiggen_tiles == None) or ((embiggen != None or embiggen_tiles != None) and init_img != None)
+        ), 'Embiggen requires an init/input image to be specified'

        # check this logic - doesn't look right
        if len(with_variations) > 0 or variation_amount > 1.0:
@ -312,6 +322,8 @@ class Generate:
            
            if (init_image is not None) and (mask_image is not None):
                generator = self._make_inpaint()
+            elif (embiggen != None or embiggen_tiles != None):
+                generator = self._make_embiggen()
            elif init_image is not None:
                generator = self._make_img2img()
            else:
@ -331,11 +343,14 @@ class Generate:
                step_callback  = step_callback,   # called after each intermediate image is generated
                width          = width,
                height         = height,
+                init_img       = init_img,        # embiggen needs to manipulate from the unmodified init_img
                init_image     = init_image,      # notice that init_image is different from init_img
                mask_image     = mask_image,
                strength       = strength,
                threshold      = threshold,
                perlin         = perlin,
+                embiggen       = embiggen,
+                embiggen_tiles = embiggen_tiles,
            )

            if upscale is not None or gfpgan_strength > 0:
@ -408,6 +423,12 @@ class Generate:
            from ldm.dream.generator.img2img import Img2Img
            self.generators['img2img'] = Img2Img(self.model)
        return self.generators['img2img']
+    
+    def _make_embiggen(self):
+        if not self.generators.get('embiggen'):
+            from ldm.dream.generator.embiggen import Embiggen
+            self.generators['embiggen'] = Embiggen(self.model)
+        return self.generators['embiggen']

    def _make_txt2img(self):
        if not self.generators.get('txt2img'):
--- a/requirements-mac.txt
+++ b/requirements-mac.txt
@ -11,7 +11,7 @@ opencv-python==4.6.0.66
 pillow==9.2.0
 pudb==2019.2
 torch==1.12.1
-torchvision==0.12.0
+torchvision==0.13.0
 pytorch-lightning==1.4.2
 streamlit==1.12.0
 test-tube>=0.7.5
--- a/scripts/dream.py
+++ b/scripts/dream.py
@ -631,7 +631,7 @@ def create_cmd_parser():
        nargs='+',
        default=None,
        type=float,
-        help='Scale factor (2, 4) for upscaling followed by upscaling strength (0-1.0). If strength not specified, defaults to 0.75'
+        help='Scale factor (2, 4) for upscaling final output followed by upscaling strength (0-1.0). If strength not specified, defaults to 0.75'
    )
    parser.add_argument(
        '-save_orig',
@ -639,6 +639,20 @@ def create_cmd_parser():
        action='store_true',
        help='Save original. Use it when upscaling to save both versions.',
    )
+    parser.add_argument(
+        '-embiggen',
+        nargs='+',
+        default=None,
+        type=float,
+        help='Embiggen tiled img2img for higher resolution and detail without extra VRAM usage. Takes scale factor relative to the size of the --init_img (-I), followed by ESRGAN upscaling strength (0-1.0), followed by minimum amount of overlap between tiles as a decimal ratio (0 - 1.0) or number of pixels. ESRGAN strength defaults to 0.75, and overlap defaults to 0.25 . ESRGAN is used to upscale the init prior to cutting it into tiles/pieces to run through img2img and then stitch back togeather.',
+    )
+    parser.add_argument(
+        '-embiggen_tiles',
+        nargs='+',
+        default=None,
+        type=int,
+        help='If while doing Embiggen we are altering only parts of the image, takes a list of tiles by number to process and replace onto the image e.g. `1 3 5`, useful for redoing problematic spots from a prior Embiggen run',
+    )
    # variants is going to be superseded by a generalized "prompt-morph" function
    #    parser.add_argument('-v','--variants',type=int,help="in img2img mode, the first generated image will get passed back to img2img to generate the requested number of variants")
    parser.add_argument(