Fix the padding behavior when max-pooling regional prompt masks to mirror the downscaling behavior of SD and SDXL. Prior to this change, denoising with input latent dimensions that were not evenly divisible by 8 would raise an exception.

2024-08-30 20:32:17 +00:00 · 2024-04-09 15:15:12 -04:00
parent 69f6c24f52
commit fba40eb1bd
1 changed files with 4 additions and 1 deletions
--- a/invokeai/backend/stable_diffusion/diffusion/regional_prompt_data.py
+++ b/invokeai/backend/stable_diffusion/diffusion/regional_prompt_data.py
@ -61,9 +61,12 @@ class RegionalPromptData:
                if downscale_factor <= max_downscale_factor:
                    # We use max pooling because we downscale to a pretty low resolution, so we don't want small prompt
                    # regions to be lost entirely.
                    #
                    # ceil_mode=True is set to mirror the downsampling behavior of SD and SDXL.
                    #
                    # TODO(ryand): In the future, we may want to experiment with other downsampling methods (e.g.
                    # nearest interpolation), and could potentially use a weighted mask rather than a binary mask.
-                    batch_sample_masks = F.max_pool2d(batch_sample_masks, kernel_size=2, stride=2)
+                    batch_sample_masks = F.max_pool2d(batch_sample_masks, kernel_size=2, stride=2, ceil_mode=True)
        return batch_sample_masks_by_seq_len