Fix the padding behavior when max-pooling regional IP-Adapter masks to mirror the downscaling behavior of SD and SDXL. Prior to this change, denoising with input latent dimensions that were not evenly divisible by 8 would raise an exception.

2024-08-30 20:32:17 +00:00 · 2024-04-09 15:25:20 -04:00 · 2024-04-09 15:25:20 -04:00 · f9af32a6d1
commit f9af32a6d1
parent fba40eb1bd
1 changed files with 4 additions and 1 deletions
--- a/invokeai/backend/stable_diffusion/diffusion/regional_ip_data.py
+++ b/invokeai/backend/stable_diffusion/diffusion/regional_ip_data.py
@ -59,8 +59,11 @@ class RegionalIPData:
            if downscale_factor <= max_downscale_factor:
                # We use max pooling because we downscale to a pretty low resolution, so we don't want small mask
                # regions to be lost entirely.
+                #
+                # ceil_mode=True is set to mirror the downsampling behavior of SD and SDXL.
+                #
                # TODO(ryand): In the future, we may want to experiment with other downsampling methods.
-                mask_tensor = torch.nn.functional.max_pool2d(mask_tensor, kernel_size=2, stride=2)
+                mask_tensor = torch.nn.functional.max_pool2d(mask_tensor, kernel_size=2, stride=2, ceil_mode=True)

        return masks_by_seq_len