Compare commits

...

115 Commits

Author SHA1 Message Date
cd4afa2e89 add missing import 2024-04-14 22:52:18 -04:00
d5c6d3428f add a time.sleep() to unittest 2024-04-14 22:44:25 -04:00
9a03bc69bf added a more verbose assert to catch unittest error 2024-04-14 21:53:03 -04:00
7cf788e658 Update deps to their lastest versions (#6178)
* Update deps to their lastest versions

* missed huggingface_hub

* bump accelerate

---------

Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com>
2024-04-15 00:48:39 +00:00
06bc38d3f4 Remove tag excluder 2024-04-15 09:14:49 +10:00
d3b0212da5 Scope project files to src dir (enables --production) 2024-04-15 09:14:49 +10:00
c2b79ce14c Replace @knipignore with paths config 2024-04-15 09:14:49 +10:00
70185b0173 translationBot(ui): update translation (Russian)
Currently translated at 99.5% (1128 of 1133 strings)

Co-authored-by: Васянатор <ilabulanov339@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/ru/
Translation: InvokeAI/Web UI
2024-04-15 09:12:38 +10:00
a83a0c6146 translationBot(ui): update translation (Chinese (Simplified))
Currently translated at 81.5% (924 of 1133 strings)

Co-authored-by: 怀瑾 <symant233@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/zh_Hans/
Translation: InvokeAI/Web UI
2024-04-15 09:12:38 +10:00
12f41039cc translationBot(ui): update translation (Italian)
Currently translated at 98.4% (1122 of 1140 strings)

translationBot(ui): update translation (Italian)

Currently translated at 98.4% (1120 of 1138 strings)

translationBot(ui): update translation (Italian)

Currently translated at 98.4% (1115 of 1133 strings)

Co-authored-by: Riccardo Giovanetti <riccardo.giovanetti@gmail.com>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/it/
Translation: InvokeAI/Web UI
2024-04-15 09:12:38 +10:00
b3b5b7e261 Include hardcoded count of one to avoid translation issues on missing keys 2024-04-15 09:10:15 +10:00
f706a13230 Adjust gallery image length handling 2024-04-15 09:10:15 +10:00
22c6400bb8 Refactor i18n pluralization 2024-04-15 09:10:15 +10:00
1ca152f6c8 Apply eslint/prettier fixes 2024-04-15 09:10:15 +10:00
982e255878 Add dynamic label to delete button located at the top toolbar 2024-04-15 09:10:15 +10:00
7899149144 Remove unnecessary word 2024-04-15 09:10:15 +10:00
bef97b46bf Apply eslint/prettier fixes 2024-04-15 09:10:15 +10:00
cc256fee0e Modify the modal title to include selected image array length 2024-04-15 09:10:15 +10:00
ec69a58c8d Include plural variation for delete image modal title 2024-04-15 09:10:15 +10:00
ec67ba61db Pass an array of selected images instead of imageDTO 2024-04-15 09:10:15 +10:00
66126996e7 Import image selection 2024-04-15 09:10:15 +10:00
4eb66a9198 remove hires fix badge from settings when using sdxl 2024-04-15 07:57:58 +10:00
14e41a1fd9 Remove unnecessary whitespace 2024-04-15 07:54:36 +10:00
fc55522003 Import hook in the main App script 2024-04-15 07:54:36 +10:00
cd6d8ae9cc Add a hook as a singleton to update favicon and title upon queueSize change 2024-04-15 07:54:36 +10:00
2933eb594d Remove unnecessary code 2024-04-15 07:54:36 +10:00
4e08fab3f5 Apply brand red color and a black border 2024-04-15 07:54:36 +10:00
8bca7e2aa2 Apply eslint/prettier fixes 2024-04-15 07:54:36 +10:00
3706cf0ad4 Add JSDoc strings 2024-04-15 07:54:36 +10:00
a459361376 Modify the processing to consider the active queue length instead of in_progress only 2024-04-15 07:54:36 +10:00
bb330d50a6 Increase favicon alert detail size 2024-04-15 07:54:36 +10:00
102cb62960 Apply eslint/prettier fixes 2024-04-15 07:54:36 +10:00
8eeab22ecd Replace let with const 2024-04-15 07:54:36 +10:00
4343852b83 Update HTML page title and favicon upon queue item event 2024-04-15 07:54:36 +10:00
0a9bf25bff Implement updatePageTitle and updatePageFavicon methods 2024-04-15 07:54:36 +10:00
4cd09850b8 Add ID to the HTML link element 2024-04-15 07:54:36 +10:00
dbc586e0b2 Add alert variation for Invoke favicon 2024-04-15 07:54:36 +10:00
c2e3c61f28 fix recall all when loras, controls, or hrf arent present 2024-04-14 16:49:14 +10:00
fbfa29c2ef Update GALLERY.md 2024-04-14 16:46:31 +10:00
9ee7b951eb Update GALLERY.md 2024-04-14 16:46:31 +10:00
29dd1bb35b Update GALLERY.md 2024-04-14 16:46:31 +10:00
68d8a2497e Update GALLERY.md 2024-04-14 16:46:31 +10:00
4b171fa696 Creation of GALLERY.md and related images
First draft of the walkthrough of the Gallery right-hand panel
2024-04-14 16:46:31 +10:00
d0beb45431 Create GALLERY.md 2024-04-14 16:46:31 +10:00
e724781a80 Update WEB.md
Correct stated location of Gallery panel.
2024-04-14 16:46:31 +10:00
636ece323f Update INSTALL_DEVELOPMENT.md 2024-04-14 15:24:00 +10:00
77b3281f08 prettier 2024-04-14 15:22:33 +10:00
bd7c8cd517 added info popover back to model, updated description hover to combobox only 2024-04-14 15:22:33 +10:00
489d485907 added missing description to control adapters hover 2024-04-14 15:22:33 +10:00
6eed5ad531 added button for hiding bounding box 2024-04-14 15:22:33 +10:00
24f2cde862 Remove type="submit" from all tsx files.
Fixes a problem on firefox, at least for me.
2024-04-12 09:09:32 +10:00
b18442ded4 fix(queue): poll queue on finished queue item
When a queue item is finished (completed, canceled, failed), immediately poll the queue for the next queue item.

Closes #6189
2024-04-12 07:31:47 +10:00
651c0b39b1 clear cache on all exceptions 2024-04-12 07:19:16 +10:00
46d23cd868 catch RunTimeError during model to() call rather than OutOfMemoryError 2024-04-12 07:19:16 +10:00
dedf0c6ffa fix ruff issues 2024-04-12 07:19:16 +10:00
579082ac10 [mm] clear the cache entry for a model that got an OOM during loading 2024-04-12 07:19:16 +10:00
7bc77ddb40 fix(nodes): doubly-noised latents
When using refiner with a mask (i.e. inpainting), we don't have noise provided as an input to the node.

This situation uniquely hits a code path that wasn't reviewed when gradient denoising was implemented.

That code path does two things wrong:
- It lerp'd the input latents. This was fixed in 5a1f4cb1ce.
- It added noise to the latents an extra time. This is fixed in this change.

We don't need to add noise in `latents_from_embeddings` because we do it just a lines later in `AddsMaskGuidance`.

- Remove the extraneous call to `add_noise`
- Make `seed` a required arg. We never call the function without seed anyways. If we refactor this in the future, it will be clearer that we need to look at how seed is handled.
- Move the call to create the noise to a deeper conditional, just before we call `AddsMaskGuidance`. The created noise tensor is now only used in that function, no need to create it every time.

Note: Whether or not having both noise and latents as inputs on the node is correct is a separate conversation. This change just fixes the issue with the current setup.
2024-04-11 07:21:50 -04:00
026d095afe fix(nodes): do not set seed on output latents from denoise latents
`LatentsField` objects have an optional `seed` field. This should only be populated when the latents are noise, generated from a seed.

`DenoiseLatentsInvocation` needs a seed value for scheduler initialization. It's used in a few places, and there is some logic for determining the seed to use with a series of fallbacks:
- Use the seed from the noise (a `LatentsField` object)
- Use the seed from the latents (a `LatentsField` object - normally it won't have a seed)
- Use `0` as a final fallback

In `DenoisLatentsInvocation`, we set the seed in the `LatentsOutput`, even though the output latents are not noise.

This is normally fine, but when we use refiner, we re-use the those same latents for the refiner denoise. This causes that characteristic same-seed-fried look on the refiner pass.

Simple fix - do not set the field in the output latents.
2024-04-11 07:21:50 -04:00
7e2ade50e1 fix(ui): canvas staging area & batch handling fixes
Handful of intertwined fixes.

- Create and use helper function to reset staging area.
- Clear staging area when queue items are canceled, failed, cleared, etc. Fixes a bug where the bbox ends up offset and images are put into the wrong spot.
- Fix a number of similar bugs where canvas would "forget" it had pending generations, but they continued to generate. Canvas needs to track batches that should be displayed in it using `state.canvas.batchIds`, and this was getting cleared without actually canceling those batches.
- Disable the `discard current image` button on canvas if there is only one image. Prevents accidentally canceling all canvas batches if you spam the button.
2024-04-10 21:48:34 +10:00
c0d54d5414 Revert "always enqueue with fresh bounding box"
This reverts commit fae51da278b39c61cbbea5de88661b4bc546f1ce.
2024-04-10 21:48:34 +10:00
98bfbb73ac always enqueue with fresh bounding box 2024-04-10 21:48:34 +10:00
f9af32a6d1 Fix the padding behavior when max-pooling regional IP-Adapter masks to mirror the downscaling behavior of SD and SDXL. Prior to this change, denoising with input latent dimensions that were not evenly divisible by 8 would raise an exception. 2024-04-09 16:50:43 -04:00
fba40eb1bd Fix the padding behavior when max-pooling regional prompt masks to mirror the downscaling behavior of SD and SDXL. Prior to this change, denoising with input latent dimensions that were not evenly divisible by 8 would raise an exception. 2024-04-09 16:50:43 -04:00
69f6c24f52 Fix field ordering (#6186)
Changed fields to go in w/h x/y order.

## Summary

The prior ordering of height, then width, and y, then x, doesn't match
up with the expected UX. This has been changed.

## Checklist

- [X] _The PR has a short but descriptive title, suitable for a
changelog_
- [ ] _Tests added / updated (if applicable)_
- [ ] _Documentation added / updated (if applicable)_
2024-04-10 01:00:22 +05:30
80d631118d Fix field ordering
Changed fields to go in w/h x/y order.
2024-04-09 14:17:55 -05:00
0c6dd32ece (minor) Fix IP-Adapter conditional logic in CustomAttnProcessor2_0. 2024-04-09 15:06:51 -04:00
0bdbfd4d1d Add support for IP-Adapter masks. 2024-04-09 15:06:51 -04:00
2e27ed5f3d Pass IP-Adapter scales through the cross_attn_kwargs pathway, since they are the same for all attention layers. This change also helps to prepare for adding IP-Adapter region masks. 2024-04-09 15:06:51 -04:00
babdc64b17 (minor) Fix typo in IP-Adapter field description. 2024-04-09 15:06:51 -04:00
54327ec4a7 Remove documentation references to prompt-to-prompt cross-attention control. 2024-04-09 10:57:02 -04:00
4a828818da Remove support for Prompt-to-Prompt cross-attention control (aka .swap()). This feature is not widely used. It does not work with SDXL and is incompatible with IP-Adapter and regional prompting. The implementation is also intertwined with both text embedding and the UNet attention layers, resulting in a high maintenance burden. For all of these reasons, we have decided to drop support. 2024-04-09 10:57:02 -04:00
fe386252f3 Revert "feat(nodes): add prompt region from image nodes"
This reverts commit 3a531c5097.
2024-04-09 08:12:12 -04:00
182810337c Add utility to_standard_float_mask(...) to convert various mask formats to a standardized format. 2024-04-09 08:12:12 -04:00
338bf808d6 Rename MaskField to be a generice TensorField. 2024-04-09 08:12:12 -04:00
5b5a4204a1 Fix dimensions of mask produced by ExtractMasksAndPromptsInvocation. Also, added a clearer error message in case the same error is introduced in the future. 2024-04-09 08:12:12 -04:00
75ef473748 Pull the upstream changes from diffusers' AttnProcessor2_0 into CustomAttnProcessor2_0. This fixes a bug in CustomAttnProcessor2_0 that was being triggered when peft was not installed. The bug was present in a block of code that was previously copied from diffusers. The bug seems to have been introduced during diffusers' migration to PEFT for their LoRA handling. The upstream bug was fixed in 531e719163. 2024-04-09 08:12:12 -04:00
926b8d0efe feat(nodes): add prompt region from image nodes 2024-04-09 08:12:12 -04:00
9d9d1761f3 (minor) The latest ruff version has _slightly_ different formatting preferences. 2024-04-09 08:12:12 -04:00
a78df8123f Update the diffusion logic to use the new regional prompting feature. 2024-04-09 08:12:12 -04:00
7ca677578e Create a UNetAttentionPatcher for patching UNet models with CustomAttnProcessor2_0 modules. 2024-04-09 08:12:12 -04:00
31c456c1e6 Update CustomAttention to support both IP-Adapters and regional prompting. 2024-04-09 08:12:12 -04:00
2ce79b61f5 Initialize a RegionalPromptAttnProcessor2_0 class by copying AttnProcessor2_0 from diffusers. 2024-04-09 08:12:12 -04:00
109e3f0e7f Add RegionalPromptData class for managing prompt region masks. 2024-04-09 08:12:12 -04:00
dc64fec771 Add support for lists of prompt embeddings to be passed to the DenoiseLatents invocation, and add handling of the conditioning region masks in DenoiseLatents. 2024-04-09 08:12:12 -04:00
d1e45585d0 Add TextConditioningRegions to the TextConditioningData data structure. 2024-04-09 08:12:12 -04:00
aba023e0c5 Improve documentation of conditioning_data.py. 2024-04-09 08:12:12 -04:00
e354c29b52 Rename ConditioningData -> TextConditioningData. 2024-04-09 08:12:12 -04:00
a7f363e654 Split ip_adapter_conditioning out from ConditioningData. 2024-04-09 08:12:12 -04:00
9b2162e564 Remove scheduler_args from ConditioningData structure. 2024-04-09 08:12:12 -04:00
4e64b26702 Update compel nodes to accept an optional prompt mask. 2024-04-09 08:12:12 -04:00
c22d772062 Add RectangleMaskInvocation. 2024-04-09 08:12:12 -04:00
d6be7662c9 Add a MaskField primitive, and add a mask to the ConditioningField primitive type. 2024-04-09 08:12:12 -04:00
95050088d1 chore: lint fixes 2024-04-09 14:13:10 +10:00
94b5084cd5 fix: one man's max is another man's min 2024-04-09 14:13:10 +10:00
ca0d60bee6 fix: set coherence denoise to 0.2 min for refiner models 2024-04-09 14:13:10 +10:00
fd1f240853 fix: SDXL Refiner not working properly with Inpainting 2024-04-09 14:13:10 +10:00
381b41a56e fix: Update SDXL Refiner graphs to use Gradient Mask 2024-04-09 14:13:10 +10:00
b58494c420 feat(ui): add graph-to-workflow debug helper
This is intended for debug usage, so it's hidden away in the workflow library `...` menu. Hold shift to see the button for it.

- Paste a graph (from a network request, for example) and then click the convert button to convert it to a workflow.
- Disable auto layout to stack the nodes with an offset (try it out). If you change this, you must re-convert to get the changes.
- Edit the workflow JSON if you need to tweak something before loading it.
2024-04-08 20:38:04 -04:00
dca30d5462 (feat) add a method to get the path of an image from the invocation context
Fixes #6175
2024-04-08 18:42:55 +10:00
9ab6655491 feat(backend): clean up choose_precision
- Allow user-defined precision on MPS.
- Use more explicit logic to handle all possible cases.
- Add comments.
- Remove the app_config args (they were effectively unused, just get the config using the singleton getter util)
2024-04-07 09:41:05 -04:00
29cfe5a274 fix(ui): handle multipleOf on number fields
This data is already in the template but it wasn't ever used.

One big place where this improves UX is the noise node. Previously, the UI let you change width and height in increments of 1, despite the template requiring a multiple of 8. It now works in multiples of 8.
2024-04-06 13:15:20 -04:00
2c45697f3d translationBot(ui): update translation files
Updated by "Cleanup translation files" hook in Weblate.

Co-authored-by: Hosted Weblate <hosted@weblate.org>
Translate-URL: https://hosted.weblate.org/projects/invokeai/web-ui/
Translation: InvokeAI/Web UI
2024-04-06 15:19:20 +11:00
9a0a90e2a2 chore: v4.0.4 2024-04-06 15:15:16 +11:00
69f17da1a2 fix(nodes): add WithBoard to public API 2024-04-06 15:02:28 +11:00
4d0a49298c tidy(ui): remove extraneous zod schema 2024-04-06 14:54:12 +11:00
55f7a7737a feat(ui): shift around init image recall logic
Retrieving the DTO happens as part of the metadata parsing, not recall. This way, we don't show the option to recall a nonexistent image.

This matches the flow for other metadata entities like models - we don't show the model recall button if the model isn't available.
2024-04-06 14:54:12 +11:00
adc30045a6 addressed pr feedback 2024-04-06 14:54:12 +11:00
fdd0e57976 actually use the schema 2024-04-06 14:54:12 +11:00
9ba5ec4b67 fix typo Params set set 2024-04-06 14:54:12 +11:00
8a17616bf4 recall initial image from metadata and set to image2image 2024-04-06 14:54:12 +11:00
f56b9537cd added initial image to metadata viewer 2024-04-06 14:54:12 +11:00
a95756f3ed docs: update FAQ.md (shared GPU memory) 2024-04-06 14:35:36 +11:00
4068e817d6 fix(mm): typing issues in model cache 2024-04-06 14:35:36 +11:00
a09d705e4c fix(mm): remove vram check
This check prematurely reports insufficient VRAM on Windows. See #6106 for details.
2024-04-06 14:35:36 +11:00
540d506ec9 fix: Incorrect default clip vision opt in the node 2024-04-05 15:06:33 -04:00
101 changed files with 1914 additions and 1144 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 23 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 30 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 221 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 53 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 786 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 3.3 KiB

92
docs/features/GALLERY.md Normal file
View File

@ -0,0 +1,92 @@
---
title: InvokeAI Gallery Panel
---
# :material-web: InvokeAI Gallery Panel
## Quick guided walkthrough of the Gallery Panel's features
The Gallery Panel is a fast way to review, find, and make use of images you've
generated and loaded. The Gallery is divided into Boards. The Uncategorized board is always
present but you can create your own for better organization.
![image](../assets/gallery/gallery.png)
### Board Display and Settings
At the very top of the Gallery Panel are the boards disclosure and settings buttons.
![image](../assets/gallery/top_controls.png)
The disclosure button shows the name of the currently selected board and allows you to show and hide the board thumbnails (shown in the image below).
![image](../assets/gallery/board_thumbnails.png)
The settings button opens a list of options.
![image](../assets/gallery/board_settings.png)
- ***Image Size*** this slider lets you control the size of the image previews (images of three different sizes).
- ***Auto-Switch to New Images*** if you turn this on, whenever a new image is generated, it will automatically be loaded into the current image panel on the Text to Image tab and into the result panel on the [Image to Image](IMG2IMG.md) tab. This will happen invisibly if you are on any other tab when the image is generated.
- ***Auto-Assign Board on Click*** whenever an image is generated or saved, it always gets put in a board. The board it gets put into is marked with AUTO (image of board marked). Turning on Auto-Assign Board on Click will make whichever board you last selected be the destination when you click Invoke. That means you can click Invoke, select a different board, and then click Invoke again and the two images will be put in two different boards. (bold)It's the board selected when Invoke is clicked that's used, not the board that's selected when the image is finished generating.(bold) Turning this off, enables the Auto-Add Board drop down which lets you set one specific board to always put generated images into. This also enables and disables the Auto-add to this Board menu item described below.
- ***Always Show Image Size Badge*** this toggles whether to show image sizes for each image preview (show two images, one with sizes shown, one without)
Below these two buttons, you'll see the Search Boards text entry area. You use this to search for specific boards by the name of the board.
Next to it is the Add Board (+) button which lets you add new boards. Boards can be renamed by clicking on the name of the board under its thumbnail and typing in the new name.
### Board Thumbnail Menu
Each board has a context menu (ctrl+click / right-click).
![image](../assets/gallery/thumbnail_menu.png)
- ***Auto-add to this Board*** if you've disabled Auto-Assign Board on Click in the board settings, you can use this option to set this board to be where new images are put.
- ***Download Board*** this will add all the images in the board into a zip file and provide a link to it in a notification (image of notification)
- ***Delete Board*** this will delete the board
> [!CAUTION]
> This will delete all the images in the board and the board itself.
### Board Contents
Every board is organized by two tabs, Images and Assets.
![image](../assets/gallery/board_tabs.png)
Images are the Invoke-generated images that are placed into the board. Assets are images that you upload into Invoke to be used as an [Image Prompt](https://support.invoke.ai/support/solutions/articles/151000159340-using-the-image-prompt-adapter-ip-adapter-) or in the [Image to Image](IMG2IMG.md) tab.
### Image Thumbnail Menu
Every image generated by Invoke has its generation information stored as text inside the image file itself. This can be read directly by selecting the image and clicking on the Info button ![image](../assets/gallery/info_button.png) in any of the image result panels.
Each image also has a context menu (ctrl+click / right-click).
![image](../assets/gallery/image_menu.png)
The options are (items marked with an * will not work with images that lack generation information):
- ***Open in New Tab*** this will open the image alone in a new browser tab, separate from the Invoke interface.
- ***Download Image*** this will trigger your browser to download the image.
- ***Load Workflow **** this will load any workflow settings into the Workflow tab and automatically open it.
- ***Remix Image **** this will load all of the image's generation information, (bold)excluding its Seed, into the left hand control panel
- ***Use Prompt **** this will load only the image's text prompts into the left-hand control panel
- ***Use Seed **** this will load only the image's Seed into the left-hand control panel
- ***Use All **** this will load all of the image's generation information into the left-hand control panel
- ***Send to Image to Image*** this will put the image into the left-hand panel in the Image to Image tab ana automatically open it
- ***Send to Unified Canvas*** This will (bold)replace whatever is already present(bold) in the Unified Canvas tab with the image and automatically open the tab
- ***Change Board*** this will oipen a small window that will let you move the image to a different board. This is the same as dragging the image to that board's thumbnail.
- ***Star Image*** this will add the image to the board's list of starred images that are always kept at the top of the gallery. This is the same as clicking on the star on the top right-hand side of the image that appears when you hover over the image with the mouse
- ***Delete Image*** this will delete the image from the board
> [!CAUTION]
> This will delete the image entirely from Invoke.
## Summary
This walkthrough only covers the Gallery interface and Boards. Actually generating images is handled by [Prompts](PROMPTS.md), the [Image to Image](IMG2IMG.md) tab, and the [Unified Canvas](UNIFIED_CANVAS.md).
## Acknowledgements
A huge shout-out to the core team working to make the Web GUI a reality,
including [psychedelicious](https://github.com/psychedelicious),
[Kyle0654](https://github.com/Kyle0654) and
[blessedcoolant](https://github.com/blessedcoolant).
[hipsterusername](https://github.com/hipsterusername) was the team's unofficial
cheerleader and added tooltips/docs.

View File

@ -108,40 +108,6 @@ Can be used with .and():
Each will give you different results - try them out and see what you prefer!
### Cross-Attention Control ('prompt2prompt')
Sometimes an image you generate is almost right, and you just want to change one
detail without affecting the rest. You could use a photo editor and inpainting
to overpaint the area, but that's a pain. Here's where `prompt2prompt` comes in
handy.
Generate an image with a given prompt, record the seed of the image, and then
use the `prompt2prompt` syntax to substitute words in the original prompt for
words in a new prompt. This works for `img2img` as well.
For example, consider the prompt `a cat.swap(dog) playing with a ball in the forest`. Normally, because the words interact with each other when doing a stable diffusion image generation, these two prompts would generate different compositions:
- `a cat playing with a ball in the forest`
- `a dog playing with a ball in the forest`
| `a cat playing with a ball in the forest` | `a dog playing with a ball in the forest` |
| --- | --- |
| img | img |
- For multiple word swaps, use parentheses: `a (fluffy cat).swap(barking dog) playing with a ball in the forest`.
- To swap a comma, use quotes: `a ("fluffy, grey cat").swap("big, barking dog") playing with a ball in the forest`.
- Supports options `t_start` and `t_end` (each 0-1) loosely corresponding to (bloc97's)[(https://github.com/bloc97/CrossAttentionControl)] `prompt_edit_tokens_start/_end` but with the math swapped to make it easier to
intuitively understand. `t_start` and `t_end` are used to control on which steps cross-attention control should run. With the default values `t_start=0` and `t_end=1`, cross-attention control is active on every step of image generation. Other values can be used to turn cross-attention control off for part of the image generation process.
- For example, if doing a diffusion with 10 steps for the prompt is `a cat.swap(dog, t_start=0.3, t_end=1.0) playing with a ball in the forest`, the first 3 steps will be run as `a cat playing with a ball in the forest`, while the last 7 steps will run as `a dog playing with a ball in the forest`, but the pixels that represent `dog` will be locked to the pixels that would have represented `cat` if the `cat` prompt had been used instead.
- Conversely, for `a cat.swap(dog, t_start=0, t_end=0.7) playing with a ball in the forest`, the first 7 steps will run as `a dog playing with a ball in the forest` with the pixels that represent `dog` locked to the same pixels that would have represented `cat` if the `cat` prompt was being used instead. The final 3 steps will just run `a cat playing with a ball in the forest`.
> For img2img, the step sequence does not start at 0 but instead at `(1.0-strength)` - so if the img2img `strength` is `0.7`, `t_start` and `t_end` must both be greater than `0.3` (`1.0-0.7`) to have any effect.
Prompt2prompt `.swap()` is not compatible with xformers, which will be temporarily disabled when doing a `.swap()` - so you should expect to use more VRAM and run slower that with xformers enabled.
The `prompt2prompt` code is based off
[bloc97's colab](https://github.com/bloc97/CrossAttentionControl).
### Escaping parentheses and speech marks
If the model you are using has parentheses () or speech marks "" as part of its

View File

@ -54,7 +54,7 @@ main sections:
of buttons at the top lets you modify and manipulate the image in
various ways.
3. A **gallery** section on the left that contains a history of the images you
3. A **gallery** section on the right that contains a history of the images you
have generated. These images are read and written to the directory specified
in the `INVOKEAIROOT/invokeai.yaml` initialization file, usually a directory
named `outputs` in `INVOKEAIROOT`.

View File

@ -40,6 +40,25 @@ Follow the same steps to scan and import the missing models.
- Check the `ram` setting in `invokeai.yaml`. This setting tells Invoke how much of your system RAM can be used to cache models. Having this too high or too low can slow things down. That said, it's generally safest to not set this at all and instead let Invoke manage it.
- Check the `vram` setting in `invokeai.yaml`. This setting tells Invoke how much of your GPU VRAM can be used to cache models. Counter-intuitively, if this setting is too high, Invoke will need to do a lot of shuffling of models as it juggles the VRAM cache and the currently-loaded model. The default value of 0.25 is generally works well for GPUs without 16GB or more VRAM. Even on a 24GB card, the default works well.
- Check that your generations are happening on your GPU (if you have one). InvokeAI will log what is being used for generation upon startup. If your GPU isn't used, re-install to ensure the correct versions of torch get installed.
- If you are on Windows, you may have exceeded your GPU's VRAM capacity and are using slower [shared GPU memory](#shared-gpu-memory-windows). There's a guide to opt out of this behaviour in the linked FAQ entry.
## Shared GPU Memory (Windows)
!!! tip "Nvidia GPUs with driver 536.40"
This only applies to current Nvidia cards with driver 536.40 or later, released in June 2023.
When the GPU doesn't have enough VRAM for a task, Windows is able to allocate some of its CPU RAM to the GPU. This is much slower than VRAM, but it does allow the system to generate when it otherwise might no have enough VRAM.
When shared GPU memory is used, generation slows down dramatically - but at least it doesn't crash.
If you'd like to opt out of this behavior and instead get an error when you exceed your GPU's VRAM, follow [this guide from Nvidia](https://nvidia.custhelp.com/app/answers/detail/a_id/5490).
Here's how to get the python path required in the linked guide:
- Run `invoke.bat`.
- Select option 2 for developer console.
- At least one python path will be printed. Copy the path that includes your invoke installation directory (typically the first).
## Installer cannot find python (Windows)

View File

@ -23,6 +23,7 @@ If you have an interest in how InvokeAI works, or you would like to add features
1. [Fork and clone] the [InvokeAI repo].
1. Follow the [manual installation] docs to create a new virtual environment for the development install.
- Create a new folder outside the repo root for the installation and create the venv inside that folder.
- When installing the InvokeAI package, add `-e` to the command so you get an [editable install].
1. Install the [frontend dev toolchain] and do a production build of the UI as described.
1. You can now run the app as described in the [manual installation] docs.

View File

@ -5,7 +5,15 @@ from compel import Compel, ReturnedEmbeddingsType
from compel.prompt_parser import Blend, Conjunction, CrossAttentionControlSubstitute, FlattenedPrompt, Fragment
from transformers import CLIPTextModel, CLIPTextModelWithProjection, CLIPTokenizer
from invokeai.app.invocations.fields import FieldDescriptions, Input, InputField, OutputField, UIComponent
from invokeai.app.invocations.fields import (
ConditioningField,
FieldDescriptions,
Input,
InputField,
OutputField,
TensorField,
UIComponent,
)
from invokeai.app.invocations.primitives import ConditioningOutput
from invokeai.app.services.shared.invocation_context import InvocationContext
from invokeai.app.util.ti_utils import generate_ti_list
@ -14,7 +22,6 @@ from invokeai.backend.model_patcher import ModelPatcher
from invokeai.backend.stable_diffusion.diffusion.conditioning_data import (
BasicConditioningInfo,
ConditioningFieldData,
ExtraConditioningInfo,
SDXLConditioningInfo,
)
from invokeai.backend.util.devices import torch_dtype
@ -36,7 +43,7 @@ from .model import CLIPField
title="Prompt",
tags=["prompt", "compel"],
category="conditioning",
version="1.1.1",
version="1.2.0",
)
class CompelInvocation(BaseInvocation):
"""Parse prompt using compel package to conditioning."""
@ -51,6 +58,9 @@ class CompelInvocation(BaseInvocation):
description=FieldDescriptions.clip,
input=Input.Connection,
)
mask: Optional[TensorField] = InputField(
default=None, description="A mask defining the region that this conditioning prompt applies to."
)
@torch.no_grad()
def invoke(self, context: InvocationContext) -> ConditioningOutput:
@ -98,27 +108,19 @@ class CompelInvocation(BaseInvocation):
if context.config.get().log_tokenization:
log_tokenization_for_conjunction(conjunction, tokenizer)
c, options = compel.build_conditioning_tensor_for_conjunction(conjunction)
ec = ExtraConditioningInfo(
tokens_count_including_eos_bos=get_max_token_count(tokenizer, conjunction),
cross_attention_control_args=options.get("cross_attention_control", None),
)
c, _options = compel.build_conditioning_tensor_for_conjunction(conjunction)
c = c.detach().to("cpu")
conditioning_data = ConditioningFieldData(
conditionings=[
BasicConditioningInfo(
embeds=c,
extra_conditioning=ec,
)
]
)
conditioning_data = ConditioningFieldData(conditionings=[BasicConditioningInfo(embeds=c)])
conditioning_name = context.conditioning.save(conditioning_data)
return ConditioningOutput.build(conditioning_name)
return ConditioningOutput(
conditioning=ConditioningField(
conditioning_name=conditioning_name,
mask=self.mask,
)
)
class SDXLPromptInvocationBase:
@ -132,7 +134,7 @@ class SDXLPromptInvocationBase:
get_pooled: bool,
lora_prefix: str,
zero_on_empty: bool,
) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[ExtraConditioningInfo]]:
) -> Tuple[torch.Tensor, Optional[torch.Tensor]]:
tokenizer_info = context.models.load(clip_field.tokenizer)
tokenizer_model = tokenizer_info.model
assert isinstance(tokenizer_model, CLIPTokenizer)
@ -159,7 +161,7 @@ class SDXLPromptInvocationBase:
)
else:
c_pooled = None
return c, c_pooled, None
return c, c_pooled
def _lora_loader() -> Iterator[Tuple[LoRAModelRaw, float]]:
for lora in clip_field.loras:
@ -204,17 +206,12 @@ class SDXLPromptInvocationBase:
log_tokenization_for_conjunction(conjunction, tokenizer)
# TODO: ask for optimizations? to not run text_encoder twice
c, options = compel.build_conditioning_tensor_for_conjunction(conjunction)
c, _options = compel.build_conditioning_tensor_for_conjunction(conjunction)
if get_pooled:
c_pooled = compel.conditioning_provider.get_pooled_embeddings([prompt])
else:
c_pooled = None
ec = ExtraConditioningInfo(
tokens_count_including_eos_bos=get_max_token_count(tokenizer, conjunction),
cross_attention_control_args=options.get("cross_attention_control", None),
)
del tokenizer
del text_encoder
del tokenizer_info
@ -224,7 +221,7 @@ class SDXLPromptInvocationBase:
if c_pooled is not None:
c_pooled = c_pooled.detach().to("cpu")
return c, c_pooled, ec
return c, c_pooled
@invocation(
@ -232,7 +229,7 @@ class SDXLPromptInvocationBase:
title="SDXL Prompt",
tags=["sdxl", "compel", "prompt"],
category="conditioning",
version="1.1.1",
version="1.2.0",
)
class SDXLCompelPromptInvocation(BaseInvocation, SDXLPromptInvocationBase):
"""Parse prompt using compel package to conditioning."""
@ -255,20 +252,19 @@ class SDXLCompelPromptInvocation(BaseInvocation, SDXLPromptInvocationBase):
target_height: int = InputField(default=1024, description="")
clip: CLIPField = InputField(description=FieldDescriptions.clip, input=Input.Connection, title="CLIP 1")
clip2: CLIPField = InputField(description=FieldDescriptions.clip, input=Input.Connection, title="CLIP 2")
mask: Optional[TensorField] = InputField(
default=None, description="A mask defining the region that this conditioning prompt applies to."
)
@torch.no_grad()
def invoke(self, context: InvocationContext) -> ConditioningOutput:
c1, c1_pooled, ec1 = self.run_clip_compel(
context, self.clip, self.prompt, False, "lora_te1_", zero_on_empty=True
)
c1, c1_pooled = self.run_clip_compel(context, self.clip, self.prompt, False, "lora_te1_", zero_on_empty=True)
if self.style.strip() == "":
c2, c2_pooled, ec2 = self.run_clip_compel(
c2, c2_pooled = self.run_clip_compel(
context, self.clip2, self.prompt, True, "lora_te2_", zero_on_empty=True
)
else:
c2, c2_pooled, ec2 = self.run_clip_compel(
context, self.clip2, self.style, True, "lora_te2_", zero_on_empty=True
)
c2, c2_pooled = self.run_clip_compel(context, self.clip2, self.style, True, "lora_te2_", zero_on_empty=True)
original_size = (self.original_height, self.original_width)
crop_coords = (self.crop_top, self.crop_left)
@ -307,17 +303,19 @@ class SDXLCompelPromptInvocation(BaseInvocation, SDXLPromptInvocationBase):
conditioning_data = ConditioningFieldData(
conditionings=[
SDXLConditioningInfo(
embeds=torch.cat([c1, c2], dim=-1),
pooled_embeds=c2_pooled,
add_time_ids=add_time_ids,
extra_conditioning=ec1,
embeds=torch.cat([c1, c2], dim=-1), pooled_embeds=c2_pooled, add_time_ids=add_time_ids
)
]
)
conditioning_name = context.conditioning.save(conditioning_data)
return ConditioningOutput.build(conditioning_name)
return ConditioningOutput(
conditioning=ConditioningField(
conditioning_name=conditioning_name,
mask=self.mask,
)
)
@invocation(
@ -345,7 +343,7 @@ class SDXLRefinerCompelPromptInvocation(BaseInvocation, SDXLPromptInvocationBase
@torch.no_grad()
def invoke(self, context: InvocationContext) -> ConditioningOutput:
# TODO: if there will appear lora for refiner - write proper prefix
c2, c2_pooled, ec2 = self.run_clip_compel(context, self.clip2, self.style, True, "<NONE>", zero_on_empty=False)
c2, c2_pooled = self.run_clip_compel(context, self.clip2, self.style, True, "<NONE>", zero_on_empty=False)
original_size = (self.original_height, self.original_width)
crop_coords = (self.crop_top, self.crop_left)
@ -354,14 +352,7 @@ class SDXLRefinerCompelPromptInvocation(BaseInvocation, SDXLPromptInvocationBase
assert c2_pooled is not None
conditioning_data = ConditioningFieldData(
conditionings=[
SDXLConditioningInfo(
embeds=c2,
pooled_embeds=c2_pooled,
add_time_ids=add_time_ids,
extra_conditioning=ec2, # or None
)
]
conditionings=[SDXLConditioningInfo(embeds=c2, pooled_embeds=c2_pooled, add_time_ids=add_time_ids)]
)
conditioning_name = context.conditioning.save(conditioning_data)

View File

@ -203,6 +203,12 @@ class DenoiseMaskField(BaseModel):
gradient: bool = Field(default=False, description="Used for gradient inpainting")
class TensorField(BaseModel):
"""A tensor primitive field."""
tensor_name: str = Field(description="The name of a tensor.")
class LatentsField(BaseModel):
"""A latents tensor primitive field"""
@ -226,7 +232,11 @@ class ConditioningField(BaseModel):
"""A conditioning tensor primitive value"""
conditioning_name: str = Field(description="The name of conditioning tensor")
# endregion
mask: Optional[TensorField] = Field(
default=None,
description="The mask associated with this conditioning tensor. Excluded regions should be set to False, "
"included regions should be set to True.",
)
class MetadataField(RootModel[dict[str, Any]]):

View File

@ -1,11 +1,23 @@
from builtins import float
from typing import List, Literal, Union
from typing import List, Literal, Optional, Union
from pydantic import BaseModel, Field, field_validator, model_validator
from typing_extensions import Self
from invokeai.app.invocations.baseinvocation import BaseInvocation, BaseInvocationOutput, invocation, invocation_output
from invokeai.app.invocations.fields import FieldDescriptions, Input, InputField, OutputField, UIType
from invokeai.app.invocations.baseinvocation import (
BaseInvocation,
BaseInvocationOutput,
invocation,
invocation_output,
)
from invokeai.app.invocations.fields import (
FieldDescriptions,
Input,
InputField,
OutputField,
TensorField,
UIType,
)
from invokeai.app.invocations.model import ModelIdentifierField
from invokeai.app.invocations.primitives import ImageField
from invokeai.app.invocations.util import validate_begin_end_step, validate_weights
@ -23,13 +35,18 @@ class IPAdapterField(BaseModel):
image: Union[ImageField, List[ImageField]] = Field(description="The IP-Adapter image prompt(s).")
ip_adapter_model: ModelIdentifierField = Field(description="The IP-Adapter model to use.")
image_encoder_model: ModelIdentifierField = Field(description="The name of the CLIP image encoder model.")
weight: Union[float, List[float]] = Field(default=1, description="The weight given to the ControlNet")
weight: Union[float, List[float]] = Field(default=1, description="The weight given to the IP-Adapter.")
begin_step_percent: float = Field(
default=0, ge=0, le=1, description="When the IP-Adapter is first applied (% of total steps)"
)
end_step_percent: float = Field(
default=1, ge=0, le=1, description="When the IP-Adapter is last applied (% of total steps)"
)
mask: Optional[TensorField] = Field(
default=None,
description="The bool mask associated with this IP-Adapter. Excluded regions should be set to False, included "
"regions should be set to True.",
)
@field_validator("weight")
@classmethod
@ -52,7 +69,7 @@ class IPAdapterOutput(BaseInvocationOutput):
CLIP_VISION_MODEL_MAP = {"ViT-H": "ip_adapter_sd_image_encoder", "ViT-G": "ip_adapter_sdxl_image_encoder"}
@invocation("ip_adapter", title="IP-Adapter", tags=["ip_adapter", "control"], category="ip_adapter", version="1.2.2")
@invocation("ip_adapter", title="IP-Adapter", tags=["ip_adapter", "control"], category="ip_adapter", version="1.3.0")
class IPAdapterInvocation(BaseInvocation):
"""Collects IP-Adapter info to pass to other nodes."""
@ -67,7 +84,7 @@ class IPAdapterInvocation(BaseInvocation):
)
clip_vision_model: Literal["ViT-H", "ViT-G"] = InputField(
description="CLIP Vision model to use. Overrides model settings. Mandatory for checkpoint models.",
default="auto",
default="ViT-H",
ui_order=2,
)
weight: Union[float, List[float]] = InputField(
@ -79,6 +96,9 @@ class IPAdapterInvocation(BaseInvocation):
end_step_percent: float = InputField(
default=1, ge=0, le=1, description="When the IP-Adapter is last applied (% of total steps)"
)
mask: Optional[TensorField] = InputField(
default=None, description="A mask defining the region that this IP-Adapter applies to."
)
@field_validator("weight")
@classmethod
@ -112,6 +132,7 @@ class IPAdapterInvocation(BaseInvocation):
weight=self.weight,
begin_step_percent=self.begin_step_percent,
end_step_percent=self.end_step_percent,
mask=self.mask,
),
)

View File

@ -1,5 +1,5 @@
# Copyright (c) 2023 Kyle Schouviller (https://github.com/kyle0654)
import inspect
import math
from contextlib import ExitStack
from functools import singledispatchmethod
@ -9,6 +9,7 @@ import einops
import numpy as np
import numpy.typing as npt
import torch
import torchvision
import torchvision.transforms as T
from diffusers import AutoencoderKL, AutoencoderTiny
from diffusers.configuration_utils import ConfigMixin
@ -52,12 +53,20 @@ from invokeai.backend.lora import LoRAModelRaw
from invokeai.backend.model_manager import BaseModelType, LoadedModel
from invokeai.backend.model_patcher import ModelPatcher
from invokeai.backend.stable_diffusion import PipelineIntermediateState, set_seamless
from invokeai.backend.stable_diffusion.diffusion.conditioning_data import ConditioningData, IPAdapterConditioningInfo
from invokeai.backend.stable_diffusion.diffusion.conditioning_data import (
BasicConditioningInfo,
IPAdapterConditioningInfo,
IPAdapterData,
Range,
SDXLConditioningInfo,
TextConditioningData,
TextConditioningRegions,
)
from invokeai.backend.util.mask import to_standard_float_mask
from invokeai.backend.util.silence_warnings import SilenceWarnings
from ...backend.stable_diffusion.diffusers_pipeline import (
ControlNetData,
IPAdapterData,
StableDiffusionGeneratorPipeline,
T2IAdapterData,
image_resized_to_grid_as_tensor,
@ -275,10 +284,10 @@ def get_scheduler(
class DenoiseLatentsInvocation(BaseInvocation):
"""Denoises noisy latents to decodable images"""
positive_conditioning: ConditioningField = InputField(
positive_conditioning: Union[ConditioningField, list[ConditioningField]] = InputField(
description=FieldDescriptions.positive_cond, input=Input.Connection, ui_order=0
)
negative_conditioning: ConditioningField = InputField(
negative_conditioning: Union[ConditioningField, list[ConditioningField]] = InputField(
description=FieldDescriptions.negative_cond, input=Input.Connection, ui_order=1
)
noise: Optional[LatentsField] = InputField(
@ -356,33 +365,168 @@ class DenoiseLatentsInvocation(BaseInvocation):
raise ValueError("cfg_scale must be greater than 1")
return v
def _get_text_embeddings_and_masks(
self,
cond_list: list[ConditioningField],
context: InvocationContext,
device: torch.device,
dtype: torch.dtype,
) -> tuple[Union[list[BasicConditioningInfo], list[SDXLConditioningInfo]], list[Optional[torch.Tensor]]]:
"""Get the text embeddings and masks from the input conditioning fields."""
text_embeddings: Union[list[BasicConditioningInfo], list[SDXLConditioningInfo]] = []
text_embeddings_masks: list[Optional[torch.Tensor]] = []
for cond in cond_list:
cond_data = context.conditioning.load(cond.conditioning_name)
text_embeddings.append(cond_data.conditionings[0].to(device=device, dtype=dtype))
mask = cond.mask
if mask is not None:
mask = context.tensors.load(mask.tensor_name)
text_embeddings_masks.append(mask)
return text_embeddings, text_embeddings_masks
def _preprocess_regional_prompt_mask(
self, mask: Optional[torch.Tensor], target_height: int, target_width: int, dtype: torch.dtype
) -> torch.Tensor:
"""Preprocess a regional prompt mask to match the target height and width.
If mask is None, returns a mask of all ones with the target height and width.
If mask is not None, resizes the mask to the target height and width using 'nearest' interpolation.
Returns:
torch.Tensor: The processed mask. shape: (1, 1, target_height, target_width).
"""
if mask is None:
return torch.ones((1, 1, target_height, target_width), dtype=dtype)
mask = to_standard_float_mask(mask, out_dtype=dtype)
tf = torchvision.transforms.Resize(
(target_height, target_width), interpolation=torchvision.transforms.InterpolationMode.NEAREST
)
# Add a batch dimension to the mask, because torchvision expects shape (batch, channels, h, w).
mask = mask.unsqueeze(0) # Shape: (1, h, w) -> (1, 1, h, w)
resized_mask = tf(mask)
return resized_mask
def _concat_regional_text_embeddings(
self,
text_conditionings: Union[list[BasicConditioningInfo], list[SDXLConditioningInfo]],
masks: Optional[list[Optional[torch.Tensor]]],
latent_height: int,
latent_width: int,
dtype: torch.dtype,
) -> tuple[Union[BasicConditioningInfo, SDXLConditioningInfo], Optional[TextConditioningRegions]]:
"""Concatenate regional text embeddings into a single embedding and track the region masks accordingly."""
if masks is None:
masks = [None] * len(text_conditionings)
assert len(text_conditionings) == len(masks)
is_sdxl = type(text_conditionings[0]) is SDXLConditioningInfo
all_masks_are_none = all(mask is None for mask in masks)
text_embedding = []
pooled_embedding = None
add_time_ids = None
cur_text_embedding_len = 0
processed_masks = []
embedding_ranges = []
for prompt_idx, text_embedding_info in enumerate(text_conditionings):
mask = masks[prompt_idx]
if is_sdxl:
# We choose a random SDXLConditioningInfo's pooled_embeds and add_time_ids here, with a preference for
# prompts without a mask. We prefer prompts without a mask, because they are more likely to contain
# global prompt information. In an ideal case, there should be exactly one global prompt without a
# mask, but we don't enforce this.
# HACK(ryand): The fact that we have to choose a single pooled_embedding and add_time_ids here is a
# fundamental interface issue. The SDXL Compel nodes are not designed to be used in the way that we use
# them for regional prompting. Ideally, the DenoiseLatents invocation should accept a single
# pooled_embeds tensor and a list of standard text embeds with region masks. This change would be a
# pretty major breaking change to a popular node, so for now we use this hack.
if pooled_embedding is None or mask is None:
pooled_embedding = text_embedding_info.pooled_embeds
if add_time_ids is None or mask is None:
add_time_ids = text_embedding_info.add_time_ids
text_embedding.append(text_embedding_info.embeds)
if not all_masks_are_none:
embedding_ranges.append(
Range(
start=cur_text_embedding_len, end=cur_text_embedding_len + text_embedding_info.embeds.shape[1]
)
)
processed_masks.append(
self._preprocess_regional_prompt_mask(mask, latent_height, latent_width, dtype=dtype)
)
cur_text_embedding_len += text_embedding_info.embeds.shape[1]
text_embedding = torch.cat(text_embedding, dim=1)
assert len(text_embedding.shape) == 3 # batch_size, seq_len, token_len
regions = None
if not all_masks_are_none:
regions = TextConditioningRegions(
masks=torch.cat(processed_masks, dim=1),
ranges=embedding_ranges,
)
if is_sdxl:
return SDXLConditioningInfo(
embeds=text_embedding, pooled_embeds=pooled_embedding, add_time_ids=add_time_ids
), regions
return BasicConditioningInfo(embeds=text_embedding), regions
def get_conditioning_data(
self,
context: InvocationContext,
scheduler: Scheduler,
unet: UNet2DConditionModel,
seed: int,
) -> ConditioningData:
positive_cond_data = context.conditioning.load(self.positive_conditioning.conditioning_name)
c = positive_cond_data.conditionings[0].to(device=unet.device, dtype=unet.dtype)
latent_height: int,
latent_width: int,
) -> TextConditioningData:
# Normalize self.positive_conditioning and self.negative_conditioning to lists.
cond_list = self.positive_conditioning
if not isinstance(cond_list, list):
cond_list = [cond_list]
uncond_list = self.negative_conditioning
if not isinstance(uncond_list, list):
uncond_list = [uncond_list]
negative_cond_data = context.conditioning.load(self.negative_conditioning.conditioning_name)
uc = negative_cond_data.conditionings[0].to(device=unet.device, dtype=unet.dtype)
conditioning_data = ConditioningData(
unconditioned_embeddings=uc,
text_embeddings=c,
guidance_scale=self.cfg_scale,
guidance_rescale_multiplier=self.cfg_rescale_multiplier,
cond_text_embeddings, cond_text_embedding_masks = self._get_text_embeddings_and_masks(
cond_list, context, unet.device, unet.dtype
)
uncond_text_embeddings, uncond_text_embedding_masks = self._get_text_embeddings_and_masks(
uncond_list, context, unet.device, unet.dtype
)
conditioning_data = conditioning_data.add_scheduler_args_if_applicable( # FIXME
scheduler,
# for ddim scheduler
eta=0.0, # ddim_eta
# for ancestral and sde schedulers
# flip all bits to have noise different from initial
generator=torch.Generator(device=unet.device).manual_seed(seed ^ 0xFFFFFFFF),
cond_text_embedding, cond_regions = self._concat_regional_text_embeddings(
text_conditionings=cond_text_embeddings,
masks=cond_text_embedding_masks,
latent_height=latent_height,
latent_width=latent_width,
dtype=unet.dtype,
)
uncond_text_embedding, uncond_regions = self._concat_regional_text_embeddings(
text_conditionings=uncond_text_embeddings,
masks=uncond_text_embedding_masks,
latent_height=latent_height,
latent_width=latent_width,
dtype=unet.dtype,
)
conditioning_data = TextConditioningData(
uncond_text=uncond_text_embedding,
cond_text=cond_text_embedding,
uncond_regions=uncond_regions,
cond_regions=cond_regions,
guidance_scale=self.cfg_scale,
guidance_rescale_multiplier=self.cfg_rescale_multiplier,
)
return conditioning_data
@ -488,8 +632,10 @@ class DenoiseLatentsInvocation(BaseInvocation):
self,
context: InvocationContext,
ip_adapter: Optional[Union[IPAdapterField, list[IPAdapterField]]],
conditioning_data: ConditioningData,
exit_stack: ExitStack,
latent_height: int,
latent_width: int,
dtype: torch.dtype,
) -> Optional[list[IPAdapterData]]:
"""If IP-Adapter is enabled, then this function loads the requisite models, and adds the image prompt embeddings
to the `conditioning_data` (in-place).
@ -505,7 +651,6 @@ class DenoiseLatentsInvocation(BaseInvocation):
return None
ip_adapter_data_list = []
conditioning_data.ip_adapter_conditioning = []
for single_ip_adapter in ip_adapter:
ip_adapter_model: Union[IPAdapter, IPAdapterPlus] = exit_stack.enter_context(
context.models.load(single_ip_adapter.ip_adapter_model)
@ -528,9 +673,10 @@ class DenoiseLatentsInvocation(BaseInvocation):
single_ipa_images, image_encoder_model
)
conditioning_data.ip_adapter_conditioning.append(
IPAdapterConditioningInfo(image_prompt_embeds, uncond_image_prompt_embeds)
)
mask = single_ip_adapter.mask
if mask is not None:
mask = context.tensors.load(mask.tensor_name)
mask = self._preprocess_regional_prompt_mask(mask, latent_height, latent_width, dtype=dtype)
ip_adapter_data_list.append(
IPAdapterData(
@ -538,6 +684,8 @@ class DenoiseLatentsInvocation(BaseInvocation):
weight=single_ip_adapter.weight,
begin_step_percent=single_ip_adapter.begin_step_percent,
end_step_percent=single_ip_adapter.end_step_percent,
ip_adapter_conditioning=IPAdapterConditioningInfo(image_prompt_embeds, uncond_image_prompt_embeds),
mask=mask,
)
)
@ -627,6 +775,7 @@ class DenoiseLatentsInvocation(BaseInvocation):
steps: int,
denoising_start: float,
denoising_end: float,
seed: int,
) -> Tuple[int, List[int], int]:
assert isinstance(scheduler, ConfigMixin)
if scheduler.config.get("cpu_only", False):
@ -655,7 +804,15 @@ class DenoiseLatentsInvocation(BaseInvocation):
timesteps = timesteps[t_start_idx : t_start_idx + t_end_idx]
num_inference_steps = len(timesteps) // scheduler.order
return num_inference_steps, timesteps, init_timestep
scheduler_step_kwargs = {}
scheduler_step_signature = inspect.signature(scheduler.step)
if "generator" in scheduler_step_signature.parameters:
# At some point, someone decided that schedulers that accept a generator should use the original seed with
# all bits flipped. I don't know the original rationale for this, but now we must keep it like this for
# reproducibility.
scheduler_step_kwargs = {"generator": torch.Generator(device=device).manual_seed(seed ^ 0xFFFFFFFF)}
return num_inference_steps, timesteps, init_timestep, scheduler_step_kwargs
def prep_inpaint_mask(
self, context: InvocationContext, latents: torch.Tensor
@ -749,7 +906,11 @@ class DenoiseLatentsInvocation(BaseInvocation):
)
pipeline = self.create_pipeline(unet, scheduler)
conditioning_data = self.get_conditioning_data(context, scheduler, unet, seed)
_, _, latent_height, latent_width = latents.shape
conditioning_data = self.get_conditioning_data(
context=context, unet=unet, latent_height=latent_height, latent_width=latent_width
)
controlnet_data = self.prep_control_data(
context=context,
@ -763,16 +924,19 @@ class DenoiseLatentsInvocation(BaseInvocation):
ip_adapter_data = self.prep_ip_adapter_data(
context=context,
ip_adapter=self.ip_adapter,
conditioning_data=conditioning_data,
exit_stack=exit_stack,
latent_height=latent_height,
latent_width=latent_width,
dtype=unet.dtype,
)
num_inference_steps, timesteps, init_timestep = self.init_scheduler(
num_inference_steps, timesteps, init_timestep, scheduler_step_kwargs = self.init_scheduler(
scheduler,
device=unet.device,
steps=self.steps,
denoising_start=self.denoising_start,
denoising_end=self.denoising_end,
seed=seed,
)
result_latents = pipeline.latents_from_embeddings(
@ -785,6 +949,7 @@ class DenoiseLatentsInvocation(BaseInvocation):
masked_latents=masked_latents,
gradient_mask=gradient_mask,
num_inference_steps=num_inference_steps,
scheduler_step_kwargs=scheduler_step_kwargs,
conditioning_data=conditioning_data,
control_data=controlnet_data,
ip_adapter_data=ip_adapter_data,
@ -799,7 +964,7 @@ class DenoiseLatentsInvocation(BaseInvocation):
mps.empty_cache()
name = context.tensors.save(tensor=result_latents)
return LatentsOutput.build(latents_name=name, latents=result_latents, seed=seed)
return LatentsOutput.build(latents_name=name, latents=result_latents, seed=None)
@invocation(

View File

@ -0,0 +1,36 @@
import torch
from invokeai.app.invocations.baseinvocation import BaseInvocation, InvocationContext, invocation
from invokeai.app.invocations.fields import InputField, TensorField, WithMetadata
from invokeai.app.invocations.primitives import MaskOutput
@invocation(
"rectangle_mask",
title="Create Rectangle Mask",
tags=["conditioning"],
category="conditioning",
version="1.0.1",
)
class RectangleMaskInvocation(BaseInvocation, WithMetadata):
"""Create a rectangular mask."""
width: int = InputField(description="The width of the entire mask.")
height: int = InputField(description="The height of the entire mask.")
x_left: int = InputField(description="The left x-coordinate of the rectangular masked region (inclusive).")
y_top: int = InputField(description="The top y-coordinate of the rectangular masked region (inclusive).")
rectangle_width: int = InputField(description="The width of the rectangular masked region.")
rectangle_height: int = InputField(description="The height of the rectangular masked region.")
def invoke(self, context: InvocationContext) -> MaskOutput:
mask = torch.zeros((1, self.height, self.width), dtype=torch.bool)
mask[:, self.y_top : self.y_top + self.rectangle_height, self.x_left : self.x_left + self.rectangle_width] = (
True
)
mask_tensor_name = context.tensors.save(mask)
return MaskOutput(
mask=TensorField(tensor_name=mask_tensor_name),
width=self.width,
height=self.height,
)

View File

@ -15,6 +15,7 @@ from invokeai.app.invocations.fields import (
InputField,
LatentsField,
OutputField,
TensorField,
UIComponent,
)
from invokeai.app.services.images.images_common import ImageDTO
@ -405,9 +406,19 @@ class ColorInvocation(BaseInvocation):
# endregion
# region Conditioning
@invocation_output("mask_output")
class MaskOutput(BaseInvocationOutput):
"""A torch mask tensor."""
mask: TensorField = OutputField(description="The mask.")
width: int = OutputField(description="The width of the mask in pixels.")
height: int = OutputField(description="The height of the mask in pixels.")
@invocation_output("conditioning_output")
class ConditioningOutput(BaseInvocationOutput):
"""Base class for nodes that output a single conditioning tensor"""

View File

@ -270,7 +270,7 @@ class DownloadQueueService(DownloadQueueServiceBase):
job.dest.parent.mkdir(parents=True, exist_ok=True)
job.download_path = job.dest
assert job.download_path
assert job.download_path is not None
# Don't clobber an existing file. See commit 82c2c85202f88c6d24ff84710f297cfc6ae174af
# for code that instead resumes an interrupted download.
@ -280,6 +280,9 @@ class DownloadQueueService(DownloadQueueServiceBase):
# append ".downloading" to the path
in_progress_path = self._in_progress_path(job.download_path)
# catch rare race condition that is appearing in unit tests.
assert in_progress_path.parent.exists(), f"Directory doesn't exist! in_progress_path={in_progress_path}; parent={in_progress_path.parent}"
# signal caller that the download is starting. At this point, key fields such as
# download_path and total_bytes will be populated. We call it here because the might
# discover that the local file is already complete and generate a COMPLETED status.

View File

@ -86,6 +86,12 @@ class DefaultSessionProcessor(SessionProcessorBase):
self._poll_now()
elif event_name == "batch_enqueued":
self._poll_now()
elif event_name == "queue_item_status_changed" and event[1]["data"]["queue_item"]["status"] in [
"completed",
"failed",
"canceled",
]:
self._poll_now()
def resume(self) -> SessionProcessorStatus:
if not self._resume_event.is_set():

View File

@ -245,6 +245,18 @@ class ImagesInterface(InvocationContextInterface):
"""
return self._services.images.get_dto(image_name)
def get_path(self, image_name: str, thumbnail: bool = False) -> Path:
"""Gets the internal path to an image or thumbnail.
Args:
image_name: The name of the image to get the path of.
thumbnail: Get the path of the thumbnail instead of the full image
Returns:
The local path of the image or thumbnail.
"""
return self._services.images.get_path(image_name, thumbnail)
class TensorsInterface(InvocationContextInterface):
def save(self, tensor: Tensor) -> str:

View File

@ -1,182 +0,0 @@
# copied from https://github.com/tencent-ailab/IP-Adapter (Apache License 2.0)
# and modified as needed
# tencent-ailab comment:
# modified from https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py
import torch
import torch.nn as nn
import torch.nn.functional as F
from diffusers.models.attention_processor import AttnProcessor2_0 as DiffusersAttnProcessor2_0
from invokeai.backend.ip_adapter.ip_attention_weights import IPAttentionProcessorWeights
# Create a version of AttnProcessor2_0 that is a sub-class of nn.Module. This is required for IP-Adapter state_dict
# loading.
class AttnProcessor2_0(DiffusersAttnProcessor2_0, nn.Module):
def __init__(self):
DiffusersAttnProcessor2_0.__init__(self)
nn.Module.__init__(self)
def __call__(
self,
attn,
hidden_states,
encoder_hidden_states=None,
attention_mask=None,
temb=None,
ip_adapter_image_prompt_embeds=None,
):
"""Re-definition of DiffusersAttnProcessor2_0.__call__(...) that accepts and ignores the
ip_adapter_image_prompt_embeds parameter.
"""
return DiffusersAttnProcessor2_0.__call__(
self, attn, hidden_states, encoder_hidden_states, attention_mask, temb
)
class IPAttnProcessor2_0(torch.nn.Module):
r"""
Attention processor for IP-Adapater for PyTorch 2.0.
Args:
hidden_size (`int`):
The hidden size of the attention layer.
cross_attention_dim (`int`):
The number of channels in the `encoder_hidden_states`.
scale (`float`, defaults to 1.0):
the weight scale of image prompt.
"""
def __init__(self, weights: list[IPAttentionProcessorWeights], scales: list[float]):
super().__init__()
if not hasattr(F, "scaled_dot_product_attention"):
raise ImportError("AttnProcessor2_0 requires PyTorch 2.0, to use it, please upgrade PyTorch to 2.0.")
assert len(weights) == len(scales)
self._weights = weights
self._scales = scales
def __call__(
self,
attn,
hidden_states,
encoder_hidden_states=None,
attention_mask=None,
temb=None,
ip_adapter_image_prompt_embeds=None,
):
"""Apply IP-Adapter attention.
Args:
ip_adapter_image_prompt_embeds (torch.Tensor): The image prompt embeddings.
Shape: (batch_size, num_ip_images, seq_len, ip_embedding_len).
"""
residual = hidden_states
if attn.spatial_norm is not None:
hidden_states = attn.spatial_norm(hidden_states, temb)
input_ndim = hidden_states.ndim
if input_ndim == 4:
batch_size, channel, height, width = hidden_states.shape
hidden_states = hidden_states.view(batch_size, channel, height * width).transpose(1, 2)
batch_size, sequence_length, _ = (
hidden_states.shape if encoder_hidden_states is None else encoder_hidden_states.shape
)
if attention_mask is not None:
attention_mask = attn.prepare_attention_mask(attention_mask, sequence_length, batch_size)
# scaled_dot_product_attention expects attention_mask shape to be
# (batch, heads, source_length, target_length)
attention_mask = attention_mask.view(batch_size, attn.heads, -1, attention_mask.shape[-1])
if attn.group_norm is not None:
hidden_states = attn.group_norm(hidden_states.transpose(1, 2)).transpose(1, 2)
query = attn.to_q(hidden_states)
if encoder_hidden_states is None:
encoder_hidden_states = hidden_states
elif attn.norm_cross:
encoder_hidden_states = attn.norm_encoder_hidden_states(encoder_hidden_states)
key = attn.to_k(encoder_hidden_states)
value = attn.to_v(encoder_hidden_states)
inner_dim = key.shape[-1]
head_dim = inner_dim // attn.heads
query = query.view(batch_size, -1, attn.heads, head_dim).transpose(1, 2)
key = key.view(batch_size, -1, attn.heads, head_dim).transpose(1, 2)
value = value.view(batch_size, -1, attn.heads, head_dim).transpose(1, 2)
# the output of sdp = (batch, num_heads, seq_len, head_dim)
# TODO: add support for attn.scale when we move to Torch 2.1
hidden_states = F.scaled_dot_product_attention(
query, key, value, attn_mask=attention_mask, dropout_p=0.0, is_causal=False
)
hidden_states = hidden_states.transpose(1, 2).reshape(batch_size, -1, attn.heads * head_dim)
hidden_states = hidden_states.to(query.dtype)
if encoder_hidden_states is not None:
# If encoder_hidden_states is not None, then we are doing cross-attention, not self-attention. In this case,
# we will apply IP-Adapter conditioning. We validate the inputs for IP-Adapter conditioning here.
assert ip_adapter_image_prompt_embeds is not None
assert len(ip_adapter_image_prompt_embeds) == len(self._weights)
for ipa_embed, ipa_weights, scale in zip(
ip_adapter_image_prompt_embeds, self._weights, self._scales, strict=True
):
# The batch dimensions should match.
assert ipa_embed.shape[0] == encoder_hidden_states.shape[0]
# The token_len dimensions should match.
assert ipa_embed.shape[-1] == encoder_hidden_states.shape[-1]
ip_hidden_states = ipa_embed
# Expected ip_hidden_state shape: (batch_size, num_ip_images, ip_seq_len, ip_image_embedding)
ip_key = ipa_weights.to_k_ip(ip_hidden_states)
ip_value = ipa_weights.to_v_ip(ip_hidden_states)
# Expected ip_key and ip_value shape: (batch_size, num_ip_images, ip_seq_len, head_dim * num_heads)
ip_key = ip_key.view(batch_size, -1, attn.heads, head_dim).transpose(1, 2)
ip_value = ip_value.view(batch_size, -1, attn.heads, head_dim).transpose(1, 2)
# Expected ip_key and ip_value shape: (batch_size, num_heads, num_ip_images * ip_seq_len, head_dim)
# TODO: add support for attn.scale when we move to Torch 2.1
ip_hidden_states = F.scaled_dot_product_attention(
query, ip_key, ip_value, attn_mask=None, dropout_p=0.0, is_causal=False
)
# Expected ip_hidden_states shape: (batch_size, num_heads, query_seq_len, head_dim)
ip_hidden_states = ip_hidden_states.transpose(1, 2).reshape(batch_size, -1, attn.heads * head_dim)
ip_hidden_states = ip_hidden_states.to(query.dtype)
# Expected ip_hidden_states shape: (batch_size, query_seq_len, num_heads * head_dim)
hidden_states = hidden_states + scale * ip_hidden_states
# linear proj
hidden_states = attn.to_out[0](hidden_states)
# dropout
hidden_states = attn.to_out[1](hidden_states)
if input_ndim == 4:
hidden_states = hidden_states.transpose(-1, -2).reshape(batch_size, channel, height, width)
if attn.residual_connection:
hidden_states = hidden_states + residual
hidden_states = hidden_states / attn.rescale_output_factor
return hidden_states

View File

@ -37,7 +37,7 @@ class ModelLoader(ModelLoaderBase):
self._logger = logger
self._ram_cache = ram_cache
self._convert_cache = convert_cache
self._torch_dtype = torch_dtype(choose_torch_device(), app_config)
self._torch_dtype = torch_dtype(choose_torch_device())
def load_model(self, model_config: AnyModelConfig, submodel_type: Optional[SubModelType] = None) -> LoadedModel:
"""

View File

@ -117,7 +117,7 @@ class ModelCacheBase(ABC, Generic[T]):
@property
@abstractmethod
def stats(self) -> CacheStats:
def stats(self) -> Optional[CacheStats]:
"""Return collected CacheStats object."""
pass

View File

@ -269,12 +269,14 @@ class ModelCache(ModelCacheBase[AnyModel]):
if torch.device(source_device).type == torch.device(target_device).type:
return
# may raise an exception here if insufficient GPU VRAM
self._check_free_vram(target_device, cache_entry.size)
start_model_to_time = time.time()
snapshot_before = self._capture_memory_snapshot()
cache_entry.model.to(target_device)
try:
cache_entry.model.to(target_device)
except Exception as e: # blow away cache entry
self._delete_cache_entry(cache_entry)
raise e
snapshot_after = self._capture_memory_snapshot()
end_model_to_time = time.time()
self.logger.debug(
@ -329,11 +331,11 @@ class ModelCache(ModelCacheBase[AnyModel]):
f" {in_ram_models}/{in_vram_models}({locked_in_vram_models})"
)
def make_room(self, model_size: int) -> None:
def make_room(self, size: int) -> None:
"""Make enough room in the cache to accommodate a new model of indicated size."""
# calculate how much memory this model will require
# multiplier = 2 if self.precision==torch.float32 else 1
bytes_needed = model_size
bytes_needed = size
maximum_size = self.max_cache_size * GIG # stored in GB, convert to bytes
current_size = self.cache_size()
@ -388,12 +390,11 @@ class ModelCache(ModelCacheBase[AnyModel]):
# 1 from onnx runtime object
if not cache_entry.locked and refs <= (3 if "onnx" in model_key else 2):
self.logger.debug(
f"Removing {model_key} from RAM cache to free at least {(model_size/GIG):.2f} GB (-{(cache_entry.size/GIG):.2f} GB)"
f"Removing {model_key} from RAM cache to free at least {(size/GIG):.2f} GB (-{(cache_entry.size/GIG):.2f} GB)"
)
current_size -= cache_entry.size
models_cleared += 1
del self._cache_stack[pos]
del self._cached_models[model_key]
self._delete_cache_entry(cache_entry)
del cache_entry
else:
@ -421,23 +422,6 @@ class ModelCache(ModelCacheBase[AnyModel]):
self.logger.debug(f"After making room: cached_models={len(self._cached_models)}")
def _free_vram(self, device: torch.device) -> int:
vram_device = ( # mem_get_info() needs an indexed device
device if device.index is not None else torch.device(str(device), index=0)
)
free_mem, _ = torch.cuda.mem_get_info(vram_device)
for _, cache_entry in self._cached_models.items():
if cache_entry.loaded and not cache_entry.locked:
free_mem += cache_entry.size
return free_mem
def _check_free_vram(self, target_device: torch.device, needed_size: int) -> None:
if target_device.type != "cuda":
return
free_mem = self._free_vram(target_device)
if needed_size > free_mem:
needed_gb = round(needed_size / GIG, 2)
free_gb = round(free_mem / GIG, 2)
raise torch.cuda.OutOfMemoryError(
f"Insufficient VRAM to load model, requested {needed_gb}GB but only had {free_gb}GB free"
)
def _delete_cache_entry(self, cache_entry: CacheRecord[AnyModel]) -> None:
self._cache_stack.remove(cache_entry.key)
del self._cached_models[cache_entry.key]

View File

@ -21,10 +21,12 @@ from pydantic import Field
from transformers import CLIPFeatureExtractor, CLIPTextModel, CLIPTokenizer
from invokeai.app.services.config.config_default import get_config
from invokeai.backend.ip_adapter.ip_adapter import IPAdapter
from invokeai.backend.ip_adapter.unet_patcher import UNetPatcher
from invokeai.backend.stable_diffusion.diffusion.conditioning_data import ConditioningData
from invokeai.backend.stable_diffusion.diffusion.conditioning_data import (
IPAdapterData,
TextConditioningData,
)
from invokeai.backend.stable_diffusion.diffusion.shared_invokeai_diffusion import InvokeAIDiffuserComponent
from invokeai.backend.stable_diffusion.diffusion.unet_attention_patcher import UNetAttentionPatcher
from invokeai.backend.util.attention import auto_detect_slice_size
from invokeai.backend.util.devices import normalize_device
@ -149,16 +151,6 @@ class ControlNetData:
resize_mode: str = Field(default="just_resize")
@dataclass
class IPAdapterData:
ip_adapter_model: IPAdapter = Field(default=None)
# TODO: change to polymorphic so can do different weights per step (once implemented...)
weight: Union[float, List[float]] = Field(default=1.0)
# weight: float = Field(default=1.0)
begin_step_percent: float = Field(default=0.0)
end_step_percent: float = Field(default=1.0)
@dataclass
class T2IAdapterData:
"""A structure containing the information required to apply conditioning from a single T2I-Adapter model."""
@ -295,7 +287,8 @@ class StableDiffusionGeneratorPipeline(StableDiffusionPipeline):
self,
latents: torch.Tensor,
num_inference_steps: int,
conditioning_data: ConditioningData,
scheduler_step_kwargs: dict[str, Any],
conditioning_data: TextConditioningData,
*,
noise: Optional[torch.Tensor],
timesteps: torch.Tensor,
@ -308,7 +301,7 @@ class StableDiffusionGeneratorPipeline(StableDiffusionPipeline):
mask: Optional[torch.Tensor] = None,
masked_latents: Optional[torch.Tensor] = None,
gradient_mask: Optional[bool] = False,
seed: Optional[int] = None,
seed: int,
) -> torch.Tensor:
if init_timestep.shape[0] == 0:
return latents
@ -326,20 +319,6 @@ class StableDiffusionGeneratorPipeline(StableDiffusionPipeline):
latents = self.scheduler.add_noise(latents, noise, batched_t)
if mask is not None:
# if no noise provided, noisify unmasked area based on seed(or 0 as fallback)
if noise is None:
noise = torch.randn(
orig_latents.shape,
dtype=torch.float32,
device="cpu",
generator=torch.Generator(device="cpu").manual_seed(seed or 0),
).to(device=orig_latents.device, dtype=orig_latents.dtype)
latents = self.scheduler.add_noise(latents, noise, batched_t)
latents = torch.lerp(
orig_latents, latents.to(dtype=orig_latents.dtype), mask.to(dtype=orig_latents.dtype)
)
if is_inpainting_model(self.unet):
if masked_latents is None:
raise Exception("Source image required for inpaint mask when inpaint model used!")
@ -348,6 +327,15 @@ class StableDiffusionGeneratorPipeline(StableDiffusionPipeline):
self._unet_forward, mask, masked_latents
)
else:
# if no noise provided, noisify unmasked area based on seed
if noise is None:
noise = torch.randn(
orig_latents.shape,
dtype=torch.float32,
device="cpu",
generator=torch.Generator(device="cpu").manual_seed(seed),
).to(device=orig_latents.device, dtype=orig_latents.dtype)
additional_guidance.append(AddsMaskGuidance(mask, orig_latents, self.scheduler, noise, gradient_mask))
try:
@ -355,6 +343,7 @@ class StableDiffusionGeneratorPipeline(StableDiffusionPipeline):
latents,
timesteps,
conditioning_data,
scheduler_step_kwargs=scheduler_step_kwargs,
additional_guidance=additional_guidance,
control_data=control_data,
ip_adapter_data=ip_adapter_data,
@ -380,7 +369,8 @@ class StableDiffusionGeneratorPipeline(StableDiffusionPipeline):
self,
latents: torch.Tensor,
timesteps,
conditioning_data: ConditioningData,
conditioning_data: TextConditioningData,
scheduler_step_kwargs: dict[str, Any],
*,
additional_guidance: List[Callable] = None,
control_data: List[ControlNetData] = None,
@ -397,22 +387,17 @@ class StableDiffusionGeneratorPipeline(StableDiffusionPipeline):
if timesteps.shape[0] == 0:
return latents
ip_adapter_unet_patcher = None
extra_conditioning_info = conditioning_data.text_embeddings.extra_conditioning
if extra_conditioning_info is not None and extra_conditioning_info.wants_cross_attention_control:
attn_ctx = self.invokeai_diffuser.custom_attention_context(
self.invokeai_diffuser.model,
extra_conditioning_info=extra_conditioning_info,
)
self.use_ip_adapter = False
elif ip_adapter_data is not None:
# TODO(ryand): Should we raise an exception if both custom attention and IP-Adapter attention are active?
# As it is now, the IP-Adapter will silently be skipped.
ip_adapter_unet_patcher = UNetPatcher([ipa.ip_adapter_model for ipa in ip_adapter_data])
attn_ctx = ip_adapter_unet_patcher.apply_ip_adapter_attention(self.invokeai_diffuser.model)
self.use_ip_adapter = True
else:
attn_ctx = nullcontext()
use_ip_adapter = ip_adapter_data is not None
use_regional_prompting = (
conditioning_data.cond_regions is not None or conditioning_data.uncond_regions is not None
)
unet_attention_patcher = None
self.use_ip_adapter = use_ip_adapter
attn_ctx = nullcontext()
if use_ip_adapter or use_regional_prompting:
ip_adapters = [ipa.ip_adapter_model for ipa in ip_adapter_data] if use_ip_adapter else None
unet_attention_patcher = UNetAttentionPatcher(ip_adapters)
attn_ctx = unet_attention_patcher.apply_ip_adapter_attention(self.invokeai_diffuser.model)
with attn_ctx:
if callback is not None:
@ -435,11 +420,11 @@ class StableDiffusionGeneratorPipeline(StableDiffusionPipeline):
conditioning_data,
step_index=i,
total_step_count=len(timesteps),
scheduler_step_kwargs=scheduler_step_kwargs,
additional_guidance=additional_guidance,
control_data=control_data,
ip_adapter_data=ip_adapter_data,
t2i_adapter_data=t2i_adapter_data,
ip_adapter_unet_patcher=ip_adapter_unet_patcher,
)
latents = step_output.prev_sample
predicted_original = getattr(step_output, "pred_original_sample", None)
@ -463,14 +448,14 @@ class StableDiffusionGeneratorPipeline(StableDiffusionPipeline):
self,
t: torch.Tensor,
latents: torch.Tensor,
conditioning_data: ConditioningData,
conditioning_data: TextConditioningData,
step_index: int,
total_step_count: int,
scheduler_step_kwargs: dict[str, Any],
additional_guidance: List[Callable] = None,
control_data: List[ControlNetData] = None,
ip_adapter_data: Optional[list[IPAdapterData]] = None,
t2i_adapter_data: Optional[list[T2IAdapterData]] = None,
ip_adapter_unet_patcher: Optional[UNetPatcher] = None,
):
# invokeai_diffuser has batched timesteps, but diffusers schedulers expect a single value
timestep = t[0]
@ -485,23 +470,6 @@ class StableDiffusionGeneratorPipeline(StableDiffusionPipeline):
# i.e. before or after passing it to InvokeAIDiffuserComponent
latent_model_input = self.scheduler.scale_model_input(latents, timestep)
# handle IP-Adapter
if self.use_ip_adapter and ip_adapter_data is not None: # somewhat redundant but logic is clearer
for i, single_ip_adapter_data in enumerate(ip_adapter_data):
first_adapter_step = math.floor(single_ip_adapter_data.begin_step_percent * total_step_count)
last_adapter_step = math.ceil(single_ip_adapter_data.end_step_percent * total_step_count)
weight = (
single_ip_adapter_data.weight[step_index]
if isinstance(single_ip_adapter_data.weight, List)
else single_ip_adapter_data.weight
)
if step_index >= first_adapter_step and step_index <= last_adapter_step:
# Only apply this IP-Adapter if the current step is within the IP-Adapter's begin/end step range.
ip_adapter_unet_patcher.set_scale(i, weight)
else:
# Otherwise, set the IP-Adapter's scale to 0, so it has no effect.
ip_adapter_unet_patcher.set_scale(i, 0.0)
# Handle ControlNet(s)
down_block_additional_residuals = None
mid_block_additional_residual = None
@ -550,6 +518,7 @@ class StableDiffusionGeneratorPipeline(StableDiffusionPipeline):
step_index=step_index,
total_step_count=total_step_count,
conditioning_data=conditioning_data,
ip_adapter_data=ip_adapter_data,
down_block_additional_residuals=down_block_additional_residuals, # for ControlNet
mid_block_additional_residual=mid_block_additional_residual, # for ControlNet
down_intrablock_additional_residuals=down_intrablock_additional_residuals, # for T2I-Adapter
@ -569,7 +538,7 @@ class StableDiffusionGeneratorPipeline(StableDiffusionPipeline):
)
# compute the previous noisy sample x_t -> x_t-1
step_output = self.scheduler.step(noise_pred, timestep, latents, **conditioning_data.scheduler_args)
step_output = self.scheduler.step(noise_pred, timestep, latents, **scheduler_step_kwargs)
# TODO: discuss injection point options. For now this is a patch to get progress images working with inpainting again.
for guidance in additional_guidance:

View File

@ -1,27 +1,17 @@
import dataclasses
import inspect
from dataclasses import dataclass, field
from typing import Any, List, Optional, Union
import math
from dataclasses import dataclass
from typing import List, Optional, Union
import torch
from .cross_attention_control import Arguments
@dataclass
class ExtraConditioningInfo:
tokens_count_including_eos_bos: int
cross_attention_control_args: Optional[Arguments] = None
@property
def wants_cross_attention_control(self):
return self.cross_attention_control_args is not None
from invokeai.backend.ip_adapter.ip_adapter import IPAdapter
@dataclass
class BasicConditioningInfo:
"""SD 1/2 text conditioning information produced by Compel."""
embeds: torch.Tensor
extra_conditioning: Optional[ExtraConditioningInfo]
def to(self, device, dtype=None):
self.embeds = self.embeds.to(device=device, dtype=dtype)
@ -35,6 +25,8 @@ class ConditioningFieldData:
@dataclass
class SDXLConditioningInfo(BasicConditioningInfo):
"""SDXL text conditioning information produced by Compel."""
pooled_embeds: torch.Tensor
add_time_ids: torch.Tensor
@ -57,37 +49,74 @@ class IPAdapterConditioningInfo:
@dataclass
class ConditioningData:
unconditioned_embeddings: BasicConditioningInfo
text_embeddings: BasicConditioningInfo
"""
Guidance scale as defined in [Classifier-Free Diffusion Guidance](https://arxiv.org/abs/2207.12598).
`guidance_scale` is defined as `w` of equation 2. of [Imagen Paper](https://arxiv.org/pdf/2205.11487.pdf).
Guidance scale is enabled by setting `guidance_scale > 1`. Higher guidance scale encourages to generate
images that are closely linked to the text `prompt`, usually at the expense of lower image quality.
"""
guidance_scale: Union[float, List[float]]
""" for models trained using zero-terminal SNR ("ztsnr"), it's suggested to use guidance_rescale_multiplier of 0.7 .
ref [Common Diffusion Noise Schedules and Sample Steps are Flawed](https://arxiv.org/pdf/2305.08891.pdf)
"""
guidance_rescale_multiplier: float = 0
scheduler_args: dict[str, Any] = field(default_factory=dict)
class IPAdapterData:
ip_adapter_model: IPAdapter
ip_adapter_conditioning: IPAdapterConditioningInfo
mask: torch.Tensor
ip_adapter_conditioning: Optional[list[IPAdapterConditioningInfo]] = None
# Either a single weight applied to all steps, or a list of weights for each step.
weight: Union[float, List[float]] = 1.0
begin_step_percent: float = 0.0
end_step_percent: float = 1.0
@property
def dtype(self):
return self.text_embeddings.dtype
def scale_for_step(self, step_index: int, total_steps: int) -> float:
first_adapter_step = math.floor(self.begin_step_percent * total_steps)
last_adapter_step = math.ceil(self.end_step_percent * total_steps)
weight = self.weight[step_index] if isinstance(self.weight, List) else self.weight
if step_index >= first_adapter_step and step_index <= last_adapter_step:
# Only apply this IP-Adapter if the current step is within the IP-Adapter's begin/end step range.
return weight
# Otherwise, set the IP-Adapter's scale to 0, so it has no effect.
return 0.0
def add_scheduler_args_if_applicable(self, scheduler, **kwargs):
scheduler_args = dict(self.scheduler_args)
step_method = inspect.signature(scheduler.step)
for name, value in kwargs.items():
try:
step_method.bind_partial(**{name: value})
except TypeError:
# FIXME: don't silently discard arguments
pass # debug("%s does not accept argument named %r", scheduler, name)
else:
scheduler_args[name] = value
return dataclasses.replace(self, scheduler_args=scheduler_args)
@dataclass
class Range:
start: int
end: int
class TextConditioningRegions:
def __init__(
self,
masks: torch.Tensor,
ranges: list[Range],
):
# A binary mask indicating the regions of the image that the prompt should be applied to.
# Shape: (1, num_prompts, height, width)
# Dtype: torch.bool
self.masks = masks
# A list of ranges indicating the start and end indices of the embeddings that corresponding mask applies to.
# ranges[i] contains the embedding range for the i'th prompt / mask.
self.ranges = ranges
assert self.masks.shape[1] == len(self.ranges)
class TextConditioningData:
def __init__(
self,
uncond_text: Union[BasicConditioningInfo, SDXLConditioningInfo],
cond_text: Union[BasicConditioningInfo, SDXLConditioningInfo],
uncond_regions: Optional[TextConditioningRegions],
cond_regions: Optional[TextConditioningRegions],
guidance_scale: Union[float, List[float]],
guidance_rescale_multiplier: float = 0,
):
self.uncond_text = uncond_text
self.cond_text = cond_text
self.uncond_regions = uncond_regions
self.cond_regions = cond_regions
# Guidance scale as defined in [Classifier-Free Diffusion Guidance](https://arxiv.org/abs/2207.12598).
# `guidance_scale` is defined as `w` of equation 2. of [Imagen Paper](https://arxiv.org/pdf/2205.11487.pdf).
# Guidance scale is enabled by setting `guidance_scale > 1`. Higher guidance scale encourages to generate
# images that are closely linked to the text `prompt`, usually at the expense of lower image quality.
self.guidance_scale = guidance_scale
# For models trained using zero-terminal SNR ("ztsnr"), it's suggested to use guidance_rescale_multiplier of 0.7.
# See [Common Diffusion Noise Schedules and Sample Steps are Flawed](https://arxiv.org/pdf/2305.08891.pdf).
self.guidance_rescale_multiplier = guidance_rescale_multiplier
def is_sdxl(self):
assert isinstance(self.uncond_text, SDXLConditioningInfo) == isinstance(self.cond_text, SDXLConditioningInfo)
return isinstance(self.cond_text, SDXLConditioningInfo)

View File

@ -1,218 +0,0 @@
# adapted from bloc97's CrossAttentionControl colab
# https://github.com/bloc97/CrossAttentionControl
import enum
from dataclasses import dataclass, field
from typing import Optional
import torch
from compel.cross_attention_control import Arguments
from diffusers.models.attention_processor import Attention, SlicedAttnProcessor
from diffusers.models.unets.unet_2d_condition import UNet2DConditionModel
from invokeai.backend.util.devices import torch_dtype
class CrossAttentionType(enum.Enum):
SELF = 1
TOKENS = 2
class CrossAttnControlContext:
def __init__(self, arguments: Arguments):
"""
:param arguments: Arguments for the cross-attention control process
"""
self.cross_attention_mask: Optional[torch.Tensor] = None
self.cross_attention_index_map: Optional[torch.Tensor] = None
self.arguments = arguments
def get_active_cross_attention_control_types_for_step(
self, percent_through: float = None
) -> list[CrossAttentionType]:
"""
Should cross-attention control be applied on the given step?
:param percent_through: How far through the step sequence are we (0.0=pure noise, 1.0=completely denoised image). Expected range 0.0..<1.0.
:return: A list of attention types that cross-attention control should be performed for on the given step. May be [].
"""
if percent_through is None:
return [CrossAttentionType.SELF, CrossAttentionType.TOKENS]
opts = self.arguments.edit_options
to_control = []
if opts["s_start"] <= percent_through < opts["s_end"]:
to_control.append(CrossAttentionType.SELF)
if opts["t_start"] <= percent_through < opts["t_end"]:
to_control.append(CrossAttentionType.TOKENS)
return to_control
def setup_cross_attention_control_attention_processors(unet: UNet2DConditionModel, context: CrossAttnControlContext):
"""
Inject attention parameters and functions into the passed in model to enable cross attention editing.
:param model: The unet model to inject into.
:return: None
"""
# adapted from init_attention_edit
device = context.arguments.edited_conditioning.device
# urgh. should this be hardcoded?
max_length = 77
# mask=1 means use base prompt attention, mask=0 means use edited prompt attention
mask = torch.zeros(max_length, dtype=torch_dtype(device))
indices_target = torch.arange(max_length, dtype=torch.long)
indices = torch.arange(max_length, dtype=torch.long)
for name, a0, a1, b0, b1 in context.arguments.edit_opcodes:
if b0 < max_length:
if name == "equal": # or (name == "replace" and a1 - a0 == b1 - b0):
# these tokens have not been edited
indices[b0:b1] = indices_target[a0:a1]
mask[b0:b1] = 1
context.cross_attention_mask = mask.to(device)
context.cross_attention_index_map = indices.to(device)
old_attn_processors = unet.attn_processors
if torch.backends.mps.is_available():
# see note in StableDiffusionGeneratorPipeline.__init__ about borked slicing on MPS
unet.set_attn_processor(SwapCrossAttnProcessor())
else:
# try to re-use an existing slice size
default_slice_size = 4
slice_size = next(
(p.slice_size for p in old_attn_processors.values() if type(p) is SlicedAttnProcessor), default_slice_size
)
unet.set_attn_processor(SlicedSwapCrossAttnProcesser(slice_size=slice_size))
@dataclass
class SwapCrossAttnContext:
modified_text_embeddings: torch.Tensor
index_map: torch.Tensor # maps from original prompt token indices to the equivalent tokens in the modified prompt
mask: torch.Tensor # in the target space of the index_map
cross_attention_types_to_do: list[CrossAttentionType] = field(default_factory=list)
def wants_cross_attention_control(self, attn_type: CrossAttentionType) -> bool:
return attn_type in self.cross_attention_types_to_do
@classmethod
def make_mask_and_index_map(
cls, edit_opcodes: list[tuple[str, int, int, int, int]], max_length: int
) -> tuple[torch.Tensor, torch.Tensor]:
# mask=1 means use original prompt attention, mask=0 means use modified prompt attention
mask = torch.zeros(max_length)
indices_target = torch.arange(max_length, dtype=torch.long)
indices = torch.arange(max_length, dtype=torch.long)
for name, a0, a1, b0, b1 in edit_opcodes:
if b0 < max_length:
if name == "equal":
# these tokens remain the same as in the original prompt
indices[b0:b1] = indices_target[a0:a1]
mask[b0:b1] = 1
return mask, indices
class SlicedSwapCrossAttnProcesser(SlicedAttnProcessor):
# TODO: dynamically pick slice size based on memory conditions
def __call__(
self,
attn: Attention,
hidden_states,
encoder_hidden_states=None,
attention_mask=None,
# kwargs
swap_cross_attn_context: SwapCrossAttnContext = None,
**kwargs,
):
attention_type = CrossAttentionType.SELF if encoder_hidden_states is None else CrossAttentionType.TOKENS
# if cross-attention control is not in play, just call through to the base implementation.
if (
attention_type is CrossAttentionType.SELF
or swap_cross_attn_context is None
or not swap_cross_attn_context.wants_cross_attention_control(attention_type)
):
# print(f"SwapCrossAttnContext for {attention_type} not active - passing request to superclass")
return super().__call__(attn, hidden_states, encoder_hidden_states, attention_mask)
# else:
# print(f"SwapCrossAttnContext for {attention_type} active")
batch_size, sequence_length, _ = hidden_states.shape
attention_mask = attn.prepare_attention_mask(
attention_mask=attention_mask,
target_length=sequence_length,
batch_size=batch_size,
)
query = attn.to_q(hidden_states)
dim = query.shape[-1]
query = attn.head_to_batch_dim(query)
original_text_embeddings = encoder_hidden_states
modified_text_embeddings = swap_cross_attn_context.modified_text_embeddings
original_text_key = attn.to_k(original_text_embeddings)
modified_text_key = attn.to_k(modified_text_embeddings)
original_value = attn.to_v(original_text_embeddings)
modified_value = attn.to_v(modified_text_embeddings)
original_text_key = attn.head_to_batch_dim(original_text_key)
modified_text_key = attn.head_to_batch_dim(modified_text_key)
original_value = attn.head_to_batch_dim(original_value)
modified_value = attn.head_to_batch_dim(modified_value)
# compute slices and prepare output tensor
batch_size_attention = query.shape[0]
hidden_states = torch.zeros(
(batch_size_attention, sequence_length, dim // attn.heads),
device=query.device,
dtype=query.dtype,
)
# do slices
for i in range(max(1, hidden_states.shape[0] // self.slice_size)):
start_idx = i * self.slice_size
end_idx = (i + 1) * self.slice_size
query_slice = query[start_idx:end_idx]
original_key_slice = original_text_key[start_idx:end_idx]
modified_key_slice = modified_text_key[start_idx:end_idx]
attn_mask_slice = attention_mask[start_idx:end_idx] if attention_mask is not None else None
original_attn_slice = attn.get_attention_scores(query_slice, original_key_slice, attn_mask_slice)
modified_attn_slice = attn.get_attention_scores(query_slice, modified_key_slice, attn_mask_slice)
# because the prompt modifications may result in token sequences shifted forwards or backwards,
# the original attention probabilities must be remapped to account for token index changes in the
# modified prompt
remapped_original_attn_slice = torch.index_select(
original_attn_slice, -1, swap_cross_attn_context.index_map
)
# only some tokens taken from the original attention probabilities. this is controlled by the mask.
mask = swap_cross_attn_context.mask
inverse_mask = 1 - mask
attn_slice = remapped_original_attn_slice * mask + modified_attn_slice * inverse_mask
del remapped_original_attn_slice, modified_attn_slice
attn_slice = torch.bmm(attn_slice, modified_value[start_idx:end_idx])
hidden_states[start_idx:end_idx] = attn_slice
# done
hidden_states = attn.batch_to_head_dim(hidden_states)
# linear proj
hidden_states = attn.to_out[0](hidden_states)
# dropout
hidden_states = attn.to_out[1](hidden_states)
return hidden_states
class SwapCrossAttnProcessor(SlicedSwapCrossAttnProcesser):
def __init__(self):
super(SwapCrossAttnProcessor, self).__init__(slice_size=int(1e9)) # massive slice size = don't slice

View File

@ -0,0 +1,198 @@
from typing import Optional
import torch
import torch.nn.functional as F
from diffusers.models.attention_processor import Attention, AttnProcessor2_0
from invokeai.backend.ip_adapter.ip_attention_weights import IPAttentionProcessorWeights
from invokeai.backend.stable_diffusion.diffusion.regional_ip_data import RegionalIPData
from invokeai.backend.stable_diffusion.diffusion.regional_prompt_data import RegionalPromptData
class CustomAttnProcessor2_0(AttnProcessor2_0):
"""A custom implementation of AttnProcessor2_0 that supports additional Invoke features.
This implementation is based on
https://github.com/huggingface/diffusers/blame/fcfa270fbd1dc294e2f3a505bae6bcb791d721c3/src/diffusers/models/attention_processor.py#L1204
Supported custom features:
- IP-Adapter
- Regional prompt attention
"""
def __init__(
self,
ip_adapter_weights: Optional[list[IPAttentionProcessorWeights]] = None,
):
"""Initialize a CustomAttnProcessor2_0.
Note: Arguments that are the same for all attention layers are passed to __call__(). Arguments that are
layer-specific are passed to __init__().
Args:
ip_adapter_weights: The IP-Adapter attention weights. ip_adapter_weights[i] contains the attention weights
for the i'th IP-Adapter.
"""
super().__init__()
self._ip_adapter_weights = ip_adapter_weights
def _is_ip_adapter_enabled(self) -> bool:
return self._ip_adapter_weights is not None
def __call__(
self,
attn: Attention,
hidden_states: torch.FloatTensor,
encoder_hidden_states: Optional[torch.FloatTensor] = None,
attention_mask: Optional[torch.FloatTensor] = None,
temb: Optional[torch.FloatTensor] = None,
# For regional prompting:
regional_prompt_data: Optional[RegionalPromptData] = None,
percent_through: Optional[torch.FloatTensor] = None,
# For IP-Adapter:
regional_ip_data: Optional[RegionalIPData] = None,
) -> torch.FloatTensor:
"""Apply attention.
Args:
regional_prompt_data: The regional prompt data for the current batch. If not None, this will be used to
apply regional prompt masking.
regional_ip_data: The IP-Adapter data for the current batch.
"""
# If true, we are doing cross-attention, if false we are doing self-attention.
is_cross_attention = encoder_hidden_states is not None
# Start unmodified block from AttnProcessor2_0.
# vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
residual = hidden_states
if attn.spatial_norm is not None:
hidden_states = attn.spatial_norm(hidden_states, temb)
input_ndim = hidden_states.ndim
if input_ndim == 4:
batch_size, channel, height, width = hidden_states.shape
hidden_states = hidden_states.view(batch_size, channel, height * width).transpose(1, 2)
batch_size, sequence_length, _ = (
hidden_states.shape if encoder_hidden_states is None else encoder_hidden_states.shape
)
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# End unmodified block from AttnProcessor2_0.
_, query_seq_len, _ = hidden_states.shape
# Handle regional prompt attention masks.
if regional_prompt_data is not None and is_cross_attention:
assert percent_through is not None
prompt_region_attention_mask = regional_prompt_data.get_cross_attn_mask(
query_seq_len=query_seq_len, key_seq_len=sequence_length
)
if attention_mask is None:
attention_mask = prompt_region_attention_mask
else:
attention_mask = prompt_region_attention_mask + attention_mask
# Start unmodified block from AttnProcessor2_0.
# vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
if attention_mask is not None:
attention_mask = attn.prepare_attention_mask(attention_mask, sequence_length, batch_size)
# scaled_dot_product_attention expects attention_mask shape to be
# (batch, heads, source_length, target_length)
attention_mask = attention_mask.view(batch_size, attn.heads, -1, attention_mask.shape[-1])
if attn.group_norm is not None:
hidden_states = attn.group_norm(hidden_states.transpose(1, 2)).transpose(1, 2)
query = attn.to_q(hidden_states)
if encoder_hidden_states is None:
encoder_hidden_states = hidden_states
elif attn.norm_cross:
encoder_hidden_states = attn.norm_encoder_hidden_states(encoder_hidden_states)
key = attn.to_k(encoder_hidden_states)
value = attn.to_v(encoder_hidden_states)
inner_dim = key.shape[-1]
head_dim = inner_dim // attn.heads
query = query.view(batch_size, -1, attn.heads, head_dim).transpose(1, 2)
key = key.view(batch_size, -1, attn.heads, head_dim).transpose(1, 2)
value = value.view(batch_size, -1, attn.heads, head_dim).transpose(1, 2)
# the output of sdp = (batch, num_heads, seq_len, head_dim)
# TODO: add support for attn.scale when we move to Torch 2.1
hidden_states = F.scaled_dot_product_attention(
query, key, value, attn_mask=attention_mask, dropout_p=0.0, is_causal=False
)
hidden_states = hidden_states.transpose(1, 2).reshape(batch_size, -1, attn.heads * head_dim)
hidden_states = hidden_states.to(query.dtype)
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# End unmodified block from AttnProcessor2_0.
# Apply IP-Adapter conditioning.
if is_cross_attention:
if self._is_ip_adapter_enabled():
assert regional_ip_data is not None
ip_masks = regional_ip_data.get_masks(query_seq_len=query_seq_len)
assert (
len(regional_ip_data.image_prompt_embeds)
== len(self._ip_adapter_weights)
== len(regional_ip_data.scales)
== ip_masks.shape[1]
)
for ipa_index, ipa_embed in enumerate(regional_ip_data.image_prompt_embeds):
ipa_weights = self._ip_adapter_weights[ipa_index]
ipa_scale = regional_ip_data.scales[ipa_index]
ip_mask = ip_masks[0, ipa_index, ...]
# The batch dimensions should match.
assert ipa_embed.shape[0] == encoder_hidden_states.shape[0]
# The token_len dimensions should match.
assert ipa_embed.shape[-1] == encoder_hidden_states.shape[-1]
ip_hidden_states = ipa_embed
# Expected ip_hidden_state shape: (batch_size, num_ip_images, ip_seq_len, ip_image_embedding)
ip_key = ipa_weights.to_k_ip(ip_hidden_states)
ip_value = ipa_weights.to_v_ip(ip_hidden_states)
# Expected ip_key and ip_value shape: (batch_size, num_ip_images, ip_seq_len, head_dim * num_heads)
ip_key = ip_key.view(batch_size, -1, attn.heads, head_dim).transpose(1, 2)
ip_value = ip_value.view(batch_size, -1, attn.heads, head_dim).transpose(1, 2)
# Expected ip_key and ip_value shape: (batch_size, num_heads, num_ip_images * ip_seq_len, head_dim)
# TODO: add support for attn.scale when we move to Torch 2.1
ip_hidden_states = F.scaled_dot_product_attention(
query, ip_key, ip_value, attn_mask=None, dropout_p=0.0, is_causal=False
)
# Expected ip_hidden_states shape: (batch_size, num_heads, query_seq_len, head_dim)
ip_hidden_states = ip_hidden_states.transpose(1, 2).reshape(batch_size, -1, attn.heads * head_dim)
ip_hidden_states = ip_hidden_states.to(query.dtype)
# Expected ip_hidden_states shape: (batch_size, query_seq_len, num_heads * head_dim)
hidden_states = hidden_states + ipa_scale * ip_hidden_states * ip_mask
else:
# If IP-Adapter is not enabled, then regional_ip_data should not be passed in.
assert regional_ip_data is None
# Start unmodified block from AttnProcessor2_0.
# vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
# linear proj
hidden_states = attn.to_out[0](hidden_states)
# dropout
hidden_states = attn.to_out[1](hidden_states)
if input_ndim == 4:
hidden_states = hidden_states.transpose(-1, -2).reshape(batch_size, channel, height, width)
if attn.residual_connection:
hidden_states = hidden_states + residual
hidden_states = hidden_states / attn.rescale_output_factor
return hidden_states

View File

@ -0,0 +1,72 @@
import torch
class RegionalIPData:
"""A class to manage the data for regional IP-Adapter conditioning."""
def __init__(
self,
image_prompt_embeds: list[torch.Tensor],
scales: list[float],
masks: list[torch.Tensor],
dtype: torch.dtype,
device: torch.device,
max_downscale_factor: int = 8,
):
"""Initialize a `IPAdapterConditioningData` object."""
assert len(image_prompt_embeds) == len(scales) == len(masks)
# The image prompt embeddings.
# regional_ip_data[i] contains the image prompt embeddings for the i'th IP-Adapter. Each tensor
# has shape (batch_size, num_ip_images, seq_len, ip_embedding_len).
self.image_prompt_embeds = image_prompt_embeds
# The scales for the IP-Adapter attention.
# scales[i] contains the attention scale for the i'th IP-Adapter.
self.scales = scales
# The IP-Adapter masks.
# self._masks_by_seq_len[s] contains the spatial masks for the downsampling level with query sequence length of
# s. It has shape (batch_size, num_ip_images, query_seq_len, 1). The masks have values of 1.0 for included
# regions and 0.0 for excluded regions.
self._masks_by_seq_len = self._prepare_masks(masks, max_downscale_factor, device, dtype)
def _prepare_masks(
self, masks: list[torch.Tensor], max_downscale_factor: int, device: torch.device, dtype: torch.dtype
) -> dict[int, torch.Tensor]:
"""Prepare the masks for the IP-Adapter attention."""
# Concatenate the masks so that they can be processed more efficiently.
mask_tensor = torch.cat(masks, dim=1)
mask_tensor = mask_tensor.to(device=device, dtype=dtype)
masks_by_seq_len: dict[int, torch.Tensor] = {}
# Downsample the spatial dimensions by factors of 2 until max_downscale_factor is reached.
downscale_factor = 1
while downscale_factor <= max_downscale_factor:
b, num_ip_adapters, h, w = mask_tensor.shape
# Assert that the batch size is 1, because I haven't thought through batch handling for this feature yet.
assert b == 1
# The IP-Adapters are applied in the cross-attention layers, where the query sequence length is the h * w of
# the spatial features.
query_seq_len = h * w
masks_by_seq_len[query_seq_len] = mask_tensor.view((b, num_ip_adapters, -1, 1))
downscale_factor *= 2
if downscale_factor <= max_downscale_factor:
# We use max pooling because we downscale to a pretty low resolution, so we don't want small mask
# regions to be lost entirely.
#
# ceil_mode=True is set to mirror the downsampling behavior of SD and SDXL.
#
# TODO(ryand): In the future, we may want to experiment with other downsampling methods.
mask_tensor = torch.nn.functional.max_pool2d(mask_tensor, kernel_size=2, stride=2, ceil_mode=True)
return masks_by_seq_len
def get_masks(self, query_seq_len: int) -> torch.Tensor:
"""Get the mask for the given query sequence length."""
return self._masks_by_seq_len[query_seq_len]

View File

@ -0,0 +1,105 @@
import torch
import torch.nn.functional as F
from invokeai.backend.stable_diffusion.diffusion.conditioning_data import (
TextConditioningRegions,
)
class RegionalPromptData:
"""A class to manage the prompt data for regional conditioning."""
def __init__(
self,
regions: list[TextConditioningRegions],
device: torch.device,
dtype: torch.dtype,
max_downscale_factor: int = 8,
):
"""Initialize a `RegionalPromptData` object.
Args:
regions (list[TextConditioningRegions]): regions[i] contains the prompt regions for the i'th sample in the
batch.
device (torch.device): The device to use for the attention masks.
dtype (torch.dtype): The data type to use for the attention masks.
max_downscale_factor: Spatial masks will be prepared for downscale factors from 1 to max_downscale_factor
in steps of 2x.
"""
self._regions = regions
self._device = device
self._dtype = dtype
# self._spatial_masks_by_seq_len[b][s] contains the spatial masks for the b'th batch sample with a query
# sequence length of s.
self._spatial_masks_by_seq_len: list[dict[int, torch.Tensor]] = self._prepare_spatial_masks(
regions, max_downscale_factor
)
self._negative_cross_attn_mask_score = -10000.0
def _prepare_spatial_masks(
self, regions: list[TextConditioningRegions], max_downscale_factor: int = 8
) -> list[dict[int, torch.Tensor]]:
"""Prepare the spatial masks for all downscaling factors."""
# batch_masks_by_seq_len[b][s] contains the spatial masks for the b'th batch sample with a query sequence length
# of s.
batch_sample_masks_by_seq_len: list[dict[int, torch.Tensor]] = []
for batch_sample_regions in regions:
batch_sample_masks_by_seq_len.append({})
batch_sample_masks = batch_sample_regions.masks.to(device=self._device, dtype=self._dtype)
# Downsample the spatial dimensions by factors of 2 until max_downscale_factor is reached.
downscale_factor = 1
while downscale_factor <= max_downscale_factor:
b, _num_prompts, h, w = batch_sample_masks.shape
assert b == 1
query_seq_len = h * w
batch_sample_masks_by_seq_len[-1][query_seq_len] = batch_sample_masks
downscale_factor *= 2
if downscale_factor <= max_downscale_factor:
# We use max pooling because we downscale to a pretty low resolution, so we don't want small prompt
# regions to be lost entirely.
#
# ceil_mode=True is set to mirror the downsampling behavior of SD and SDXL.
#
# TODO(ryand): In the future, we may want to experiment with other downsampling methods (e.g.
# nearest interpolation), and could potentially use a weighted mask rather than a binary mask.
batch_sample_masks = F.max_pool2d(batch_sample_masks, kernel_size=2, stride=2, ceil_mode=True)
return batch_sample_masks_by_seq_len
def get_cross_attn_mask(self, query_seq_len: int, key_seq_len: int) -> torch.Tensor:
"""Get the cross-attention mask for the given query sequence length.
Args:
query_seq_len: The length of the flattened spatial features at the current downscaling level.
key_seq_len (int): The sequence length of the prompt embeddings (which act as the key in the cross-attention
layers). This is most likely equal to the max embedding range end, but we pass it explicitly to be sure.
Returns:
torch.Tensor: The cross-attention score mask.
shape: (batch_size, query_seq_len, key_seq_len).
dtype: float
"""
batch_size = len(self._spatial_masks_by_seq_len)
batch_spatial_masks = [self._spatial_masks_by_seq_len[b][query_seq_len] for b in range(batch_size)]
# Create an empty attention mask with the correct shape.
attn_mask = torch.zeros((batch_size, query_seq_len, key_seq_len), dtype=self._dtype, device=self._device)
for batch_idx in range(batch_size):
batch_sample_spatial_masks = batch_spatial_masks[batch_idx]
batch_sample_regions = self._regions[batch_idx]
# Flatten the spatial dimensions of the mask by reshaping to (1, num_prompts, query_seq_len, 1).
_, num_prompts, _, _ = batch_sample_spatial_masks.shape
batch_sample_query_masks = batch_sample_spatial_masks.view((1, num_prompts, query_seq_len, 1))
for prompt_idx, embedding_range in enumerate(batch_sample_regions.ranges):
batch_sample_query_scores = batch_sample_query_masks[0, prompt_idx, :, :].clone()
batch_sample_query_mask = batch_sample_query_scores > 0.5
batch_sample_query_scores[batch_sample_query_mask] = 0.0
batch_sample_query_scores[~batch_sample_query_mask] = self._negative_cross_attn_mask_score
attn_mask[batch_idx, :, embedding_range.start : embedding_range.end] = batch_sample_query_scores
return attn_mask

View File

@ -1,26 +1,20 @@
from __future__ import annotations
import math
from contextlib import contextmanager
from typing import Any, Callable, Optional, Union
import torch
from diffusers import UNet2DConditionModel
from typing_extensions import TypeAlias
from invokeai.app.services.config.config_default import get_config
from invokeai.backend.stable_diffusion.diffusion.conditioning_data import (
ConditioningData,
ExtraConditioningInfo,
SDXLConditioningInfo,
)
from .cross_attention_control import (
CrossAttentionType,
CrossAttnControlContext,
SwapCrossAttnContext,
setup_cross_attention_control_attention_processors,
IPAdapterData,
Range,
TextConditioningData,
TextConditioningRegions,
)
from invokeai.backend.stable_diffusion.diffusion.regional_ip_data import RegionalIPData
from invokeai.backend.stable_diffusion.diffusion.regional_prompt_data import RegionalPromptData
ModelForwardCallback: TypeAlias = Union[
# x, t, conditioning, Optional[cross-attention kwargs]
@ -58,31 +52,8 @@ class InvokeAIDiffuserComponent:
self.conditioning = None
self.model = model
self.model_forward_callback = model_forward_callback
self.cross_attention_control_context = None
self.sequential_guidance = config.sequential_guidance
@contextmanager
def custom_attention_context(
self,
unet: UNet2DConditionModel,
extra_conditioning_info: Optional[ExtraConditioningInfo],
):
old_attn_processors = unet.attn_processors
try:
self.cross_attention_control_context = CrossAttnControlContext(
arguments=extra_conditioning_info.cross_attention_control_args,
)
setup_cross_attention_control_attention_processors(
unet,
self.cross_attention_control_context,
)
yield None
finally:
self.cross_attention_control_context = None
unet.set_attn_processor(old_attn_processors)
def do_controlnet_step(
self,
control_data,
@ -90,7 +61,7 @@ class InvokeAIDiffuserComponent:
timestep: torch.Tensor,
step_index: int,
total_step_count: int,
conditioning_data,
conditioning_data: TextConditioningData,
):
down_block_res_samples, mid_block_res_sample = None, None
@ -123,28 +94,28 @@ class InvokeAIDiffuserComponent:
added_cond_kwargs = None
if cfg_injection: # only applying ControlNet to conditional instead of in unconditioned
if type(conditioning_data.text_embeddings) is SDXLConditioningInfo:
if conditioning_data.is_sdxl():
added_cond_kwargs = {
"text_embeds": conditioning_data.text_embeddings.pooled_embeds,
"time_ids": conditioning_data.text_embeddings.add_time_ids,
"text_embeds": conditioning_data.cond_text.pooled_embeds,
"time_ids": conditioning_data.cond_text.add_time_ids,
}
encoder_hidden_states = conditioning_data.text_embeddings.embeds
encoder_hidden_states = conditioning_data.cond_text.embeds
encoder_attention_mask = None
else:
if type(conditioning_data.text_embeddings) is SDXLConditioningInfo:
if conditioning_data.is_sdxl():
added_cond_kwargs = {
"text_embeds": torch.cat(
[
# TODO: how to pad? just by zeros? or even truncate?
conditioning_data.unconditioned_embeddings.pooled_embeds,
conditioning_data.text_embeddings.pooled_embeds,
conditioning_data.uncond_text.pooled_embeds,
conditioning_data.cond_text.pooled_embeds,
],
dim=0,
),
"time_ids": torch.cat(
[
conditioning_data.unconditioned_embeddings.add_time_ids,
conditioning_data.text_embeddings.add_time_ids,
conditioning_data.uncond_text.add_time_ids,
conditioning_data.cond_text.add_time_ids,
],
dim=0,
),
@ -153,8 +124,8 @@ class InvokeAIDiffuserComponent:
encoder_hidden_states,
encoder_attention_mask,
) = self._concat_conditionings_for_batch(
conditioning_data.unconditioned_embeddings.embeds,
conditioning_data.text_embeddings.embeds,
conditioning_data.uncond_text.embeds,
conditioning_data.cond_text.embeds,
)
if isinstance(control_datum.weight, list):
# if controlnet has multiple weights, use the weight for the current step
@ -198,24 +169,15 @@ class InvokeAIDiffuserComponent:
self,
sample: torch.Tensor,
timestep: torch.Tensor,
conditioning_data: ConditioningData,
conditioning_data: TextConditioningData,
ip_adapter_data: Optional[list[IPAdapterData]],
step_index: int,
total_step_count: int,
down_block_additional_residuals: Optional[torch.Tensor] = None, # for ControlNet
mid_block_additional_residual: Optional[torch.Tensor] = None, # for ControlNet
down_intrablock_additional_residuals: Optional[torch.Tensor] = None, # for T2I-Adapter
):
cross_attention_control_types_to_do = []
if self.cross_attention_control_context is not None:
percent_through = step_index / total_step_count
cross_attention_control_types_to_do = (
self.cross_attention_control_context.get_active_cross_attention_control_types_for_step(percent_through)
)
wants_cross_attention_control = len(cross_attention_control_types_to_do) > 0
if wants_cross_attention_control or self.sequential_guidance:
# If wants_cross_attention_control is True, we force the sequential mode to be used, because cross-attention
# control is currently only supported in sequential mode.
if self.sequential_guidance:
(
unconditioned_next_x,
conditioned_next_x,
@ -223,7 +185,9 @@ class InvokeAIDiffuserComponent:
x=sample,
sigma=timestep,
conditioning_data=conditioning_data,
cross_attention_control_types_to_do=cross_attention_control_types_to_do,
ip_adapter_data=ip_adapter_data,
step_index=step_index,
total_step_count=total_step_count,
down_block_additional_residuals=down_block_additional_residuals,
mid_block_additional_residual=mid_block_additional_residual,
down_intrablock_additional_residuals=down_intrablock_additional_residuals,
@ -236,6 +200,9 @@ class InvokeAIDiffuserComponent:
x=sample,
sigma=timestep,
conditioning_data=conditioning_data,
ip_adapter_data=ip_adapter_data,
step_index=step_index,
total_step_count=total_step_count,
down_block_additional_residuals=down_block_additional_residuals,
mid_block_additional_residual=mid_block_additional_residual,
down_intrablock_additional_residuals=down_intrablock_additional_residuals,
@ -294,53 +261,84 @@ class InvokeAIDiffuserComponent:
def _apply_standard_conditioning(
self,
x,
sigma,
conditioning_data: ConditioningData,
x: torch.Tensor,
sigma: torch.Tensor,
conditioning_data: TextConditioningData,
ip_adapter_data: Optional[list[IPAdapterData]],
step_index: int,
total_step_count: int,
down_block_additional_residuals: Optional[torch.Tensor] = None, # for ControlNet
mid_block_additional_residual: Optional[torch.Tensor] = None, # for ControlNet
down_intrablock_additional_residuals: Optional[torch.Tensor] = None, # for T2I-Adapter
):
) -> tuple[torch.Tensor, torch.Tensor]:
"""Runs the conditioned and unconditioned UNet forward passes in a single batch for faster inference speed at
the cost of higher memory usage.
"""
x_twice = torch.cat([x] * 2)
sigma_twice = torch.cat([sigma] * 2)
cross_attention_kwargs = None
if conditioning_data.ip_adapter_conditioning is not None:
cross_attention_kwargs = {}
if ip_adapter_data is not None:
ip_adapter_conditioning = [ipa.ip_adapter_conditioning for ipa in ip_adapter_data]
# Note that we 'stack' to produce tensors of shape (batch_size, num_ip_images, seq_len, token_len).
cross_attention_kwargs = {
"ip_adapter_image_prompt_embeds": [
torch.stack(
[ipa_conditioning.uncond_image_prompt_embeds, ipa_conditioning.cond_image_prompt_embeds]
)
for ipa_conditioning in conditioning_data.ip_adapter_conditioning
]
}
image_prompt_embeds = [
torch.stack([ipa_conditioning.uncond_image_prompt_embeds, ipa_conditioning.cond_image_prompt_embeds])
for ipa_conditioning in ip_adapter_conditioning
]
scales = [ipa.scale_for_step(step_index, total_step_count) for ipa in ip_adapter_data]
ip_masks = [ipa.mask for ipa in ip_adapter_data]
regional_ip_data = RegionalIPData(
image_prompt_embeds=image_prompt_embeds, scales=scales, masks=ip_masks, dtype=x.dtype, device=x.device
)
cross_attention_kwargs["regional_ip_data"] = regional_ip_data
added_cond_kwargs = None
if type(conditioning_data.text_embeddings) is SDXLConditioningInfo:
if conditioning_data.is_sdxl():
added_cond_kwargs = {
"text_embeds": torch.cat(
[
# TODO: how to pad? just by zeros? or even truncate?
conditioning_data.unconditioned_embeddings.pooled_embeds,
conditioning_data.text_embeddings.pooled_embeds,
conditioning_data.uncond_text.pooled_embeds,
conditioning_data.cond_text.pooled_embeds,
],
dim=0,
),
"time_ids": torch.cat(
[
conditioning_data.unconditioned_embeddings.add_time_ids,
conditioning_data.text_embeddings.add_time_ids,
conditioning_data.uncond_text.add_time_ids,
conditioning_data.cond_text.add_time_ids,
],
dim=0,
),
}
if conditioning_data.cond_regions is not None or conditioning_data.uncond_regions is not None:
# TODO(ryand): We currently initialize RegionalPromptData for every denoising step. The text conditionings
# and masks are not changing from step-to-step, so this really only needs to be done once. While this seems
# painfully inefficient, the time spent is typically negligible compared to the forward inference pass of
# the UNet. The main reason that this hasn't been moved up to eliminate redundancy is that it is slightly
# awkward to handle both standard conditioning and sequential conditioning further up the stack.
regions = []
for c, r in [
(conditioning_data.uncond_text, conditioning_data.uncond_regions),
(conditioning_data.cond_text, conditioning_data.cond_regions),
]:
if r is None:
# Create a dummy mask and range for text conditioning that doesn't have region masks.
_, _, h, w = x.shape
r = TextConditioningRegions(
masks=torch.ones((1, 1, h, w), dtype=x.dtype),
ranges=[Range(start=0, end=c.embeds.shape[1])],
)
regions.append(r)
cross_attention_kwargs["regional_prompt_data"] = RegionalPromptData(
regions=regions, device=x.device, dtype=x.dtype
)
cross_attention_kwargs["percent_through"] = step_index / total_step_count
both_conditionings, encoder_attention_mask = self._concat_conditionings_for_batch(
conditioning_data.unconditioned_embeddings.embeds, conditioning_data.text_embeddings.embeds
conditioning_data.uncond_text.embeds, conditioning_data.cond_text.embeds
)
both_results = self.model_forward_callback(
x_twice,
@ -360,8 +358,10 @@ class InvokeAIDiffuserComponent:
self,
x: torch.Tensor,
sigma,
conditioning_data: ConditioningData,
cross_attention_control_types_to_do: list[CrossAttentionType],
conditioning_data: TextConditioningData,
ip_adapter_data: Optional[list[IPAdapterData]],
step_index: int,
total_step_count: int,
down_block_additional_residuals: Optional[torch.Tensor] = None, # for ControlNet
mid_block_additional_residual: Optional[torch.Tensor] = None, # for ControlNet
down_intrablock_additional_residuals: Optional[torch.Tensor] = None, # for T2I-Adapter
@ -391,53 +391,48 @@ class InvokeAIDiffuserComponent:
if mid_block_additional_residual is not None:
uncond_mid_block, cond_mid_block = mid_block_additional_residual.chunk(2)
# If cross-attention control is enabled, prepare the SwapCrossAttnContext.
cross_attn_processor_context = None
if self.cross_attention_control_context is not None:
# Note that the SwapCrossAttnContext is initialized with an empty list of cross_attention_types_to_do.
# This list is empty because cross-attention control is not applied in the unconditioned pass. This field
# will be populated before the conditioned pass.
cross_attn_processor_context = SwapCrossAttnContext(
modified_text_embeddings=self.cross_attention_control_context.arguments.edited_conditioning,
index_map=self.cross_attention_control_context.cross_attention_index_map,
mask=self.cross_attention_control_context.cross_attention_mask,
cross_attention_types_to_do=[],
)
#####################
# Unconditioned pass
#####################
cross_attention_kwargs = None
cross_attention_kwargs = {}
# Prepare IP-Adapter cross-attention kwargs for the unconditioned pass.
if conditioning_data.ip_adapter_conditioning is not None:
if ip_adapter_data is not None:
ip_adapter_conditioning = [ipa.ip_adapter_conditioning for ipa in ip_adapter_data]
# Note that we 'unsqueeze' to produce tensors of shape (batch_size=1, num_ip_images, seq_len, token_len).
cross_attention_kwargs = {
"ip_adapter_image_prompt_embeds": [
torch.unsqueeze(ipa_conditioning.uncond_image_prompt_embeds, dim=0)
for ipa_conditioning in conditioning_data.ip_adapter_conditioning
]
}
image_prompt_embeds = [
torch.unsqueeze(ipa_conditioning.uncond_image_prompt_embeds, dim=0)
for ipa_conditioning in ip_adapter_conditioning
]
# Prepare cross-attention control kwargs for the unconditioned pass.
if cross_attn_processor_context is not None:
cross_attention_kwargs = {"swap_cross_attn_context": cross_attn_processor_context}
scales = [ipa.scale_for_step(step_index, total_step_count) for ipa in ip_adapter_data]
ip_masks = [ipa.mask for ipa in ip_adapter_data]
regional_ip_data = RegionalIPData(
image_prompt_embeds=image_prompt_embeds, scales=scales, masks=ip_masks, dtype=x.dtype, device=x.device
)
cross_attention_kwargs["regional_ip_data"] = regional_ip_data
# Prepare SDXL conditioning kwargs for the unconditioned pass.
added_cond_kwargs = None
is_sdxl = type(conditioning_data.text_embeddings) is SDXLConditioningInfo
if is_sdxl:
if conditioning_data.is_sdxl():
added_cond_kwargs = {
"text_embeds": conditioning_data.unconditioned_embeddings.pooled_embeds,
"time_ids": conditioning_data.unconditioned_embeddings.add_time_ids,
"text_embeds": conditioning_data.uncond_text.pooled_embeds,
"time_ids": conditioning_data.uncond_text.add_time_ids,
}
# Prepare prompt regions for the unconditioned pass.
if conditioning_data.uncond_regions is not None:
cross_attention_kwargs["regional_prompt_data"] = RegionalPromptData(
regions=[conditioning_data.uncond_regions], device=x.device, dtype=x.dtype
)
cross_attention_kwargs["percent_through"] = step_index / total_step_count
# Run unconditioned UNet denoising (i.e. negative prompt).
unconditioned_next_x = self.model_forward_callback(
x,
sigma,
conditioning_data.unconditioned_embeddings.embeds,
conditioning_data.uncond_text.embeds,
cross_attention_kwargs=cross_attention_kwargs,
down_block_additional_residuals=uncond_down_block,
mid_block_additional_residual=uncond_mid_block,
@ -449,36 +444,43 @@ class InvokeAIDiffuserComponent:
# Conditioned pass
###################
cross_attention_kwargs = None
cross_attention_kwargs = {}
# Prepare IP-Adapter cross-attention kwargs for the conditioned pass.
if conditioning_data.ip_adapter_conditioning is not None:
if ip_adapter_data is not None:
ip_adapter_conditioning = [ipa.ip_adapter_conditioning for ipa in ip_adapter_data]
# Note that we 'unsqueeze' to produce tensors of shape (batch_size=1, num_ip_images, seq_len, token_len).
cross_attention_kwargs = {
"ip_adapter_image_prompt_embeds": [
torch.unsqueeze(ipa_conditioning.cond_image_prompt_embeds, dim=0)
for ipa_conditioning in conditioning_data.ip_adapter_conditioning
]
}
image_prompt_embeds = [
torch.unsqueeze(ipa_conditioning.cond_image_prompt_embeds, dim=0)
for ipa_conditioning in ip_adapter_conditioning
]
# Prepare cross-attention control kwargs for the conditioned pass.
if cross_attn_processor_context is not None:
cross_attn_processor_context.cross_attention_types_to_do = cross_attention_control_types_to_do
cross_attention_kwargs = {"swap_cross_attn_context": cross_attn_processor_context}
scales = [ipa.scale_for_step(step_index, total_step_count) for ipa in ip_adapter_data]
ip_masks = [ipa.mask for ipa in ip_adapter_data]
regional_ip_data = RegionalIPData(
image_prompt_embeds=image_prompt_embeds, scales=scales, masks=ip_masks, dtype=x.dtype, device=x.device
)
cross_attention_kwargs["regional_ip_data"] = regional_ip_data
# Prepare SDXL conditioning kwargs for the conditioned pass.
added_cond_kwargs = None
if is_sdxl:
if conditioning_data.is_sdxl():
added_cond_kwargs = {
"text_embeds": conditioning_data.text_embeddings.pooled_embeds,
"time_ids": conditioning_data.text_embeddings.add_time_ids,
"text_embeds": conditioning_data.cond_text.pooled_embeds,
"time_ids": conditioning_data.cond_text.add_time_ids,
}
# Prepare prompt regions for the conditioned pass.
if conditioning_data.cond_regions is not None:
cross_attention_kwargs["regional_prompt_data"] = RegionalPromptData(
regions=[conditioning_data.cond_regions], device=x.device, dtype=x.dtype
)
cross_attention_kwargs["percent_through"] = step_index / total_step_count
# Run conditioned UNet denoising (i.e. positive prompt).
conditioned_next_x = self.model_forward_callback(
x,
sigma,
conditioning_data.text_embeddings.embeds,
conditioning_data.cond_text.embeds,
cross_attention_kwargs=cross_attention_kwargs,
down_block_additional_residuals=cond_down_block,
mid_block_additional_residual=cond_mid_block,

View File

@ -1,52 +1,46 @@
from contextlib import contextmanager
from typing import Optional
from diffusers.models import UNet2DConditionModel
from invokeai.backend.ip_adapter.attention_processor import AttnProcessor2_0, IPAttnProcessor2_0
from invokeai.backend.ip_adapter.ip_adapter import IPAdapter
from invokeai.backend.stable_diffusion.diffusion.custom_atttention import CustomAttnProcessor2_0
class UNetPatcher:
"""A class that contains multiple IP-Adapters and can apply them to a UNet."""
class UNetAttentionPatcher:
"""A class for patching a UNet with CustomAttnProcessor2_0 attention layers."""
def __init__(self, ip_adapters: list[IPAdapter]):
def __init__(self, ip_adapters: Optional[list[IPAdapter]]):
self._ip_adapters = ip_adapters
self._scales = [1.0] * len(self._ip_adapters)
def set_scale(self, idx: int, value: float):
self._scales[idx] = value
def _prepare_attention_processors(self, unet: UNet2DConditionModel):
"""Prepare a dict of attention processors that can be injected into a unet, and load the IP-Adapter attention
weights into them.
weights into them (if IP-Adapters are being applied).
Note that the `unet` param is only used to determine attention block dimensions and naming.
"""
# Construct a dict of attention processors based on the UNet's architecture.
attn_procs = {}
for idx, name in enumerate(unet.attn_processors.keys()):
if name.endswith("attn1.processor"):
attn_procs[name] = AttnProcessor2_0()
if name.endswith("attn1.processor") or self._ip_adapters is None:
# "attn1" processors do not use IP-Adapters.
attn_procs[name] = CustomAttnProcessor2_0()
else:
# Collect the weights from each IP Adapter for the idx'th attention processor.
attn_procs[name] = IPAttnProcessor2_0(
attn_procs[name] = CustomAttnProcessor2_0(
[ip_adapter.attn_weights.get_attention_processor_weights(idx) for ip_adapter in self._ip_adapters],
self._scales,
)
return attn_procs
@contextmanager
def apply_ip_adapter_attention(self, unet: UNet2DConditionModel):
"""A context manager that patches `unet` with IP-Adapter attention processors."""
"""A context manager that patches `unet` with CustomAttnProcessor2_0 attention layers."""
attn_procs = self._prepare_attention_processors(unet)
orig_attn_processors = unet.attn_processors
try:
# Note to future devs: set_attn_processor(...) does something slightly unexpected - it pops elements from the
# passed dict. So, if you wanted to keep the dict for future use, you'd have to make a moderately-shallow copy
# of it. E.g. `attn_procs_copy = {k: v for k, v in attn_procs.items()}`.
# Note to future devs: set_attn_processor(...) does something slightly unexpected - it pops elements from
# the passed dict. So, if you wanted to keep the dict for future use, you'd have to make a
# moderately-shallow copy of it. E.g. `attn_procs_copy = {k: v for k, v in attn_procs.items()}`.
unet.set_attn_processor(attn_procs)
yield None
finally:

View File

@ -6,8 +6,7 @@ from typing import Literal, Optional, Union
import torch
from torch import autocast
from invokeai.app.services.config import InvokeAIAppConfig
from invokeai.app.services.config.config_default import get_config
from invokeai.app.services.config.config_default import PRECISION, get_config
CPU_DEVICE = torch.device("cpu")
CUDA_DEVICE = torch.device("cuda")
@ -33,35 +32,34 @@ def get_torch_device_name() -> str:
return torch.cuda.get_device_name(device) if device.type == "cuda" else device.type.upper()
# We are in transition here from using a single global AppConfig to allowing multiple
# configurations. It is strongly recommended to pass the app_config to this function.
def choose_precision(
device: torch.device, app_config: Optional[InvokeAIAppConfig] = None
) -> Literal["float32", "float16", "bfloat16"]:
def choose_precision(device: torch.device) -> Literal["float32", "float16", "bfloat16"]:
"""Return an appropriate precision for the given torch device."""
app_config = app_config or get_config()
app_config = get_config()
if device.type == "cuda":
device_name = torch.cuda.get_device_name(device)
if not ("GeForce GTX 1660" in device_name or "GeForce GTX 1650" in device_name):
if app_config.precision == "float32":
return "float32"
elif app_config.precision == "bfloat16":
return "bfloat16"
else:
return "float16"
if "GeForce GTX 1660" in device_name or "GeForce GTX 1650" in device_name:
# These GPUs have limited support for float16
return "float32"
elif app_config.precision == "auto" or app_config.precision == "autocast":
# Default to float16 for CUDA devices
return "float16"
else:
# Use the user-defined precision
return app_config.precision
elif device.type == "mps":
return "float16"
if app_config.precision == "auto" or app_config.precision == "autocast":
# Default to float16 for MPS devices
return "float16"
else:
# Use the user-defined precision
return app_config.precision
# CPU / safe fallback
return "float32"
# We are in transition here from using a single global AppConfig to allowing multiple
# configurations. It is strongly recommended to pass the app_config to this function.
def torch_dtype(
device: Optional[torch.device] = None,
app_config: Optional[InvokeAIAppConfig] = None,
) -> torch.dtype:
def torch_dtype(device: Optional[torch.device] = None) -> torch.dtype:
device = device or choose_torch_device()
precision = choose_precision(device, app_config)
precision = choose_precision(device)
if precision == "float16":
return torch.float16
if precision == "bfloat16":
@ -71,7 +69,7 @@ def torch_dtype(
return torch.float32
def choose_autocast(precision):
def choose_autocast(precision: PRECISION):
"""Returns an autocast context or nullcontext for the given precision string"""
# float16 currently requires autocast to avoid errors like:
# 'expected scalar type Half but found Float'

View File

@ -0,0 +1,53 @@
import torch
def to_standard_mask_dim(mask: torch.Tensor) -> torch.Tensor:
"""Standardize the dimensions of a mask tensor.
Args:
mask (torch.Tensor): A mask tensor. The shape can be (1, h, w) or (h, w).
Returns:
torch.Tensor: The output mask tensor. The shape is (1, h, w).
"""
# Get the mask height and width.
if mask.ndim == 2:
mask = mask.unsqueeze(0)
elif mask.ndim == 3 and mask.shape[0] == 1:
pass
else:
raise ValueError(f"Unsupported mask shape: {mask.shape}. Expected (1, h, w) or (h, w).")
return mask
def to_standard_float_mask(mask: torch.Tensor, out_dtype: torch.dtype) -> torch.Tensor:
"""Standardize the format of a mask tensor.
Args:
mask (torch.Tensor): A mask tensor. The dtype can be any bool, float, or int type. The shape must be (1, h, w)
or (h, w).
out_dtype (torch.dtype): The dtype of the output mask tensor. Must be a float type.
Returns:
torch.Tensor: The output mask tensor. The dtype is out_dtype. The shape is (1, h, w). All values are either 0.0
or 1.0.
"""
if not out_dtype.is_floating_point:
raise ValueError(f"out_dtype must be a float type, but got {out_dtype}")
mask = to_standard_mask_dim(mask)
mask = mask.to(out_dtype)
# Set masked regions to 1.0.
if mask.dtype == torch.bool:
mask = mask.to(out_dtype)
else:
mask = mask.to(out_dtype)
mask_region = mask > 0.5
mask[mask_region] = 1.0
mask[~mask_region] = 0.0
return mask

View File

@ -8,7 +8,7 @@
<meta http-equiv="Pragma" content="no-cache">
<meta http-equiv="Expires" content="0">
<title>Invoke - Community Edition</title>
<link rel="icon" type="icon" href="assets/images/invoke-favicon.svg" />
<link id="invoke-favicon" rel="icon" type="icon" href="assets/images/invoke-favicon.svg" />
<style>
html,
body {
@ -23,4 +23,4 @@
<script type="module" src="/src/main.tsx"></script>
</body>
</html>
</html>

View File

@ -1,6 +1,7 @@
import type { KnipConfig } from 'knip';
const config: KnipConfig = {
project: ['src/**/*.{ts,tsx}!'],
ignore: [
// This file is only used during debugging
'src/app/store/middleware/debugLoggerMiddleware.ts',
@ -10,6 +11,9 @@ const config: KnipConfig = {
'src/features/nodes/types/v2/**',
],
ignoreBinaries: ['only-allow'],
paths: {
'public/*': ['public/*'],
},
};
export default config;

View File

@ -24,7 +24,7 @@
"build": "pnpm run lint && vite build",
"typegen": "node scripts/typegen.js",
"preview": "vite preview",
"lint:knip": "knip --tags=-@knipignore",
"lint:knip": "knip",
"lint:dpdm": "dpdm --no-warning --no-tree --transform --exit-code circular:1 src/main.tsx",
"lint:eslint": "eslint --max-warnings=0 .",
"lint:prettier": "prettier --check .",
@ -52,6 +52,7 @@
},
"dependencies": {
"@chakra-ui/react-use-size": "^2.1.0",
"@dagrejs/dagre": "^1.1.1",
"@dagrejs/graphlib": "^2.2.1",
"@dnd-kit/core": "^6.1.0",
"@dnd-kit/sortable": "^8.0.0",

View File

@ -11,6 +11,9 @@ dependencies:
'@chakra-ui/react-use-size':
specifier: ^2.1.0
version: 2.1.0(react@18.2.0)
'@dagrejs/dagre':
specifier: ^1.1.1
version: 1.1.1
'@dagrejs/graphlib':
specifier: ^2.2.1
version: 2.2.1
@ -3092,6 +3095,12 @@ packages:
dev: true
optional: true
/@dagrejs/dagre@1.1.1:
resolution: {integrity: sha512-AQfT6pffEuPE32weFzhS/u3UpX+bRXUARIXL7UqLaxz497cN8pjuBlX6axO4IIECE2gBV8eLFQkGCtKX5sDaUA==}
dependencies:
'@dagrejs/graphlib': 2.2.1
dev: false
/@dagrejs/graphlib@2.2.1:
resolution: {integrity: sha512-xJsN1v6OAxXk6jmNdM+OS/bBE8nDCwM0yDNprXR18ZNatL6to9ggod9+l2XtiLhXfLm0NkE7+Er/cpdlM+SkUA==}
engines: {node: '>17.0.0'}

View File

@ -0,0 +1,5 @@
<svg width="16" height="16" viewBox="0 0 16 16" fill="none" xmlns="http://www.w3.org/2000/svg">
<rect width="16" height="16" rx="2" fill="#E6FD13"/>
<path d="M9.61889 5.45H12.5V3.5H3.5V5.45H6.38111L9.61889 10.55H12.5V12.5H3.5V10.55H6.38111" stroke="black"/>
<circle cx="12" cy="4" r="3" fill="#f5480c" stroke="#0d1117" stroke-width="1"/>
</svg>

After

Width:  |  Height:  |  Size: 345 B

View File

@ -291,7 +291,6 @@
"canvasMerged": "تم دمج الخط",
"sentToImageToImage": "تم إرسال إلى صورة إلى صورة",
"sentToUnifiedCanvas": "تم إرسال إلى لوحة موحدة",
"parametersSet": "تم تعيين المعلمات",
"parametersNotSet": "لم يتم تعيين المعلمات",
"metadataLoadFailed": "فشل تحميل البيانات الوصفية"
},

View File

@ -480,7 +480,6 @@
"canvasMerged": "Leinwand zusammengeführt",
"sentToImageToImage": "Gesendet an Bild zu Bild",
"sentToUnifiedCanvas": "Gesendet an Leinwand",
"parametersSet": "Parameter festlegen",
"parametersNotSet": "Parameter nicht festgelegt",
"metadataLoadFailed": "Metadaten konnten nicht geladen werden",
"setCanvasInitialImage": "Ausgangsbild setzen",

View File

@ -326,7 +326,8 @@
"drop": "Drop",
"dropOrUpload": "$t(gallery.drop) or Upload",
"dropToUpload": "$t(gallery.drop) to Upload",
"deleteImage": "Delete Image",
"deleteImage_one": "Delete Image",
"deleteImage_other": "Delete {{count}} Images",
"deleteImageBin": "Deleted images will be sent to your operating system's Bin.",
"deleteImagePermanent": "Deleted images cannot be restored.",
"download": "Download",
@ -849,6 +850,7 @@
"version": "Version",
"versionUnknown": " Version Unknown",
"workflow": "Workflow",
"graph": "Graph",
"workflowAuthor": "Author",
"workflowContact": "Contact",
"workflowDescription": "Short Description",
@ -1041,10 +1043,10 @@
"metadataLoadFailed": "Failed to load metadata",
"modelAddedSimple": "Model Added to Queue",
"modelImportCanceled": "Model Import Canceled",
"parameters": "Parameters",
"parameterNotSet": "{{parameter}} not set",
"parameterSet": "{{parameter}} set",
"parametersNotSet": "Parameters Not Set",
"parametersSet": "Parameters Set",
"problemCopyingCanvas": "Problem Copying Canvas",
"problemCopyingCanvasDesc": "Unable to export base layer",
"problemCopyingImage": "Unable to Copy Image",
@ -1423,6 +1425,7 @@
"eraseBoundingBox": "Erase Bounding Box",
"eraser": "Eraser",
"fillBoundingBox": "Fill Bounding Box",
"hideBoundingBox": "Hide Bounding Box",
"initialFitImageSize": "Fit Image Size on Drop",
"invertBrushSizeScrollDirection": "Invert Scroll for Brush Size",
"layer": "Layer",
@ -1440,6 +1443,7 @@
"saveMask": "Save $t(unifiedCanvas.mask)",
"saveToGallery": "Save To Gallery",
"scaledBoundingBox": "Scaled Bounding Box",
"showBoundingBox": "Show Bounding Box",
"showCanvasDebugInfo": "Show Additional Canvas Info",
"showGrid": "Show Grid",
"showResultsOn": "Show Results (On)",
@ -1482,7 +1486,11 @@
"workflowName": "Workflow Name",
"newWorkflowCreated": "New Workflow Created",
"workflowCleared": "Workflow Cleared",
"workflowEditorMenu": "Workflow Editor Menu"
"workflowEditorMenu": "Workflow Editor Menu",
"loadFromGraph": "Load Workflow from Graph",
"convertGraph": "Convert Graph",
"loadWorkflow": "$t(common.load) Workflow",
"autoLayout": "Auto Layout"
},
"app": {
"storeNotInitialized": "Store is not initialized"

View File

@ -363,7 +363,6 @@
"canvasMerged": "Lienzo consolidado",
"sentToImageToImage": "Enviar hacia Imagen a Imagen",
"sentToUnifiedCanvas": "Enviar hacia Lienzo Consolidado",
"parametersSet": "Parámetros establecidos",
"parametersNotSet": "Parámetros no establecidos",
"metadataLoadFailed": "Error al cargar metadatos",
"serverError": "Error en el servidor",

View File

@ -298,7 +298,6 @@
"canvasMerged": "Canvas fusionné",
"sentToImageToImage": "Envoyé à Image à Image",
"sentToUnifiedCanvas": "Envoyé à Canvas unifié",
"parametersSet": "Paramètres définis",
"parametersNotSet": "Paramètres non définis",
"metadataLoadFailed": "Échec du chargement des métadonnées"
},

View File

@ -306,7 +306,6 @@
"canvasMerged": "קנבס מוזג",
"sentToImageToImage": "נשלח לתמונה לתמונה",
"sentToUnifiedCanvas": "נשלח אל קנבס מאוחד",
"parametersSet": "הגדרת פרמטרים",
"parametersNotSet": "פרמטרים לא הוגדרו",
"metadataLoadFailed": "טעינת מטא-נתונים נכשלה"
},

View File

@ -444,7 +444,8 @@
"hfTokenInvalidErrorMessage2": "Aggiornalo in ",
"main": "Principali",
"noModelsInstalledDesc1": "Installa i modelli con",
"ipAdapters": "Adattatori IP"
"ipAdapters": "Adattatori IP",
"noMatchingModels": "Nessun modello corrispondente"
},
"parameters": {
"images": "Immagini",
@ -526,7 +527,12 @@
"aspect": "Aspetto",
"setToOptimalSizeTooLarge": "$t(parameters.setToOptimalSize) (potrebbe essere troppo grande)",
"remixImage": "Remixa l'immagine",
"coherenceEdgeSize": "Dim. bordo"
"coherenceEdgeSize": "Dim. bordo",
"infillMosaicTileWidth": "Larghezza piastrella",
"infillMosaicMinColor": "Colore minimo",
"infillMosaicMaxColor": "Colore massimo",
"infillMosaicTileHeight": "Altezza piastrella",
"infillColorValue": "Colore di riempimento"
},
"settings": {
"models": "Modelli",
@ -569,7 +575,6 @@
"canvasMerged": "Tela unita",
"sentToImageToImage": "Inviato a Immagine a Immagine",
"sentToUnifiedCanvas": "Inviato a Tela Unificata",
"parametersSet": "Parametri impostati",
"parametersNotSet": "Parametri non impostati",
"metadataLoadFailed": "Impossibile caricare i metadati",
"serverError": "Errore del Server",
@ -621,7 +626,8 @@
"uploadInitialImage": "Carica l'immagine iniziale",
"problemDownloadingImage": "Impossibile scaricare l'immagine",
"prunedQueue": "Coda ripulita",
"modelImportCanceled": "Importazione del modello annullata"
"modelImportCanceled": "Importazione del modello annullata",
"parameters": "Parametri"
},
"tooltip": {
"feature": {
@ -690,7 +696,10 @@
"coherenceModeBoxBlur": "Sfocatura Box",
"coherenceModeStaged": "Maschera espansa",
"invertBrushSizeScrollDirection": "Inverti scorrimento per dimensione pennello",
"discardCurrent": "Scarta l'attuale"
"discardCurrent": "Scarta l'attuale",
"initialFitImageSize": "Adatta dimensione immagine al rilascio",
"hideBoundingBox": "Nascondi il rettangolo di selezione",
"showBoundingBox": "Mostra il rettangolo di selezione"
},
"accessibility": {
"invokeProgressBar": "Barra di avanzamento generazione",
@ -833,7 +842,8 @@
"editMode": "Modifica nell'editor del flusso di lavoro",
"resetToDefaultValue": "Ripristina il valore predefinito",
"noFieldsViewMode": "Questo flusso di lavoro non ha campi selezionati da visualizzare. Visualizza il flusso di lavoro completo per configurare i valori.",
"edit": "Modifica"
"edit": "Modifica",
"graph": "Grafico"
},
"boards": {
"autoAddBoard": "Aggiungi automaticamente bacheca",
@ -1347,13 +1357,13 @@
]
},
"seamlessTilingXAxis": {
"heading": "Asse X di piastrellatura senza cuciture",
"heading": "Piastrella senza giunte sull'asse X",
"paragraphs": [
"Affianca senza soluzione di continuità un'immagine lungo l'asse orizzontale."
]
},
"seamlessTilingYAxis": {
"heading": "Asse Y di piastrellatura senza cuciture",
"heading": "Piastrella senza giunte sull'asse Y",
"paragraphs": [
"Affianca senza soluzione di continuità un'immagine lungo l'asse verticale."
]
@ -1477,7 +1487,11 @@
"name": "Nome",
"updated": "Aggiornato",
"projectWorkflows": "Flussi di lavoro del progetto",
"opened": "Aperto"
"opened": "Aperto",
"convertGraph": "Converti grafico",
"loadWorkflow": "$t(common.load) Flusso di lavoro",
"autoLayout": "Disposizione automatica",
"loadFromGraph": "Carica il flusso di lavoro dal grafico"
},
"app": {
"storeNotInitialized": "Il negozio non è inizializzato"

View File

@ -420,7 +420,6 @@
"canvasMerged": "Canvas samengevoegd",
"sentToImageToImage": "Gestuurd naar Afbeelding naar afbeelding",
"sentToUnifiedCanvas": "Gestuurd naar Centraal canvas",
"parametersSet": "Parameters ingesteld",
"parametersNotSet": "Parameters niet ingesteld",
"metadataLoadFailed": "Fout bij laden metagegevens",
"serverError": "Serverfout",

View File

@ -267,7 +267,6 @@
"canvasMerged": "Scalono widoczne warstwy",
"sentToImageToImage": "Wysłano do Obraz na obraz",
"sentToUnifiedCanvas": "Wysłano do trybu uniwersalnego",
"parametersSet": "Ustawiono parametry",
"parametersNotSet": "Nie ustawiono parametrów",
"metadataLoadFailed": "Błąd wczytywania metadanych"
},

View File

@ -310,7 +310,6 @@
"canvasMerged": "Tela Fundida",
"sentToImageToImage": "Mandar Para Imagem Para Imagem",
"sentToUnifiedCanvas": "Enviada para a Tela Unificada",
"parametersSet": "Parâmetros Definidos",
"parametersNotSet": "Parâmetros Não Definidos",
"metadataLoadFailed": "Falha ao tentar carregar metadados"
},

View File

@ -307,7 +307,6 @@
"canvasMerged": "Tela Fundida",
"sentToImageToImage": "Mandar Para Imagem Para Imagem",
"sentToUnifiedCanvas": "Enviada para a Tela Unificada",
"parametersSet": "Parâmetros Definidos",
"parametersNotSet": "Parâmetros Não Definidos",
"metadataLoadFailed": "Falha ao tentar carregar metadados"
},

View File

@ -448,7 +448,9 @@
"loraModels": "LoRAs",
"main": "Основные",
"noModelsInstalled": "Нет установленных моделей",
"noModelsInstalledDesc1": "Установите модели с помощью"
"noModelsInstalledDesc1": "Установите модели с помощью",
"noMatchingModels": "Нет подходящих моделей",
"ipAdapters": "IP адаптеры"
},
"parameters": {
"images": "Изображения",
@ -532,7 +534,12 @@
"lockAspectRatio": "Заблокировать соотношение",
"remixImage": "Ремикс изображения",
"coherenceMinDenoise": "Мин. шумоподавление",
"coherenceEdgeSize": "Размер края"
"coherenceEdgeSize": "Размер края",
"infillMosaicTileWidth": "Ширина плиток",
"infillMosaicTileHeight": "Высота плиток",
"infillMosaicMinColor": "Мин цвет",
"infillMosaicMaxColor": "Макс цвет",
"infillColorValue": "Цвет заливки"
},
"settings": {
"models": "Модели",
@ -575,7 +582,6 @@
"canvasMerged": "Холст объединен",
"sentToImageToImage": "Отправить в img2img",
"sentToUnifiedCanvas": "Отправлено на Единый холст",
"parametersSet": "Параметры заданы",
"parametersNotSet": "Параметры не заданы",
"metadataLoadFailed": "Не удалось загрузить метаданные",
"serverError": "Ошибка сервера",
@ -627,7 +633,8 @@
"uploadInitialImage": "Загрузить начальное изображение",
"resetInitialImage": "Сбросить начальное изображение",
"prunedQueue": "Урезанная очередь",
"modelImportCanceled": "Импорт модели отменен"
"modelImportCanceled": "Импорт модели отменен",
"parameters": "Параметры"
},
"tooltip": {
"feature": {
@ -696,7 +703,8 @@
"coherenceModeGaussianBlur": "Размытие по Гауссу",
"coherenceModeBoxBlur": "коробчатое размытие",
"discardCurrent": "Отбросить текущее",
"invertBrushSizeScrollDirection": "Инвертировать прокрутку для размера кисти"
"invertBrushSizeScrollDirection": "Инвертировать прокрутку для размера кисти",
"initialFitImageSize": "Подогнать размер изображения при перебросе"
},
"accessibility": {
"uploadImage": "Загрузить изображение",
@ -922,7 +930,8 @@
"modelSize": "Размер модели",
"small": "Маленький",
"body": "Тело",
"hands": "Руки"
"hands": "Руки",
"selectCLIPVisionModel": "Выбрать модель CLIP Vision"
},
"boards": {
"autoAddBoard": "Авто добавление Доски",

View File

@ -315,7 +315,6 @@
"canvasMerged": "Полотно об'єднане",
"sentToImageToImage": "Надіслати до img2img",
"sentToUnifiedCanvas": "Надіслати на полотно",
"parametersSet": "Параметри задані",
"parametersNotSet": "Параметри не задані",
"metadataLoadFailed": "Не вдалося завантажити метадані",
"serverError": "Помилка сервера",

View File

@ -65,7 +65,12 @@
"nextPage": "下一页",
"saveAs": "保存为",
"ai": "ai",
"or": "或"
"or": "或",
"aboutDesc": "使用 Invoke 工作?查看:",
"add": "添加",
"loglevel": "日志级别",
"copy": "复制",
"localSystem": "本地系统"
},
"gallery": {
"galleryImageSize": "预览大小",
@ -487,7 +492,6 @@
"canvasMerged": "画布已合并",
"sentToImageToImage": "已发送到图生图",
"sentToUnifiedCanvas": "已发送到统一画布",
"parametersSet": "参数已设定",
"parametersNotSet": "参数未设定",
"metadataLoadFailed": "加载元数据失败",
"uploadFailedInvalidUploadDesc": "必须是单张的 PNG 或 JPEG 图片",
@ -600,7 +604,8 @@
"loadMore": "加载更多",
"mode": "模式",
"resetUI": "$t(accessibility.reset) UI",
"createIssue": "创建问题"
"createIssue": "创建问题",
"about": "关于"
},
"tooltip": {
"feature": {
@ -1202,7 +1207,16 @@
"workflows": "工作流",
"noDescription": "无描述",
"uploadWorkflow": "从文件中加载",
"newWorkflowCreated": "已创建新的工作流"
"newWorkflowCreated": "已创建新的工作流",
"name": "名称",
"defaultWorkflows": "默认工作流",
"created": "已创建",
"ascending": "升序",
"descending": "降序",
"updated": "已更新",
"userWorkflows": "我的工作流",
"projectWorkflows": "项目工作流",
"opened": "已打开"
},
"app": {
"storeNotInitialized": "商店尚未初始化"
@ -1220,7 +1234,8 @@
"title": "生成"
},
"advanced": {
"title": "高级"
"title": "高级",
"options": "$t(accordions.advanced.title) 选项"
},
"image": {
"title": "图像"

View File

@ -1,5 +1,6 @@
import { Box, useGlobalModifiersInit } from '@invoke-ai/ui-library';
import { useSocketIO } from 'app/hooks/useSocketIO';
import { useSyncQueueStatus } from 'app/hooks/useSyncQueueStatus';
import { useLogger } from 'app/logging/useLogger';
import { appStarted } from 'app/store/middleware/listenerMiddleware/listeners/appStarted';
import { useAppDispatch, useAppSelector } from 'app/store/storeHooks';
@ -70,6 +71,7 @@ const App = ({ config = DEFAULT_CONFIG, selectedImage }: Props) => {
}, [dispatch]);
useStarterModelsToast();
useSyncQueueStatus();
return (
<ErrorBoundary onReset={handleReset} FallbackComponent={AppErrorBoundaryFallback}>

View File

@ -0,0 +1,25 @@
import { useEffect } from 'react';
import { useGetQueueStatusQuery } from 'services/api/endpoints/queue';
const baseTitle = document.title;
const invokeLogoSVG = 'assets/images/invoke-favicon.svg';
const invokeAlertLogoSVG = 'assets/images/invoke-alert-favicon.svg';
/**
* This hook synchronizes the queue status with the page's title and favicon.
* It should be considered a singleton and only used once in the component tree.
*/
export const useSyncQueueStatus = () => {
const { queueSize } = useGetQueueStatusQuery(undefined, {
selectFromResult: (res) => ({
queueSize: res.data ? res.data.queue.pending + res.data.queue.in_progress : 0,
}),
});
useEffect(() => {
document.title = queueSize > 0 ? `(${queueSize}) ${baseTitle}` : baseTitle;
const faviconEl = document.getElementById('invoke-favicon');
if (faviconEl instanceof HTMLLinkElement) {
faviconEl.href = queueSize > 0 ? invokeAlertLogoSVG : invokeLogoSVG;
}
}, [queueSize]);
};

View File

@ -1,12 +1,18 @@
import { isAnyOf } from '@reduxjs/toolkit';
import { logger } from 'app/logging/logger';
import type { AppStartListening } from 'app/store/middleware/listenerMiddleware';
import { canvasBatchIdsReset, commitStagingAreaImage, discardStagedImages } from 'features/canvas/store/canvasSlice';
import {
canvasBatchIdsReset,
commitStagingAreaImage,
discardStagedImages,
resetCanvas,
setInitialCanvasImage,
} from 'features/canvas/store/canvasSlice';
import { addToast } from 'features/system/store/systemSlice';
import { t } from 'i18next';
import { queueApi } from 'services/api/endpoints/queue';
const matcher = isAnyOf(commitStagingAreaImage, discardStagedImages);
const matcher = isAnyOf(commitStagingAreaImage, discardStagedImages, resetCanvas, setInitialCanvasImage);
export const addCommitStagingAreaImageListener = (startAppListening: AppStartListening) => {
startAppListening({

View File

@ -1,5 +1,4 @@
import { Flex, Image, Spinner } from '@invoke-ai/ui-library';
/** @knipignore */
import InvokeLogoWhite from 'public/assets/images/invoke-symbol-wht-lrg.svg';
import { memo } from 'react';

View File

@ -49,14 +49,20 @@ const selector = createMemoizedSelector(selectCanvasSlice, (canvas) => {
const ClearStagingIntermediatesIconButton = () => {
const dispatch = useAppDispatch();
const { t } = useTranslation();
const totalStagedImages = useAppSelector((s) => s.canvas.layerState.stagingArea.images.length);
const handleDiscardStagingArea = useCallback(() => {
dispatch(discardStagedImages());
}, [dispatch]);
const handleDiscardStagingImage = useCallback(() => {
dispatch(discardStagedImage());
}, [dispatch]);
// Discarding all staged images triggers cancelation of all canvas batches. It's too easy to accidentally
// click the discard button, so to prevent accidental cancelation of all batches, we only discard the current
// image if there are more than one staged images.
if (totalStagedImages > 1) {
dispatch(discardStagedImage());
}
}, [dispatch, totalStagedImages]);
return (
<>
@ -67,6 +73,7 @@ const ClearStagingIntermediatesIconButton = () => {
onClick={handleDiscardStagingImage}
colorScheme="invokeBlue"
fontSize={16}
isDisabled={totalStagedImages <= 1}
/>
<IconButton
tooltip={`${t('unifiedCanvas.discardAll')} (Esc)`}

View File

@ -13,7 +13,13 @@ import {
} from 'features/canvas/store/actions';
import { $canvasBaseLayer, $tool } from 'features/canvas/store/canvasNanostore';
import { isStagingSelector } from 'features/canvas/store/canvasSelectors';
import { resetCanvas, resetCanvasView, setIsMaskEnabled, setLayer } from 'features/canvas/store/canvasSlice';
import {
resetCanvas,
resetCanvasView,
setIsMaskEnabled,
setLayer,
setShouldShowBoundingBox,
} from 'features/canvas/store/canvasSlice';
import type { CanvasLayer } from 'features/canvas/store/canvasTypes';
import { LAYER_NAMES_DICT } from 'features/canvas/store/canvasTypes';
import { memo, useCallback, useMemo } from 'react';
@ -23,6 +29,8 @@ import {
PiCopyBold,
PiCrosshairSimpleBold,
PiDownloadSimpleBold,
PiEyeBold,
PiEyeSlashBold,
PiFloppyDiskBold,
PiHandGrabbingBold,
PiStackBold,
@ -44,6 +52,7 @@ const IAICanvasToolbar = () => {
const isStaging = useAppSelector(isStagingSelector);
const { t } = useTranslation();
const { isClipboardAPIAvailable } = useCopyImageToClipboard();
const shouldShowBoundingBox = useAppSelector((s) => s.canvas.shouldShowBoundingBox);
const { getUploadButtonProps, getUploadInputProps } = useImageUploadButton({
postUploadAction: { type: 'SET_CANVAS_INITIAL_IMAGE' },
@ -61,6 +70,18 @@ const IAICanvasToolbar = () => {
[]
);
useHotkeys(
'shift+h',
() => {
dispatch(setShouldShowBoundingBox(!shouldShowBoundingBox));
},
{
enabled: () => !isStaging,
preventDefault: true,
},
[shouldShowBoundingBox]
);
useHotkeys(
['r'],
() => {
@ -125,6 +146,10 @@ const IAICanvasToolbar = () => {
$tool.set('move');
}, []);
const handleSetShouldShowBoundingBox = useCallback(() => {
dispatch(setShouldShowBoundingBox(!shouldShowBoundingBox));
}, [dispatch, shouldShowBoundingBox]);
const handleResetCanvasView = useCallback(
(shouldScaleTo1 = false) => {
const canvasBaseLayer = $canvasBaseLayer.get();
@ -212,6 +237,13 @@ const IAICanvasToolbar = () => {
isChecked={tool === 'move' || isStaging}
onClick={handleSelectMoveTool}
/>
<IconButton
aria-label={`${shouldShowBoundingBox ? t('unifiedCanvas.hideBoundingBox') : t('unifiedCanvas.showBoundingBox')} (Shift + H)`}
tooltip={`${shouldShowBoundingBox ? t('unifiedCanvas.hideBoundingBox') : t('unifiedCanvas.showBoundingBox')} (Shift + H)`}
icon={shouldShowBoundingBox ? <PiEyeBold /> : <PiEyeSlashBold />}
onClick={handleSetShouldShowBoundingBox}
isDisabled={isStaging}
/>
<IconButton
aria-label={`${t('unifiedCanvas.resetView')} (R)`}
tooltip={`${t('unifiedCanvas.resetView')} (R)`}

View File

@ -7,12 +7,7 @@ import {
resetToolInteractionState,
} from 'features/canvas/store/canvasNanostore';
import { isStagingSelector } from 'features/canvas/store/canvasSelectors';
import {
clearMask,
setIsMaskEnabled,
setShouldShowBoundingBox,
setShouldSnapToGrid,
} from 'features/canvas/store/canvasSlice';
import { clearMask, setIsMaskEnabled, setShouldSnapToGrid } from 'features/canvas/store/canvasSlice';
import { isInteractiveTarget } from 'features/canvas/util/isInteractiveTarget';
import { activeTabNameSelector } from 'features/ui/store/uiSelectors';
import { useCallback, useEffect } from 'react';
@ -21,7 +16,6 @@ import { useHotkeys } from 'react-hotkeys-hook';
const useInpaintingCanvasHotkeys = () => {
const dispatch = useAppDispatch();
const activeTabName = useAppSelector(activeTabNameSelector);
const shouldShowBoundingBox = useAppSelector((s) => s.canvas.shouldShowBoundingBox);
const isStaging = useAppSelector(isStagingSelector);
const isMaskEnabled = useAppSelector((s) => s.canvas.isMaskEnabled);
const shouldSnapToGrid = useAppSelector((s) => s.canvas.shouldSnapToGrid);
@ -79,18 +73,6 @@ const useInpaintingCanvasHotkeys = () => {
}
);
useHotkeys(
'shift+h',
() => {
dispatch(setShouldShowBoundingBox(!shouldShowBoundingBox));
},
{
enabled: () => !isStaging,
preventDefault: true,
},
[activeTabName, shouldShowBoundingBox]
);
const onKeyDown = useCallback(
(e: KeyboardEvent) => {
if (e.repeat || e.key !== ' ' || isInteractiveTarget(e.target) || activeTabName !== 'unifiedCanvas') {

View File

@ -190,7 +190,6 @@ export const canvasSlice = createSlice({
],
};
state.futureLayerStates = [];
state.batchIds = [];
const newScale = calculateScale(
stageDimensions.width,
@ -286,40 +285,14 @@ export const canvasSlice = createSlice({
},
discardStagedImages: (state) => {
pushToPrevLayerStates(state);
state.layerState.stagingArea = deepClone(initialLayerState.stagingArea);
resetStagingArea(state);
state.futureLayerStates = [];
state.shouldShowStagingOutline = true;
state.shouldShowStagingImage = true;
state.batchIds = [];
},
discardStagedImage: (state) => {
const { images, selectedImageIndex } = state.layerState.stagingArea;
pushToPrevLayerStates(state);
images.splice(selectedImageIndex, 1);
if (images.length === 0) {
pushToPrevLayerStates(state);
state.layerState.stagingArea = deepClone(initialLayerState.stagingArea);
state.futureLayerStates = [];
state.shouldShowStagingOutline = true;
state.shouldShowStagingImage = true;
state.batchIds = [];
}
if (selectedImageIndex >= images.length) {
state.layerState.stagingArea.selectedImageIndex = images.length - 1;
}
if (!images.length) {
state.shouldShowStagingImage = false;
state.shouldShowStagingOutline = false;
}
state.layerState.stagingArea.selectedImageIndex = Math.max(0, images.length - 1);
state.futureLayerStates = [];
},
addFillRect: (state) => {
@ -433,7 +406,6 @@ export const canvasSlice = createSlice({
pushToPrevLayerStates(state);
state.layerState = deepClone(initialLayerState);
state.futureLayerStates = [];
state.batchIds = [];
state.boundingBoxCoordinates = {
...initialCanvasState.boundingBoxCoordinates,
};
@ -534,12 +506,9 @@ export const canvasSlice = createSlice({
...imageToCommit,
});
}
state.layerState.stagingArea = deepClone(initialLayerState.stagingArea);
resetStagingArea(state);
state.futureLayerStates = [];
state.shouldShowStagingOutline = true;
state.shouldShowStagingImage = true;
state.batchIds = [];
},
setBoundingBoxScaleMethod: {
reducer: (state, action: PayloadActionWithOptimalDimension<BoundingBoxScaleMethod>) => {
@ -647,12 +616,19 @@ export const canvasSlice = createSlice({
if (batch_status.in_progress === 0 && batch_status.pending === 0) {
state.batchIds = state.batchIds.filter((id) => id !== batch_status.batch_id);
}
const queueItemStatus = action.payload.data.queue_item.status;
if (queueItemStatus === 'canceled' || queueItemStatus === 'failed') {
resetStagingAreaIfEmpty(state);
}
});
builder.addMatcher(queueApi.endpoints.clearQueue.matchFulfilled, (state) => {
state.batchIds = [];
resetStagingAreaIfEmpty(state);
});
builder.addMatcher(queueApi.endpoints.cancelByBatchIds.matchFulfilled, (state, action) => {
state.batchIds = state.batchIds.filter((id) => !action.meta.arg.originalArgs.batch_ids.includes(id));
resetStagingAreaIfEmpty(state);
});
},
});
@ -726,7 +702,7 @@ export const canvasPersistConfig: PersistConfig<CanvasState> = {
name: canvasSlice.name,
initialState: initialCanvasState,
migrate: migrateCanvasState,
persistDenylist: [],
persistDenylist: ['shouldShowStagingImage', 'shouldShowStagingOutline'],
};
const pushToPrevLayerStates = (state: CanvasState) => {
@ -742,3 +718,15 @@ const pushToFutureLayerStates = (state: CanvasState) => {
state.futureLayerStates = state.futureLayerStates.slice(0, MAX_HISTORY);
}
};
const resetStagingAreaIfEmpty = (state: CanvasState) => {
if (state.batchIds.length === 0 && state.layerState.stagingArea.images.length === 0) {
resetStagingArea(state);
}
};
const resetStagingArea = (state: CanvasState) => {
state.layerState.stagingArea = { ...initialCanvasState.layerState.stagingArea };
state.shouldShowStagingImage = initialCanvasState.shouldShowStagingImage;
state.shouldShowStagingOutline = initialCanvasState.shouldShowStagingOutline;
};

View File

@ -103,7 +103,7 @@ const ParamControlAdapterModel = ({ id }: ParamControlAdapterModelProps) => {
return (
<Flex sx={{ gap: 2 }}>
<Tooltip label={value?.description}>
<Tooltip label={selectedModel?.description}>
<FormControl
isDisabled={!isEnabled}
isInvalid={!value || mainModel?.base !== modelConfig?.base}

View File

@ -13,13 +13,15 @@ export const DeleteImageButton = memo((props: DeleteImageButtonProps) => {
const { onClick, isDisabled } = props;
const { t } = useTranslation();
const isConnected = useAppSelector((s) => s.system.isConnected);
const imageSelectionLength: number = useAppSelector((s) => s.gallery.selection.length);
const labelMessage: string = `${t('gallery.deleteImage', { count: imageSelectionLength })} (Del)`;
return (
<IconButton
onClick={onClick}
icon={<PiTrashSimpleBold />}
tooltip={`${t('gallery.deleteImage')} (Del)`}
aria-label={`${t('gallery.deleteImage')} (Del)`}
tooltip={labelMessage}
aria-label={labelMessage}
isDisabled={isDisabled || !isConnected}
colorScheme="error"
/>

View File

@ -80,7 +80,7 @@ const DeleteImageModal = () => {
return (
<ConfirmationAlertDialog
title={t('gallery.deleteImage')}
title={t('gallery.deleteImage', { count: imagesToDelete.length })}
isOpen={isModalOpen}
onClose={handleClose}
cancelButtonText={t('boards.cancel')}

View File

@ -6,7 +6,6 @@ import type { RemoveFromBoardDropData } from 'features/dnd/types';
import AutoAddIcon from 'features/gallery/components/Boards/AutoAddIcon';
import BoardContextMenu from 'features/gallery/components/Boards/BoardContextMenu';
import { autoAddBoardIdChanged, boardIdSelected } from 'features/gallery/store/gallerySlice';
/** @knipignore */
import InvokeLogoSVG from 'public/assets/images/invoke-symbol-wht-lrg.svg';
import { memo, useCallback, useMemo, useState } from 'react';
import { useTranslation } from 'react-i18next';

View File

@ -51,6 +51,7 @@ const CurrentImageButtons = () => {
const shouldShowImageDetails = useAppSelector((s) => s.ui.shouldShowImageDetails);
const shouldShowProgressInViewer = useAppSelector((s) => s.ui.shouldShowProgressInViewer);
const lastSelectedImage = useAppSelector(selectLastSelectedImage);
const selection = useAppSelector((s) => s.gallery.selection);
const shouldDisableToolbarButtons = useAppSelector(selectShouldDisableToolbarButtons);
const isUpscalingEnabled = useFeatureStatus('upscaling').isFeatureEnabled;
@ -102,8 +103,8 @@ const CurrentImageButtons = () => {
if (!imageDTO) {
return;
}
dispatch(imagesToDeleteSelected([imageDTO]));
}, [dispatch, imageDTO]);
dispatch(imagesToDeleteSelected(selection));
}, [dispatch, imageDTO, selection]);
useHotkeys(
'Shift+U',

View File

@ -188,7 +188,7 @@ const SingleSelectionMenuItems = (props: SingleSelectionMenuItemsProps) => {
)}
<MenuDivider />
<MenuItem color="error.300" icon={<PiTrashSimpleBold />} onClickCapture={handleDelete}>
{t('gallery.deleteImage')}
{t('gallery.deleteImage', { count: 1 })}
</MenuItem>
</>
);

View File

@ -180,7 +180,7 @@ const GalleryImage = (props: HoverableImageProps) => {
<IAIDndImageIcon
onClick={handleDelete}
icon={<PiTrashSimpleFill size="16px" />}
tooltip={t('gallery.deleteImage')}
tooltip={t('gallery.deleteImage', { count: 1 })}
styleOverrides={imageIconStyleOverrides}
/>
)}

View File

@ -33,6 +33,7 @@ const ImageMetadataActions = (props: Props) => {
<MetadataItem metadata={metadata} handlers={handlers.scheduler} />
<MetadataItem metadata={metadata} handlers={handlers.cfgScale} />
<MetadataItem metadata={metadata} handlers={handlers.cfgRescaleMultiplier} />
<MetadataItem metadata={metadata} handlers={handlers.initialImage} />
<MetadataItem metadata={metadata} handlers={handlers.strength} />
<MetadataItem metadata={metadata} handlers={handlers.hrfEnabled} />
<MetadataItem metadata={metadata} handlers={handlers.hrfMethod} />

View File

@ -189,6 +189,12 @@ export const handlers = {
recaller: recallers.cfgScale,
}),
height: buildHandlers({ getLabel: () => t('metadata.height'), parser: parsers.height, recaller: recallers.height }),
initialImage: buildHandlers({
getLabel: () => t('metadata.initImage'),
parser: parsers.initialImage,
recaller: recallers.initialImage,
renderValue: async (imageDTO) => imageDTO.image_name,
}),
negativePrompt: buildHandlers({
getLabel: () => t('metadata.negativePrompt'),
parser: parsers.negativePrompt,
@ -405,6 +411,6 @@ export const parseAndRecallAllMetadata = async (metadata: unknown, skip: (keyof
})
);
if (results.some((result) => result.status === 'fulfilled')) {
parameterSetToast(t('toast.parametersSet'));
parameterSetToast(t('toast.parameters'));
}
};

View File

@ -1,3 +1,4 @@
import { getStore } from 'app/store/nanostores/store';
import {
initialControlNet,
initialIPAdapter,
@ -57,6 +58,8 @@ import {
isParameterWidth,
} from 'features/parameters/types/parameterSchemas';
import { get, isArray, isString } from 'lodash-es';
import { imagesApi } from 'services/api/endpoints/images';
import type { ImageDTO } from 'services/api/types';
import {
isControlNetModelConfig,
isIPAdapterModelConfig,
@ -135,6 +138,14 @@ const parseCFGRescaleMultiplier: MetadataParseFunc<ParameterCFGRescaleMultiplier
const parseScheduler: MetadataParseFunc<ParameterScheduler> = (metadata) =>
getProperty(metadata, 'scheduler', isParameterScheduler);
const parseInitialImage: MetadataParseFunc<ImageDTO> = async (metadata) => {
const imageName = await getProperty(metadata, 'init_image', isString);
const imageDTORequest = getStore().dispatch(imagesApi.endpoints.getImageDTO.initiate(imageName));
const imageDTO = await imageDTORequest.unwrap();
imageDTORequest.unsubscribe();
return imageDTO;
};
const parseWidth: MetadataParseFunc<ParameterWidth> = (metadata) => getProperty(metadata, 'width', isParameterWidth);
const parseHeight: MetadataParseFunc<ParameterHeight> = (metadata) =>
@ -145,8 +156,13 @@ const parseSteps: MetadataParseFunc<ParameterSteps> = (metadata) => getProperty(
const parseStrength: MetadataParseFunc<ParameterStrength> = (metadata) =>
getProperty(metadata, 'strength', isParameterStrength);
const parseHRFEnabled: MetadataParseFunc<ParameterHRFEnabled> = (metadata) =>
getProperty(metadata, 'hrf_enabled', isParameterHRFEnabled);
const parseHRFEnabled: MetadataParseFunc<ParameterHRFEnabled> = async (metadata) => {
try {
return await getProperty(metadata, 'hrf_enabled', isParameterHRFEnabled);
} catch {
return false;
}
};
const parseHRFStrength: MetadataParseFunc<ParameterStrength> = (metadata) =>
getProperty(metadata, 'hrf_strength', isParameterStrength);
@ -213,12 +229,16 @@ const parseLoRA: MetadataParseFunc<LoRA> = async (metadataItem) => {
};
const parseAllLoRAs: MetadataParseFunc<LoRA[]> = async (metadata) => {
const lorasRaw = await getProperty(metadata, 'loras', isArray);
const parseResults = await Promise.allSettled(lorasRaw.map((lora) => parseLoRA(lora)));
const loras = parseResults
.filter((result): result is PromiseFulfilledResult<LoRA> => result.status === 'fulfilled')
.map((result) => result.value);
return loras;
try {
const lorasRaw = await getProperty(metadata, 'loras', isArray);
const parseResults = await Promise.allSettled(lorasRaw.map((lora) => parseLoRA(lora)));
const loras = parseResults
.filter((result): result is PromiseFulfilledResult<LoRA> => result.status === 'fulfilled')
.map((result) => result.value);
return loras;
} catch {
return [];
}
};
const parseControlNet: MetadataParseFunc<ControlNetConfigMetadata> = async (metadataItem) => {
@ -277,12 +297,16 @@ const parseControlNet: MetadataParseFunc<ControlNetConfigMetadata> = async (meta
};
const parseAllControlNets: MetadataParseFunc<ControlNetConfigMetadata[]> = async (metadata) => {
const controlNetsRaw = await getProperty(metadata, 'controlnets', isArray);
const parseResults = await Promise.allSettled(controlNetsRaw.map((cn) => parseControlNet(cn)));
const controlNets = parseResults
.filter((result): result is PromiseFulfilledResult<ControlNetConfigMetadata> => result.status === 'fulfilled')
.map((result) => result.value);
return controlNets;
try {
const controlNetsRaw = await getProperty(metadata, 'controlnets', isArray || undefined);
const parseResults = await Promise.allSettled(controlNetsRaw.map((cn) => parseControlNet(cn)));
const controlNets = parseResults
.filter((result): result is PromiseFulfilledResult<ControlNetConfigMetadata> => result.status === 'fulfilled')
.map((result) => result.value);
return controlNets;
} catch {
return [];
}
};
const parseT2IAdapter: MetadataParseFunc<T2IAdapterConfigMetadata> = async (metadataItem) => {
@ -337,12 +361,16 @@ const parseT2IAdapter: MetadataParseFunc<T2IAdapterConfigMetadata> = async (meta
};
const parseAllT2IAdapters: MetadataParseFunc<T2IAdapterConfigMetadata[]> = async (metadata) => {
const t2iAdaptersRaw = await getProperty(metadata, 't2iAdapters', isArray);
const parseResults = await Promise.allSettled(t2iAdaptersRaw.map((t2iAdapter) => parseT2IAdapter(t2iAdapter)));
const t2iAdapters = parseResults
.filter((result): result is PromiseFulfilledResult<T2IAdapterConfigMetadata> => result.status === 'fulfilled')
.map((result) => result.value);
return t2iAdapters;
try {
const t2iAdaptersRaw = await getProperty(metadata, 't2iAdapters', isArray);
const parseResults = await Promise.allSettled(t2iAdaptersRaw.map((t2iAdapter) => parseT2IAdapter(t2iAdapter)));
const t2iAdapters = parseResults
.filter((result): result is PromiseFulfilledResult<T2IAdapterConfigMetadata> => result.status === 'fulfilled')
.map((result) => result.value);
return t2iAdapters;
} catch {
return [];
}
};
const parseIPAdapter: MetadataParseFunc<IPAdapterConfigMetadata> = async (metadataItem) => {
@ -383,12 +411,16 @@ const parseIPAdapter: MetadataParseFunc<IPAdapterConfigMetadata> = async (metada
};
const parseAllIPAdapters: MetadataParseFunc<IPAdapterConfigMetadata[]> = async (metadata) => {
const ipAdaptersRaw = await getProperty(metadata, 'ipAdapters', isArray);
const parseResults = await Promise.allSettled(ipAdaptersRaw.map((ipAdapter) => parseIPAdapter(ipAdapter)));
const ipAdapters = parseResults
.filter((result): result is PromiseFulfilledResult<IPAdapterConfigMetadata> => result.status === 'fulfilled')
.map((result) => result.value);
return ipAdapters;
try {
const ipAdaptersRaw = await getProperty(metadata, 'ipAdapters', isArray);
const parseResults = await Promise.allSettled(ipAdaptersRaw.map((ipAdapter) => parseIPAdapter(ipAdapter)));
const ipAdapters = parseResults
.filter((result): result is PromiseFulfilledResult<IPAdapterConfigMetadata> => result.status === 'fulfilled')
.map((result) => result.value);
return ipAdapters;
} catch {
return [];
}
};
export const parsers = {
@ -402,6 +434,7 @@ export const parsers = {
cfgScale: parseCFGScale,
cfgRescaleMultiplier: parseCFGRescaleMultiplier,
scheduler: parseScheduler,
initialImage: parseInitialImage,
width: parseWidth,
height: parseHeight,
steps: parseSteps,

View File

@ -17,6 +17,7 @@ import type {
import { modelSelected } from 'features/parameters/store/actions';
import {
heightRecalled,
initialImageChanged,
setCfgRescaleMultiplier,
setCfgScale,
setImg2imgStrength,
@ -61,6 +62,7 @@ import {
setRefinerStart,
setRefinerSteps,
} from 'features/sdxl/store/sdxlSlice';
import type { ImageDTO } from 'services/api/types';
const recallPositivePrompt: MetadataRecallFunc<ParameterPositivePrompt> = (positivePrompt) => {
getStore().dispatch(setPositivePrompt(positivePrompt));
@ -94,6 +96,10 @@ const recallScheduler: MetadataRecallFunc<ParameterScheduler> = (scheduler) => {
getStore().dispatch(setScheduler(scheduler));
};
const recallInitialImage: MetadataRecallFunc<ImageDTO> = async (imageDTO) => {
getStore().dispatch(initialImageChanged(imageDTO));
};
const recallWidth: MetadataRecallFunc<ParameterWidth> = (width) => {
getStore().dispatch(widthRecalled(width));
};
@ -171,11 +177,11 @@ const recallLoRA: MetadataRecallFunc<LoRA> = (lora) => {
};
const recallAllLoRAs: MetadataRecallFunc<LoRA[]> = (loras) => {
const { dispatch } = getStore();
dispatch(lorasReset());
if (!loras.length) {
return;
}
const { dispatch } = getStore();
dispatch(lorasReset());
loras.forEach((lora) => {
dispatch(loraRecalled(lora));
});
@ -186,11 +192,11 @@ const recallControlNet: MetadataRecallFunc<ControlNetConfigMetadata> = (controlN
};
const recallControlNets: MetadataRecallFunc<ControlNetConfigMetadata[]> = (controlNets) => {
const { dispatch } = getStore();
dispatch(controlNetsReset());
if (!controlNets.length) {
return;
}
const { dispatch } = getStore();
dispatch(controlNetsReset());
controlNets.forEach((controlNet) => {
dispatch(controlAdapterRecalled(controlNet));
});
@ -201,11 +207,11 @@ const recallT2IAdapter: MetadataRecallFunc<T2IAdapterConfigMetadata> = (t2iAdapt
};
const recallT2IAdapters: MetadataRecallFunc<T2IAdapterConfigMetadata[]> = (t2iAdapters) => {
const { dispatch } = getStore();
dispatch(t2iAdaptersReset());
if (!t2iAdapters.length) {
return;
}
const { dispatch } = getStore();
dispatch(t2iAdaptersReset());
t2iAdapters.forEach((t2iAdapter) => {
dispatch(controlAdapterRecalled(t2iAdapter));
});
@ -216,11 +222,11 @@ const recallIPAdapter: MetadataRecallFunc<IPAdapterConfigMetadata> = (ipAdapter)
};
const recallIPAdapters: MetadataRecallFunc<IPAdapterConfigMetadata[]> = (ipAdapters) => {
const { dispatch } = getStore();
dispatch(ipAdaptersReset());
if (!ipAdapters.length) {
return;
}
const { dispatch } = getStore();
dispatch(ipAdaptersReset());
ipAdapters.forEach((ipAdapter) => {
dispatch(controlAdapterRecalled(ipAdapter));
});
@ -235,6 +241,7 @@ export const recallers = {
cfgScale: recallCFGScale,
cfgRescaleMultiplier: recallCFGRescaleMultiplier,
scheduler: recallScheduler,
initialImage: recallInitialImage,
width: recallWidth,
height: recallHeight,
steps: recallSteps,

View File

@ -74,7 +74,6 @@ export const InstallModelForm = () => {
onClick={handleSubmit(onSubmit)}
isDisabled={!formState.dirtyFields.location}
isLoading={isLoading}
type="submit"
size="sm"
>
{t('modelManager.install')}

View File

@ -86,7 +86,6 @@ export const ControlNetOrT2IAdapterDefaultSettings = () => {
colorScheme="invokeYellow"
isDisabled={!formState.isDirty}
onClick={handleSubmit(onSubmit)}
type="submit"
isLoading={isLoadingUpdateModel}
>
{t('common.save')}

View File

@ -116,7 +116,6 @@ export const MainModelDefaultSettings = () => {
colorScheme="invokeYellow"
isDisabled={!formState.isDirty}
onClick={handleSubmit(onSubmit)}
type="submit"
isLoading={isLoadingUpdateModel}
>
{t('common.save')}

View File

@ -88,7 +88,6 @@ export const TriggerPhrases = () => {
<Button
leftIcon={<PiPlusBold />}
size="sm"
type="submit"
onClick={addTriggerPhrase}
isDisabled={!phrase || Boolean(errors.length)}
isLoading={isLoading}

View File

@ -3,6 +3,7 @@ import 'reactflow/dist/style.css';
import { Flex } from '@invoke-ai/ui-library';
import { IAINoContentFallback } from 'common/components/IAIImageFallback';
import TopPanel from 'features/nodes/components/flow/panels/TopPanel/TopPanel';
import { LoadWorkflowFromGraphModal } from 'features/workflowLibrary/components/LoadWorkflowFromGraphModal/LoadWorkflowFromGraphModal';
import { SaveWorkflowAsDialog } from 'features/workflowLibrary/components/SaveWorkflowAsDialog/SaveWorkflowAsDialog';
import type { AnimationProps } from 'framer-motion';
import { AnimatePresence, motion } from 'framer-motion';
@ -61,6 +62,7 @@ const NodeEditor = () => {
<BottomLeftPanel />
<MinimapPanel />
<SaveWorkflowAsDialog />
<LoadWorkflowFromGraphModal />
</motion.div>
)}
</AnimatePresence>

View File

@ -37,34 +37,50 @@ const NumberFieldInputComponent = (
);
const min = useMemo(() => {
let min = -NUMPY_RAND_MAX;
if (!isNil(fieldTemplate.minimum)) {
return fieldTemplate.minimum;
min = fieldTemplate.minimum;
}
if (!isNil(fieldTemplate.exclusiveMinimum)) {
return fieldTemplate.exclusiveMinimum + 0.01;
min = fieldTemplate.exclusiveMinimum + 0.01;
}
return;
return min;
}, [fieldTemplate.exclusiveMinimum, fieldTemplate.minimum]);
const max = useMemo(() => {
let max = NUMPY_RAND_MAX;
if (!isNil(fieldTemplate.maximum)) {
return fieldTemplate.maximum;
max = fieldTemplate.maximum;
}
if (!isNil(fieldTemplate.exclusiveMaximum)) {
return fieldTemplate.exclusiveMaximum - 0.01;
max = fieldTemplate.exclusiveMaximum - 0.01;
}
return;
return max;
}, [fieldTemplate.exclusiveMaximum, fieldTemplate.maximum]);
const step = useMemo(() => {
if (isNil(fieldTemplate.multipleOf)) {
return isIntegerField ? 1 : 0.1;
}
return fieldTemplate.multipleOf;
}, [fieldTemplate.multipleOf, isIntegerField]);
const fineStep = useMemo(() => {
if (isNil(fieldTemplate.multipleOf)) {
return isIntegerField ? 1 : 0.01;
}
return fieldTemplate.multipleOf;
}, [fieldTemplate.multipleOf, isIntegerField]);
return (
<CompositeNumberInput
defaultValue={fieldTemplate.default}
onChange={handleValueChanged}
value={field.value}
min={min ?? -NUMPY_RAND_MAX}
max={max ?? NUMPY_RAND_MAX}
step={isIntegerField ? 1 : 0.1}
fineStep={isIntegerField ? 1 : 0.01}
min={min}
max={max}
step={step}
fineStep={fineStep}
className="nodrag"
/>
);

View File

@ -1,7 +1,6 @@
import { Button, Flex, Image, Text } from '@invoke-ai/ui-library';
import { useAppDispatch } from 'app/store/storeHooks';
import { workflowModeChanged } from 'features/nodes/store/workflowSlice';
/** @knipignore */
import InvokeLogoSVG from 'public/assets/images/invoke-symbol-wht-lrg.svg';
import { useCallback } from 'react';
import { useTranslation } from 'react-i18next';

View File

@ -1,26 +1,18 @@
import type { RootState } from 'app/store/store';
import { fetchModelConfigWithTypeGuard } from 'features/metadata/util/modelFetchingHelpers';
import {
type CreateDenoiseMaskInvocation,
type ImageDTO,
isRefinerMainModelModelConfig,
type NonNullableGraph,
type SeamlessModeInvocation,
} from 'services/api/types';
import type { NonNullableGraph, SeamlessModeInvocation } from 'services/api/types';
import { isRefinerMainModelModelConfig } from 'services/api/types';
import {
CANVAS_OUTPUT,
INPAINT_IMAGE_RESIZE_UP,
INPAINT_CREATE_MASK,
LATENTS_TO_IMAGE,
MASK_COMBINE,
MASK_RESIZE_UP,
SDXL_CANVAS_IMAGE_TO_IMAGE_GRAPH,
SDXL_CANVAS_INPAINT_GRAPH,
SDXL_CANVAS_OUTPAINT_GRAPH,
SDXL_CANVAS_TEXT_TO_IMAGE_GRAPH,
SDXL_MODEL_LOADER,
SDXL_REFINER_DENOISE_LATENTS,
SDXL_REFINER_INPAINT_CREATE_MASK,
SDXL_REFINER_MODEL_LOADER,
SDXL_REFINER_NEGATIVE_CONDITIONING,
SDXL_REFINER_POSITIVE_CONDITIONING,
@ -33,9 +25,7 @@ export const addSDXLRefinerToGraph = async (
state: RootState,
graph: NonNullableGraph,
baseNodeId: string,
modelLoaderNodeId?: string,
canvasInitImage?: ImageDTO,
canvasMaskImage?: ImageDTO
modelLoaderNodeId?: string
): Promise<void> => {
const {
refinerModel,
@ -51,11 +41,9 @@ export const addSDXLRefinerToGraph = async (
return;
}
const { seamlessXAxis, seamlessYAxis, vaePrecision } = state.generation;
const { seamlessXAxis, seamlessYAxis } = state.generation;
const { boundingBoxScaleMethod } = state.canvas;
const fp32 = vaePrecision === 'fp32';
const isUsingScaledDimensions = ['auto', 'manual'].includes(boundingBoxScaleMethod);
const modelConfig = await fetchModelConfigWithTypeGuard(refinerModel.key, isRefinerMainModelModelConfig);
@ -214,67 +202,9 @@ export const addSDXLRefinerToGraph = async (
);
if (graph.id === SDXL_CANVAS_INPAINT_GRAPH || graph.id === SDXL_CANVAS_OUTPAINT_GRAPH) {
graph.nodes[SDXL_REFINER_INPAINT_CREATE_MASK] = {
type: 'create_denoise_mask',
id: SDXL_REFINER_INPAINT_CREATE_MASK,
is_intermediate: true,
fp32,
};
if (isUsingScaledDimensions) {
graph.edges.push({
source: {
node_id: INPAINT_IMAGE_RESIZE_UP,
field: 'image',
},
destination: {
node_id: SDXL_REFINER_INPAINT_CREATE_MASK,
field: 'image',
},
});
} else {
graph.nodes[SDXL_REFINER_INPAINT_CREATE_MASK] = {
...(graph.nodes[SDXL_REFINER_INPAINT_CREATE_MASK] as CreateDenoiseMaskInvocation),
image: canvasInitImage,
};
}
if (graph.id === SDXL_CANVAS_INPAINT_GRAPH) {
if (isUsingScaledDimensions) {
graph.edges.push({
source: {
node_id: MASK_RESIZE_UP,
field: 'image',
},
destination: {
node_id: SDXL_REFINER_INPAINT_CREATE_MASK,
field: 'mask',
},
});
} else {
graph.nodes[SDXL_REFINER_INPAINT_CREATE_MASK] = {
...(graph.nodes[SDXL_REFINER_INPAINT_CREATE_MASK] as CreateDenoiseMaskInvocation),
mask: canvasMaskImage,
};
}
}
if (graph.id === SDXL_CANVAS_OUTPAINT_GRAPH) {
graph.edges.push({
source: {
node_id: isUsingScaledDimensions ? MASK_RESIZE_UP : MASK_COMBINE,
field: 'image',
},
destination: {
node_id: SDXL_REFINER_INPAINT_CREATE_MASK,
field: 'mask',
},
});
}
graph.edges.push({
source: {
node_id: SDXL_REFINER_INPAINT_CREATE_MASK,
node_id: INPAINT_CREATE_MASK,
field: 'denoise_mask',
},
destination: {

View File

@ -17,7 +17,6 @@ import {
SDXL_CANVAS_OUTPAINT_GRAPH,
SDXL_CANVAS_TEXT_TO_IMAGE_GRAPH,
SDXL_IMAGE_TO_IMAGE_GRAPH,
SDXL_REFINER_INPAINT_CREATE_MASK,
SDXL_REFINER_SEAMLESS,
SDXL_TEXT_TO_IMAGE_GRAPH,
SEAMLESS,
@ -166,27 +165,6 @@ export const addVAEToGraph = async (
);
}
if (refinerModel) {
if (graph.id === SDXL_CANVAS_INPAINT_GRAPH || graph.id === SDXL_CANVAS_OUTPAINT_GRAPH) {
graph.edges.push({
source: {
node_id: isSeamlessEnabled
? isUsingRefiner
? SDXL_REFINER_SEAMLESS
: SEAMLESS
: isAutoVae
? modelLoaderNodeId
: VAE_LOADER,
field: 'vae',
},
destination: {
node_id: SDXL_REFINER_INPAINT_CREATE_MASK,
field: 'vae',
},
});
}
}
if (vae) {
upsertMetadata(graph, { vae });
}

View File

@ -133,7 +133,7 @@ export const buildCanvasSDXLInpaintGraph = async (
id: INPAINT_CREATE_MASK,
is_intermediate,
coherence_mode: canvasCoherenceMode,
minimum_denoise: canvasCoherenceMinDenoise,
minimum_denoise: refinerModel ? Math.max(0.2, canvasCoherenceMinDenoise) : canvasCoherenceMinDenoise,
edge_radius: canvasCoherenceEdgeSize,
},
[SDXL_DENOISE_LATENTS]: {
@ -426,14 +426,7 @@ export const buildCanvasSDXLInpaintGraph = async (
// Add Refiner if enabled
if (refinerModel) {
await addSDXLRefinerToGraph(
state,
graph,
SDXL_DENOISE_LATENTS,
modelLoaderNodeId,
canvasInitImage,
canvasMaskImage
);
await addSDXLRefinerToGraph(state, graph, SDXL_DENOISE_LATENTS, modelLoaderNodeId);
if (seamlessXAxis || seamlessYAxis) {
modelLoaderNodeId = SDXL_REFINER_SEAMLESS;
}

View File

@ -156,7 +156,7 @@ export const buildCanvasSDXLOutpaintGraph = async (
is_intermediate,
coherence_mode: canvasCoherenceMode,
edge_radius: canvasCoherenceEdgeSize,
minimum_denoise: canvasCoherenceMinDenoise,
minimum_denoise: refinerModel ? Math.max(0.2, canvasCoherenceMinDenoise) : canvasCoherenceMinDenoise,
},
[SDXL_DENOISE_LATENTS]: {
type: 'denoise_latents',
@ -582,7 +582,7 @@ export const buildCanvasSDXLOutpaintGraph = async (
// Add Refiner if enabled
if (refinerModel) {
await addSDXLRefinerToGraph(state, graph, SDXL_DENOISE_LATENTS, modelLoaderNodeId, canvasInitImage);
await addSDXLRefinerToGraph(state, graph, SDXL_DENOISE_LATENTS, modelLoaderNodeId);
if (seamlessXAxis || seamlessYAxis) {
modelLoaderNodeId = SDXL_REFINER_SEAMLESS;
}

View File

@ -0,0 +1,148 @@
import * as dagre from '@dagrejs/dagre';
import { logger } from 'app/logging/logger';
import { getStore } from 'app/store/nanostores/store';
import { NODE_WIDTH } from 'features/nodes/types/constants';
import type { FieldInputInstance } from 'features/nodes/types/field';
import type { WorkflowV3 } from 'features/nodes/types/workflow';
import { buildFieldInputInstance } from 'features/nodes/util/schema/buildFieldInputInstance';
import { t } from 'i18next';
import { forEach } from 'lodash-es';
import type { NonNullableGraph } from 'services/api/types';
import { v4 as uuidv4 } from 'uuid';
/**
* Converts a graph to a workflow. This is a best-effort conversion and may not be perfect.
* For example, if a graph references an unknown node type, that node will be skipped.
* @param graph The graph to convert to a workflow
* @param autoLayout Whether to auto-layout the nodes using `dagre`. If false, nodes will be simply stacked on top of one another with an offset.
* @returns The workflow.
*/
export const graphToWorkflow = (graph: NonNullableGraph, autoLayout = true): WorkflowV3 => {
const invocationTemplates = getStore().getState().nodes.templates;
if (!invocationTemplates) {
throw new Error(t('app.storeNotInitialized'));
}
// Initialize the workflow
const workflow: WorkflowV3 = {
name: '',
author: '',
contact: '',
description: '',
meta: {
category: 'user',
version: '3.0.0',
},
notes: '',
tags: '',
version: '',
exposedFields: [],
edges: [],
nodes: [],
};
// Convert nodes
forEach(graph.nodes, (node) => {
const template = invocationTemplates[node.type];
// Skip missing node templates - this is a best-effort
if (!template) {
logger('nodes').warn(`Node type ${node.type} not found in invocationTemplates`);
return;
}
// Build field input instances for each attr
const inputs: Record<string, FieldInputInstance> = {};
forEach(node, (value, key) => {
// Ignore the non-input keys - I think this is all of them?
if (key === 'id' || key === 'type' || key === 'is_intermediate' || key === 'use_cache') {
return;
}
const inputTemplate = template.inputs[key];
// Skip missing input templates
if (!inputTemplate) {
logger('nodes').warn(`Input ${key} not found in template for node type ${node.type}`);
return;
}
// This _should_ be all we need to do!
const inputInstance = buildFieldInputInstance(node.id, inputTemplate);
inputInstance.value = value;
inputs[key] = inputInstance;
});
workflow.nodes.push({
id: node.id,
type: 'invocation',
position: { x: 0, y: 0 }, // we'll do layout later, just need something here
data: {
id: node.id,
type: node.type,
version: template.version,
label: '',
notes: '',
isOpen: true,
isIntermediate: node.is_intermediate ?? false,
useCache: node.use_cache ?? true,
inputs,
},
});
});
forEach(graph.edges, (edge) => {
workflow.edges.push({
id: uuidv4(), // we don't have edge IDs in the graph
type: 'default',
source: edge.source.node_id,
sourceHandle: edge.source.field,
target: edge.destination.node_id,
targetHandle: edge.destination.field,
});
});
if (autoLayout) {
// Best-effort auto layout via dagre - not perfect but better than nothing
const dagreGraph = new dagre.graphlib.Graph();
// `rankdir` and `align` could be tweaked, but it's gonna be janky no matter what we choose
dagreGraph.setGraph({ rankdir: 'TB', align: 'UL' });
dagreGraph.setDefaultEdgeLabel(() => ({}));
// We don't know the dimensions of the nodes until we load the graph into `reactflow` - use a reasonable value
forEach(graph.nodes, (node) => {
const width = NODE_WIDTH;
const height = NODE_WIDTH * 1.5;
dagreGraph.setNode(node.id, { width, height });
});
graph.edges.forEach((edge) => {
dagreGraph.setEdge(edge.source.node_id, edge.destination.node_id);
});
// This does the magic
dagre.layout(dagreGraph);
// Update the workflow now that we've got the positions
workflow.nodes.forEach((node) => {
const nodeWithPosition = dagreGraph.node(node.id);
node.position = {
x: nodeWithPosition.x - nodeWithPosition.width / 2,
y: nodeWithPosition.y - nodeWithPosition.height / 2,
};
});
} else {
// Stack nodes with a 50px,50px offset from the previous ndoe
let x = 0;
let y = 0;
workflow.nodes.forEach((node) => {
node.position = { x, y };
x = x + 50;
y = y + 50;
});
}
return workflow;
};

View File

@ -1,4 +1,4 @@
import { Combobox, FormControl, FormLabel, Tooltip } from '@invoke-ai/ui-library';
import { Box, Combobox, FormControl, FormLabel, Tooltip } from '@invoke-ai/ui-library';
import { createMemoizedSelector } from 'app/store/createMemoizedSelector';
import { useAppDispatch, useAppSelector } from 'app/store/storeHooks';
import { InformationalPopover } from 'common/components/InformationalPopover/InformationalPopover';
@ -46,20 +46,22 @@ const ParamMainModelSelect = () => {
});
return (
<Tooltip label={tooltipLabel}>
<FormControl isDisabled={!modelConfigs.length} isInvalid={!value || !modelConfigs.length}>
<InformationalPopover feature="paramModel">
<FormLabel>{t('modelManager.model')}</FormLabel>
</InformationalPopover>
<Combobox
value={value}
placeholder={placeholder}
options={options}
onChange={onChange}
noOptionsMessage={noOptionsMessage}
/>
</FormControl>
</Tooltip>
<FormControl isDisabled={!modelConfigs.length} isInvalid={!value || !modelConfigs.length}>
<InformationalPopover feature="paramModel">
<FormLabel>{t('modelManager.model')}</FormLabel>
</InformationalPopover>
<Tooltip label={tooltipLabel}>
<Box w="full">
<Combobox
value={value}
placeholder={placeholder}
options={options}
onChange={onChange}
noOptionsMessage={noOptionsMessage}
/>
</Box>
</Tooltip>
</FormControl>
);
};

View File

@ -29,6 +29,7 @@ const selector = createMemoizedSelector(
const { shouldRandomizeSeed, model } = generation;
const { hrfEnabled } = hrf;
const badges: string[] = [];
const isSDXL = model?.base === 'sdxl';
if (activeTabName === 'unifiedCanvas') {
const {
@ -53,10 +54,10 @@ const selector = createMemoizedSelector(
badges.push('Manual Seed');
}
if (hrfEnabled) {
if (hrfEnabled && !isSDXL) {
badges.push('HiRes Fix');
}
return { badges, activeTabName, isSDXL: model?.base === 'sdxl' };
return { badges, activeTabName, isSDXL };
}
);

View File

@ -21,7 +21,6 @@ import {
import ScrollableContent from 'common/components/OverlayScrollbars/ScrollableContent';
import { discordLink, githubLink, websiteLink } from 'features/system/store/constants';
import { map } from 'lodash-es';
/** @knipignore */
import InvokeLogoYellow from 'public/assets/images/invoke-tag-lrg.svg';
import type { ReactElement } from 'react';
import { cloneElement, memo, useCallback } from 'react';

View File

@ -2,7 +2,6 @@
import { Image, Text, Tooltip } from '@invoke-ai/ui-library';
import { useStore } from '@nanostores/react';
import { $logo } from 'app/store/nanostores/logo';
/** @knipignore */
import InvokeLogoYellow from 'public/assets/images/invoke-symbol-ylw-lrg.svg';
import { memo, useMemo, useRef } from 'react';
import { useGetAppVersionQuery } from 'services/api/endpoints/appInfo';

View File

@ -0,0 +1,111 @@
import {
Button,
Checkbox,
Flex,
FormControl,
FormLabel,
Modal,
ModalBody,
ModalCloseButton,
ModalContent,
ModalHeader,
ModalOverlay,
Spacer,
Textarea,
} from '@invoke-ai/ui-library';
import { useStore } from '@nanostores/react';
import { useAppDispatch } from 'app/store/storeHooks';
import { workflowLoadRequested } from 'features/nodes/store/actions';
import { graphToWorkflow } from 'features/nodes/util/workflow/graphToWorkflow';
import { atom } from 'nanostores';
import type { ChangeEvent } from 'react';
import { useCallback, useState } from 'react';
import { useTranslation } from 'react-i18next';
const $isOpen = atom<boolean>(false);
export const useLoadWorkflowFromGraphModal = () => {
const isOpen = useStore($isOpen);
const onOpen = useCallback(() => {
$isOpen.set(true);
}, []);
const onClose = useCallback(() => {
$isOpen.set(false);
}, []);
return { isOpen, onOpen, onClose };
};
export const LoadWorkflowFromGraphModal = () => {
const { t } = useTranslation();
const dispatch = useAppDispatch();
const { isOpen, onClose } = useLoadWorkflowFromGraphModal();
const [graphRaw, setGraphRaw] = useState<string>('');
const [workflowRaw, setWorkflowRaw] = useState<string>('');
const [shouldAutoLayout, setShouldAutoLayout] = useState(true);
const onChangeGraphRaw = useCallback((e: ChangeEvent<HTMLTextAreaElement>) => {
setGraphRaw(e.target.value);
}, []);
const onChangeWorkflowRaw = useCallback((e: ChangeEvent<HTMLTextAreaElement>) => {
setWorkflowRaw(e.target.value);
}, []);
const onChangeShouldAutoLayout = useCallback((e: ChangeEvent<HTMLInputElement>) => {
setShouldAutoLayout(e.target.checked);
}, []);
const parse = useCallback(() => {
const graph = JSON.parse(graphRaw);
const workflow = graphToWorkflow(graph, shouldAutoLayout);
setWorkflowRaw(JSON.stringify(workflow, null, 2));
}, [graphRaw, shouldAutoLayout]);
const loadWorkflow = useCallback(() => {
const workflow = JSON.parse(workflowRaw);
dispatch(workflowLoadRequested({ workflow, asCopy: true }));
onClose();
}, [dispatch, onClose, workflowRaw]);
return (
<Modal isOpen={isOpen} onClose={onClose} isCentered>
<ModalOverlay />
<ModalContent w="80vw" h="80vh" maxW="unset" maxH="unset">
<ModalHeader>{t('workflows.loadFromGraph')}</ModalHeader>
<ModalCloseButton />
<ModalBody as={Flex} flexDir="column" gap={4} w="full" h="full" pb={4}>
<Flex gap={4}>
<Button onClick={parse} size="sm" flexShrink={0}>
{t('workflows.convertGraph')}
</Button>
<FormControl>
<FormLabel>{t('workflows.autoLayout')}</FormLabel>
<Checkbox isChecked={shouldAutoLayout} onChange={onChangeShouldAutoLayout} />
</FormControl>
<Spacer />
<Button onClick={loadWorkflow} size="sm" flexShrink={0}>
{t('workflows.loadWorkflow')}
</Button>
</Flex>
<FormControl orientation="vertical" h="50%">
<FormLabel>{t('nodes.graph')}</FormLabel>
<Textarea
h="full"
value={graphRaw}
fontFamily="monospace"
whiteSpace="pre-wrap"
overflowWrap="normal"
onChange={onChangeGraphRaw}
/>
</FormControl>
<FormControl orientation="vertical" h="50%">
<FormLabel>{t('nodes.workflow')}</FormLabel>
<Textarea
h="full"
value={workflowRaw}
fontFamily="monospace"
whiteSpace="pre-wrap"
overflowWrap="normal"
onChange={onChangeWorkflowRaw}
/>
</FormControl>
</ModalBody>
</ModalContent>
</Modal>
);
};

View File

@ -0,0 +1,18 @@
import { MenuItem } from '@invoke-ai/ui-library';
import { useLoadWorkflowFromGraphModal } from 'features/workflowLibrary/components/LoadWorkflowFromGraphModal/LoadWorkflowFromGraphModal';
import { memo } from 'react';
import { useTranslation } from 'react-i18next';
import { PiFlaskBold } from 'react-icons/pi';
const LoadWorkflowFromGraphMenuItem = () => {
const { t } = useTranslation();
const { onOpen } = useLoadWorkflowFromGraphModal();
return (
<MenuItem as="button" icon={<PiFlaskBold />} onClick={onOpen}>
{t('workflows.loadFromGraph')}
</MenuItem>
);
};
export default memo(LoadWorkflowFromGraphMenuItem);

View File

@ -6,8 +6,10 @@ import {
MenuList,
useDisclosure,
useGlobalMenuClose,
useShiftModifier,
} from '@invoke-ai/ui-library';
import DownloadWorkflowMenuItem from 'features/workflowLibrary/components/WorkflowLibraryMenu/DownloadWorkflowMenuItem';
import LoadWorkflowFromGraphMenuItem from 'features/workflowLibrary/components/WorkflowLibraryMenu/LoadWorkflowFromGraphMenuItem';
import { NewWorkflowMenuItem } from 'features/workflowLibrary/components/WorkflowLibraryMenu/NewWorkflowMenuItem';
import SaveWorkflowAsMenuItem from 'features/workflowLibrary/components/WorkflowLibraryMenu/SaveWorkflowAsMenuItem';
import SaveWorkflowMenuItem from 'features/workflowLibrary/components/WorkflowLibraryMenu/SaveWorkflowMenuItem';
@ -20,6 +22,7 @@ import { PiDotsThreeOutlineFill } from 'react-icons/pi';
const WorkflowLibraryMenu = () => {
const { t } = useTranslation();
const { isOpen, onOpen, onClose } = useDisclosure();
const shift = useShiftModifier();
useGlobalMenuClose(onClose);
return (
<Menu isOpen={isOpen} onOpen={onOpen} onClose={onClose}>
@ -38,6 +41,8 @@ const WorkflowLibraryMenu = () => {
<DownloadWorkflowMenuItem />
<MenuDivider />
<SettingsMenuItem />
{shift && <MenuDivider />}
{shift && <LoadWorkflowFromGraphMenuItem />}
</MenuList>
</Menu>
);

File diff suppressed because one or more lines are too long

View File

@ -134,7 +134,6 @@ export type CollectInvocation = S['CollectInvocation'];
export type ImageResizeInvocation = S['ImageResizeInvocation'];
export type InfillPatchMatchInvocation = S['InfillPatchMatchInvocation'];
export type InfillTileInvocation = S['InfillTileInvocation'];
export type CreateDenoiseMaskInvocation = S['CreateDenoiseMaskInvocation'];
export type CreateGradientMaskInvocation = S['CreateGradientMaskInvocation'];
export type CanvasPasteBackInvocation = S['CanvasPasteBackInvocation'];
export type NoiseInvocation = S['NoiseInvocation'];

View File

@ -27,6 +27,7 @@ from invokeai.app.invocations.fields import (
OutputField,
UIComponent,
UIType,
WithBoard,
WithMetadata,
WithWorkflow,
)
@ -75,7 +76,6 @@ from invokeai.backend.stable_diffusion.diffusers_pipeline import PipelineInterme
from invokeai.backend.stable_diffusion.diffusion.conditioning_data import (
BasicConditioningInfo,
ConditioningFieldData,
ExtraConditioningInfo,
SDXLConditioningInfo,
)
from invokeai.backend.util.devices import CPU_DEVICE, CUDA_DEVICE, MPS_DEVICE, choose_precision, choose_torch_device
@ -105,6 +105,7 @@ __all__ = [
"OutputField",
"UIComponent",
"UIType",
"WithBoard",
"WithMetadata",
"WithWorkflow",
# invokeai.app.invocations.latent
@ -149,7 +150,6 @@ __all__ = [
# invokeai.backend.stable_diffusion.diffusion.conditioning_data
"BasicConditioningInfo",
"ConditioningFieldData",
"ExtraConditioningInfo",
"SDXLConditioningInfo",
# invokeai.backend.stable_diffusion.diffusers_pipeline
"PipelineIntermediateState",

View File

@ -1 +1 @@
__version__ = "4.0.3"
__version__ = "4.0.4"

View File

@ -33,7 +33,7 @@ classifiers = [
]
dependencies = [
# Core generation dependencies, pinned for reproducible builds.
"accelerate==0.28.0",
"accelerate==0.29.2",
"clip_anytorch==2.5.2", # replacing "clip @ https://github.com/openai/CLIP/archive/eaa22acb90a5876642d0507623e859909230a52d.zip",
"compel==2.0.2",
"controlnet-aux==0.0.7",
@ -47,16 +47,16 @@ dependencies = [
"pytorch-lightning==2.1.3",
"safetensors==0.4.2",
"timm==0.6.13", # needed to override timm latest in controlnet_aux, see https://github.com/isl-org/ZoeDepth/issues/26
"torch==2.2.1",
"torch==2.2.2",
"torchmetrics==0.11.4",
"torchsde==0.2.6",
"torchvision==0.17.1",
"transformers==4.39.1",
"torchvision==0.17.2",
"transformers==4.39.3",
# Core application dependencies, pinned for reproducible builds.
"fastapi-events==0.11.0",
"fastapi==0.110.0",
"huggingface-hub==0.21.4",
"huggingface-hub==0.22.2",
"pydantic-settings==2.2.1",
"pydantic==2.6.3",
"python-socketio==5.11.1",
@ -96,7 +96,7 @@ dependencies = [
[project.optional-dependencies]
"xformers" = [
# Core generation dependencies, pinned for reproducible builds.
"xformers==0.0.25; sys_platform!='darwin'",
"xformers==0.0.25post1; sys_platform!='darwin'",
# Auxiliary dependencies, pinned only if necessary.
"triton; sys_platform=='linux'",
]
@ -256,7 +256,6 @@ module = [
"invokeai.backend.model_management.seamless",
"invokeai.backend.model_management.util",
"invokeai.backend.stable_diffusion.diffusers_pipeline",
"invokeai.backend.stable_diffusion.diffusion.cross_attention_control",
"invokeai.backend.stable_diffusion.diffusion.shared_invokeai_diffusion",
"invokeai.backend.util.hotfixes",
"invokeai.backend.util.mps_fixes",

View File

@ -3,6 +3,7 @@ Test the model installer
"""
import platform
import time
import uuid
from pathlib import Path
from typing import Any, Dict
@ -255,6 +256,10 @@ def test_huggingface_download(mm2_installer: ModelInstallServiceBase, mm2_app_co
assert isinstance(bus, EventServiceBase)
assert store is not None
# test hypothesis that there is a race condition between
# creating temporary directory and downloading into it.
time.sleep(2)
job = mm2_installer.import_model(source)
job_list = mm2_installer.wait_for_installs(timeout=10)
assert len(job_list) == 1

View File

@ -1,8 +1,8 @@
import pytest
import torch
from invokeai.backend.ip_adapter.unet_patcher import UNetPatcher
from invokeai.backend.model_manager import BaseModelType, ModelType, SubModelType
from invokeai.backend.stable_diffusion.diffusion.unet_attention_patcher import UNetAttentionPatcher
from invokeai.backend.util.test_utils import install_and_load_model
@ -77,7 +77,7 @@ def test_ip_adapter_unet_patch(model_params, model_installer, torch_device):
ip_embeds = torch.randn((1, 3, 4, 768)).to(torch_device)
cross_attention_kwargs = {"ip_adapter_image_prompt_embeds": [ip_embeds]}
ip_adapter_unet_patcher = UNetPatcher([ip_adapter])
ip_adapter_unet_patcher = UNetAttentionPatcher([ip_adapter])
with ip_adapter_unet_patcher.apply_ip_adapter_attention(unet):
output = unet(**dummy_unet_input, cross_attention_kwargs=cross_attention_kwargs).sample

Some files were not shown because too many files have changed in this diff Show More