mirror of
https://github.com/invoke-ai/InvokeAI
synced 2024-08-30 20:32:17 +00:00
wip updating docs
This commit is contained in:
parent
93a24445dc
commit
5e7ed964d2
@ -239,28 +239,24 @@ Generate an image with a given prompt, record the seed of the image, and then
|
||||
use the `prompt2prompt` syntax to substitute words in the original prompt for
|
||||
words in a new prompt. This works for `img2img` as well.
|
||||
|
||||
- `a ("fluffy cat").swap("smiling dog") eating a hotdog`.
|
||||
- quotes optional: `a (fluffy cat).swap(smiling dog) eating a hotdog`.
|
||||
- for single word substitutions parentheses are also optional:
|
||||
`a cat.swap(dog) eating a hotdog`.
|
||||
- Supports options `s_start`, `s_end`, `t_start`, `t_end` (each 0-1) loosely
|
||||
corresponding to bloc97's `prompt_edit_spatial_start/_end` and
|
||||
`prompt_edit_tokens_start/_end` but with the math swapped to make it easier to
|
||||
intuitively understand.
|
||||
- Example usage:`a (cat).swap(dog, s_end=0.3) eating a hotdog` - the `s_end`
|
||||
argument means that the "spatial" (self-attention) edit will stop having any
|
||||
effect after 30% (=0.3) of the steps have been done, leaving Stable
|
||||
Diffusion with 70% of the steps where it is free to decide for itself how to
|
||||
reshape the cat-form into a dog form.
|
||||
- The numbers represent a percentage through the step sequence where the edits
|
||||
should happen. 0 means the start (noisy starting image), 1 is the end (final
|
||||
image).
|
||||
- For img2img, the step sequence does not start at 0 but instead at
|
||||
(1-strength) - so if strength is 0.7, s_start and s_end must both be
|
||||
greater than 0.3 (1-0.7) to have any effect.
|
||||
- Convenience option `shape_freedom` (0-1) to specify how much "freedom" Stable
|
||||
Diffusion should have to change the shape of the subject being swapped.
|
||||
- `a (cat).swap(dog, shape_freedom=0.5) eating a hotdog`.
|
||||
For example, consider the prompt `a cat.swap(dog) playing with a ball in the forest`. Normally, because of the word words interact with each other when doing a stable diffusion image generation, these two prompts would generate different compositions:
|
||||
- `a cat playing with a ball in the forest`
|
||||
- `a dog playing with a ball in the forest`
|
||||
|
||||
| `a cat playing with a ball in the forest` | `a dog playing with a ball in the forest` |
|
||||
| --- | --- |
|
||||
| img | img |
|
||||
|
||||
|
||||
- For multiple word swaps, use parentheses: `a (fluffy cat).swap(barking dog) playing with a ball in the forest`.
|
||||
- To swap a comma, use quotes: `a ("fluffy, grey cat").swap("big, barking dog") playing with a ball in the forest`.
|
||||
- Supports options `t_start` and `t_end` (each 0-1) loosely corresponding to bloc97's `prompt_edit_tokens_start/_end` but with the math swapped to make it easier to
|
||||
intuitively understand. `t_start` and `t_end` are used to control on which steps cross-attention control should run. With the default values `t_start=0` and `t_end=1`, cross-attention control is active on every step of image generation. Other values can be used to turn cross-attention control off for part of the image generation process.
|
||||
- For example, if doing a diffusion with 10 steps for the prompt is `a cat.swap(dog, t_start=0.3, t_end=1.0) playing with a ball in the forest`, the first 3 steps will be run as `a cat playing with a ball in the forest`, while the last 7 steps will run as `a dog playing with a ball in the forest`, but the pixels that represent `dog` will be locked to the pixels that would have represented `cat` if the `cat` prompt had been used instead.
|
||||
- Conversely, for `a cat.swap(dog, t_start=0, t_end=0.7) playing with a ball in the forest`, the first 7 steps will run as `a dog playing with a ball in the forest` with the pixels that represent `dog` locked to the same pixels that would have represented `cat` if the `cat` prompt was being used instead. The final 3 steps will just run `a cat playing with a ball in the forest`.
|
||||
> For img2img, the step sequence does not start at 0 but instead at `(1.0-strength)` - so if the img2img `strength` is `0.7`, `t_start` and `t_end` must both be greater than `0.3` (`1.0-0.7`) to have any effect.
|
||||
|
||||
Prompt2prompt `.swap()` is not compatible with xformers, which will be temporarily disabled when doing a `.swap()` - so you should expect to use more VRAM and run slower that with xformers enabled.
|
||||
|
||||
The `prompt2prompt` code is based off
|
||||
[bloc97's colab](https://github.com/bloc97/CrossAttentionControl).
|
||||
|
@ -594,12 +594,12 @@ class SlicedSwapCrossAttnProcesser(SlicedAttnProcessor):
|
||||
modified_text_embeddings = swap_cross_attn_context.modified_text_embeddings
|
||||
original_text_key = attn.to_k(original_text_embeddings)
|
||||
modified_text_key = attn.to_k(modified_text_embeddings)
|
||||
#original_value = attn.to_v(original_text_embeddings)
|
||||
original_value = attn.to_v(original_text_embeddings)
|
||||
modified_value = attn.to_v(modified_text_embeddings)
|
||||
|
||||
original_text_key = attn.head_to_batch_dim(original_text_key)
|
||||
modified_text_key = attn.head_to_batch_dim(modified_text_key)
|
||||
#original_value = attn.head_to_batch_dim(original_value)
|
||||
original_value = attn.head_to_batch_dim(original_value)
|
||||
modified_value = attn.head_to_batch_dim(modified_value)
|
||||
|
||||
# compute slices and prepare output tensor
|
||||
@ -636,7 +636,7 @@ class SlicedSwapCrossAttnProcesser(SlicedAttnProcessor):
|
||||
|
||||
del remapped_original_attn_slice, modified_attn_slice
|
||||
|
||||
attn_slice = torch.bmm(attn_slice, modified_value[start_idx:end_idx])
|
||||
attn_slice = torch.bmm(attn_slice, original_value[start_idx:end_idx])
|
||||
hidden_states[start_idx:end_idx] = attn_slice
|
||||
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user