mirror of
https://github.com/invoke-ai/InvokeAI
synced 2024-08-30 20:32:17 +00:00
305 lines
17 KiB
Markdown
305 lines
17 KiB
Markdown
---
|
|
title: Prompting-Features
|
|
---
|
|
|
|
# :octicons-command-palette-24: Prompting-Features
|
|
|
|
## **Negative and Unconditioned Prompts**
|
|
|
|
Any words between a pair of square brackets will instruct Stable
|
|
Diffusion to attempt to ban the concept from the generated image. The
|
|
same effect is achieved by placing words in the "Negative Prompts"
|
|
textbox in the Web UI.
|
|
|
|
```text
|
|
this is a test prompt [not really] to make you understand [cool] how this works.
|
|
```
|
|
|
|
In the above statement, the words 'not really cool` will be ignored by Stable
|
|
Diffusion.
|
|
|
|
Here's a prompt that depicts what it does.
|
|
|
|
original prompt:
|
|
|
|
`#!bash "A fantastical translucent pony made of water and foam, ethereal, radiant, hyperalism, scottish folklore, digital painting, artstation, concept art, smooth, 8 k frostbite 3 engine, ultra detailed, art by artgerm and greg rutkowski and magali villeneuve"`
|
|
|
|
`#!bash parameters: steps=20, dimensions=512x768, CFG=7.5, Scheduler=k_euler_a, seed=1654590180`
|
|
|
|
<figure markdown>
|
|
|
|
data:image/s3,"s3://crabby-images/7e413/7e413a99acf797c0f2b91a3487fd6d62c9b6bbac" alt="step1"
|
|
|
|
</figure>
|
|
|
|
That image has a woman, so if we want the horse without a rider, we can
|
|
influence the image not to have a woman by putting [woman] in the prompt, like
|
|
this:
|
|
|
|
`#!bash "A fantastical translucent poney made of water and foam, ethereal, radiant, hyperalism, scottish folklore, digital painting, artstation, concept art, smooth, 8 k frostbite 3 engine, ultra detailed, art by artgerm and greg rutkowski and magali villeneuve [woman]"`
|
|
(same parameters as above)
|
|
|
|
<figure markdown>
|
|
|
|
data:image/s3,"s3://crabby-images/52079/520791bf89f982fd5154681a085bdec484895bd7" alt="step2"
|
|
|
|
</figure>
|
|
|
|
That's nice - but say we also don't want the image to be quite so blue. We can
|
|
add "blue" to the list of negative prompts, so it's now [woman blue]:
|
|
|
|
`#!bash "A fantastical translucent poney made of water and foam, ethereal, radiant, hyperalism, scottish folklore, digital painting, artstation, concept art, smooth, 8 k frostbite 3 engine, ultra detailed, art by artgerm and greg rutkowski and magali villeneuve [woman blue]"`
|
|
(same parameters as above)
|
|
|
|
<figure markdown>
|
|
|
|
data:image/s3,"s3://crabby-images/f9cab/f9cab3c89c19a528a7802115f3741c2c579a9adb" alt="step3"
|
|
|
|
</figure>
|
|
|
|
Getting close - but there's no sense in having a saddle when our horse doesn't
|
|
have a rider, so we'll add one more negative prompt: [woman blue saddle].
|
|
|
|
`#!bash "A fantastical translucent poney made of water and foam, ethereal, radiant, hyperalism, scottish folklore, digital painting, artstation, concept art, smooth, 8 k frostbite 3 engine, ultra detailed, art by artgerm and greg rutkowski and magali villeneuve [woman blue saddle]"`
|
|
(same parameters as above)
|
|
|
|
<figure markdown>
|
|
|
|
data:image/s3,"s3://crabby-images/0df9a/0df9a502fe6f0cdd85550d8d7ffa2d89cdfb084b" alt="step4"
|
|
|
|
</figure>
|
|
|
|
!!! notes "Notes about this feature:"
|
|
|
|
* The only requirement for words to be ignored is that they are in between a pair of square brackets.
|
|
* You can provide multiple words within the same bracket.
|
|
* You can provide multiple brackets with multiple words in different places of your prompt. That works just fine.
|
|
* To improve typical anatomy problems, you can add negative prompts like `[bad anatomy, extra legs, extra arms, extra fingers, poorly drawn hands, poorly drawn feet, disfigured, out of frame, tiling, bad art, deformed, mutated]`.
|
|
|
|
---
|
|
|
|
## **Prompt Syntax Features**
|
|
|
|
The InvokeAI prompting language has the following features:
|
|
|
|
### Attention weighting
|
|
|
|
Append a word or phrase with `-` or `+`, or a weight between `0` and `2`
|
|
(`1`=default), to decrease or increase "attention" (= a mix of per-token CFG
|
|
weighting multiplier and, for `-`, a weighted blend with the prompt without the
|
|
term).
|
|
|
|
The following syntax is recognised:
|
|
|
|
- single words without parentheses: `a tall thin man picking apricots+`
|
|
- single or multiple words with parentheses:
|
|
`a tall thin man picking (apricots)+` `a tall thin man picking (apricots)-`
|
|
`a tall thin man (picking apricots)+` `a tall thin man (picking apricots)-`
|
|
- more effect with more symbols `a tall thin man (picking apricots)++`
|
|
- nesting `a tall thin man (picking apricots+)++` (`apricots` effectively gets
|
|
`+++`)
|
|
- all of the above with explicit numbers `a tall thin man picking (apricots)1.1`
|
|
`a tall thin man (picking (apricots)1.3)1.1`. (`+` is equivalent to 1.1, `++`
|
|
is pow(1.1,2), `+++` is pow(1.1,3), etc; `-` means 0.9, `--` means pow(0.9,2),
|
|
etc.)
|
|
- attention also applies to `[unconditioning]` so
|
|
`a tall thin man picking apricots [(ladder)0.01]` will _very gently_ nudge SD
|
|
away from trying to draw the man on a ladder
|
|
|
|
You can use this to increase or decrease the amount of something. Starting from
|
|
this prompt of `a man picking apricots from a tree`, let's see what happens if
|
|
we increase and decrease how much attention we want Stable Diffusion to pay to
|
|
the word `apricots`:
|
|
|
|
<figure markdown>
|
|
|
|
data:image/s3,"s3://crabby-images/4950b/4950b6a833d40ebbdd4610dab765ae79b3043246" alt="an AI generated image of a man picking apricots from a tree"
|
|
|
|
</figure>
|
|
|
|
Using `-` to reduce apricot-ness:
|
|
|
|
| `a man picking apricots- from a tree` | `a man picking apricots-- from a tree` | `a man picking apricots--- from a tree` |
|
|
| ------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ |
|
|
| data:image/s3,"s3://crabby-images/cecda/cecdafc9304319b839721779956d79be59a0bdca" alt="an AI generated image of a man picking apricots from a tree, with smaller apricots" | data:image/s3,"s3://crabby-images/41562/41562f76c43d1e5177068e1b45ee617e136001e1" alt="an AI generated image of a man picking apricots from a tree, with even smaller and fewer apricots" | data:image/s3,"s3://crabby-images/9c221/9c221e0f00e62cd79d189bf3e616c0a1238425cf" alt="an AI generated image of a man picking apricots from a tree, with very few very small apricots" |
|
|
|
|
Using `+` to increase apricot-ness:
|
|
|
|
| `a man picking apricots+ from a tree` | `a man picking apricots++ from a tree` | `a man picking apricots+++ from a tree` | `a man picking apricots++++ from a tree` | `a man picking apricots+++++ from a tree` |
|
|
| ------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
|
| data:image/s3,"s3://crabby-images/a9df5/a9df533404f14b309abd8149ddddfb12405c54ab" alt="an AI generated image of a man picking apricots from a tree, with larger, more vibrant apricots" | data:image/s3,"s3://crabby-images/1eb8a/1eb8a5ac4fc0d1662bcb38bfa24f5e65518bfe26" alt="an AI generated image of a man picking apricots from a tree with even larger, even more vibrant apricots" | data:image/s3,"s3://crabby-images/2e255/2e255bd0d75fe6491daff94291ac43bfbfe3f83c" alt="an AI generated image of a man picking apricots from a tree, but the man has been replaced by a pile of apricots" | data:image/s3,"s3://crabby-images/d8103/d8103091e4e36c848459c10b70312fb48eb3e827" alt="an AI generated image of a man picking apricots from a tree, but the man has been replaced by a mound of giant melting-looking apricots" | data:image/s3,"s3://crabby-images/814d1/814d198b11c837e785d78a4deaabc3d3cd849631" alt="an AI generated image of a man picking apricots from a tree, but the man and the leaves and parts of the ground have all been replaced by giant melting-looking apricots" |
|
|
|
|
You can also change the balance between different parts of a prompt. For
|
|
example, below is a `mountain man`:
|
|
|
|
<figure markdown>
|
|
|
|
data:image/s3,"s3://crabby-images/61991/619916f6ed93f3b3f96a86546ae0fd0ac55ac4e6" alt="an AI generated image of a mountain man"
|
|
|
|
</figure>
|
|
|
|
And here he is with more mountain:
|
|
|
|
| `mountain+ man` | `mountain++ man` | `mountain+++ man` |
|
|
| ---------------------------------------------- | ---------------------------------------------- | ---------------------------------------------- |
|
|
| data:image/s3,"s3://crabby-images/70176/70176a7305088c98a28fb7977990a1e3fe0aae3b" alt="" | data:image/s3,"s3://crabby-images/cdcaa/cdcaacd54e0c646c4408fdb76fe8cd92e211cbc7" alt="" | data:image/s3,"s3://crabby-images/5bcf8/5bcf832abbfe7f82bb75aab9006005428d11a6bc" alt="" |
|
|
|
|
Or, alternatively, with more man:
|
|
|
|
| `mountain man+` | `mountain man++` | `mountain man+++` | `mountain man++++` |
|
|
| ---------------------------------------------- | ---------------------------------------------- | ---------------------------------------------- | ---------------------------------------------- |
|
|
| data:image/s3,"s3://crabby-images/d072a/d072acf3e8599bf3aeb799425b39d2c22c4dc4ae" alt="" | data:image/s3,"s3://crabby-images/9b56c/9b56ca5eae6fc5642eedf97aa268505133f26ff8" alt="" | data:image/s3,"s3://crabby-images/38617/386173a2140b562de8a532cc5b343e831c9f737f" alt="" | data:image/s3,"s3://crabby-images/f48b4/f48b4e183eb499b65ea1530a71b195dbd04550eb" alt="" |
|
|
|
|
### Blending between prompts
|
|
|
|
- `("a tall thin man picking apricots", "a tall thin man picking pears").blend(1,1)`
|
|
- The existing prompt blending using `:<weight>` will continue to be supported -
|
|
`("a tall thin man picking apricots", "a tall thin man picking pears").blend(1,1)`
|
|
is equivalent to
|
|
`a tall thin man picking apricots:1 a tall thin man picking pears:1` in the
|
|
old syntax.
|
|
- Attention weights can be nested inside blends.
|
|
- Non-normalized blends are supported by passing `no_normalize` as an additional
|
|
argument to the blend weights, eg
|
|
`("a tall thin man picking apricots", "a tall thin man picking pears").blend(1,-1,no_normalize)`.
|
|
very fun to explore local maxima in the feature space, but also easy to
|
|
produce garbage output.
|
|
|
|
See the section below on "Prompt Blending" for more information about how this
|
|
works.
|
|
|
|
### Cross-Attention Control ('prompt2prompt')
|
|
|
|
Sometimes an image you generate is almost right, and you just want to change one
|
|
detail without affecting the rest. You could use a photo editor and inpainting
|
|
to overpaint the area, but that's a pain. Here's where `prompt2prompt` comes in
|
|
handy.
|
|
|
|
Generate an image with a given prompt, record the seed of the image, and then
|
|
use the `prompt2prompt` syntax to substitute words in the original prompt for
|
|
words in a new prompt. This works for `img2img` as well.
|
|
|
|
For example, consider the prompt `a cat.swap(dog) playing with a ball in the forest`. Normally, because of the word words interact with each other when doing a stable diffusion image generation, these two prompts would generate different compositions:
|
|
- `a cat playing with a ball in the forest`
|
|
- `a dog playing with a ball in the forest`
|
|
|
|
| `a cat playing with a ball in the forest` | `a dog playing with a ball in the forest` |
|
|
| --- | --- |
|
|
| img | img |
|
|
|
|
|
|
- For multiple word swaps, use parentheses: `a (fluffy cat).swap(barking dog) playing with a ball in the forest`.
|
|
- To swap a comma, use quotes: `a ("fluffy, grey cat").swap("big, barking dog") playing with a ball in the forest`.
|
|
- Supports options `t_start` and `t_end` (each 0-1) loosely corresponding to bloc97's `prompt_edit_tokens_start/_end` but with the math swapped to make it easier to
|
|
intuitively understand. `t_start` and `t_end` are used to control on which steps cross-attention control should run. With the default values `t_start=0` and `t_end=1`, cross-attention control is active on every step of image generation. Other values can be used to turn cross-attention control off for part of the image generation process.
|
|
- For example, if doing a diffusion with 10 steps for the prompt is `a cat.swap(dog, t_start=0.3, t_end=1.0) playing with a ball in the forest`, the first 3 steps will be run as `a cat playing with a ball in the forest`, while the last 7 steps will run as `a dog playing with a ball in the forest`, but the pixels that represent `dog` will be locked to the pixels that would have represented `cat` if the `cat` prompt had been used instead.
|
|
- Conversely, for `a cat.swap(dog, t_start=0, t_end=0.7) playing with a ball in the forest`, the first 7 steps will run as `a dog playing with a ball in the forest` with the pixels that represent `dog` locked to the same pixels that would have represented `cat` if the `cat` prompt was being used instead. The final 3 steps will just run `a cat playing with a ball in the forest`.
|
|
> For img2img, the step sequence does not start at 0 but instead at `(1.0-strength)` - so if the img2img `strength` is `0.7`, `t_start` and `t_end` must both be greater than `0.3` (`1.0-0.7`) to have any effect.
|
|
|
|
Prompt2prompt `.swap()` is not compatible with xformers, which will be temporarily disabled when doing a `.swap()` - so you should expect to use more VRAM and run slower that with xformers enabled.
|
|
|
|
The `prompt2prompt` code is based off
|
|
[bloc97's colab](https://github.com/bloc97/CrossAttentionControl).
|
|
|
|
### Escaping parantheses () and speech marks ""
|
|
|
|
If the model you are using has parentheses () or speech marks "" as part of its
|
|
syntax, you will need to "escape" these using a backslash, so that`(my_keyword)`
|
|
becomes `\(my_keyword\)`. Otherwise, the prompt parser will attempt to interpret
|
|
the parentheses as part of the prompt syntax and it will get confused.
|
|
|
|
---
|
|
|
|
## **Prompt Blending**
|
|
|
|
You may blend together different sections of the prompt to explore the AI's
|
|
latent semantic space and generate interesting (and often surprising!)
|
|
variations. The syntax is:
|
|
|
|
```bash
|
|
blue sphere:0.25 red cube:0.75 hybrid
|
|
```
|
|
|
|
This will tell the sampler to blend 25% of the concept of a blue sphere with 75%
|
|
of the concept of a red cube. The blend weights can use any combination of
|
|
integers and floating point numbers, and they do not need to add up to 1.
|
|
Everything to the left of the `:XX` up to the previous `:XX` is used for
|
|
merging, so the overall effect is:
|
|
|
|
```bash
|
|
0.25 * "blue sphere" + 0.75 * "white duck" + hybrid
|
|
```
|
|
|
|
Because you are exploring the "mind" of the AI, the AI's way of mixing two
|
|
concepts may not match yours, leading to surprising effects. To illustrate, here
|
|
are three images generated using various combinations of blend weights. As
|
|
usual, unless you fix the seed, the prompts will give you different results each
|
|
time you run them.
|
|
|
|
<figure markdown>
|
|
|
|
### "blue sphere, red cube, hybrid"
|
|
|
|
</figure>
|
|
|
|
This example doesn't use melding at all and represents the default way of mixing
|
|
concepts.
|
|
|
|
<figure markdown>
|
|
|
|
data:image/s3,"s3://crabby-images/9fcfc/9fcfc167200942a41fc7761a749ba50b2994b2b3" alt="blue-sphere-red-cube-hyprid"
|
|
|
|
</figure>
|
|
|
|
It's interesting to see how the AI expressed the concept of "cube" as the four
|
|
quadrants of the enclosing frame. If you look closely, there is depth there, so
|
|
the enclosing frame is actually a cube.
|
|
|
|
<figure markdown>
|
|
|
|
### "blue sphere:0.25 red cube:0.75 hybrid"
|
|
|
|
data:image/s3,"s3://crabby-images/aa338/aa338670c6995ce28a7c61f6df9c37228e392d38" alt="blue-sphere-25-red-cube-75"
|
|
|
|
</figure>
|
|
|
|
Now that's interesting. We get neither a blue sphere nor a red cube, but a red
|
|
sphere embedded in a brick wall, which represents a melding of concepts within
|
|
the AI's "latent space" of semantic representations. Where is Ludwig
|
|
Wittgenstein when you need him?
|
|
|
|
<figure markdown>
|
|
|
|
### "blue sphere:0.75 red cube:0.25 hybrid"
|
|
|
|
data:image/s3,"s3://crabby-images/067ed/067edb73cec52bb248b94306f41b0681ff297495" alt="blue-sphere-75-red-cube-25"
|
|
|
|
</figure>
|
|
|
|
Definitely more blue-spherey. The cube is gone entirely, but it's really cool
|
|
abstract art.
|
|
|
|
<figure markdown>
|
|
|
|
### "blue sphere:0.5 red cube:0.5 hybrid"
|
|
|
|
data:image/s3,"s3://crabby-images/27c08/27c0865f751864d60e567299ba67f780beaa7cab" alt="blue-sphere-5-red-cube-5-hybrid"
|
|
|
|
</figure>
|
|
|
|
Whoa...! I see blue and red, but no spheres or cubes. Is the word "hybrid"
|
|
summoning up the concept of some sort of scifi creature? Let's find out.
|
|
|
|
<figure markdown>
|
|
|
|
### "blue sphere:0.5 red cube:0.5"
|
|
|
|
data:image/s3,"s3://crabby-images/0b05d/0b05d5c6a65a8dd803cf1a5a5506b652c089cf0a" alt="blue-sphere-5-red-cube-5"
|
|
|
|
</figure>
|
|
|
|
Indeed, removing the word "hybrid" produces an image that is more like what we'd
|
|
expect.
|
|
|
|
In conclusion, prompt blending is great for exploring creative space,
|
|
but takes some trial and error to achieve the desired effect. |