## What type of PR is this? (check all applicable)
- [x] Refactor
- [x] Feature
- [x] Bug Fix
- [?] Optimization
## Have you discussed this change with the InvokeAI team?
- [x] No
## Description
- Fixed filter type select using `images` instead of `all` -- Probably
some merge conflict.
- Added loading state for model lists. Handy when the model list takes
longer than a second for any reason to fetch. Better to show this than
an empty screen.
- Refactored the render model list function so we modify the display
component in one area rather than have repeated code.
### Other Issues
- The list can get a bit laggy on initial load when you have a hundreds
of models / loras. This needs to be fixed. Will make another PR for
this.
## What type of PR is this? (check all applicable)
- [x] Refactor
## Have you discussed this change with the InvokeAI team?
- [ ] Yes
- [x] No, because: trivial
## Description
Adds a few obviously missing `Optional` on fields that default to
`None`.
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [ ] Feature
- [ ] Bug Fix
- [ ] Optimization
- [X] Documentation Update
- [ ] Community Node Submission
## Have you discussed this change with the InvokeAI team?
- [ ] Yes
- [X] No, because: Just a documentation update
## Have you updated all relevant documentation?
- [X] Yes
- [ ] No
## Description
Updated documentation with a getting started guide & a glossary of terms
needed to get started
Updated the landing page flow for users
<img width="1430" alt="Screenshot 2023-07-27 at 9 53 25 PM"
src="https://github.com/invoke-ai/InvokeAI/assets/7254508/d0006ba7-2ed4-4044-a1bc-ca9a99df9397">
## Related Tickets & Documents
<!--
For pull requests that relate or
close an issue, please include them
below.
For example having the text: "closes #1234" would connect the current
pull
request to issue 1234. And when we merge the pull request, Github will
automatically close the issue.
-->
- Related Issue #
- Closes #
## QA Instructions, Screenshots, Recordings
<!--
Please provide steps on how to test changes, any hardware or
software specifications as well as any other pertinent information.
-->
## Added/updated tests?
- [ ] Yes
- [ ] No : _please replace this line with details on why tests
have not been included_
## [optional] Are there any post deployment tasks we need to perform?
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [ ] Feature
- [ ] Bug Fix
- [x] Optimization
- [ ] Documentation Update
- [ ] Community Node Submission
## Have you discussed this change with the InvokeAI team?
- [ ] Yes
- [ ] No, because:
## Have you updated all relevant documentation?
- [ ] Yes
- [ ] No
## Description
## Related Tickets & Documents
<!--
For pull requests that relate or close an issue, please include them
below.
For example having the text: "closes #1234" would connect the current
pull
request to issue 1234. And when we merge the pull request, Github will
automatically close the issue.
-->
- Related Issue #
- Closes #
## QA Instructions, Screenshots, Recordings
<!--
Please provide steps on how to test changes, any hardware or
software specifications as well as any other pertinent information.
-->
## Added/updated tests?
- [ ] Yes
- [ ] No : _please replace this line with details on why tests
have not been included_
## [optional] Are there any post deployment tasks we need to perform?
This is a relatively stable release that corrects the urgent windows
install and model manager problems in 3.0.1. It still has two known
bugs:
1. Many inpainting models are not loading correctly.
2. The merge script is failing to start.
- Remove FaceMask and add link FaceTools repository, which includes FaceMask, FaceOff, and FacePlace
- Move Ideal Size up from under the template entry
## What type of PR is this? (check all applicable)
- [ X] Bug Fix
## Have you discussed this change with the InvokeAI team?
- [X] Yes - bug discovered by jpphoto
- [ ] No, because:
## Have you updated all relevant documentation?
- [ ] Yes
- [ X] Not needed
## Description
The user can customize the location of the models directory by setting
configuration variable `models_dir`. However, the model manager and the
TUI installer were all treating model relative paths as relative to the
invokeai root rather than the designated models directory. This has been
fixed by changing path resolution calls from using `config.root_path` to
`config.models_path`
Unfortunately there were many instances that needed replacement, so this
needs a bit of functional testing - try adding models, removing models,
renaming them, converting checkpoints, etc.
## What type of PR is this? (check all applicable)
- [ X] Optimization
## Have you discussed this change with the InvokeAI team?
- [X ] Yes
- [ ] No, because:
## Have you updated all relevant documentation?
- [X ] Yes
- [ ] No
## Description
This PR does two things:
1. if the environment variable INVOKEAI_ROOT is defined at install time,
the zipfile installer will default to its value when asking the user
where to install the software
2. If the user has more than 72 models of any type installed, then the
list will be truncated in the TUI and the user given a warning. Anything
larger than this number of models causes the vertical space to overflow.
The only effect of truncation is that the user will not be able to see
and delete the models that were truncated. The message advises the user
to go to the Web Model Manager interface in this event.
## What type of PR is this? (check all applicable)
- [X ] Bug Fix
## Have you discussed this change with the InvokeAI team?
- [ X] Yes
- [ ] No, because:
## Have you updated all relevant documentation?
- [ ] Yes
- [ X] No
## Description
This PR fixes several issues with the 3.0.0 conversion script:
- Handles checkpoint variants that don't put dots between fields in the
long state dict key names
- Handles ema, non-ema, pruned and non-pruned ckpts
- Does not add safety checker to converted checkpoints
- Respects precision of original checkpoint file
## What type of PR is this? (check all applicable)
- [ X] Bug Fix
## Have you discussed this change with the InvokeAI team?
- [ X] Yes
- [ ] No, because:
## Have you updated all relevant documentation?
- [ ] Yes
- [X] Not needed
## Description
Windows users have been getting a lot of OSErrors while installing 3.0.1
during the pip dependency installation phase. Generally the errors have
involved just two packages, pydantic and numpy. Looking at the install
logs, I see that both of these packages are first installed under one
version number by a dependency, and then uninstalled and replaced by a
slightly different version specified in invoke's `pyproject.toml`. I
think this is the problem - maybe the earlier package is not completely
closed before it is uninstalled and reinstalled.
This PR relaxes pinning of numpy and pydantic in `pyproject.toml`.
Everything seems to install and run properly. Hopefully it will address
the windows install bug as well.
## What type of PR is this? (check all applicable)
- [x] Bug Fix
## Have you discussed this change with the InvokeAI team?
- [x] Yes
## Description
- SDXL Metadata was not being retrieved. This PR fixes it.
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [ ] Feature
- [ ] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
- [ ] Community Node Submission
## Have you discussed this change with the InvokeAI team?
- [ ] Yes
- [ ] No, because:
## Have you updated all relevant documentation?
- [ ] Yes
- [ ] No
## Description
## Related Tickets & Documents
<!--
For pull requests that relate or close an issue, please include them
below.
For example having the text: "closes #1234" would connect the current
pull
request to issue 1234. And when we merge the pull request, Github will
automatically close the issue.
-->
- Related Issue #
- Closes #
## QA Instructions, Screenshots, Recordings
<!--
Please provide steps on how to test changes, any hardware or
software specifications as well as any other pertinent information.
-->
## Added/updated tests?
- [ ] Yes
- [ ] No : _please replace this line with details on why tests
have not been included_
## [optional] Are there any post deployment tasks we need to perform?
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [X] Feature
- [ ] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
- [ ] Community Node Submission
## Have you discussed this change with the InvokeAI team?
- [ ] Yes
- [X] No, because:
not yet, making pr to show
## Have you updated relevant documentation?
- [ ] Yes
- [ ] No
## Description
Temp Change Node String input to a textbox, to allow easier input of
prompts and larger strings, it works for me but please tell me if I did
it wrong and if the size is ok
## Related Tickets & Documents
<!--
For pull requests that relate or close an issue, please include them
below.
For example having the text: "closes #1234" would connect the current
pull
request to issue 1234. And when we merge the pull request, Github will
automatically close the issue.
-->
- Related Issue #
- Closes #
## QA Instructions, Screenshots, Recordings
<!--
Please provide steps on how to test changes, any hardware or
software specifications as well as any other pertinent information.
-->
## Added/updated tests?
- [ ] Yes
- [ ] No : _please replace this line with details on why tests
have not been included_
## [optional] Are there any post deployment tasks we need to perform?
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [ ] Feature
- [x] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
- [ ] Community Node Submission
## Have you discussed this change with the InvokeAI team?
- [ ] Yes
- [x] No, because: minor fix, let me know your thoughts
## Have you updated all relevant documentation?
- [x] Yes
- [ ] No
## Description
## Related Tickets & Documents
<!--
For pull requests that relate or close an issue, please include them
below.
For example having the text: "closes #1234" would connect the current
pull
request to issue 1234. And when we merge the pull request, Github will
automatically close the issue.
-->
- Related Issue # https://github.com/invoke-ai/InvokeAI/issues/4017
- Closes #
## QA Instructions, Screenshots, Recordings
<!--
Please provide steps on how to test changes, any hardware or
software specifications as well as any other pertinent information.
-->
## Added/updated tests?
- [ ] Yes
- [x] No : Requires mps device
## [optional] Are there any post deployment tasks we need to perform?
Please test on an MPS (M1/M2) device.
Relevant code causing the error in #401701b6ec21fa/src/diffusers/schedulers/scheduling_euler_discrete.py (L263C3-L268C75)
```
self.sigmas = torch.from_numpy(sigmas).to(device=device)
if str(device).startswith("mps"):
# mps does not support float64
self.timesteps = torch.from_numpy(timesteps).to(device, dtype=torch.float32)
else:
self.timesteps = torch.from_numpy(timesteps).to(device=device)
```
## What type of PR is this? (check all applicable)
- [x] Bug Fix
## Description
- Fix SDXL Concat Link animation not considering the fact that prompt
boxes can be resized.
- Also fixed a minor issue where the overlaying animation box would
block the prompt input resize slightly. Should be good now.
## What type of PR is this? (check all applicable)
- [X ] Documentation Update
## Have you discussed this change with the InvokeAI team?
- [X ] Yes
## Have you updated all relevant documentation?
- [X ] Yes
## Description
Added solutions for installation issues related to large SDXL files and
Windows dependency glitches.
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [x] Feature
- [ ] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
- [ ] Community Node Submission
## Have you discussed this change with the InvokeAI team?
- [ ] Yes
- [ ] No, because:
## Have you updated all relevant documentation?
- [ ] Yes
- [ ] No
## Description
Making the prompt area styling match across all tabs / models and move
all prompt related components into a parent components for quick add.
Cherry picked stuff from the Styles PR coz we ain't gonna merge that.
## Related Tickets & Documents
<!--
For pull requests that relate or close an issue, please include them
below.
For example having the text: "closes #1234" would connect the current
pull
request to issue 1234. And when we merge the pull request, Github will
automatically close the issue.
-->
- Related Issue #
- Closes #
## QA Instructions, Screenshots, Recordings
<!--
Please provide steps on how to test changes, any hardware or
software specifications as well as any other pertinent information.
-->
## Added/updated tests?
- [ ] Yes
- [ ] No : _please replace this line with details on why tests
have not been included_
## [optional] Are there any post deployment tasks we need to perform?
- make the `SDXLConcatLink` icon match existing colors in light mode
- make the link toggle button accent color when active (its not super obvious but at least there is *some* visual difference for the button)
## What type of PR is this? (check all applicable)
- [ X] Feature
## Have you discussed this change with the InvokeAI team?
- [ X] Yes
## Have you updated all relevant documentation?
- [ X] Yes - this makes invokeai behave the way it is described in
LOGGING.md
## Description
Prior to this PR, the uvicorn embedded web server did all its logging
independently of the InvokeAI logging infrastructure, and none of the
command-line or `invokeai.yaml` configuration directives, such as
`log_level` had any effect on its output. This PR replaces the uvicorn
logger with InvokeAI's, simultaneously creating a more uniform output
experience, as well as a unified way to control the amount and
destination of the logs.
Here's what we used to see at startup:
```
[2023-07-27 07:29:48,027]::[InvokeAI]::INFO --> InvokeAI version 3.0.1rc2
[2023-07-27 07:29:48,027]::[InvokeAI]::INFO --> Root directory = /home/lstein/invokeai-main
[2023-07-27 07:29:48,028]::[InvokeAI]::INFO --> GPU device = cuda NVIDIA GeForce RTX 4070
[2023-07-27 07:29:48,040]::[InvokeAI]::INFO --> Scanning /home/lstein/invokeai-main/models for new models
[2023-07-27 07:29:49,263]::[InvokeAI]::INFO --> Scanned 22 files and directories, imported 10 models
[2023-07-27 07:29:49,271]::[InvokeAI]::INFO --> Model manager service initialized
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:9090 (Press CTRL+C to quit)
INFO: 127.0.0.1:44404 - "GET /socket.io/?EIO=4&transport=polling&t=OcN7Pvd HTTP/1.1" 200 OK
INFO: 127.0.0.1:44404 - "POST /socket.io/?EIO=4&transport=polling&t=OcN7Pw6&sid=SB-NsBKLSrW7YtM0AAAA HTTP/1.1" 200 OK
INFO: ('127.0.0.1', 44418) - "WebSocket /socket.io/?EIO=4&transport=websocket&sid=SB-NsBKLSrW7YtM0AAAA" [accepted]
INFO: connection open
INFO: 127.0.0.1:44430 - "GET /socket.io/?EIO=4&transport=polling&t=OcN7Pw9&sid=SB-NsBKLSrW7YtM0AAAA HTTP/1.1" 200 OK
INFO: 127.0.0.1:44404 - "GET /socket.io/?EIO=4&transport=polling&t=OcN7PwU&sid=SB-NsBKLSrW7YtM0AAAA HTTP/1.1" 200 OK
INFO: 127.0.0.1:44404 - "GET /api/v1/images/?is_intermediate=true HTTP/1.1" 200 OK
INFO: 127.0.0.1:43448 - "GET / HTTP/1.1" 200 OK
INFO: connection closed
INFO: 127.0.0.1:43448 - "GET /assets/index-5a784cdd.js HTTP/1.1" 200 OK
INFO: 127.0.0.1:43458 - "GET /assets/favicon-0d253ced.ico HTTP/1.1" 304 Not Modified
INFO: 127.0.0.1:43448 - "GET /locales/en.json HTTP/1.1" 200 OK
```
And here's what we see with the new implementation:
```
[2023-07-27 12:05:28,810]::[uvicorn.error]::INFO --> Started server process [875161]
[2023-07-27 12:05:28,810]::[uvicorn.error]::INFO --> Waiting for application startup.
[2023-07-27 12:05:28,810]::[InvokeAI]::INFO --> InvokeAI version 3.0.1rc2
[2023-07-27 12:05:28,810]::[InvokeAI]::INFO --> Root directory = /home/lstein/invokeai-main
[2023-07-27 12:05:28,811]::[InvokeAI]::INFO --> GPU device = cuda NVIDIA GeForce RTX 4070
[2023-07-27 12:05:28,823]::[InvokeAI]::INFO --> Scanning /home/lstein/invokeai-main/models for new models
[2023-07-27 12:05:29,970]::[InvokeAI]::INFO --> Scanned 22 files and directories, imported 10 models
[2023-07-27 12:05:29,977]::[InvokeAI]::INFO --> Model manager service initialized
[2023-07-27 12:05:29,980]::[uvicorn.error]::INFO --> Application startup complete.
[2023-07-27 12:05:29,981]::[uvicorn.error]::INFO --> Uvicorn running on http://127.0.0.1:9090 (Press CTRL+C to quit)
[2023-07-27 12:05:32,140]::[uvicorn.access]::INFO --> 127.0.0.1:45236 - "GET /socket.io/?EIO=4&transport=polling&t=OcO6ILb HTTP/1.1" 200
[2023-07-27 12:05:32,142]::[uvicorn.access]::INFO --> 127.0.0.1:45248 - "GET /socket.io/?EIO=4&transport=polling&t=OcO6ILb HTTP/1.1" 200
[2023-07-27 12:05:32,154]::[uvicorn.access]::INFO --> 127.0.0.1:45236 - "POST /socket.io/?EIO=4&transport=polling&t=OcO6ILr&sid=13O4-5uLx5eP-NuqAAAA HTTP/1.1" 200
[2023-07-27 12:05:32,168]::[uvicorn.access]::INFO --> 127.0.0.1:45252 - "POST /socket.io/?EIO=4&transport=polling&t=OcO6IM0&sid=0KRqxmh7JLc8t7wZAAAB HTTP/1.1" 200
[2023-07-27 12:05:32,171]::[uvicorn.error]::INFO --> ('127.0.0.1', 45264) - "WebSocket /socket.io/?EIO=4&transport=websocket&sid=0KRqxmh7JLc8t7wZAAAB" [accepted]
[2023-07-27 12:05:32,172]::[uvicorn.error]::INFO --> connection open
[2023-07-27 12:05:32,174]::[uvicorn.access]::INFO --> 127.0.0.1:45276 - "GET /socket.io/?EIO=4&transport=polling&t=OcO6IM3&sid=0KRqxmh7JLc8t7wZAAAB HTTP/1.1" 200
```
You can also divert everything to a file with a `invokeai.yaml` entry
like this:
```
Logging:
log_handlers:
- file=/home/lstein/invokeai/logs/access_log
log_format: plain
log_level: info
```
This works with syslog and other log handlers as well.
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [x] Feature
- [ ] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
- [ ] Community Node Submission
## Have you discussed this change with the InvokeAI team?
- [ ] Yes
- [ ] No, because:
## Have you updated all relevant documentation?
- [ ] Yes
- [ ] No
## Description
https://github.com/huggingface/diffusers/releases/tag/v0.19.0
## Related Tickets & Documents
<!--
For pull requests that relate or close an issue, please include them
below.
For example having the text: "closes #1234" would connect the current
pull
request to issue 1234. And when we merge the pull request, Github will
automatically close the issue.
-->
- Related Issue #
- Closes #
## QA Instructions, Screenshots, Recordings
<!--
Please provide steps on how to test changes, any hardware or
software specifications as well as any other pertinent information.
-->
## Added/updated tests?
- [ ] Yes
- [ ] No : _please replace this line with details on why tests
have not been included_
## [optional] Are there any post deployment tasks we need to perform?
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [X ] Feature
- [ ] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
- [ ] Community Node Submission
## Have you discussed this change with the InvokeAI team?
- [X ] Yes
- [ ] No, because:
## Have you updated all relevant documentation?
- [X ] Yes
- [ ] No
## Description
This updates InvokeAI's pyproject.toml to the minimum library versions
needed to support Python 3.11. It updates the installer to find and
allow for 3.11, and the documentation.
Between 3.10 and 3.11 there was a change to the handling of `enum`
interpolation into strings that caused the model manager to break. I
think I have fixed the places where this was a problem, but there may be
other instances in which this will cause problems. Please be alert for
errors involving `ModelType` or `BaseModelType`.
I also took the opportunity to add a `SilenceWarnings()` context to the
t2i and i2i invocations. This quenches nags from diffusers about the
HuggingFace NSFW library.
I have tested basic functionality (t2i, i2i, inpaint, lora, controlnet,
nodes) on 3.10 and 3.11 and all seems good. Please test more
extensively!
## Added/updated tests?
- [ X ] Yes - existing tests run to completion
- [ ] No
## [optional] Are there any post deployment tasks we need to perform?
Should be a drop-in replacement.
* add upper bound for minWidth to prevent crash with cypress
* add fallback so UI doesnt crash when backend isnt running
---------
Co-authored-by: Mary Hipp <maryhipp@Marys-MacBook-Air.local>
when multiple python versions are installed with `pyenv`, the executable
(shim) exists, but returns an error when trying to run it
unless activated with `pyenv`. This commit ensures the python
executable is usable.
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [x] Feature (dev feature and reformatting)
- [ ] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
## Have you discussed this change with the InvokeAI team?
- [x] Yes
- [ ] No, because:
## Description
Introducing black to the code base as a first step towards this:
https://github.com/invoke-ai/InvokeAI/discussions/3721
## Related Tickets & Documents
<!--
For pull requests that relate or close an issue, please include them
below.
For example having the text: "closes #1234" would connect the current
pull
request to issue 1234. And when we merge the pull request, Github will
automatically close the issue.
-->
- Related Issue #
- Closes #
## QA Instructions, Screenshots, Recordings
<!--
Please provide steps on how to test changes, any hardware or
software specifications as well as any other pertinent information.
-->
## Added/updated tests?
- [ ] Yes
- [x] No : Not applicable
## [optional] Are there any post deployment tasks we need to perform?
All active branches will be affected by this and will need to be
updated.
This PR adds a new github workflow for black as well as config for
pre-commit hooks to those who wish to use it
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [ ] Feature
- [ X] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
- [ ] Community Node Submission
## Have you discussed this change with the InvokeAI team?
- [X ] Yes
- [ ] No, because:
## Have you updated all relevant documentation?
- [ ] Yes
- [X ] Not needed
## Description
This bugfix enables InvokeAI to convert sd-1, sd-2 and sdxl base model
checkpoints (.safetensors) to diffusers.
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [ ] Feature
- [ ] Bug Fix
- [ X] Optimization
- [ ] Documentation Update
- [ ] Community Node Submission
## Have you discussed this change with the InvokeAI team?
- [ X] Yes
- [ ] No, because:
## Have you updated all relevant documentation?
- [ ] Yes
- [X ] No
## Description
This PR causes the installer to install, by default, the fine-tuned
SDXL-1.0 VAE located at
https://huggingface.co/madebyollin/sdxl-vae-fp16-fix.
Although this VAE is supposed to run at fp16 resolution, currently it
only works in InvokeAI at fp32. However, because it is a fine tune, it
may have fewer of the watermark-related artifacts that we see with the
SDXL-1.0 VAE.
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [ ] Feature
- [ X] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
- [ ] Community Node Submission
## Have you discussed this change with the InvokeAI team?
- [ X] Yes
- [ ] No, because:
## Have you updated all relevant documentation?
- [ ] Yes
- [ X] Not necessary
## Description
When adding new core models to a 3.0.0 root directory needed to support
SDXL, the configure script was (under some conditions) overwriting
models.yaml. This PR corrects the problem.
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [ ] Feature
- [X ] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
- [ ] Community Node Submission
## Have you discussed this change with the InvokeAI team?
- [ X] Yes
- [ ] No, because:
## Have you updated all relevant documentation?
- [X ] Yes
- [ ] No
## Description
I have reworked the console TUIs for the configure and model install
scripts to require much less vertical space. In the event that the
"NEXT" button is still missing and "page 1/2" is displayed, scrolling
beyond the last checkbox will now automatically move to page 2 where the
buttons are displayed. This is not ideal, but will no longer block user
completely.
If users continue to have problems after this, I'll get rid of the TUI
altogether and replace with a web form.
## Added/updated tests?
- [ ] Yes
- [X ] No : not needed
## [optional] Are there any post deployment tasks we need to perform?
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [ ] Feature
- [ ] Bug Fix
- [ ] Optimization
- [X ] Documentation Update
- [ ] Community Node Submission
## Have you discussed this change with the InvokeAI team?
- [ ] Yes
- [X ] No, because they trust me
## Have you updated all relevant documentation?
- [ X] Yes
- [ ] No
## Description
* Added the RAIL++ license for SDXL
* Updated configure script with URLs for both the original RAIL-M and
RAIL++ licenses
* Added invisible watermark documentation and renamed doc file
* Updated documentation for installation
* Updated documentation on settings in invokeai.yaml
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [ ] Feature
- [x] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
- [ ] Community Node Submission
## Have you discussed this change with the InvokeAI team?
- [ ] Yes
- [ ] No, because:
## Have you updated all relevant documentation?
- [ ] Yes
- [ ] No
## Description
Metadata was not getting saved coz the accumulator was not plugged in if
watermark or nsfw nodes were turned off.
## Related Tickets & Documents
<!--
For pull requests that relate or close an issue, please include them
below.
For example having the text: "closes #1234" would connect the current
pull
request to issue 1234. And when we merge the pull request, Github will
automatically close the issue.
-->
- Related Issue #
- Closes #
## QA Instructions, Screenshots, Recordings
<!--
Please provide steps on how to test changes, any hardware or
software specifications as well as any other pertinent information.
-->
## Added/updated tests?
- [ ] Yes
- [ ] No : _please replace this line with details on why tests
have not been included_
## [optional] Are there any post deployment tasks we need to perform?
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [ ] Feature
- [x ] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
- [ ] Community Node Submission
## Have you discussed this change with the InvokeAI team?
- [ ] Yes
- [ x] No, because there was no time!
## Have you updated all relevant documentation?
- [ ] Yes
- [X ] No
## Description
Hotfix for issue of SD1 and SD2 legacy safetensors models not converting
in 3.0.1rc1.
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [ ] Feature
- [ ] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
- [ ] Community Node Submission
## Have you discussed this change with the InvokeAI team?
- [ ] Yes
- [ ] No, because:
## Have you updated all relevant documentation?
- [ ] Yes
- [ ] No
## Description
## Related Tickets & Documents
<!--
For pull requests that relate or close an issue, please include them
below.
For example having the text: "closes #1234" would connect the current
pull
request to issue 1234. And when we merge the pull request, Github will
automatically close the issue.
-->
- Related Issue #
- Closes #
## QA Instructions, Screenshots, Recordings
<!--
Please provide steps on how to test changes, any hardware or
software specifications as well as any other pertinent information.
-->
## Added/updated tests?
- [ ] Yes
- [ ] No : _please replace this line with details on why tests
have not been included_
## [optional] Are there any post deployment tasks we need to perform?
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [ X] Feature
- [ ] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
- [ ] Community Node Submission
## Have you discussed this change with the InvokeAI team?
- [ X] Yes
- [ ] No, because:
## Have you updated all relevant documentation?
- [X ] Yes
- [] No
## Description
This PR adds NSFW checker and invisible watermark fields. The NSFW
checker takes an image input and produces an image output. If NSFW
content is detected, the output image will be blurred and a "caution"
icon pasted into its upper left corner. A boolean `active` field
controls whether the checker is active. If turned off it simply returns
a copy of the image.
The invisible watermark node adds an invisible text to the image,
defaulting to "InvokeAI". To decode the watermark use the
`invisible-watermark` command, which is part of the
`invisible-watermark` library:
```
$ invisible-watermark -v -a decode -t bytes -m dwtDct -l 64 ./bluebird-watermark.png
decode time ms: 14.129877090454102
InvokeAI
```
Note that the `-l` (length) argument is mandatory. It is set to 64 here
because the watermark `InvokeAI` is 8 bytes/64 bits long. The length
must match in order for the watermark to be decoded correctly.
Both nodes are now incorporated into the linear Text2Image and
Image2Image UIs, including the canvas. They are not implemented for
inpaint currently.
The nodes can be disabled with configuration options:
```
invisible_watermark: false
nsfw_checker: false
```
or at launch time with `--no-invisible_watermark` and
`--no-nsfw_checker`.
feat(ui) use `as` for menuitem links
I had requested this be done with the chakra `Link` component, but actually using `as` is correct according to the docs. For other components, you are supposed to use `Link` but looks like `MenuItem` has this built in.
Fixed in all places where we use it.
Also:
- fix github icon
- give menu hamburger button padding
- add menu motion props so it animates the same as other menus
feat(ui): restore ColorModeButton
@maryhipp
chore(ui): lint
feat(ui): remove colormodebutton again
sry
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [ X] Feature
- [ ] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
- [ ] Community Node Submission
## Have you discussed this change with the InvokeAI team?
- [X ] Yes
- [ ] No, because:
## Have you updated all relevant documentation?
- [ ] Yes
- [ X] No - not yet WIP
## Description
This PR adds support for loading and converting checkpoint-format
ControlNet and SDXL models. The SDXL and SDXL-refiner model conversions
are working; however saving the unet in safetensors format leads to
corrupted model files, so currently is saving in .bin format (after
scanning the input model).
ControlNet conversion seems to be working but needs further testing.
To use this PR, you will need to copy the files
`invokeai/configs/stable-diffusion/sd_xl_base.yaml` and
`invokeai/configs/stable-diffusion/sd_xl_refiner.yaml` into
`INVOKEAI/configs/stable-diffusion`. You will also need to run
`invokeai-configure --yes --skip-sd` in order to install additional core
model files needed by the converter.
## What type of PR is this? (check all applicable)
- [x] Feature
## Have you discussed this change with the InvokeAI team?
- [x] Yes
## Description
- Update the Aspect Ratio tags to show the aspect ratio values rather
than Wide / Square and etc.
- Updated Lora Input to take values between -50 and 50 coz I found some
LoRA that are actually trained to work until -25 and +15 too. So these
input caps should mostly suffice. If there's ever a LoRA that goes
bonkers on that, we can change it.
- Fixed LoRA's being sorted the wrong way in Lora Select.
- Fixed Embeddings being sorted the wrong way in Embedding Select.
## Related Tickets & Documents
<!--
For pull requests that relate or close an issue, please include them
below.
For example having the text: "closes #1234" would connect the current
pull
request to issue 1234. And when we merge the pull request, Github will
automatically close the issue.
-->
- Related Issue #
- Closes #
## QA Instructions, Screenshots, Recordings
<!--
Please provide steps on how to test changes, any hardware or
software specifications as well as any other pertinent information.
-->
## Added/updated tests?
- [ ] Yes
- [ ] No : _please replace this line with details on why tests
have not been included_
## [optional] Are there any post deployment tasks we need to perform?
- add `addNSFWCheckerToGraph` and `addWatermarkerToGraph` functions
- use them in all linear graph creation
- add state & toggles to settings modal to enable these
- trigger queries for app config on socket connect
- disable the nsfw/watermark booleans if we get the app config and they are not available
## What type of PR is this? (check all applicable)
- [x] Feature
## Have you discussed this change with the InvokeAI team?
- [x] Yes
## Description
This PR adds support for SDXL Models in the Linear UI
### DONE
- SDXL Base Text To Image Support
- SDXL Base Image To Image Support
- SDXL Refiner Support
- SDXL Relevant UI
## [optional] Are there any post deployment tasks we need to perform?
Double check to ensure nothing major changed with 1.0 -- In any case
those changes would be backend related mostly. If Refiner is scrapped
for 1.0 models, then we simply disable the Refiner Graph.
Rolled back the earlier split of the refiner model query.
Now, when you use `useGetMainModelsQuery()`, you must provide it an array of base model types.
They are provided as constants for simplicity:
- ALL_BASE_MODELS
- NON_REFINER_BASE_MODELS
- REFINER_BASE_MODELS
Opted to just use args for the hook instead of wrapping the hook in another hook, we can tidy this up later if desired.
We can derive `isRefinerAvailable` from the query result (eg are there any refiner models installed). This is a piece of server state, so by using the list models response directly, we can avoid needing to manually keep the client in sync with the server.
Created a `useIsRefinerAvailable()` hook to return this boolean wherever it is needed.
Also updated the main models & refiner models endpoints to only return the appropriate models. Now we don't need to filter the data on these endpoints.
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [ ] Feature
- [X] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
- [ ] Community Node Submission
## Have you discussed this change with the InvokeAI team?
- [X] Yes
- [ ] No, because:
## Have you updated all relevant documentation?
- [ ] Yes
- [X] No
## Description
Updated script to close stale issues with the newest version of the
actions/stale
## Related Tickets & Documents
<!--
For pull requests that relate or close an issue, please include them
below.
For example having the text: "closes #1234" would connect the current
pull
request to issue 1234. And when we merge the pull request, Github will
automatically close the issue.
-->
- Related Issue #
- Closes #
## QA Instructions, Screenshots, Recordings
<!--
Please provide steps on how to test changes, any hardware or
software specifications as well as any other pertinent information.
-->
## Added/updated tests?
- [ ] Yes
- [X] No : _please replace this line with details on why tests
have not been included_
## [optional] Are there any post deployment tasks we need to perform?
Not sure how this script gets kicked off
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [ ] Feature
- [ ] Bug Fix
- [ ] Optimization
- [x] Documentation Update
- [ ] Community Node Submission
## Have you discussed this change with the InvokeAI team?
- [ ] Yes
- [x] No, because: This is a minor fix that I happened upon while
reading
## Have you updated all relevant documentation?
- [x] Yes
- [ ] No
## Description
Within the `mkdocs.yml` file, there's a typo where `Model Merging` is
spelled as `Model Mergeing`. I also found some unnecessary white space
that I removed.
## Related Tickets & Documents
<!--
For pull requests that relate or close an issue, please include them
below.
For example having the text: "closes #1234" would connect the current
pull
request to issue 1234. And when we merge the pull request, Github will
automatically close the issue.
-->
- Related Issue #
- Closes #
## QA Instructions, Screenshots, Recordings
<!--
Please provide steps on how to test changes, any hardware or
software specifications as well as any other pertinent information.
-->
## Added/updated tests?
- [ ] Yes
- [x] No : Not big enough of a change to require tests (unless it is)
## [optional] Are there any post deployment tasks we need to perform?
Might need to re-run the yml file for docs to regenerate, but I'm hardly
familiar with the codebase so 🤷
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [x] Feature
- [ ] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
- [ ] Community Node Submission
## Have you discussed this change with the InvokeAI team?
- [ ] Yes
- [x] No, because: n/a
## Have you updated all relevant documentation?
- [ ] Yes
- [x] No n/a
## Description
Add a generation mode indicator to canvas.
- use the existing logic to determine if generation is txt2img, img2img,
inpaint or outpaint
- technically `outpaint` and `inpaint` are the same, just display
"Inpaint" if its either
- debounce this by 1s to prevent jank
I was going to disable controlnet conditionally when the mode is inpaint
but that involves a lot of fiddly changes to the controlnet UI
components. Instead, I'm hoping we can get inpaint moved over to latents
by next release, at which point controlnet will work.
## Related Tickets & Documents
<!--
For pull requests that relate or close an issue, please include them
below.
For example having the text: "closes #1234" would connect the current
pull
request to issue 1234. And when we merge the pull request, Github will
automatically close the issue.
-->
- Related Issue #
- Closes #
## QA Instructions, Screenshots, Recordings
<!--
Please provide steps on how to test changes, any hardware or
software specifications as well as any other pertinent information.
-->
https://github.com/invoke-ai/InvokeAI/assets/4822129/87464ae9-4136-4367-b992-e243ff0d05b4
## Added/updated tests?
- [ ] Yes
- [x] No : n/a
## [optional] Are there any post deployment tasks we need to perform?
n/a
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [ ] Feature
- [x] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
- [ ] Community Node Submission
## Have you discussed this change with the InvokeAI team?
- [x] Yes
- [ ] No, because:
## Have you updated all relevant documentation?
- [ ] Yes
- [x] No, n/a
## Description
When a queue item is popped for processing, we need to retrieve its
session from the DB. Pydantic serializes the graph at this stage.
It's possible for a graph to have been made invalid during the graph
preparation stage (e.g. an ancestor node executes, and its output is not
valid for its successor node's input field).
When this occurs, the session in the DB will fail validation, but we
don't have a chance to find out until it is retrieved and parsed by
pydantic.
This logic was previously not wrapped in any exception handling.
Just after retrieving a session, we retrieve the specific invocation to
execute from the session. It's possible that this could also have some
sort of error, though it should be impossible for it to be a pydantic
validation error (that would have been caught during session
validation). There was also no exception handling here.
When either of these processes fail, the processor gets soft-locked
because the processor's cleanup logic is never run. (I didn't dig deeper
into exactly what cleanup is not happening, because the fix is to just
handle the exceptions.)
This PR adds exception handling to both the session retrieval and node
retrieval and events for each: `session_retrieval_error` and
`invocation_retrieval_error`.
These events are caught and displayed in the UI as toasts, along with
the type of the python exception (e.g. `Validation Error`). The events
are also logged to the browser console.
## Related Tickets & Documents
<!--
For pull requests that relate or close an issue, please include them
below.
For example having the text: "closes #1234" would connect the current
pull
request to issue 1234. And when we merge the pull request, Github will
automatically close the issue.
-->
Closes#3860 , #3412
## QA Instructions, Screenshots, Recordings
<!--
Please provide steps on how to test changes, any hardware or
software specifications as well as any other pertinent information.
-->
Create an valid graph that will become invalid during execution. Here's
an example:

This is valid before execution, but the `width` field of the `Noise`
node will end up with an invalid value (`0`). Previously, this would
soft-lock the app and you'd have to restart it.
Now, with this graph, you will get an error toast, and the app will not
get locked up.
## Added/updated tests?
- [x] Yes (ish)
- [ ] No
@Kyle0654 @brandonrising
It seems because the processor runs in its own thread, `pytest` cannot
catch exceptions raised in the processor.
I added a test that does work, insofar as it does recreate the issue.
But, because the exception occurs in a separate thread, the test doesn't
see it. The result is that the test passes even without the fix.
So when running the test, we see the exception:
```py
Exception in thread invoker_processor:
Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/usr/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/home/bat/Documents/Code/InvokeAI/invokeai/app/services/processor.py", line 50, in __process
self.__invoker.services.graph_execution_manager.get(
File "/home/bat/Documents/Code/InvokeAI/invokeai/app/services/sqlite.py", line 79, in get
return self._parse_item(result[0])
File "/home/bat/Documents/Code/InvokeAI/invokeai/app/services/sqlite.py", line 52, in _parse_item
return parse_raw_as(item_type, item)
File "pydantic/tools.py", line 82, in pydantic.tools.parse_raw_as
File "pydantic/tools.py", line 38, in pydantic.tools.parse_obj_as
File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
```
But `pytest` doesn't actually see it as an exception. Not sure how to
fix this, it's a bit beyond me.
## [optional] Are there any post deployment tasks we need to perform?
nope don't think so
## What type of PR is this? (check all applicable)
- [x] Refactor
- [ ] Feature
- [ ] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
- [ ] Community Node Submission
## Description
`search_for_models` is explicitly typed as taking a singular `Path` but
was given a list because some later function in the stack expects a
list. Fixed that to be compatible with the paths. This is the only use
of that function.
The `list()` call is unrelated but removes a type warning since it's
supposed to return a list, not a set. I can revert it if requested.
This was found through pylance type errors. Go types!
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [ ] Feature
- [x] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
- [ ] Community Node Submission
## Description
This import is missing and used later in the file.
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [ ] Feature
- [x] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
- [ ] Community Node Submission
## Have you discussed this change with the InvokeAI team?
- [ ] Yes
- [ ] No, because:
## Have you updated all relevant documentation?
- [ ] Yes
- [ ] No
## Description
## Related Tickets & Documents
<!--
For pull requests that relate or close an issue, please include them
below.
For example having the text: "closes #1234" would connect the current
pull
request to issue 1234. And when we merge the pull request, Github will
automatically close the issue.
-->
- Related Issue #
- Closes #
## QA Instructions, Screenshots, Recordings
<!--
Please provide steps on how to test changes, any hardware or
software specifications as well as any other pertinent information.
-->
## Added/updated tests?
- [ ] Yes
- [ ] No : _please replace this line with details on why tests
have not been included_
## [optional] Are there any post deployment tasks we need to perform?
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [x] Feature
- [ ] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
- [ ] Community Node Submission
## Have you discussed this change with the InvokeAI team?
- [x] Yes
- [ ] No, because:
## Have you updated all relevant documentation?
- [ ] Yes
- [x] No: n/a
## Description
At some point I typo'd this and set the max seed to signed int32 max. It
should be *un*signed int32 max.
This restored the seed range to what it was in v2.3.
Also fixed a bug in the Noise node which resulted in the max valid seed
being one less than intended.
## Related Tickets & Documents
<!--
For pull requests that relate or close an issue, please include them
below.
For example having the text: "closes #1234" would connect the current
pull
request to issue 1234. And when we merge the pull request, Github will
automatically close the issue.
-->
- Related Issues
#2843 is against v2.3 and increases the range of valid seeds
substantially. Maybe we can explore this in the future but as of v3.0,
we use numpy for a RNG in a few places, and it maxes out at the max
`uint32`. I will close this PR as this supersedes it.
- Closes#3866
## QA Instructions, Screenshots, Recordings
<!--
Please provide steps on how to test changes, any hardware or
software specifications as well as any other pertinent information.
-->
You should be able to use seeds up to and including `4294967295`.
## Added/updated tests?
- [ ] Yes
- [x] No : don't think we have any relevant tests
## [optional] Are there any post deployment tasks we need to perform?
nope!
At some point I typo'd this and set the max seed to signed int32 max. It should be *un*signed int32 max.
This restored the seed range to what it was in v2.3.
## What type of PR is this? (check all applicable)
- [x] Bug Fix
## Have you discussed this change with the InvokeAI team?
- [x] Yes, we feel very passionate about this.
## Description
Uploading an incorrect JSON file to the Node Editor would crash the app.
While this is a much larger problem that we will tackle while refining
the Node Editor, this is a fix that should address 99% of the cases out
there.
When saving an InvokeAI node graph, there are three primary keys.
1. `nodes` - which has all the node related data.
2. `edges` - which has all the edges related data
3. `viewport` - which has all the viewport related data.
So when we load back the JSON, we now check if all three of these keys
exist in the retrieved JSON object. While the `viewport` itself is not a
mandatory key to repopulate the graph, checking for it will allow us to
treat it as an additional check to ensure that the graph was saved from
InvokeAI.
As a result ...
- If you upload an invalid JSON file, the app now warns you that the
JSON is invalid.
- If you upload a JSON of a graph editor that is not InvokeAI, it simply
warns you that you are uploading a non InvokeAI graph.
So effectively, you should not be able to load any graph that is not
generated by ReactFlow.
Here are the edge cases:
- What happens if a user maintains the above key structure but tampers
with the data inside them? Well tested it. Turns out because we validate
and build the graph based on the JSON data, if you tamper with any data
that is needed to rebuild that node, it simply will skip that and load
the rest of the graph with valid data.
- What happens if a user uploads a graph that was made by some other
random ReactFlow app? Well, same as above. Because we do not have to
parse that in our setup, it simply will skip it and only display what
are setup to do.
I think that just about covers 99% of the cases where this could go
wrong. If there's any other edges cases, can add checks if need be. But
can't think of any at the moment.
## Related Tickets & Documents
### Closes
- #3893
- #3881
## [optional] Are there any post deployment tasks we need to perform?
Yes. Making @psychedelicious a little bit happier. :P
- use the existing logic to determine if generation is txt2img, img2img, inpaint or outpaint
- technically `outpaint` and `inpaint` are the same, just display
"Inpaint" if its either
- debounce this by 1s to prevent jank
When a queue item is popped for processing, we need to retrieve its session from the DB. Pydantic serializes the graph at this stage.
It's possible for a graph to have been made invalid during the graph preparation stage (e.g. an ancestor node executes, and its output is not valid for its successor node's input field).
When this occurs, the session in the DB will fail validation, but we don't have a chance to find out until it is retrieved and parsed by pydantic.
This logic was previously not wrapped in any exception handling.
Just after retrieving a session, we retrieve the specific invocation to execute from the session. It's possible that this could also have some sort of error, though it should be impossible for it to be a pydantic validation error (that would have been caught during session validation). There was also no exception handling here.
When either of these processes fail, the processor gets soft-locked because the processor's cleanup logic is never run. (I didn't dig deeper into exactly what cleanup is not happening, because the fix is to just handle the exceptions.)
This PR adds exception handling to both the session retrieval and node retrieval and events for each: `session_retrieval_error` and `invocation_retrieval_error`.
These events are caught and displayed in the UI as toasts, along with the type of the python exception (e.g. `Validation Error`). The events are also logged to the browser console.
## What type of PR is this? (check all applicable)
- [x] Refactor
- [ ] Feature
- [ ] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
- [ ] Community Node Submission
## Have you discussed this change with the InvokeAI team?
- [ ] Yes
- [x] No, because: n/a
## Have you updated all relevant documentation?
- [ ] Yes
- [x] No n/a
## Description
Big cleanup:
- improve & simplify the app logging
- resolve all TS issues
- resolve all circular dependencies
- fix all lint/format issues
## QA Instructions, Screenshots, Recordings
`yarn lint` passes:

<!--
Please provide steps on how to test changes, any hardware or
software specifications as well as any other pertinent information.
-->
## Added/updated tests?
- [ ] Yes
- [x] No : n/a
## [optional] Are there any post deployment tasks we need to perform?
bask in the glory of what *should* be a fully-passing frontend lint on
this PR
Added the Ideal Size node
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [X] Feature
- [ ] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
- [ ] Community Node Submission
## Have you discussed this change with the InvokeAI team?
- [ ] Yes
- [X] No, because: It's a community node addition
## Have you updated all relevant documentation?
- [X] Yes
- [ ] No
## Description
Added a reference to my community node that calculates the ideal size
for initial latent generation that avoids duplication. This is the logic
that was present in 2.3.5's first pass of high-res optimization.
## Related Tickets & Documents
<!--
For pull requests that relate or close an issue, please include them
below.
For example having the text: "closes #1234" would connect the current
pull
request to issue 1234. And when we merge the pull request, Github will
automatically close the issue.
-->
- Related Issue #
- Closes #
## QA Instructions, Screenshots, Recordings
<!--
Please provide steps on how to test changes, any hardware or
software specifications as well as any other pertinent information.
-->
## Added/updated tests?
- [ ] Yes
- [X] No : This is a documentation change that references my community
node.
## [optional] Are there any post deployment tasks we need to perform?
Add Face Mask to communityNodes.md
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [ ] Feature
- [ ] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
- [x] Community Node Submission
## Have you discussed this change with the InvokeAI team?
- [x] Yes
- [ ] No, because:
## Have you updated all relevant documentation?
- [x] Yes
- [ ] No
## Description
Add Face Mask to communituNodes.md list.
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [ ] Feature
- [ ] Bug Fix
- [ ] Optimization
- [x] Documentation Update
- [ ] Community Node Submission
## Have you discussed this change with the InvokeAI team?
- [ ] Yes
- [x] No, because: just updated docs to try to help lead new users to
installs a little easier
## Have you updated relevant documentation?
- [x] Yes
- [ ] No
## Description
Some minor docs tweaks
## Related Tickets & Documents
<!--
For pull requests that relate or close an issue, please include them
below.
For example having the text: "closes #1234" would connect the current
pull
request to issue 1234. And when we merge the pull request, Github will
automatically close the issue.
-->
- Related Issue #
- Closes #
## QA Instructions, Screenshots, Recordings
<!--
Please provide steps on how to test changes, any hardware or
software specifications as well as any other pertinent information.
-->
## Added/updated tests?
- [ ] Yes
- [x] No : _please replace this line with details on why tests
have not been included_
## [optional] Are there any post deployment tasks we need to perform?
## What type of PR is this? (check all applicable)
- [x] Refactor
- [ ] Feature
- [ ] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
- [ ] Community Node Submission
## Have you discussed this change with the InvokeAI team?
- [x] Yes
- [ ] No, because:
## Description
Revised boards logic and UI
## Related Tickets & Documents
<!--
For pull requests that relate or close an issue, please include them
below.
For example having the text: "closes #1234" would connect the current
pull
request to issue 1234. And when we merge the pull request, Github will
automatically close the issue.
-->
- Related Issue # discord convos
- Closes #
## QA Instructions, Screenshots, Recordings
<!--
Please provide steps on how to test changes, any hardware or
software specifications as well as any other pertinent information.
-->
## Added/updated tests?
- [ ] Yes
- [x] No : n/a
## [optional] Are there any post deployment tasks we need to perform?
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [ ] Feature
- [x] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
- [ ] Community Node Submission
## Description
On mps generating images with resolution above ~1536x1536 results in
"fried" output. Main problem that such resolution results in tensors in
size more then 4gb. Looks like that some of mps internals can't handle
properly this, so to mitigate it I break attention calculation in
chunks.
## QA Instructions, Screenshots, Recordings
Example of bad output:

## What type of PR is this? (check all applicable)
- [ X] Documentation Update
## Have you discussed this change with the InvokeAI team?
- [X ] Yes
- [ ] No, because:
## Description
This is a WIP to collect documentation enhancements and other polish
prior to final 3.0.0 release. Minor bug fixes may go in here if
non-controversial. It should be merged into main prior to the final
release.
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [ ] Feature
- [ ] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
- [ ] Community Node Submission
## Have you discussed this change with the InvokeAI team?
- [ ] Yes
- [ ] No, because:
## Have you updated relevant documentation?
- [ ] Yes
- [ ] No
## Description
## Related Tickets & Documents
<!--
For pull requests that relate or close an issue, please include them
below.
For example having the text: "closes #1234" would connect the current
pull
request to issue 1234. And when we merge the pull request, Github will
automatically close the issue.
-->
- Related Issue #
- Closes #
## QA Instructions, Screenshots, Recordings
<!--
Please provide steps on how to test changes, any hardware or
software specifications as well as any other pertinent information.
-->
## Added/updated tests?
- [ ] Yes
- [ ] No : _please replace this line with details on why tests
have not been included_
## [optional] Are there any post deployment tasks we need to perform?
## What type of PR is this? (check all applicable)
- [x] Bug Fix
## Desc
Fixes a bug where the board name is not displayed in the header if there
are no images in it.
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [x] Feature
- [ ] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
- [ ] Community Node Submission
## Have you discussed this change with the InvokeAI team?
- [x] Yes
- [ ] No, because:
## Description
Add progress preview for sdxl generation nodes
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [ X] Feature
- [ ] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
- [ ] Community Node Submission
## Have you discussed this change with the InvokeAI team?
- [X ] Yes
- [ ] No, because:
## Have you updated relevant documentation?
- [ X] Yes (swagger)
- [ ] No
## Description
This add new routes for getting and setting the command line console
logging level.
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [X] Feature
- [ ] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
- [ ] Community Node Submission
## Have you discussed this change with the InvokeAI team?
- [X] Yes Discussed with @hipsterusername yesterday
- [ ] No, because:
## Have you updated relevant documentation?
- [ ] Yes
- [X] No Not yet (but change to default ControlNet resizing doesn't
require any user documentation)
## Description
This PR adds resize modes (just_resize, crop_resize, fill_resize) to
InvokeAI's ControlNet node. The implementation is largely based on
lllyasviel's, which includes a high quality resizer specifically
intended to handle common ControlNet preprocessor outputs, such as
binary (black/white) images, grayscale images, and binary or grayscale
thin lines. Previously the InvokeAI ControlNet implementation only did a
simple resize with independent x/y scaling to match noise latent.
### "just_resize" mode (the default setting)
With the new implementation, using the default "just_resize" mode,
ControlNet images are still resized with independent x/y scaling to
match the noise latent resolution, but with the high quality resizer. As
a result, images generated in InvokeAI now look much closer to
counterparts generated via sd-webui-controlnet. See example below. All
inference runs are using prompt="old man", same ControlNet canny edge
detection preprocessor and model and control image, identical other
parameters except for control_mode. The top row is previous simple
resize implementation, the bottom row is with new high quality resizer
and "just_resize" mode. Control_mode is: left="balanced", middle="more
prompt", right="more control". The high quality resize images are
identical (at least by eye) to output from sd-webui-controlnet with same
settings.

## "crop_resize" and "fill_resize" modes
The other two resize modes are "crop_resize" and "fill_resize". Whereas
"just_resize" ignores any aspect ratio mismatch between the ControlNet
image and the noise latent, these other modes preserve the aspect ratio
of the ControlNet image. The "crop_resize" mode does this by cropping
the image, and the "fill_resize" option does this by expanding the image
(adding fill pixels). See example below. In this case all inference runs
are using prompt="old man", the ControlNet Midas depth detection
preprocessor and depth model, same control image of size 512x512,
control_mode="balanced", and identical other parameters except for
resize_mode and noise latent dimensions. For top row noise latent size
is 768x512, and for bottom row noise latent size is 512x768. Resize_mode
is: left="just_resize", middle="crop_resize", right="fill_resize"

## Are there any post deployment tasks we need to perform?
To use "just_resize" mode in linear UI, no post deployment work is
needed. The default is switched from old resizer to new high quality
resizer.
To use "just_resize", "crop_resize", and "fill_resize" modes in node UI,
no post deployment work is needed. There is also an additional option
"just_resize_simple" that uses old resizer, mainly left in for testing
and for anyone curious to see the difference.
To use "crop_resize" and "fill_resize" in linear UI, there will need to
be some work to incorporate choice of three modes in ControlNet UI
(probably best to not expose "just_resize_simple" in linear UI, it just
confuses things).
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [ ] Feature
- [ X] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
## Have you discussed this change with the InvokeAI team?
- [ X] Yes
- [ ] No, because:
## Description
This changes the "sync" route from a GET to POST method, in keeping with
the Representational Existential(?) State Transfer (REST) protocol.
* feat(ui): enhance clear intermediates feature
- retrieve the # of intermediates using a new query (just uses list images endpoint w/ limit of 0)
- display the count in the UI
- add types for clearIntermediates mutation
- minor styling and verbiage changes
* feat(ui): remove unused settings option for guides
* feat(ui): use solid badge variant
consistent with the rest of the usage of badges
* feat(ui): update board ctx menu, add board auto-add
- add context menu to system boards - only open is select board. did this so that you dont think its broken when you click it
- add auto-add board. you can right click a user board to enable it for auto-add, or use the gallery settings popover to select it. the invoke button has a tooltip on a short delay to remind you that you have auto-add enabled
- made useBoardName hook, provide it a board id and it gets your the board name
- removed `boardIdToAdTo` state & logic, updated workflows to auto-switch and auto-add on image generation
* fix(ui): clear controlnet when clearing intermediates
* feat: Make Add Board icon a button
* feat(db, api): clear intermediates now clears all of them
* feat(ui): make reset webui text subtext style
* feat(ui): board name change submits on blur
---------
Co-authored-by: blessedcoolant <54517381+blessedcoolant@users.noreply.github.com>
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [ ] Feature
- [ ] Bug Fix
- [ ] Optimization
- [x] Documentation Update
## Have you discussed this change with the InvokeAI team?
- [ ] Yes
- [x] No, because: documentation update that needs review from the team
before going live
## Description
I updated the contribution guidelines, adding more structure and a
getting started guide. Also re-organized the tabs to be in the order of
most commonly used.
## Related Tickets & Documents
<!--
For pull requests that relate or close an issue, please include them
below.
For example having the text: "closes #1234" would connect the current
pull
request to issue 1234. And when we merge the pull request, Github will
automatically close the issue.
-->
- Related Issue #
- Closes #
## QA Instructions, Screenshots, Recordings
run `mkdocs serve` to check it out
## Added/updated tests?
- [ ] Yes
- [X ] No : _please replace this line with details on why tests
have not been included_
## [optional] Are there any post deployment tasks we need to perform?
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [ ] Feature
- [X] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
## Have you discussed this change with the InvokeAI team?
- [X] Yes
- [ ] No, because:
## Description
ImageToLatentsInvocation defaulted to float16 rather than detect the
requested precision from configs.
This caused an exception to be raised on systems that don't support
float16 (e.g. CPU).
## Related Tickets & Documents
<!--
For pull requests that relate or close an issue, please include them
below.
For example having the text: "closes #1234" would connect the current
pull
request to issue 1234. And when we merge the pull request, Github will
automatically close the issue.
-->
- Related Issue #
- Closes #
## QA Instructions, Screenshots, Recordings
<!--
Please provide steps on how to test changes, any hardware or
software specifications as well as any other pertinent information.
-->
## Added/updated tests?
- [ ] Yes
- [x] No : _please replace this line with details on why tests
have not been included_
## [optional] Are there any post deployment tasks we need to perform?
* feat(ui): migrate listImages to RTK query using createEntityAdapter
- see comments in `endpoints/images.ts` for explanation of the caching
- so far, only manually updating `all` images when new image is generated. no other manual cache updates are implemented, but will be needed.
- fixed some weirdness with loading state components (like the spinners in gallery)
- added `useThumbnailFallback` for `IAIDndImage`, this displays the tiny webp thumbnail while the full-size images load
- comment out some old thunk related stuff in gallerySlice, which is no longer needed
* feat(ui): add manual cache updates for board changes (wip)
- update RTK Query caches when adding/removing single image to/from board
- work more on migrating all image-related operations to RTK Query
* update AddImagesToBoardContext so that it works when user uses context menu + modal
* handle case where no image is selected
* get assets working for main list and boards - dnd only
* feat(ui): migrate image uploads to RTK Query
- minor refactor of `ImageUploader` and `useImageUploadButton` hooks, simplify some logic
- style filesystem upload overlay to match existing UI
- replace all old `imageUploaded` thunks with `uploadImage` RTK Query calls, update associated logic including canvas related uploads
- simplify `PostUploadAction`s that only need to display user input
* feat(ui): remove `receivedPageOfImages` thunks
* feat(ui): remove `receivedImageUrls` thunk
* feat(ui): finish removing all images thunks
stuff now broken:
- image usage
- delete board images
- on first load, no image selected
* feat(ui): simplify `updateImage` cache manipulation
- we don't actually ever change categories, so we can remove a lot of logic
* feat(ui): simplify canvas autosave
- instead of using a network request to set the canvas generation as not intermediate, we can just do that in the graph
* feat(ui): simplify & handle edge cases in cache updates
* feat(db, api): support `board_id='none'` for `get_many` images queries
This allows us to get all images that are not on a board.
* chore(ui): regen types
* feat(ui): add `All Assets`, `No Board` boards
Restructure boards:
- `all images` is all images
- `all assets` is all assets
- `no board` is all images/assets without a board set
- user boards may have images and assets
Update caching logic
- much simpler without every board having sub-views of images and assets
- update drag and drop operations for all possible interactions
* chore(ui): regen types
* feat(ui): move download to top of context menu
* feat(ui): improve drop overlay styles
* fix(ui): fix image not selected on first load
- listen for first load of all images board, then select the first image
* feat(ui): refactor board deletion
api changes:
- add route to list all image names for a board. this is required to handle board + image deletion. we need to know every image in the board to determine the image usage across the app. this is fetched only when the delete board and images modal is opened so it's as efficient as it can be.
- update the delete board route to respond with a list of deleted `board_images` and `images`, as image names. this is needed to perform accurate clientside state & cache updates after deleting.
db changes:
- remove unused `board_images` service method to get paginated images dtos for a board. this is now done thru the list images endpoint & images service. needs a small logic change on `images.delete_images_on_board`
ui changes:
- simplify the delete board modal - no context, just minor prop drilling. this is feasible for boards only because the components that need to trigger and manipulate the modal are very close together in the tree
- add cache updates for `deleteBoard` & `deleteBoardAndImages` mutations
- the only thing we cannot do directly is on `deleteBoardAndImages`, update the `No Board` board. we'd need to insert image dtos that we may not have loaded. instead, i am just invalidating the tags for that `listImages` cache. so when you `deleteBoardAndImages`, the `No Board` will re-fetch the initial image limit. i think this is more efficient than e.g. fetching all image dtos to insert then inserting them.
- handle image usage for `deleteBoardAndImages`
- update all (i think/hope) the little bits and pieces in the UI to accomodate these changes
* fix(ui): fix board selection logic
* feat(ui): add delete board modal loading state
* fix(ui): use thumbnails for board cover images
* fix(ui): fix race condition with board selection
when selecting a board that doesn't have any images loaded, we need to wait until the images haveloaded before selecting the first image.
this logic is debounced to ~1000ms.
* feat(ui): name 'No Board' correctly, change icon
* fix(ui): do not cache listAllImageNames query
if we cache it, we can end up with stale image usage during deletion.
we could of course manually update the cache as we are doing elsewhere. but because this is a relatively infrequent network request, i'd like to trade increased cache mgmt complexity here for increased resource usage.
* feat(ui): reduce drag preview opacity, remove border
* fix(ui): fix incorrect queryArg used in `deleteImage` and `updateImage` cache updates
* fix(ui): fix doubled open in new tab
* fix(ui): fix new generations not getting added to 'No Board'
* fix(ui): fix board id not changing on new image when autosave enabled
* fix(ui): context menu when selection is 0
need to revise how context menu is triggered later, when we approach multi select
* fix(ui): fix deleting does not update counts for all images and all assets
* fix(ui): fix all assets board name in boards list collapse button
* fix(ui): ensure we never go under 0 for total board count
* fix(ui): fix text overflow on board names
---------
Co-authored-by: Mary Hipp <maryhipp@Marys-MacBook-Air.local>
* new route to clear intermediates
* UI to clear intermediates from settings modal
* cleanup
* PR feedback
---------
Co-authored-by: Mary Hipp <maryhipp@Marys-MacBook-Air.local>
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [ ] Feature
- [x] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
## Description
In transformers 4.31.0 `text_model.embeddings.position_ids` no longer
part of state_dict.
Fix untested as can't run right now but should be correct. Also need to
check how transformers 4.30.2 works with this fix.
## Related Tickets & Documents
8e5d1619b3 (diff-7f53db5caa73a4cbeb0dca3b396e3d52f30f025b8c48d4daf51eb7abb6e2b949R191)https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.register_buffer
## QA Instructions, Screenshots, Recordings
```
File "C:\Users\artis\Documents\invokeai\.venv\lib\site-packages\invokeai\backend\model_management\convert_ckpt_to_diffusers.py", line 844, in convert_ldm_clip_checkpoint
text_model.load_state_dict(text_model_dict)
File "C:\Users\artis\Documents\invokeai\.venv\lib\site-packages\torch\nn\modules\module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for CLIPTextModel:
Unexpected key(s) in state_dict: "text_model.embeddings.position_ids".
```
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [ ] Feature
- [X] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
## Have you discussed this change with the InvokeAI team?
- [ ] Yes
- [X] No, because:
## Description
Fix for
```
File "/home/invokeuser/InvokeAI/invokeai/app/services/processor.py",
line 70, in __process
outputs = invocation.invoke(
File "/home/invokeuser/InvokeAI/invokeai/app/invocations/latent.py",
line 660, in invoke
device=choose_torch_device()
NameError: name 'choose_torch_device' is not defined
```
when using scale latents node
## Related Tickets & Documents
<!--
For pull requests that relate or close an issue, please include them
below.
For example having the text: "closes #1234" would connect the current pull
request to issue 1234. And when we merge the pull request, Github will
automatically close the issue.
-->
- Related Issue #
- Closes #
## QA Instructions, Screenshots, Recordings
<!--
Please provide steps on how to test changes, any hardware or
software specifications as well as any other pertinent information.
-->
## Added/updated tests?
- [ ] Yes
- [ ] No : _please replace this line with details on why tests
have not been included_
## [optional] Are there any post deployment tasks we need to perform?
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [ ] Feature
- [ ] Bug Fix
- [ ] Optimization
- [X ] Documentation Update
## Have you discussed this change with the InvokeAI team?
- [ X] Yes
- [ ] No, because:
## Description
This PR points mkdocs to the `main` branch again, so that the 3.0.0
documentation appears in gh-pages.
It also makes a minor tweak to the tooltip for model imports, so that
users know that URLs are accepted.
Also rebuilds frontend for use in beta testing.
I've opted to leave out any additional upscaling parameters like scale
and denoising strength, which, from my review of the ESRGAN code, don't
do much:
- scale just resizes the image using CV2 after the AI upscaling, so
that's not particularly useful
- denoising strength is only valid for one class of model, which we are
no longer supporting
If there is demand, we can implement output size/scale UI and handle it
by passing the upscaled image to that a resize/scale node.
I also understand we previously had some functionality to blend the
upscaled image with the original. If that is desired, we would need to
implement that as a node that we can pass the upscaled image to.
Demo:
https://github.com/invoke-ai/InvokeAI/assets/4822129/32eee615-62a1-40ce-a183-87e7d935fbf1
---
[feat(nodes): add RealESRGAN_x2plus.pth, update upscale
nodes](dbc256c5b4)
- add `RealESRGAN_x2plus.pth` model to installer @lstein
- add `RealESRGAN_x2plus.pth` to `realesrgan` node
- rename `RealESRGAN` to `ESRGAN` in nodes
- make `scale_factor` optional in `img_scale` node
[feat(ui): restore ad-hoc
upscaling](b3fd29e5ad)
- remove face restoration entirely
- add dropdown for ESRGAN model select
- add ad-hoc upscaling graph and workflow
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [ ] Feature
- [ ] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
## Have you discussed this change with the InvokeAI team?
- [ ] Yes
- [ ] No, because:
## Description
## Related Tickets & Documents
<!--
For pull requests that relate or close an issue, please include them
below.
For example having the text: "closes #1234" would connect the current
pull
request to issue 1234. And when we merge the pull request, Github will
automatically close the issue.
-->
- Related Issue #
- Closes #
## QA Instructions, Screenshots, Recordings
<!--
Please provide steps on how to test changes, any hardware or
software specifications as well as any other pertinent information.
-->
## Added/updated tests?
- [ ] Yes
- [ ] No : _please replace this line with details on why tests
have not been included_
## [optional] Are there any post deployment tasks we need to perform?
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [ ] Feature
- [ ] Bug Fix
- [x] Optimization
- [ ] Documentation Update
## Have you discussed this change with the InvokeAI team?
- [x] Yes
- [ ] No, because:
## Description
There no vram cleanup on models offload which leads to filling vram and
slow generation speed.
## What type of PR is this? (check all applicable)
- [x] Feature
- [x] Optimization
## Have you discussed this change with the InvokeAI team?
- [ ] Yes
- [ ] No, because:
## Description
## Related Tickets & Documents
<!--
For pull requests that relate or close an issue, please include them
below.
For example having the text: "closes #1234" would connect the current
pull
request to issue 1234. And when we merge the pull request, Github will
automatically close the issue.
-->
- Related Issue #
- Closes #
## QA Instructions, Screenshots, Recordings
<!--
Please provide steps on how to test changes, any hardware or
software specifications as well as any other pertinent information.
-->
## Added/updated tests?
- [ ] Yes
- [ ] No : _please replace this line with details on why tests
have not been included_
## [optional] Are there any post deployment tasks we need to perform?
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [ ] Feature
- [x] Bug Fix
- [x] Optimization
- [ ] Documentation Update
## Description
Various fixes to consume less memory and make run sdxl on 8gb vram.
Most changes due to moving all output tensors to cpu, so that cached
tensors not consume vram.
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [ ] Feature
- [x] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
## Have you discussed this change with the InvokeAI team?
- [x] Yes
- [ ] No, because:
## Description
Fixes a bug in the `inpaint` node introduced by the new version of
`compel`. The other nodes were updated, but this one was missed. Fixed
by @StAlKeR7779 ty
## Related Tickets & Documents
<!--
For pull requests that relate or close an issue, please include them
below.
For example having the text: "closes #1234" would connect the current
pull
request to issue 1234. And when we merge the pull request, Github will
automatically close the issue.
-->
- Related Issue # discord reports
- Closes #
## QA Instructions, Screenshots, Recordings
<!--
Please provide steps on how to test changes, any hardware or
software specifications as well as any other pertinent information.
-->
## Added/updated tests?
- [ ] Yes
- [x] No : n/a, bugfix
This contains minor fixes to the beta as well as the version bump to
3.0.0.
Fixes include:
- Warning user when the installer window size is inadequate for the TUI.
- Selection of the most frequently downloaded controlnet models for
default installation.
- Adding the LowRA LoRA for dark image enhancement
- Documentation
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [x] Feature
- [x] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
## Have you discussed this change with the InvokeAI team?
- [x] Yes
- [ ] No, because:
## Description
Making some final style fixes before we push the next 3.0 version
tomorrow.
- Fixed light mode colors in Settings Modal.
- Double checked other light mode colors. Nothing seems off.
- Added Base Model badge to the model list item. Makes it visually
better and also serves as a quick glance feature for the user.
- Some minor styling updates to the Node Editor.
- Fixed hotkeys 'G' and 'O', 'Shift+G' and 'Shift+O' used to toggle the
panels not resizing canvas. #3780
- Fixed hotkey 'N' not working for Snap To Grid on Canvas.
- Fixed brush opacity hotkeys not working.
- Cleaned up hotkeys modal of hotkeys that are no longer used.
- Updated compel requirement to `2.0.0`
## Related Tickets & Documents
<!--
For pull requests that relate or close an issue, please include them
below.
For example having the text: "closes #1234" would connect the current
pull
request to issue 1234. And when we merge the pull request, Github will
automatically close the issue.
-->
- Related Issue #
- Closes#3780
## QA Instructions, Screenshots, Recordings
<!--
Please provide steps on how to test changes, any hardware or
software specifications as well as any other pertinent information.
-->
## Added/updated tests?
- [ ] Yes
- [x] No : _please replace this line with details on why tests
have not been included_
## [optional] Are there any post deployment tasks we need to perform?
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [x] Feature
- [ ] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
## Have you discussed this change with the InvokeAI team?
- [x] Yes
- [ ] No, because:
## Description
hides sdxl models from linear ui model select. just a hold-me-over
## Related Tickets & Documents
<!--
For pull requests that relate or close an issue, please include them
below.
For example having the text: "closes #1234" would connect the current
pull
request to issue 1234. And when we merge the pull request, Github will
automatically close the issue.
-->
- Related Issue #
- Closes #
## QA Instructions, Screenshots, Recordings
<!--
Please provide steps on how to test changes, any hardware or
software specifications as well as any other pertinent information.
-->
## Added/updated tests?
- [ ] Yes
- [x] No : n/a
## [optional] Are there any post deployment tasks we need to perform?
- add `RealESRGAN_x2plus.pth` model to installer
- add `RealESRGAN_x2plus.pth` to `realesrgan` node
- rename `RealESRGAN` to `ESRGAN` in nodes
- make `scale_factor` optional in `img_scale` node
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [x] Feature
- [ ] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
## Have you discussed this change with the InvokeAI team?
- [ ] Yes
- [x] No, because:
If its not useful, they do not have to use it 😄
## Description
While I was still in the viewportcontrols.tsx
added Option to toggle off the minimap with default being on(true)
added Tooltips to the buttons in viewportcontrols.tsx
## Related Tickets & Documents
<!--
For pull requests that relate or close an issue, please include them
below.
For example having the text: "closes #1234" would connect the current
pull
request to issue 1234. And when we merge the pull request, Github will
automatically close the issue.
-->
- Related Issue #
- Closes #
## QA Instructions, Screenshots, Recordings
<!--
Please provide steps on how to test changes, any hardware or
software specifications as well as any other pertinent information.
-->
## Added/updated tests?
- [ ] Yes
- [ ] No : _please replace this line with details on why tests
have not been included_
## [optional] Are there any post deployment tasks we need to perform?
This is a WIP to add SDXL support.
Tasks:
- [x] SDXL model loading support
- [x] SDXL model installation
- [x] SDXL model loader
- [x] SDXL base invocations for text2latent and latent2latent
- [ ] SDXL refiner invocations for text2latent and latent2latent
- [x] Compel support / pooled embeddings
- [ ] Linear UI graph for SDXL
- [ ] Documentation
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [ ] Feature
- [ ] Bug Fix
- [ ] Optimization
- [ ] Documentation Update
## Have you discussed this change with the InvokeAI team?
- [ ] Yes
- [ ] No, because:
## Description
## Related Tickets & Documents
<!--
For pull requests that relate or close an issue, please include them
below.
For example having the text: "closes #1234" would connect the current
pull
request to issue 1234. And when we merge the pull request, Github will
automatically close the issue.
-->
- Related Issue #
- Closes #
## QA Instructions, Screenshots, Recordings
<!--
Please provide steps on how to test changes, any hardware or
software specifications as well as any other pertinent information.
-->
## Added/updated tests?
- [ ] Yes
- [ ] No : _please replace this line with details on why tests
have not been included_
## [optional] Are there any post deployment tasks we need to perform?
fix json formatting to not have big red comment blocks
## What type of PR is this? (check all applicable)
- [ ] Refactor
- [ ] Feature
- [ ] Bug Fix
- [ ] Optimization
- [X] Documentation Update
## Have you discussed this change with the InvokeAI team?
- [ ] Yes
- [X] No, because: simple docs fix
## Description
Fix LOCAL_DEVELOPMENT.md json comment highlighting
## Related Tickets & Documents
<!--
For pull requests that relate or close an issue, please include them
below.
For example having the text: "closes #1234" would connect the current
pull
request to issue 1234. And when we merge the pull request, Github will
automatically close the issue.
-->
- Related Issue # n/a
- Closes # n/a
## QA Instructions, Screenshots, Recordings
<!--
Please provide steps on how to test changes, any hardware or
software specifications as well as any other pertinent information.
-->
## Added/updated tests?
- [ ] Yes
- [x] No : simple docs change
This PR completely ports over the Model Manager to 3.0 -- all of the
functionality has now been restored in addition to the following
changes.
- Model Manager now has been moved to its own tab on the left hand side.
- Model Manager has three tabs - Model Manager, Import Models and Merge
Models
- The edit forms for the Models now allow the users to update the model
name and the base model too along with other details.
- Checkpoint Edit form now displays the available config files from
InvokeAI and also allows users to supply their own custom config file.
- Under Import Models you can directly add models or a scan a folder for
your checkpoint files.
- Adding models has two modes -- Simple and Advanced.
- In Simple Mode, you just simply need to pass a path and InvokeAI will
try to determine kind of model it is and fill up the rest of the details
accordingly. This input lets you supply local paths to diffusers / local
paths to checkpoints / huggingface repo ID's to download models /
CivitAI links.
- Simple Mode also allows you to download different models types like
VAE's and Controlnet models and etc. Not just main models.
- In cases where the auto detection system of InvokeAI fails to read a
model correctly, you can take the manual approach and go to Advanced
where you can configure your model while adding it exactly the way you
want it. Both Diffusers and Checkpoint models now have their own custom
forms.
- Scan Models has been cleaned up. It will now only display the models
that are not already installed to InvokeAI. And each item will have two
options - Quick Add and Advanced .. replicating the Add Model behavior
from above.
- Scan Models now has a search bar for you to search through your
scanned models.
- Merge Models functionality has been restored.
This is a wrap for this PR.
**TODO: (Probably for 3.1)**
- Add model management for model types such as VAE's and ControlNet
Models
- Replace the VAE slot on the edit forms with the installed VAE drop
down + custom option
[feat(nodes): emit model loading
events](7b6159f8d6)
- remove dependency on having access to a `node` during emits, would
need a bit of additional args passed through the system and I don't
think its necessary at this point. this also allowed us to drop an
extraneous fetching/parsing of the session from db.
- provide the invocation context to all `get_model()` calls, so the
events are able to be emitted
- test all model loading events in the app and confirm socket events are
received
[feat(ui): add listeners for model load
events](c487166d9c)
- currently only exposed as DEBUG-level logs
---
One change I missed in the commit messages is the `ModelInfo` class is
not serializable, so I split out the pieces of information we didn't
already have (hash, location, precision) and added them to the event
payload directly.
This small patch improves the stability of `invokeai-*` scripts by
avoiding crashes in the model manager while scanning the models
directory for new and removed models.
Both support the same actions:
- Open in new tab
- Copy image (if supported by browser)
- Use prompt
- Use seed
- Use all
- Send to img2img
- Send to canvas
- Change board
- Download image
- Delete
- restore copy image functionality* in image context menu, current image buttons
- give IAIDndImage the same context menu
* copying image to clipboard is not possible on Firefox unless the user enables a setting which is disabled by default. if the browser does not support copying an image, the copy functionality is disabled.
- filename -> file_path
- pre and post prompt changed to optional
- clearer pre and post prompt descriptions
- handle pre and post prompt passed as None
- max_prompts defaults to 1 isted of 0 to avoid accidentally processing large prompt files with it set to 0 when adding a new node.
This PR adds several default models to the ones selected at install
time. It also removes the GFPGAN and text2clip models, which should
shave a little time off the install process.
## ESRGAN:
* models/core/upscaling/realesrgan/RealESRGAN_x4plus.pth
* models/core/upscaling/realesrgan/RealESRGAN_x4plus_anime_6B.pth
*
models/core/upscaling/realesrgan/ESRGAN_SRx4_DF2KOST_official-ff704c30.pth
## ControlNet
* models/sd-1/controlnet/canny
* models/sd-1/controlnet/depth
* models/sd-1/controlnet/lineart
* models/sd-1/controlnet/openpose
## Embedding (textual inversion)
* models/sd-1/embedding/EasyNegative.safetensors
- remove dependency on having access to a `node` during emits, would need a bit of additional args passed through the system and I don't think its necessary at this point. this also allowed us to drop an extraneous fetching/parsing of the session from db.
- provide the invocation context to all `get_model()` calls, so the events are able to be emitted
- test all model loading events in the app and confirm socket events are received
- update controlnet state to use object format for model
- update model-parsing helper functions to log errors
- update nodes components, types and state
- remove controlnets from state when models are loaded and the controlnet's model is not available
# Multiple enhancements to model manager REACT API
1. add a `/sync` route for synchronizing the in-memory model lists to
models.yaml, the models directory, and the autoimport directories.
2. added optional destination directories to convert_model and
merge_model operations.
3. added a `/ckpt_confs` route for retrieving known legacy checkpoint
configuration files.
4. added a `/search` route for finding all models in a directory located
in the server filesystem
5. added a `/add` route for manual addition of a local models
6. added a `/rename` route for renaming and/or rebasing models
7. changed the path of the `import_model` route to `/import`
# Slightly annoying detail:
When adding a model manually using `/add`, the body JSON must exactly
match one of the model configurations returned by `list_models` (i.e.
there is no defaulting of fields). This includes the `error` field,
which should be set to "null".
1. add a /sync route for synchronizing the in-memory model lists to
models.yaml, the models directory, and the autoimport directories.
2. add optional destination_directories to convert_model and merge_model
operations.
3. add /ckpt_confs route for retrieving known legacy checkpoint configuration
files.
4. add /search route for finding all models in a directory located in the server
filesystem
DONE:
- Restore Update Model functionality
- Restore Delete Model functionality
- Restore Model Convert functionality
- Restore Model Merge functionality
- Refine UX (fine tweaks when everything is done - TODO)
TODO
- Add Model (will be finished in a future PR once the backend work is
done)
IAIMantineSelect and IAIMantineMultiSelect have a bit of extra logic that prevents simple select functionality from working as expected.
- extract the styles into hooks
- rename those two components to IAIMantineSearchableSelect and IAIMantineSearchableMultiSelect
- Create IAIMantineSelect (which is just a dropdown) and use it in model manager and a few other places
When we only have a few options to present and searching is not efficient, we should use this instead.
Image files are immutable and we expect deletion to result in no further
requests for a given image, so we can set the max-age to something
thicc.
Resolves#3426
@ebr @brandonrising @maryhipp
- simplify UI logic in `ModelManagerPanel` components
- fix up the types a bit to make it easier to select models
- remove `openModel` state, just make it a useState since it is very local to model manager
similar to the previous commit, update the node editor to not just store models as strings - instead, store the model object.
the model select components in nodes are now just kinda copy-pastes over the linear UI versions of the same components, but they were different enough that we can't just share them.
i explored adding some props to override the linear ui components' logic, but it was too brittle. so just copy/paste.
We were storing all types of models by their model ID, which is a format like `sd-1/main/deliberate`.
This meant we had to do a lot of extra parsing, because nodes actually wants something like `{base_model: 'sd-1', model_name: 'deliberate'}`.
Some of this parsing was done with zod's error-throwing `parse()` method, and in other places it was done with brittle string parsing.
This commit refactors the state to use the object form of models.
There is still a bit of string parsing done in the to construct the ID from the object form, but it's far less complicated.
Also, the zod parsing is now done using `safeParse()`, which does not throw. This requires a few more conditional checks, but should prevent further crashes.
* feat(ui): salvaged gallery UI enhancements
* restore boardimage functionality, load boardimages and remove some cachine optimizations in the name of data integrity
* fix assets, fix load more params
* jk NOW fix assets, fix load more params
---------
Co-authored-by: Mary Hipp <maryhipp@Marys-MacBook-Air.local>
Co-authored-by: Mary Hipp Rogers <maryhipp@gmail.com>
- available infill methods is server state - remove it from client state, use the query to populate the dropdown
- add listener to ensure the selected infill method is an available one
As it said in comment to this branch we want to use conditioning run:
```python
if cfg_injection: # only applying ControlNet to conditional instead of in unconditioned
```
But in code used unconditioning
embeddings(`conditioning_data.unconditioned_embeddings`).
Later in code confirms that we want to run conditioning generation by
comment and tensor concatenation order(as all code expect to get [uc, c]
tensor):
```python
if cfg_injection:
# Inferred ControlNet only for the conditional batch.
# To apply the output of ControlNet to both the unconditional and conditional batches,
# add 0 to the unconditional batch to keep it unchanged.
down_samples = [torch.cat([torch.zeros_like(d), d]) for d in down_samples]
mid_sample = torch.cat([torch.zeros_like(mid_sample), mid_sample])
```
Adds a Clear Nodes Button with Confirmation Dialog, I think I Did it
right 😃
I am sure there is a way to make the Confirmation look better and have
Yes/No instead of OK/Cancel
- Restore recall functionality to `CurrentImageButtons` and `ImageContextMenu`.
- Debounce metadata requests for `ImageMetadataViewer` and `CurrentImageButtons` by 500ms. It's possible to scroll through these really fast, so we want to debounce the network requests.
- `ImageContextMenu` is lazy-mounted so it does not need to be debounced; it makes the metadata request as soon as you click it.
- Move next/prev image selection logic into hook and add the hotkeys for this to `CurrentImageButtons`. The hotkeys now work when metadata viewer is open.
I will follow up with improved loading state during the debounced calls in the future
- Update for new routes
- Update model storage in state to be `MainModelField` type instead of `string`, simplifies a lot of model handling
- Update model-related stuff for model `name` --> `model_name`
- Update linear graphs to use `MetadataAccumulator`
- Update `ImageMetadataViewer` UI
- Ensure all `recall` functions work (well, the ones that are active anyways)
Metadata for the Linear UI is now sneakily provided via a `MetadataAccumulator` node, which the client populates / hooks up while building the graph.
Additionally, we provide the unexpanded graph with the metadata API response.
Both of these are embedded into the PNGs.
- Remove `metadata` from `ImageDTO`
- Split up the `images/` routes to accomodate this; metadata is only retrieved per-image
- `images/{image_name}` now gets the DTO
- `images/{image_name}/metadata` gets the new metadata
- `images/{image_name}/full` gets the full-sized image file
- Remove old metadata service
- Add `MetadataAccumulator` node, `CoreMetadataField`, hook up to `LatentsToImage` node
- Add `get_raw()` method to `ItemStorage`, retrieves the row from DB as a string, no pydantic parsing
- Update `images`related services to handle storing and retrieving the new metadata
- Add `get_metadata_graph_from_raw_session` which extracts the `graph` from `session` without needing to hydrate the session in pydantic, in preparation for providing it as metadata; also removes all references to the `MetadataAccumulator` node
Our model fields use `model_name`, but the API response uses `name`. Some places use `model_type` but the API response used `type`.
Changed the API response to provide `model_name` and `model_type`, which simplifies how we manage models on the client substantially.
- rewrite Dockerfile
- add a stage to build the UI
- add docker-compose.yml
- add docker-entrypoint.sh such that any command may be used at runtime
- docker-compose adds .env support - add a sample .env file
* fix the test of the config system
* Add torchmetrics==0.11.4 to installer
- Closes#3700
- Closes#3658
---------
Co-authored-by: Lincoln Stein <lstein@gmail.com>
Co-authored-by: Eugene Brodsky <ebr@users.noreply.github.com>
To be consistent with max_cache_size, the amount of memory to hold in
VRAM for model caching is now controlled by the max_vram_cache_size
configuration parameter.
[feat(ui): memoize ImageContextMenu
selector](265996d230)
Without the selector itself being memoized, the gallery was rerendering
on every progress image.
[feat(ui): memoize NextPrevImageButtons
component](a7b8109ac2)
This was rerendering on every progress image, now it doesn't
[fix(ui): correctly set disabled on invoke button during
generation](1c45d18e6d)
It wasn't disabled when it should have been, looked clickable during
generation.
[fix(nodes): remove board_id column from images
table](00e26ffa9a)
This is extraneous; the `board_images` table holds image-board
relationships. @maryhipp
Image files are immutable and we expect deletion to result in no further requests for a given image, so we can set the max-age to something thicc.
Resolves#3426
Just a small thing now, as nodes are all still wip, but since
@psychedelicious was nice enough to add the progress image node for me,
what I noticed was missing now is the cancel button on nodes tab
@psychedelicious @blessedcoolant Somehow i deleted the branch the other
version of this pull request was on. 🤭
Just an idea, if you think its worth while please make changes ( I did
what I could)
I added a load more to the right arrow to avoid having to open gallery
to load more images,
I am not sure about the icon i used, maybe it should just be the normal
arrow, so you don't even need to show its loading more images.
there is an issue with it not disappearing once all images have been
loaded, (I did play around for a while to try and fix that)
Some users want the model select to take full width coz their model
names might be long. As this is a more frequently used feature,
rearrange it to do that.
Followed by VAE (as it is related to the model) and the Sampler next to
it.
I made a recent change to the function that finds the default root
directory locatoin that broke it when run under Conda (where VIRTUAL_ENV
is not set). This revision fixes the issue.
Mantine's multiselect does not let you edit the search box with mouse, paste into it, etc. Normal select is fine.
I can't remember why I made Lora etc multiselects, but everything seems to work with normal selects, so I've change to that.
- `isLoading` - now `true` *only* on first load
- added `isFetching` - `true` whenever gallery images are fetching
- on first load, show a spinner instead of skeletons. this prevents an awkward flash of skeletons into empty gallery when the gallery doesn't have enough images to fill it.
- removed `imageCategoriesChanged` listener, bc now on app start, both images and assets will be populated. leaving this in caused jank flashes of skeletons when switching gallery tabs when gallery doesn't have images to load
taking the coward's way out on this and just fetching 100 images & 100 assets on app start...
- add `appStarted` action, dispatched once on mount in App.tsx. listener fetches 100 images & 100 assets
- fix bug with selectedBoardId & assets tab
The shift key listener didn't catch pressed when focused in a textarea
or input field, causing jank on slider number inputs.
Add keydown and keyup listeners to all such fields, which ensures that
the `shift` state is always correct.
Also add the action tracking it to `actionsDenylist` to not clutter up
devtools.
The shift key listener didn't catch pressed when focused in a textarea or input field, causing jank on slider number inputs.
Add keydown and keyup listeners to all such fields, which ensures that the `shift` state is always correct.
Also add the action tracking it to `actionsDenylist` to not clutter up devtools.
There was a props on IAISlider to make the input component readonly - I
didn't know this existed and at some point used a component with that
prop as a template for other sliders, copying the flag over.
It's not actually used anywhere, so I removed the prop entirely,
enabling the number inputs everywhere.
There was a props on IAISlider to make the input component readonly - I didn't know this existed and at some point used a component with that prop as a template for other sliders, copying the flag over.
It's not actually used anywhere, so I removed the prop entirely, enabling the number inputs everywhere.
I'm not sure if this was just my local install, but even after a fresh
`yarn install` my upload network request was failing because no file was
passed in. I don't think the `bodySerializer` part is getting run
I'm not sure if this was just my local install, but even after a fresh
`yarn install` my upload network request was failing because no file was
passed in. I don't think the `bodySerializer` part is getting run
This PR is to allow FP16 precision to work on Macs with MPS. In
addition, it centralizes the torch fixes/workarounds required for MPS
into a new backend utility `mps_fixes.py`. This is conditionally
imported in `api_app.py`/`cli_app.py`.
Many MANY thanks to @StAlKeR7779 for patiently working to debug and fix
these issues.
- No longer fail root directory probing if invokeai.yaml is missing
(test is now whether a `models/core` directory exists).
- Migrate script does not overwrite previously-installed models.
- Can run migrate script on an existing 2.3 version directory
with --from and --to pointing to same 2.3 root.
Clip Skip breaks when you supply a number greater than the number of
layers for the model type. So capping this out based on the model on the
frontend
- `sd-1` at 12
- `sd-2` at 24
- Will update later to whatever SDXL needs if it is different.
- Also fixes LoRA's breaking with Clip Skip.
My PR to fix an issue with the handling of formdata in `openapi-fetch` is released. This means we no longer need to patch the package (no patches at all now!).
This PR bumps its version and adds a transformer to our typegen script to handle typing binary form fields correctly as `Blob`.
Also regens types.
This is PR adds the following API methods for managing models:
* list_models (GET)
* update_model (PATCH)
* import_model (POST)
* delete_model (DELETE)
* convert_model (PUT)
* merge_models (PUT)
* load images on gallery render
* wait for models to be loaded before you can invoke
---------
Co-authored-by: Mary Hipp <maryhipp@Marys-MacBook-Air.local>
This PR enables model manager importation of diffusers-style .bin LoRAs.
However, since there is no backend support for this type of LoRA yet,
attempts to use them will result in an unimplemented error.
It closes#3636 and #3637
The list models route should just be the base route path, and should use query parameters as opposed to path parameters (which cannot be optional)
Removed defaults for update model route - for the purposes of the API, we should always be explicit with this
This PR fixes the migrate script so that it uses the same directory for
both the tokenizer and text encoder CLIP models. This will fix a crash
that occurred during checkpoint->diffusers conversions
This PR also removes the check for an existing models directory in the
target root directory when `invokeai-migrate3` is run.
* close modal when user clicks cancel
* close modal when delete image context cleared
---------
Co-authored-by: Mary Hipp <maryhipp@Marys-MacBook-Air.local>
A user discovered that 2.3 models whose symbolic names contain the "/"
character are not imported properly by the `migrate-models-3` script.
This fixes the issue by changing "/" to underscore at import time.
- Accordions now may be opened or closed regardless of whether or not
their contents are enabled or active
- Accordions have a short text indicator alerting the user if their
contents are enabled, either a simple `Enabled` or, for accordions like
LoRA or ControlNet, `X Active` if any are active
https://github.com/invoke-ai/InvokeAI/assets/4822129/43db63bd-7ef3-43f2-8dad-59fc7200af2e
- Accordions now may be opened or closed regardless of whether or not their contents are enabled or active
- Accordions have a short text indicator alerting the user if their contents are enabled, either a simple `Enabled` or, for accordions like LoRA or ControlNet, `X Active` if any are active
This caused a lot of re-rendering whenever the selection changed, which caused a huge performance hit. It also made changing the current image lag a bit.
Instead of providing an array of image names as a multi-select dnd payload, there is now no multi-select dnd payload at all - instead, the payload types are used by the `imageDropped` listener to pull the selection out of redux.
Now, the only big re-renders are when the selectionCount changes. In the future I'll figure out a good way to do image names as payload without incurring re-renders.
Every `GalleryImage` was rerendering any time the app rerendered bc the selector function itself was not memoized. This resulted in the memoization cache inside the selector constantly being reset.
Same for `BatchImage`.
Also updated memoization for a few other selectors.
Eg `useGetMainModelsQuery()`, `useGetLoRAModelsQuery()` instead of `useListModelsQuery({base_type})`.
Add specific adapters for each model type. Just more organised and easier to consume models now.
Also updated LoRA UI to use the model name.
This PR is to allow FP16 precision to work on Macs with MPS. In addition, it centralizes the torch fixes/workarounds
required for MPS into a new backend utility file `mps_fixes.py`. This is conditionally imported in `api_app.py`/`cli_app.py`.
Many MANY thanks to StAlKeR7779 for patiently working to debug and fix these issues.
This PR is for adjusting the unit tests in the `tests` directory so that
they no longer throw errors.
I've removed two tests that were obsoleted by the shift to latent nodes,
but `test_graph_execution_state.py` and `test_invoker.py` are throwing
this validation error:
```
TypeError: InvocationServices.__init__() missing 2 required positional arguments: 'boards' and 'board_images'
```
The `invokeai-configure` script migrates the old invokeai.init file to
the new invokeai.yaml format. However, the parser for the invokeai.init
file was missing the names of the k* samplers and was giving a parser
error on any invokeai.init file that referred to one of these samplers.
This PR fixes the problem.
Ironically, there is no longer the concept of the preferred scheduler in
3.0, and so these sampler names are simply ignored and not written into
`invokeai.yaml`
This introduces the core functionality for batch operations on images and multiple selection in the gallery/batch manager.
A number of other substantial changes are included:
- `imagesSlice` is consolidated into `gallerySlice`, allowing for simpler selection of filtered images
- `batchSlice` is added to manage the batch
- The wonky context pattern for image deletion has been changed, much simpler now using a `imageDeletionSlice` and redux listeners; this needs to be implemented still for the other image modals
- Minimum gallery size in px implemented as a hook
- Many style fixes & several bug fixes
TODO:
- The UI and UX need to be figured out, especially for controlnet
- Batch processing is not hooked up; generation does not do anything with batch
- Routes to support batch image operations, specifically delete and add/remove to/from boards
@blessedcoolant it looks like with the new theme buttons not being
transparent the progress bar was completely hidden, I moved to be on
top, however it was not transparent so it hid the invoke text, after
trying for a while couldn't get it to be transparent, so I just made the
height 15%,
- Set min size for floating gallery panel
- Correct the default pinned width (it cannot be less than the min width
and this was sometimes happening during window resize)
- Set min size for floating gallery panel
- Correct the default pinned width (it cannot be less than the min width and this was sometimes happening during window resize)
Add `useMinimumPanelSize()` hook to provide minimum resizable panel sizes (in pixels).
The library we are using for the gallery panel uses percentages only. To provide a minimum size in pixels, we need to do some math to calculate the percentage of window size that corresponds to the desired min width in pixels.
The node polyfills needed to run the `swagger-parser` library (used to
dereference the OpenAPI schema) cause the canvas tab to immediately
crash when the package build is used in another react application.
I'm sure this is fixable but it's not clear what is causing the issue
and troubleshooting is very time consuming.
Selectively rolling back the implementation of `swagger-parser`.
The node polyfills needed to run the `swagger-parser` library (used to dereference the OpenAPI schema) cause the canvas tab to immediately crash when the package build is used in another react application.
I'm sure this is fixable but it's not clear what is causing the issue and troubleshooting is very time consuming.
Selectively rolling back the implementation of `swagger-parser`.
[feat(ui): remove themes, add hand-crafted dark and light
modes](032c7e68d0)
[032c7e6](032c7e68d0)
Themes are very fun but due to the differences in perceived saturation
and lightness across the
the color spectrum, it's impossible to have have multiple themes that
look great without hand-
crafting *every* shade for *every* theme. We've ended up with 4 OK
themes (well, 3, because the
light theme was pretty bad).
I've removed the themes and added color mode support. There is now a
single dark and light mode,
each with their own color palette and the classic grey / purple / yellow
invoke colors that
@blessedcoolant first designed.
I've re-styled almost everything except the model manager and lightbox,
which I keep forgetting
to work on.
One new concept is the Chakra `layerStyle`. This lets us define "layers"
- think body, first layer,
second layer, etc - that can be applied on various components. By
defining layers, we can be more
consistent about the z-axis and its relationship to color and lightness.
Themes are very fun but due to the differences in perceived saturation and lightness across the
the color spectrum, it's impossible to have have multiple themes that look great without hand-
crafting *every* shade for *every* theme. We've ended up with 4 OK themes (well, 3, because the
light theme was pretty bad).
I've removed the themes and added color mode support. There is now a single dark and light mode,
each with their own color palette and the classic grey / purple / yellow invoke colors that
@blessedcoolant first designed.
I've re-styled almost everything except the model manager and lightbox, which I keep forgetting
to work on.
One new concept is the Chakra `layerStyle`. This lets us define "layers" - think body, first layer,
second layer, etc - that can be applied on various components. By defining layers, we can be more
consistent about the z-axis and its relationship to color and lightness.
The TS Language Server slows down immensely with our translation JSON, which is used to provide kinda-type-safe translation keys. I say "kinda", because you don't get autocomplete - you only get red squigglies when the key is incorrect.
To improve the performance, we can opt out of this process entirely, at the cost of no red squigglies for translation keys. Hopefully we can resolve this in the future.
It's not clear why this became an issue only recently (like past couple weeks). We've tried rolling back the app dependencies, VSCode extensions, VSCode itself, and the TS version to before the time when the issue started, but nothing seems to improve the performance.
1. Disable `resolveJsonModule` in `tsconfig.json`
2. Ignore TS in `i18n.ts` when importing the JSON
3. Comment out the custom types in `i18.d.ts` entirely
It's possible that only `3` is needed to fix the issue.
I've tested building the app and running the build - it works fine, and translation works fine.
Rewrite lora to be applied by model patching as it gives us benefits:
1) On model execution calculates result only on model weight, while with
hooks we need to calculate on model and each lora
2) As lora now patched in model weights, there no need to store lora in
vram
Results:
Speed:
| loras count | hook | patch |
| --- | --- | --- |
| 0 | ~4.92 it/s | ~4.92 it/s |
| 1 | ~3.51 it/s | ~4.89 it/s |
| 2 | ~2.76 it/s | ~4.92 it/s |
VRAM:
| loras count | hook | patch |
| --- | --- | --- |
| 0 | ~3.6 gb | ~3.6 gb |
| 1 | ~4.0 gb | ~3.6 gb |
| 2 | ~4.4 gb | ~3.7 gb |
As based on #3547 wait to merge.
# Restore invokeai-configure and invokeai-model-install
This PR updates invokeai-configure and invokeai-model-install to work
with the new model manager file layout. It addresses a naming issue for
`ModelType.Main` (was `ModelType.Pipeline`) requested by
@blessedcoolant, and adds back the feature that allows users to dump
models into an `autoimport` directory for discovery at startup time.
Trying to get a few ControlNet extras in before 3.0 release:
- SegmentAnything ControlNet preprocessor node
- LeResDepth ControlNet preprocessor node (but commented out till
controlnet_aux v0.0.6 is released & required by InvokeAI)
- TileResampler ControlNet preprocessor node (should be equivalent to
Mikubill/sd-webui-controlnet extension tile_resampler)
- fix for Midas ControlNet preprocessor error with images that have
alpha channel
Example usage of SegmentAnything preprocessor node:

The installer TUI requires a minimum window width and height to provide
a satisfactory user experience. If, after trying and exhausting all
means of enlarging the window (on Linux, Mac and Windows) the window is
still too small, this PR generates a message telling the user to enlarge
the window and pausing until they do so. If the user fails to enlarge
the window the program will proceed, and either issue an error message
that it can't continue (on Windows), or show a clipped display that the
user can remedy by enlarging the window.
"Fixes" the test suite generally so it doesn't fail CI, but some tests
needed to be skipped/xfailed due to recent refactor.
- ignore three test suites that broke following the model manager
refactor
- move `InvocationServices` fixture to `conftest.py`
- add `boards` items to the `InvocationServices` fixture
This PR makes the unit tests work, but end-to-end tests are temporarily
commented out due to `invokeai-configure` being broken in `main` -
pending #3547
Looks like a lot of the tests need to be rewritten as they reference
`TextToImageInvocation` / `ImageToImageInvocation`
fixes the test suite generally, but some tests needed to be
skipped/xfailed due to recent refactor
- ignore three test suites that broke following the model manager
refactor
- move InvocationServices fixture to conftest.py
- add `boards` InvocationServices to the fixture
This PR adds the "control_mode" option to ControlNet implementation.
Possible control_mode options are:
- balanced -- this is the default, same as previous implementation
without control_mode
- more_prompt -- pays more attention to the prompt
- more _control -- pays more attention to the ControlNet (in earlier
implementations this was called "guess_mode")
- unbalanced -- pays even more attention to the ControlNet
balanced, more_prompt, and more_control should be nearly identical to
the equivalent options in the [auto1111 sd-webui-controlnet
extension](https://github.com/Mikubill/sd-webui-controlnet#more-control-modes-previously-called-guess-mode)
The changes to enable balanced, more_prompt, and more_control are
managed deeper in the code by two booleans, "soft_injection" and
"cfg_injection". The three control mode options in sd-webui-controlnet
map to these booleans like:
!soft_injection && !cfg_injection ⇒ BALANCED
soft_injection && cfg_injection ⇒ MORE_CONTROL
soft_injection && !cfg_injection ⇒ MORE_PROMPT
The "unbalanced" option simply exposes the fourth possible combination
of these two booleans:
!soft_injection && cfg_injection ⇒ UNBALANCED
With "unbalanced" mode it is very easy to overdrive the controlnet
inputs. It's recommended to use a cfg_scale between 2 and 4 to mitigate
this, along with lowering controlnet weight and possibly lowering "end
step percent". With those caveats, "unbalanced" can yield interesting
results.
Example of all four modes using Canny edge detection ControlNet with
prompt "old man", identical params except for control_mode:

Top middle: BALANCED
Top right: MORE_CONTROL
Bottom middle: MORE_PROMPT
Bottom right : UNBALANCED
I kind of chose this seed because it shows pretty rough results with
BALANCED (the default), but in my opinion better results with both
MORE_CONTROL and MORE_PROMPT. And you can definitely see how MORE_PROMPT
pays more attention to the prompt, and MORE_CONTROL pays more attention
to the control image. And shows that UNBALANCED with default cfg_scale
etc is unusable.
But here are four examples from same series (same seed etc), all have
control_mode = UNBALANCED but now cfg_scale is set to 3.

And param differences are:
Top middle: prompt="old man", control_weight=0.3, end_step_percent=0.5
Top right: prompt="old man", control_weight=0.4, end_step_percent=1.0
Bottom middle: prompt=None, control_weight=0.3, end_step_percent=0.5
Bottom right: prompt=None, control_weight=0.4, end_step_percent=1.0
So with the right settings UNBALANCED seems useful.
Everything seems to be working.
- Due to a change to `reactflow`, I regenerated `yarn.lock`
- New chakra CLI fixes issue I had made a patch for; removed the patch
- Change to fontsource changed how we import that font
- Change to fontawesome means we lost the txt2img tab icon, just chose a
similar one
Everything seems to be working.
- Due to a change to `reactflow`, I regenerated `yarn.lock`
- New chakra CLI fixes issue I had made a patch for; removed the patch
- Change to fontsource changed how we import that font
- Change to fontawesome means we lost the txt2img tab icon, just chose a similar one
Only "real" conflicts were in:
invokeai/frontend/web/src/features/controlNet/components/ControlNet.tsx
invokeai/frontend/web/src/features/controlNet/store/controlNetSlice.ts
- Reset and Upload buttons along top of initial image
- Also had to mess around with the control net & DnD image stuff after changing the styles
- Abstract image upload logic into hook - does not handle native HTML drag and drop upload - only the button click upload
`openapi-fetch` does not handle non-JSON `body`s, always stringifying them, and sets the `content-type` to `application/json`.
The patch here does two things:
- Do not stringify `body` if it is one of the types that should not be stringified (https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API/Using_Fetch#body)
- Do not add `content-type: application/json` unless it really is stringified JSON.
Upstream issue: https://github.com/drwpow/openapi-typescript/issues/1123
I'm not a bit lost on fixing the types and adding tests, so not raising a PR upstream.
*migrate from `openapi-typescript-codegen` to `openapi-typescript` and `openapi-fetch`*
`openapi-typescript-codegen` is not very actively maintained - it's been over a year since the last update.
`openapi-typescript` and `openapi-fetch` are part of the actively maintained repo. key differences:
- provides a `fetch` client instead of `axios`, which means we need to be a bit more verbose with typing thunks
- fetch client is created at runtime and has a very nice typescript DX
- generates a single file with all types in it, from which we then extract individual types. i don't like how verbose this is, but i do like how it is more explicit.
- removed npm api generation scripts - now we have a single `typegen` script
overall i have more confidence in this new library.
*use nanostores for api base and token*
very simple reactive store for api base url and token. this was suggested in the `openapi-fetch` docs and i quite like the strategy.
*organise rtk-query api*
split out each endpoint (models, images, boards, boardImages) into their own api extensions. tidy!
Unsure at which moment it broke, but now I can't convert vae(and model
as vae it's part) without this fix.
Need further research - maybe it's breaking change in `transformers`?
Changes:
* Linux `install.sh` now prints the maximum python version to use in
case no installed python version matches
Commits:
fix(linux): installer script prints maximum python version usable
PR for the Model Manager UI work related to 3.0
[DONE]
- Update ModelType Config names to be specific so that the front end can
parse them correctly.
- Rebuild frontend schema to reflect these changes.
- Update Linear UI Text To Image and Image to Image to work with the new
model loader.
- Updated the ModelInput component in the Node Editor to work with the
new changes.
[TODO REMEMBER]
- Add proper types for ModelLoaderType in `ModelSelect.tsx`
[TODO]
- Everything else.
Basically updated all slices to be more descriptive in their names. Did so in order to make sure theres good naming scheme available for secondary models.
To determine whether the Load More button should work, we need to keep track of how many images are left to load for a given board or category.
The Assets tab doesn't work, though. Need to figure out a better way to handle this.
We need to access the initial image dimensions during the creation of the `ImageToImage` graph to determine if we need to resize the image.
Because the `initialImage` is now just an image name, we need to either store (easy) or dynamically retrieve its dimensions during graph creation (a bit less easy).
Took the easiest path. May need to revise this in the future.
Images that are used as parameters (e.g. init image, canvas images) are stored as full `ImageDTO` objects in state, separate from and duplicating any object representing those same objects in the `imagesSlice`.
We cannot store only image names as parameters, then pull the full `ImageDTO` from `imagesSlice`, because if an image is not on a loaded page, it doesn't exist in `imagesSlice`. For example, if you scroll down a few pages in the gallery and send that image to canvas, on reloading the app, the canvas will be unable to load that image.
We solved this temporarily by storing the full `ImageDTO` object wherever it was needed, but this is both inefficient and allows for stale `ImageDTO`s across the app.
One other possible solution was to just fetch the `ImageDTO` for all images at startup, and insert them into the `imagesSlice`, but then we run into an issue where we are displaying images in the gallery totally out of context.
For example, if an image from several pages into the gallery was sent to canvas, and the user refreshes, we'd display the first 20 images in gallery. Then to populate the canvas, we'd fetch that image we sent to canvas and add it to `imagesSlice`. Now we'd have 21 images in the gallery: 1 to 20 and whichever image we sent to canvas. Weird.
Using `rtk-query` solves this by allowing us to very easily fetch individual images in the components that need them, and not directly interact with `imagesSlice`.
This commit changes all references to images-as-parameters to store only the name of the image, and not the full `ImageDTO` object. Then, we use an `rtk-query` generated `useGetImageDTOQuery()` hook in each of those components to fetch the image.
We can use cache invalidation when we mutate any image to trigger automated re-running of the query and all the images are automatically kept up to date.
This also obviates the need for the convoluted URL fetching scheme for images that are used as parameters. The `imagesSlice` still need this handling unfortunately.
Added sde schedulers.
Problem - they add random on each step, to get consistent image we need
to provide seed or generator.
I done it, but if you think that it better do in other way - feel free
to change.
Also made ancestral schedulers reproducible, this done same way as for
sde scheduler.
- Add graph builders for canvas txt2img & img2img - they are mostly copy and paste from the linear graph builders but different in a few ways that are very tricky to work around. Just made totally new functions for them.
- Canvas txt2img and img2img support ControlNet (not inpaint/outpaint). There's no way to determine in real-time which mode the canvas is in just yet, so we cannot disable the ControlNet UI when the mode will be inpaint/outpaint - it will always display. It's possible to determine this in near-real-time, will add this at some point.
- Canvas inpaint/outpaint migrated to use model loader, though inpaint/outpaint are still using the non-latents nodes.
Instead of manually creating every node and edge, we can simply copy/paste the base graph from node editor, then sub in parameters.
This is a much more intelligible process. We still need to handle seed, img2img fit and controlnet separately.
- Ports Schedulers to use IAIMantineSelect.
- Adds ability to favorite schedulers in Settings. Favorited schedulers
show up at the top of the list.
- Adds IAIMantineMultiSelect component.
- Change SettingsSchedulers component to use IAIMantineMultiSelect
instead of Chakra Menus.
- remove UI-specific state (the enabled schedulers) from redux, instead derive it in a selector
- simplify logic by putting schedulers in an object instead of an array
- rename `activeSchedulers` to `enabledSchedulers`
- remove need for `useEffect()` when `enabledSchedulers` changes by adding a listener for the `enabledSchedulersChanged` action/event to `generationSlice`
- increase type safety by making `enabledSchedulers` an array of `SchedulerParam`, which is created by the zod schema for scheduler
Basically updated all slices to be more descriptive in their names. Did so in order to make sure theres good naming scheme available for secondary models.
Update the text to imaeg and image to image graphs to work with the new model loader. Currently only supports 1.x models. Will update this soon to make it work with all models.
- `DiskImageStorage` and `DiskLatentsStorage` have now both been updated
to exclusively work with `Path` objects and not rely on the `os` lib to
handle pathing related functions.
- We now also validate the existence of the required image output
folders and latent output folders to ensure that the app does not break
in case the required folders get tampered with mid-session.
- Just overall general cleanup.
Tested it. Don't seem to be any thing breaking.
- remove `image_origin` from most places where we interact with images
- consolidate image file storage into a single `images/` dir
Images have an `image_origin` attribute but it is not actually used when retrieving images, nor will it ever be. It is still used when creating images and helps to differentiate between internally generated images and uploads.
It was included in eg API routes and image service methods as a holdover from the previous app implementation where images were not managed in a database. Now that we have images in a db, we can do away with this and simplify basically everything that touches images.
The one potentially controversial change is to no longer separate internal and external images on disk. If we retain this separation, we have to keep `image_origin` around in a number of spots and it getting image paths on disk painful.
So, I am have gotten rid of this organisation. Images are now all stored in `images`, regardless of their origin. As we improve the image management features, this change will hopefully become transparent.
Diffusers is due for an update soon. #3512
Opening up a PR now with the required changes for when the new version
is live.
I've tested it out on Windows and nothing has broken from what I could
tell. I'd like someone to run some tests on Linux / Mac just to make
sure. Refer to the PR above on how to test it or install the release
branch.
```
pip install diffusers[torch]==0.17.0
```
Feel free to push any other changes to this PR you see fit.
There are some bugs with it that I cannot figure out related to `floating-ui` and `downshift`'s handling of refs.
Will need to revisit this component in the future.
* Testing change to LatentsToText to allow setting different cfg_scale values per diffusion step.
* Adding first attempt at float param easing node, using Penner easing functions.
* Core implementation of ControlNet and MultiControlNet.
* Added support for ControlNet and MultiControlNet to legacy non-nodal Txt2Img in backend/generator. Although backend/generator will likely disappear by v3.x, right now they are very useful for testing core ControlNet and MultiControlNet functionality while node codebase is rapidly evolving.
* Added example of using ControlNet with legacy Txt2Img generator
* Resolving rebase conflict
* Added first controlnet preprocessor node for canny edge detection.
* Initial port of controlnet node support from generator-based TextToImageInvocation node to latent-based TextToLatentsInvocation node
* Switching to ControlField for output from controlnet nodes.
* Resolving conflicts in rebase to origin/main
* Refactored ControlNet nodes so they subclass from PreprocessedControlInvocation, and only need to override run_processor(image) (instead of reimplementing invoke())
* changes to base class for controlnet nodes
* Added HED, LineArt, and OpenPose ControlNet nodes
* Added an additional "raw_processed_image" output port to controlnets, mainly so could route ImageField to a ShowImage node
* Added more preprocessor nodes for:
MidasDepth
ZoeDepth
MLSD
NormalBae
Pidi
LineartAnime
ContentShuffle
Removed pil_output options, ControlNet preprocessors should always output as PIL. Removed diagnostics and other general cleanup.
* Prep for splitting pre-processor and controlnet nodes
* Refactored controlnet nodes: split out controlnet stuff into separate node, stripped controlnet stuff form image processing/analysis nodes.
* Added resizing of controlnet image based on noise latent. Fixes a tensor mismatch issue.
* More rebase repair.
* Added support for using multiple control nets. Unfortunately this breaks direct usage of Control node output port ==> TextToLatent control input port -- passing through a Collect node is now required. Working on fixing this...
* Fixed use of ControlNet control_weight parameter
* Fixed lint-ish formatting error
* Core implementation of ControlNet and MultiControlNet.
* Added first controlnet preprocessor node for canny edge detection.
* Initial port of controlnet node support from generator-based TextToImageInvocation node to latent-based TextToLatentsInvocation node
* Switching to ControlField for output from controlnet nodes.
* Refactored controlnet node to output ControlField that bundles control info.
* changes to base class for controlnet nodes
* Added more preprocessor nodes for:
MidasDepth
ZoeDepth
MLSD
NormalBae
Pidi
LineartAnime
ContentShuffle
Removed pil_output options, ControlNet preprocessors should always output as PIL. Removed diagnostics and other general cleanup.
* Prep for splitting pre-processor and controlnet nodes
* Refactored controlnet nodes: split out controlnet stuff into separate node, stripped controlnet stuff form image processing/analysis nodes.
* Added resizing of controlnet image based on noise latent. Fixes a tensor mismatch issue.
* Cleaning up TextToLatent arg testing
* Cleaning up mistakes after rebase.
* Removed last bits of dtype and and device hardwiring from controlnet section
* Refactored ControNet support to consolidate multiple parameters into data struct. Also redid how multiple controlnets are handled.
* Added support for specifying which step iteration to start using
each ControlNet, and which step to end using each controlnet (specified as fraction of total steps)
* Cleaning up prior to submitting ControlNet PR. Mostly turning off diagnostic printing. Also fixed error when there is no controlnet input.
* Added dependency on controlnet-aux v0.0.3
* Commented out ZoeDetector. Will re-instate once there's a controlnet-aux release that supports it.
* Switched CotrolNet node modelname input from free text to default list of popular ControlNet model names.
* Fix to work with current stable release of controlnet_aux (v0.0.3). Turned of pre-processor params that were added post v0.0.3. Also change defaults for shuffle.
* Refactored most of controlnet code into its own method to declutter TextToLatents.invoke(), and make upcoming integration with LatentsToLatents easier.
* Cleaning up after ControlNet refactor in TextToLatentsInvocation
* Extended node-based ControlNet support to LatentsToLatentsInvocation.
* chore(ui): regen api client
* fix(ui): add value to conditioning field
* fix(ui): add control field type
* fix(ui): fix node ui type hints
* fix(nodes): controlnet input accepts list or single controlnet
* Moved to controlnet_aux v0.0.4, reinstated Zoe controlnet preprocessor. Also in pyproject.toml had to specify downgrade of timm to 0.6.13 _after_ controlnet-aux installs timm >= 0.9.2, because timm >0.6.13 breaks Zoe preprocessor.
* Core implementation of ControlNet and MultiControlNet.
* Added first controlnet preprocessor node for canny edge detection.
* Switching to ControlField for output from controlnet nodes.
* Resolving conflicts in rebase to origin/main
* Refactored ControlNet nodes so they subclass from PreprocessedControlInvocation, and only need to override run_processor(image) (instead of reimplementing invoke())
* changes to base class for controlnet nodes
* Added HED, LineArt, and OpenPose ControlNet nodes
* Added more preprocessor nodes for:
MidasDepth
ZoeDepth
MLSD
NormalBae
Pidi
LineartAnime
ContentShuffle
Removed pil_output options, ControlNet preprocessors should always output as PIL. Removed diagnostics and other general cleanup.
* Prep for splitting pre-processor and controlnet nodes
* Refactored controlnet nodes: split out controlnet stuff into separate node, stripped controlnet stuff form image processing/analysis nodes.
* Added resizing of controlnet image based on noise latent. Fixes a tensor mismatch issue.
* Added support for using multiple control nets. Unfortunately this breaks direct usage of Control node output port ==> TextToLatent control input port -- passing through a Collect node is now required. Working on fixing this...
* Fixed use of ControlNet control_weight parameter
* Core implementation of ControlNet and MultiControlNet.
* Added first controlnet preprocessor node for canny edge detection.
* Initial port of controlnet node support from generator-based TextToImageInvocation node to latent-based TextToLatentsInvocation node
* Switching to ControlField for output from controlnet nodes.
* Refactored controlnet node to output ControlField that bundles control info.
* changes to base class for controlnet nodes
* Added more preprocessor nodes for:
MidasDepth
ZoeDepth
MLSD
NormalBae
Pidi
LineartAnime
ContentShuffle
Removed pil_output options, ControlNet preprocessors should always output as PIL. Removed diagnostics and other general cleanup.
* Prep for splitting pre-processor and controlnet nodes
* Refactored controlnet nodes: split out controlnet stuff into separate node, stripped controlnet stuff form image processing/analysis nodes.
* Added resizing of controlnet image based on noise latent. Fixes a tensor mismatch issue.
* Cleaning up TextToLatent arg testing
* Cleaning up mistakes after rebase.
* Removed last bits of dtype and and device hardwiring from controlnet section
* Refactored ControNet support to consolidate multiple parameters into data struct. Also redid how multiple controlnets are handled.
* Added support for specifying which step iteration to start using
each ControlNet, and which step to end using each controlnet (specified as fraction of total steps)
* Cleaning up prior to submitting ControlNet PR. Mostly turning off diagnostic printing. Also fixed error when there is no controlnet input.
* Commented out ZoeDetector. Will re-instate once there's a controlnet-aux release that supports it.
* Switched CotrolNet node modelname input from free text to default list of popular ControlNet model names.
* Fix to work with current stable release of controlnet_aux (v0.0.3). Turned of pre-processor params that were added post v0.0.3. Also change defaults for shuffle.
* Refactored most of controlnet code into its own method to declutter TextToLatents.invoke(), and make upcoming integration with LatentsToLatents easier.
* Cleaning up after ControlNet refactor in TextToLatentsInvocation
* Extended node-based ControlNet support to LatentsToLatentsInvocation.
* chore(ui): regen api client
* fix(ui): fix node ui type hints
* fix(nodes): controlnet input accepts list or single controlnet
* Added Mediapipe image processor for use as ControlNet preprocessor.
Also hacked in ability to specify HF subfolder when loading ControlNet models from string.
* Fixed bug where MediapipFaceProcessorInvocation was ignoring max_faces and min_confidence params.
* Added nodes for float params: ParamFloatInvocation and FloatCollectionOutput. Also added FloatOutput.
* Added mediapipe install requirement. Should be able to remove once controlnet_aux package adds mediapipe to its requirements.
* Added float to FIELD_TYPE_MAP ins constants.ts
* Progress toward improvement in fieldTemplateBuilder.ts getFieldType()
* Fixed controlnet preprocessors and controlnet handling in TextToLatents to work with revised Image services.
* Cleaning up from merge, re-adding cfg_scale to FIELD_TYPE_MAP
* Making sure cfg_scale of type list[float] can be used in image metadata, to support param easing for cfg_scale
* Fixed math for per-step param easing.
* Added option to show plot of param value at each step
* Just cleaning up after adding param easing plot option, removing vestigial code.
* Modified control_weight ControlNet param to be polistmorphic --
can now be either a single float weight applied for all steps, or a list of floats of size total_steps, that specifies weight for each step.
* Added more informative error message when _validat_edge() throws an error.
* Just improving parm easing bar chart title to include easing type.
* Added requirement for easing-functions package
* Taking out some diagnostic prints.
* Added option to use both easing function and mirror of easing function together.
* Fixed recently introduced problem (when pulled in main), triggered by num_steps in StepParamEasingInvocation not having a default value -- just added default.
---------
Co-authored-by: psychedelicious <4822129+psychedelicious@users.noreply.github.com>
In some cases the command-line was getting parsed before the logger was
initialized, causing the logger not to pick up custom logging
instructions from `--log_handlers`. This PR fixes the issue.
[fix(ui): blur tab on
click](93f3658a4a)
Fixes issue where after clicking a tab, using the arrow keys changes tab
instead of changing selected image
[fix(ui): fix canvas not filling screen on first
load](68be95acbb)
[feat(ui): remove clear temp folder canvas
button](813f79f0f9)
This button is nonfunctional.
Soon we will introduce a different way to handle clearing out
intermediate images (likely automated).
There was an issue where for graphs w/ iterations, your images were output all at once, at the very end of processing. So if you canceled halfway through an execution of 10 nodes, you wouldn't get any images - even though you'd completed 5 images' worth of inference.
## Cause
Because graphs executed breadth-first (i.e. depth-by-depth), leaf nodes were necessarily processed last. For image generation graphs, your `LatentsToImage` will be leaf nodes, and be the last depth to be executed.
For example, a `TextToLatents` graph w/ 3 iterations would execute all 3 `TextToLatents` nodes fully before moving to the next depth, where the `LatentsToImage` nodes produce output images, resulting in a node execution order like this:
1. TextToLatents
2. TextToLatents
3. TextToLatents
4. LatentsToImage
5. LatentsToImage
6. LatentsToImage
## Solution
This PR makes a two changes to graph execution to execute as deeply as it can along each branch of the graph.
### Eager node preparation
We now prepare as many nodes as possible, instead of just a single node at a time.
We also need to change the conditions in which nodes are prepared. Previously, nodes were prepared only when all of their direct ancestors were executed.
The updated logic prepares nodes that:
- are *not* `Iterate` nodes whose inputs have *not* been executed
- do *not* have any unexecuted `Iterate` ancestor nodes
This results in graphs always being maximally prepared.
### Always execute the deepest prepared node
We now choose the next node to execute by traversing from the bottom of the graph instead of the top, choosing the first node whose inputs are all executed.
This means we always execute the deepest node possible.
## Result
Graphs now execute depth-first, so instead of an execution order like this:
1. TextToLatents
2. TextToLatents
3. TextToLatents
4. LatentsToImage
5. LatentsToImage
6. LatentsToImage
... we get an execution order like this:
1. TextToLatents
2. LatentsToImage
3. TextToLatents
4. LatentsToImage
5. TextToLatents
6. LatentsToImage
Immediately after inference, the image is decoded and sent to the gallery.
fixes#3400
This PR creates the databases directory at app startup time. It also
removes a couple of debugging statements that were inadvertently left in
the model manager.
# Make InvokeAI package installable by mere mortals
This commit makes InvokeAI 3.0 to be installable via PyPi.org and/or the
installer script. The install process is now pretty much identical to
the 2.3 process, including creating launcher scripts `invoke.sh` and
`invoke.bat`.
Main changes:
1. Moved static web pages into `invokeai/frontend/web` and modified the
API to look for them there. This allows pip to copy the files into the
distribution directory so that user no longer has to be in repo root to
launch, and enables PyPi installations with `pip install invokeai`
2. Update invoke.sh and invoke.bat to launch the new web application
properly. This also changes the wording for launching the CLI from
"generate images" to "explore the InvokeAI node system," since I would
not recommend using the CLI to generate images routinely.
3. Fix a bug in the checkpoint converter script that was identified
during testing.
4. Better error reporting when checkpoint converter fails.
5. Rebuild front end.
# Major improvements to the model installer.
1. The text user interface for `invokeai-model-install` has been
expanded to allow the user to install controlnet, LoRA, textual
inversion, diffusers and checkpoint models. The user can install
interactively (without leaving the TUI), or in batch mode after exiting
the application.

2. The `invokeai-model-install` command now lets you list, add and
delete models from the command line:
## Listing models
```
$ invokeai-model-install --list diffusers
Diffuser models:
analog-diffusion-1.0 not loaded diffusers An SD-1.5 model trained on diverse analog photographs (2.13 GB)
d&d-diffusion-1.0 not loaded diffusers Dungeons & Dragons characters (2.13 GB)
deliberate-1.0 not loaded diffusers Versatile model that produces detailed images up to 768px (4.27 GB)
DreamShaper not loaded diffusers Imported diffusers model DreamShaper
sd-inpainting-1.5 not loaded diffusers RunwayML SD 1.5 model optimized for inpainting, diffusers version (4.27 GB)
sd-inpainting-2.0 not loaded diffusers Stable Diffusion version 2.0 inpainting model (5.21 GB)
stable-diffusion-1.5 not loaded diffusers Stable Diffusion version 1.5 diffusers model (4.27 GB)
stable-diffusion-2.1 not loaded diffusers Stable Diffusion version 2.1 diffusers model, trained on 768 pixel images (5.21 GB)
```
```
$ invokeai-model-install --list tis
Loading Python libraries...
Installed Textual Inversion Embeddings:
EasyNegative
ahx-beta-453407d
```
## Installing models
(this example shows correct handling of a server side error at Civitai)
```
$ invokeai-model-install --diffusers https://civitai.com/api/download/models/46259 Linaqruf/anything-v3.0
Loading Python libraries...
[2023-06-05 22:17:23,556]::[InvokeAI]::INFO --> INSTALLING EXTERNAL MODELS
[2023-06-05 22:17:23,557]::[InvokeAI]::INFO --> Probing https://civitai.com/api/download/models/46259 for import
[2023-06-05 22:17:23,557]::[InvokeAI]::INFO --> https://civitai.com/api/download/models/46259 appears to be a URL
[2023-06-05 22:17:23,763]::[InvokeAI]::ERROR --> An error occurred during downloading /home/lstein/invokeai-test/models/ldm/stable-diffusion-v1/46259: Internal Server Error
[2023-06-05 22:17:23,763]::[InvokeAI]::ERROR --> ERROR DOWNLOADING https://civitai.com/api/download/models/46259: {"error":"Invalid database operation","cause":{"clientVersion":"4.12.0"}}
[2023-06-05 22:17:23,764]::[InvokeAI]::INFO --> Probing Linaqruf/anything-v3.0 for import
[2023-06-05 22:17:23,764]::[InvokeAI]::DEBUG --> Linaqruf/anything-v3.0 appears to be a HuggingFace diffusers repo_id
[2023-06-05 22:17:23,768]::[InvokeAI]::INFO --> Loading diffusers model from Linaqruf/anything-v3.0
[2023-06-05 22:17:23,769]::[InvokeAI]::DEBUG --> Using faster float16 precision
[2023-06-05 22:17:23,883]::[InvokeAI]::ERROR --> An unexpected error occurred while downloading the model: 404 Client Error. (Request ID: Root=1-647e9733-1b0ee3af67d6ac3456b1ebfc)
Revision Not Found for url: https://huggingface.co/Linaqruf/anything-v3.0/resolve/fp16/model_index.json.
Invalid rev id: fp16)
Downloading (…)ain/model_index.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 511/511 [00:00<00:00, 2.57MB/s]
Downloading (…)cial_tokens_map.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 472/472 [00:00<00:00, 6.13MB/s]
Downloading (…)cheduler_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 341/341 [00:00<00:00, 3.30MB/s]
Downloading (…)okenizer_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 807/807 [00:00<00:00, 11.3MB/s]
```
## Deleting models
```
invokeai-model-install --delete --diffusers anything-v3
Loading Python libraries...
[2023-06-05 22:19:45,927]::[InvokeAI]::INFO --> Processing requested deletions
[2023-06-05 22:19:45,927]::[InvokeAI]::INFO --> anything-v3...
[2023-06-05 22:19:45,927]::[InvokeAI]::INFO --> Deleting the cached model directory for Linaqruf/anything-v3.0
[2023-06-05 22:19:45,948]::[InvokeAI]::WARNING --> Deletion of this model is expected to free 4.3G
```
1. Contents of autoscan directory field are restored after doing an installation.
2. Activate dialogue to choose V2 parameterization when importing from a directory.
3. Remove autoscan directory from init file when its checkbox is unselected.
4. Add widget cycling behavior to install models form.
The processor is automatically selected when model is changed.
But if the user manually changes the processor, processor settings, or disables the new `Auto configure processor` switch, auto processing is disabled.
The user can enable auto configure by turning the switch back on.
When auto configure is enabled, a small dot is overlaid on the expand button to remind the user that the system is not auto configuring the processor for them.
If auto configure is enabled, the processor settings are reset to the default for the selected model.
Add uploading to IAIDndImage
- add `postUploadAction` arg to `imageUploaded` thunk, with several current valid options (set control image, set init, set nodes image, set canvas, or toast)
- updated IAIDndImage to optionally allow click to upload
- when the controlnet model is changed, if there is a default processor for the model set, the processor is changed.
- once a control image is selected (and processed), changing the model does not change the processor - must be manually changed
- Also fixed up order in which logger is created in invokeai-web
so that handlers are installed after command-line options are
parsed (and not before!)
This handles the case when an image is deleted but is still in use in as eg an init image on canvas, or a control image. If we just delete the image, canvas/controlnet/etc may break (the image would just fail to load).
When an image is deleted, the app checks to see if it is in use in:
- Image to Image
- ControlNet
- Unified Canvas
- Node Editor
The delete dialog will always open if the image is in use anywhere, and the user is advised that deleting the image will reset the feature(s).
Even if the user has ticked the box to not confirm on delete, the dialog will still show if the image is in use somewhere.
- fix "bounding box region only" not being respected when saving
- add toasts for each action
- improve workflow `take()` predicates to use the requestId
- responsive changes were causing a lot of weird layout issues, had to remove the rest of them
- canvas (non-beta) toolbar now wraps
- reduces minH for prompt boxes a bit
1. Model installer works correctly under Windows 11 Terminal
2. Fixed crash when configure script hands control off to installer
3. Kill install subprocess on keyboard interrupt
4. Command-line functionality for --yes configuration and model installation
restored.
5. New command-line features:
- install/delete lists of diffusers, LoRAS, controlnets and textual inversions
using repo ids, paths or URLs.
Help:
```
usage: invokeai-model-install [-h] [--diffusers [DIFFUSERS ...]] [--loras [LORAS ...]] [--controlnets [CONTROLNETS ...]] [--textual-inversions [TEXTUAL_INVERSIONS ...]] [--delete] [--full-precision | --no-full-precision]
[--yes] [--default_only] [--list-models {diffusers,loras,controlnets,tis}] [--config_file CONFIG_FILE] [--root_dir ROOT]
InvokeAI model downloader
options:
-h, --help show this help message and exit
--diffusers [DIFFUSERS ...]
List of URLs or repo_ids of diffusers to install/delete
--loras [LORAS ...] List of URLs or repo_ids of LoRA/LyCORIS models to install/delete
--controlnets [CONTROLNETS ...]
List of URLs or repo_ids of controlnet models to install/delete
--textual-inversions [TEXTUAL_INVERSIONS ...]
List of URLs or repo_ids of textual inversion embeddings to install/delete
--delete Delete models listed on command line rather than installing them
--full-precision, --no-full-precision
use 32-bit weights instead of faster 16-bit weights (default: False)
--yes, -y answer "yes" to all prompts
--default_only only install the default model
--list-models {diffusers,loras,controlnets,tis}
list installed models
--config_file CONFIG_FILE, -c CONFIG_FILE
path to configuration file to create
--root_dir ROOT path to root of install directory
```
There was a potential gotcha in the config system that was previously
merged with main. The `InvokeAIAppConfig` object was configuring itself
from the command line and configuration file within its initialization
routine. However, this could cause it to read `argv` from the command
line at unexpected times. This PR fixes the object so that it only reads
from the init file and command line when its `parse_args()` method is
explicitly called, which should be done at startup time in any top level
script that uses it.
In addition, using the `get_invokeai_config()` function to get a global
version of the config object didn't feel pythonic to me, so I have
changed this to `InvokeAIAppConfig.get_config()` throughout.
## Updated Usage
In the main script, at startup time, do the following:
```
from invokeai.app.services.config import InvokeAIAppConfig
config = InvokeAIAppConfig.get_config()
config.parse_args()
```
In non-main scripts, it is not necessary (or recommended) to call
`parse_args()`:
```
from invokeai.app.services.config import InvokeAIAppConfig
config = InvokeAIAppConfig.get_config()
```
The configuration object properties can be overridden when
`get_config()` is called by passing initialization values in the usual
way. If a property is set this way, then it will not be changed by
subsequent calls to `parse_args()`, but can only be changed by
explicitly setting the property.
```
config = InvokeAIAppConfig.get_config(nsfw_checker=True)
config.parse_args(argv=['--no-nsfw_checker'])
config.nsfw_checker
# True
```
You may specify alternative argv lists and configuration files in
`parse_args()`:
```
config.parse_args(argv=['--no-nsfw_checker'],
conf = OmegaConf.load('/tmp/test.yaml')
)
```
For backward compatibility, the `get_invokeai_config()` function is
still available from the module, but has been removed from the rest of
the source tree.
this PR adds long prompt support and enables compel's new `.and()`
concatenation feature which improves image quality especially with SD2.1
example of a long prompt:
> a moist sloppy pindlesackboy sloppy hamblin' bogomadong, Clem Fandango
is pissed-off, Wario's Woods in background, making a noise like
ga-woink-a

the same prompt broken into fragments and concatenated using `.and()`
(syntax works like `.blend()`):
```
("a moist sloppy pindlesackboy sloppy hamblin' bogomadong",
"Clem Fandango is pissed-off",
"Wario's Woods in background",
"making a noise like ga-woink-a").and()
```

and a less silly example:
> A dream of a distant galaxy, by Caspar David Friedrich, matte
painting, trending on artstation, HQ

the same prompt broken into two fragments and concatenated:
```
("A dream of a distant galaxy, by Caspar David Friedrich, matte painting",
"trending on artstation, HQ").and()
```

as with `.blend()` you can also weight the parts eg `("a man eating an
apple", "sitting on the roof of a car", "high quality, trending on
artstation, 8K UHD").and(1, 0.5, 0.5)` which will assign weight `1` to
`a man eating an apple` and `0.5` to `sitting on the roof of a car` and
`high quality, trending on artstation, 8K UHD`.
Implement `dnd-kit` for image drag and drop
- vastly simplifies logic bc we can drag and drop non-serializable data (like an `ImageDTO`)
- also much prettier
- also will fix conflicts with file upload via OS drag and drop, bc `dnd-kit` does not use native HTML drag and drop API
- Implemented for Init image, controlnet, and node editor so far
More progress on the ControlNet UI
- The invokeai.db database file has now been moved into
`INVOKEAIROOT/databases`. Using plural here for possible
future with more than one database file.
- Removed a few dangling debug messages that appeared during
testing.
- Rebuilt frontend to test web.
This PR provides a number of options for controlling how InvokeAI logs
messages, including options to log to a file, syslog and a web server.
Several logging handlers can be configured simultaneously.
## Controlling How InvokeAI Logs Status Messages
InvokeAI logs status messages using a configurable logging system. You
can log to the terminal window, to a designated file on the local
machine, to the syslog facility on a Linux or Mac, or to a properly
configured web server. You can configure several logs at the same time,
and control the level of message logged and the logging format (to a
limited extent).
Three command-line options control logging:
### `--log_handlers <handler1> <handler2> ...`
This option activates one or more log handlers. Options are "console",
"file", "syslog" and "http". To specify more than one, separate them by
spaces:
```bash
invokeai-web --log_handlers console syslog=/dev/log file=C:\Users\fred\invokeai.log
```
The format of these options is described below.
### `--log_format {plain|color|legacy|syslog}`
This controls the format of log messages written to the console. Only
the "console" log handler is currently affected by this setting.
* "plain" provides formatted messages like this:
```bash
[2023-05-24 23:18:2[2023-05-24 23:18:50,352]::[InvokeAI]::DEBUG --> this is a debug message
[2023-05-24 23:18:50,352]::[InvokeAI]::INFO --> this is an informational messages
[2023-05-24 23:18:50,352]::[InvokeAI]::WARNING --> this is a warning
[2023-05-24 23:18:50,352]::[InvokeAI]::ERROR --> this is an error
[2023-05-24 23:18:50,352]::[InvokeAI]::CRITICAL --> this is a critical error
```
* "color" produces similar output, but the text will be color coded to
indicate the severity of the message.
* "legacy" produces output similar to InvokeAI versions 2.3 and earlier:
```bash
### this is a critical error
*** this is an error
** this is a warning
>> this is an informational messages
| this is a debug message
```
* "syslog" produces messages suitable for syslog entries:
```bash
InvokeAI [2691178] <CRITICAL> this is a critical error
InvokeAI [2691178] <ERROR> this is an error
InvokeAI [2691178] <WARNING> this is a warning
InvokeAI [2691178] <INFO> this is an informational messages
InvokeAI [2691178] <DEBUG> this is a debug message
```
(note that the date, time and hostname will be added by the syslog
system)
### `--log_level {debug|info|warning|error|critical}`
Providing this command-line option will cause only messages at the
specified level or above to be emitted.
## Console logging
When "console" is provided to `--log_handlers`, messages will be written
to the command line window in which InvokeAI was launched. By default,
the color formatter will be used unless overridden by `--log_format`.
## File logging
When "file" is provided to `--log_handlers`, entries will be written to
the file indicated in the path argument. By default, the "plain" format
will be used:
```bash
invokeai-web --log_handlers file=/var/log/invokeai.log
```
## Syslog logging
When "syslog" is requested, entries will be sent to the syslog system.
There are a variety of ways to control where the log message is sent:
* Send to the local machine using the `/dev/log` socket:
```
invokeai-web --log_handlers syslog=/dev/log
```
* Send to the local machine using a UDP message:
```
invokeai-web --log_handlers syslog=localhost
```
* Send to the local machine using a UDP message on a nonstandard port:
```
invokeai-web --log_handlers syslog=localhost:512
```
* Send to a remote machine named "loghost" on the local LAN using
facility LOG_USER and UDP packets:
```
invokeai-web --log_handlers syslog=loghost,facility=LOG_USER,socktype=SOCK_DGRAM
```
This can be abbreviated `syslog=loghost`, as LOG_USER and SOCK_DGRAM are
defaults.
* Send to a remote machine named "loghost" using the facility LOCAL0 and
using a TCP socket:
```
invokeai-web --log_handlers syslog=loghost,facility=LOG_LOCAL0,socktype=SOCK_STREAM
```
If no arguments are specified (just a bare "syslog"), then the logging
system will look for a UNIX socket named `/dev/log`, and if not found
try to send a UDP message to `localhost`. The Macintosh OS used to
support logging to a socket named `/var/run/syslog`, but this feature
has since been disabled.
## Web logging
If you have access to a web server that is configured to log messages
when a particular URL is requested, you can log using the "http" method:
```
invokeai-web --log_handlers http=http://my.server/path/to/logger,method=POST
```
The optional [,method=] part can be used to specify whether the URL
accepts GET (default) or POST messages.
Currently password authentication and SSL are not supported.
## Using the configuration file
You can set and forget logging options by adding a "Logging" section to
`invokeai.yaml`:
```
InvokeAI:
[... other settings...]
Logging:
log_handlers:
- console
- syslog=/dev/log
log_level: info
log_format: color
```
1. Separated the "starter models" and "more models" sections. This
gives us room to list all installed diffuserse models, not just
those that are on the starter list.
2. Support mouse-based paste into the textboxes with either middle
or right mouse buttons.
3. Support terminal-style cursor movement:
^A to move to beginning of line
^E to move to end of line
^K kill text to right and put in killring
^Y yank text back
4. Internal code cleanup.
The gallery could get in a state where it thought it had just reached the end of the list and endlessly fetches more images, if there are no more images to fetch (weird I know).
Add some logic to remove the `end reached` handler when there are no more images to load.
it doesn't work for the img2img pipelines, but the implemented conditional display could break the scheduler selection dropdown.
simple fix until diffusers merges the fix - never use this scheduler.
This commit makes InvokeAI 3.0 to be installable via PyPi.org and the
installer script.
Main changes.
1. Move static web pages into `invokeai/frontend/web` and modify the
API to look for them there. This allows pip to copy the files into the
distribution directory so that user no longer has to be in repo root
to launch.
2. Update invoke.sh and invoke.bat to launch the new web application
properly. This also changes the wording for launching the CLI from
"generate images" to "explore the InvokeAI node system," since I would
not recommend using the CLI to generate images routinely.
3. Fix a bug in the checkpoint converter script that was identified
during testing.
4. Better error reporting when checkpoint converter fails.
5. Rebuild front end.
The `ModelsList` OpenAPI schema is generated as being keyed by plain strings. This means that API consumers do not know the shape of the dict. It _should_ be keyed by the `SDModelType` enum.
Unfortunately, `fastapi` does not actually handle this correctly yet; it still generates the schema with plain string keys.
Adding this anyways though in hopes that it will be resolved upstream and we can get the correct schema. Until then, I'll implement the (simple but annoying) logic on the frontend.
https://github.com/pydantic/pydantic/issues/4393
1. If an external VAE is specified in config file, then
get_model(submodel=vae) will return the external VAE, not the one
burnt into the parent diffusers pipeline.
2. The mechanism in (1) is generalized such that you can now have
"unet:", "text_encoder:" and similar stanzas in the config file.
Valid formats of these subsections:
unet:
repo_id: foo/bar
unet:
path: /path/to/local/folder
unet:
repo_id: foo/bar
subfolder: unet
In the near future, these will also be used to attach external
parts to the pipeline, generalizing VAE behavior.
3. Accommodate callers (i.e. the WebUI) that are passing the
model key ("diffusers/stable-diffusion-1.5") to get_model()
instead of the tuple of model_name and model_type.
4. Fixed bug in VAE model attaching code.
5. Rebuilt web front end.
# Invoke AI - Generative AI for Professional Creatives
## Professional Creative Tools for Stable Diffusion, Custom-Trained Models, and more.
To learn more about Invoke AI, get started instantly, or implement our Business solutions, visit [invoke.ai](https://invoke.ai)
# InvokeAI: A Stable Diffusion Toolkit
[![discord badge]][discord link]
@ -33,15 +36,23 @@
</div>
_**Note: The UI is not fully functional on `main`. If you need a stable UI based on `main`, use the `pre-nodes` tag while we [migrate to a new backend](https://github.com/invoke-ai/InvokeAI/discussions/3246).**_
InvokeAI is a leading creative engine built to empower professionals
and enthusiasts alike. Generate and create stunning visual media using
the latest AI-driven technologies. InvokeAI offers an industry leading
Web Interface, interactive Command Line Interface, and also serves as
the foundation for multiple commercial products.
InvokeAI is a leading creative engine built to empower professionals and enthusiasts alike. Generate and create stunning visual media using the latest AI-driven technologies. InvokeAI offers an industry leading Web Interface, interactive Command Line Interface, and also serves as the foundation for multiple commercial products.
**Quick links**: [[How to Install](https://invoke-ai.github.io/InvokeAI/#installation)] [<ahref="https://discord.gg/ZmtBAhwWhy">Discord Server</a>] [<ahref="https://invoke-ai.github.io/InvokeAI/">Documentation and Tutorials</a>] [<ahref="https://github.com/invoke-ai/InvokeAI/">Code and Downloads</a>] [<ahref="https://github.com/invoke-ai/InvokeAI/issues">Bug Reports</a>] [<ahref="https://github.com/invoke-ai/InvokeAI/discussions">Discussion, Ideas & Q&A</a>]
_Note: InvokeAI is rapidly evolving. Please use the
[Issues](https://github.com/invoke-ai/InvokeAI/issues) tab to report bugs and make feature
requests. Be sure to use the provided templates. They will help us diagnose issues faster._
(Replace `v3.0.0` with the current release number if this document is out of date).
The first command will install and upgrade new software to run
InvokeAI. The second will prepare the 2.3 directory for use with 3.0.
You may now launch the WebUI in the usual way, by selecting option [1]
from the launcher script
#### Migration Caveats
The migration script will migrate your invokeai settings and models,
including textual inversion models, LoRAs and merges that you may have
installed previously. However it does **not** migrate the generated
images stored in your 2.3-format outputs directory. You will need to
manually import selected images into the 3.0 gallery via drag-and-drop.
## Hardware Requirements
InvokeAI is supported across Linux, Windows and macOS. Linux
@ -197,21 +329,20 @@ AMD card (using the ROCm driver).
You will need one of the following:
- An NVIDIA-based graphics card with 4 GB or more VRAM memory.
- An NVIDIA-based graphics card with 4 GB or more VRAM memory. 6-8 GB
of VRAM is highly recommended for rendering using the Stable
Diffusion XL models
- An Apple computer with an M1 chip.
- An AMD-based graphics card with 4GB or more VRAM memory. (Linux only)
- An AMD-based graphics card with 4GB or more VRAM memory (Linux
only), 6-8 GB for XL rendering.
We do not recommend the GTX 1650 or 1660 series video cards. They are
unable to run in half-precision mode and do not have sufficient VRAM
to render 512x512 images.
### Memory
**Memory** - At least 12 GB Main Memory RAM.
- At least 12 GB Main Memory RAM.
### Disk
- At least 12 GB of free disk space for the machine learning model, Python, and all its dependencies.
**Disk** - At least 12 GB of free disk space for the machine learning model, Python, and all its dependencies.
## Features
@ -225,28 +356,23 @@ InvokeAI offers a locally hosted Web Server & React Frontend, with an industry l
The Unified Canvas is a fully integrated canvas implementation with support for all core generation capabilities, in/outpainting, brush tools, and more. This creative tool unlocks the capability for artists to create with AI as a creative collaborator, and can be used to augment AI-generated imagery, sketches, photography, renders, and more.
### *Advanced Prompt Syntax*
### *Node Architecture & Editor (Beta)*
InvokeAI's advanced prompt syntax allows for token weighting, cross-attention control, and prompt blending, allowing for fine-tuned tweaking of your invocations and exploration of the latent space.
InvokeAI's backend is built on a graph-based execution architecture. This allows for customizable generation pipelines to be developed by professional users looking to create specific workflows to support their production use-cases, and will be extended in the future with additional capabilities.
### *Command Line Interface*
### *Board & Gallery Management*
For users utilizing a terminal-based environment, or who want to take advantage of CLI features, InvokeAI offers an extensive and actively supported command-line interface that provides the full suite of generation functionality available in the tool.
Invoke AI provides an organized gallery system for easily storing, accessing, and remixing your content in the Invoke workspace. Images can be dragged/dropped onto any Image-base UI element in the application, and rich metadata within the Image allows for easy recall of key prompts or settings used in your workflow.
All commands are to be run from the `docker` directory: `cd docker`
#### Linux
1. Ensure builkit is enabled in the Docker daemon settings (`/etc/docker/daemon.json`)
2. Install the `docker compose` plugin using your package manager, or follow a [tutorial](https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-compose-on-ubuntu-22-04).
- The deprecated `docker-compose` (hyphenated) CLI continues to work for now.
3. Ensure docker daemon is able to access the GPU.
- You may need to install [nvidia-container-toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)
#### macOS
1. Ensure Docker has at least 16GB RAM
2. Enable VirtioFS for file sharing
3. Enable `docker compose` V2 support
This is done via Docker Desktop preferences
## Quickstart
1. Make a copy of `env.sample` and name it `.env` (`cp env.sample .env` (Mac/Linux) or `copy example.env .env` (Windows)). Make changes as necessary. Set `INVOKEAI_ROOT` to an absolute path to:
a. the desired location of the InvokeAI runtime directory, or
b. an existing, v3.0.0 compatible runtime directory.
1.`docker compose up`
The image will be built automatically if needed.
The runtime directory (holding models and outputs) will be created in the location specified by `INVOKEAI_ROOT`. The default location is `~/invokeai`. The runtime directory will be populated with the base configs and models necessary to start generating.
### Use a GPU
- Linux is *recommended* for GPU support in Docker.
- WSL2 is *required* for Windows.
- only `x86_64` architecture is supported.
The Docker daemon on the system must be already set up to use the GPU. In case of Linux, this involves installing `nvidia-docker-runtime` and configuring the `nvidia` runtime as default. Steps will be different for AMD. Please see Docker documentation for the most up-to-date instructions for using your GPU with Docker.
## Customize
Check the `.env.sample` file. It contains some environment variables for running in Docker. Copy it, name it `.env`, and fill it in with your own values. Next time you run `docker compose up`, your custom values will be used.
You can also set these values in `docker compose.yml` directly, but `.env` will help avoid conflicts when code is updated.
Example (most values are optional):
```
INVOKEAI_ROOT=/Volumes/WorkDrive/invokeai
HUGGINGFACE_TOKEN=the_actual_token
CONTAINER_UID=1000
GPU_DRIVER=cuda
```
## Even Moar Customizing!
See the `docker compose.yaml` file. The `command` instruction can be uncommented and used to run arbitrary startup commands. Some examples below.
### Reconfigure the runtime directory
Can be used to download additional models from the supported model list
In conjunction with `INVOKEAI_ROOT` can be also used to initialize a runtime directory
Stable Diffusion distribution by InvokeAI: https://github.com/invoke-ai
The Docker image tracks the `main` branch of the InvokeAI project, which means it includes the latest features, but may contain some bugs.
Your working directory is mounted under the `/workspace` path inside the pod. The models are in `/workspace/invokeai/models`, and outputs are in `/workspace/invokeai/outputs`.
> **Only the /workspace directory will persist between pod restarts!**
> **If you _terminate_ (not just _stop_) the pod, the /workspace will be lost.**
## Quickstart
1. Launch a pod from this template. **It will take about 5-10 minutes to run through the initial setup**. Be patient.
1. Wait for the application to load.
- TIP: you know it's ready when the CPU usage goes idle
- You can also check the logs for a line that says "_Point your browser at..._"
1. Open the Invoke AI web UI: click the `Connect` => `connect over HTTP` button.
1. Generate some art!
## Other things you can do
At any point you may edit the pod configuration and set an arbitrary Docker command. For example, you could run a command to downloads some models using `curl`, or fetch some images and place them into your outputs to continue a working session.
If you need to run *multiple commands*, define them in the Docker Command field like this:
This image includes a couple of handy tools to help you get the data into the pod (such as your custom models or embeddings), and out of the pod (such as downloading your outputs). Here are your options for getting your data in and out of the pod:
- **SSH server**:
1. Make sure to create and set your Public Key in the RunPod settings (follow the official instructions)
1. Add an exposed port 22 (TCP) in the pod settings!
1. When your pod restarts, you will see a new entry in the `Connect` dialog. Use this SSH server to `scp` or `sftp` your files as necessary, or SSH into the pod using the fully fledged SSH server.
1. On your computer, `pip install magic-wormhole` (see above instructions for details)
1. Connect to the command line **using the "light" SSH client** or the browser-based console. _Currently there's a bug where `wormhole` isn't available when connected to "full" SSH server, as described above_.
1.`wormhole send /workspace/invokeai/outputs` will send the entire `outputs` directory. You can also send individual files.
1. Once packaged, you will see a `wormhole receive <123-some-words>` command. Copy it
1. Paste this command into the terminal on your local machine to securely download the payload.
1. It works the same in reverse: you can `wormhole send` some models from your computer to the pod. Again, save your files somewhere in `/workspace` or they will be lost when the pod is stopped.
- **RunPod's Cloud Sync feature** may be used to sync the persistent volume to cloud storage. You could, for example, copy the entire `/workspace` to S3, add some custom models to it, and copy it back from S3 when launching new pod configurations. Follow the Cloud Sync instructions.
### Disable the NSFW checker
The NSFW checker is enabled by default. To disable it, edit the pod configuration and set the following command:
This release (along with the post1 and post2 follow-on releases) expands support for additional LoRA and LyCORIS models, upgrades diffusers versions, and fixes a few bugs.
### LoRA and LyCORIS Support Improvement
A number of LoRA/LyCORIS fine-tune files (those which alter the text encoder as well as the unet model) were not having the desired effect in InvokeAI. This bug has now been fixed. Full documentation of LoRA support is available at InvokeAI LoRA Support.
Previously, InvokeAI did not distinguish between LoRA/LyCORIS models based on Stable Diffusion v1.5 vs those based on v2.0 and 2.1, leading to a crash when an incompatible model was loaded. This has now been fixed. In addition, the web pulldown menus for LoRA and Textual Inversion selection have been enhanced to show only those files that are compatible with the currently-selected Stable Diffusion model.
Support for the newer LoKR LyCORIS files has been added.
### Library Updates and Speed/Reproducibility Advancements
The major enhancement in this version is that NVIDIA users no longer need to decide between speed and reproducibility. Previously, if you activated the Xformers library, you would see improvements in speed and memory usage, but multiple images generated with the same seed and other parameters would be slightly different from each other. This is no longer the case. Relative to 2.3.5 you will see improved performance when running without Xformers, and even better performance when Xformers is activated. In both cases, images generated with the same settings will be identical.
Here are the new library versions:
Library Version
Torch 2.0.0
Diffusers 0.16.1
Xformers 0.0.19
Compel 1.1.5
Other Improvements
### Performance Improvements
When a model is loaded for the first time, InvokeAI calculates its checksum for incorporation into the PNG metadata. This process could take up to a minute on network-mounted disks and WSL mounts. This release noticeably speeds up the process.
### Bug Fixes
The "import models from directory" and "import from URL" functionality in the console-based model installer has now been fixed.
When running the WebUI, we have reduced the number of times that InvokeAI reaches out to HuggingFace to fetch the list of embeddable Textual Inversion models. We have also caught and fixed a problem with the updater not correctly detecting when another instance of the updater is running
## v2.3.4 <small>(7 April 2023)</small>
What's New in 2.3.4
This features release adds support for LoRA (Low-Rank Adaptation) and LyCORIS (Lora beYond Conventional) models, as well as some minor bug fixes.
### LoRA and LyCORIS Support
LoRA files contain fine-tuning weights that enable particular styles, subjects or concepts to be applied to generated images. LyCORIS files are an extended variant of LoRA. InvokeAI supports the most common LoRA/LyCORIS format, which ends in the suffix .safetensors. You will find numerous LoRA and LyCORIS models for download at Civitai, and a small but growing number at Hugging Face. Full documentation of LoRA support is available at InvokeAI LoRA Support.( Pre-release note: this page will only be available after release)
To use LoRA/LyCORIS models in InvokeAI:
Download the .safetensors files of your choice and place in /path/to/invokeai/loras. This directory was not present in earlier version of InvokeAI but will be created for you the first time you run the command-line or web client. You can also create the directory manually.
Add withLora(lora-file,weight) to your prompts. The weight is optional and will default to 1.0. A few examples, assuming that a LoRA file named loras/sushi.safetensors is present:
family sitting at dinner table eating sushi withLora(sushi,0.9)
family sitting at dinner table eating sushi withLora(sushi, 0.75)
family sitting at dinner table eating sushi withLora(sushi)
Multiple withLora() prompt fragments are allowed. The weight can be arbitrarily large, but the useful range is roughly 0.5 to 1.0. Higher weights make the LoRA's influence stronger. Negative weights are also allowed, which can lead to some interesting effects.
Generate as you usually would! If you find that the image is too "crisp" try reducing the overall CFG value or reducing individual LoRA weights. As is the case with all fine-tunes, you'll get the best results when running the LoRA on top of the model similar to, or identical with, the one that was used during the LoRA's training. Don't try to load a SD 1.x-trained LoRA into a SD 2.x model, and vice versa. This will trigger a non-fatal error message and generation will not proceed.
You can change the location of the loras directory by passing the --lora_directory option to `invokeai.
### New WebUI LoRA and Textual Inversion Buttons
This version adds two new web interface buttons for inserting LoRA and Textual Inversion triggers into the prompt as shown in the screenshot below.
Clicking on one or the other of the buttons will bring up a menu of available LoRA/LyCORIS or Textual Inversion trigger terms. Select a menu item to insert the properly-formatted withLora() or <textual-inversion> prompt fragment into the positive prompt. The number in parentheses indicates the number of trigger terms currently in the prompt. You may click the button again and deselect the LoRA or trigger to remove it from the prompt, or simply edit the prompt directly.
Currently terms are inserted into the positive prompt textbox only. However, some textual inversion embeddings are designed to be used with negative prompts. To move a textual inversion trigger into the negative prompt, simply cut and paste it.
By default the Textual Inversion menu only shows locally installed models found at startup time in /path/to/invokeai/embeddings. However, InvokeAI has the ability to dynamically download and install additional Textual Inversion embeddings from the HuggingFace Concepts Library. You may choose to display the most popular of these (with five or more likes) in the Textual Inversion menu by going to Settings and turning on "Show Textual Inversions from HF Concepts Library." When this option is activated, the locally-installed TI embeddings will be shown first, followed by uninstalled terms from Hugging Face. See The Hugging Face Concepts Library and Importing Textual Inversion files for more information.
### Minor features and fixes
This release changes model switching behavior so that the command-line and Web UIs save the last model used and restore it the next time they are launched. It also improves the behavior of the installer so that the pip utility is kept up to date.
### Known Bugs in 2.3.4
These are known bugs in the release.
The Ancestral DPMSolverMultistepScheduler (k_dpmpp_2a) sampler is not yet implemented for diffusers models and will disappear from the WebUI Sampler menu when a diffusers model is selected.
Windows Defender will sometimes raise Trojan or backdoor alerts for the codeformer.pth face restoration model, as well as the CIDAS/clipseg and runwayml/stable-diffusion-v1.5 models. These are false positives and can be safely ignored. InvokeAI performs a malware scan on all models as they are loaded. For additional security, you should use safetensors models whenever they are available.
## v2.3.3 <small>(28 March 2023)</small>
This is a bugfix and minor feature release.
### Bugfixes
Since version 2.3.2 the following bugs have been fixed:
Bugs
When using legacy checkpoints with an external VAE, the VAE file is now scanned for malware prior to loading. Previously only the main model weights file was scanned.
Textual inversion will select an appropriate batchsize based on whether xformers is active, and will default to xformers enabled if the library is detected.
The batch script log file names have been fixed to be compatible with Windows.
Occasional corruption of the .next_prefix file (which stores the next output file name in sequence) on Windows systems is now detected and corrected.
Support loading of legacy config files that have no personalization (textual inversion) section.
An infinite loop when opening the developer's console from within the invoke.sh script has been corrected.
Documentation fixes, including a recipe for detecting and fixing problems with the AMD GPU ROCm driver.
Enhancements
It is now possible to load and run several community-contributed SD-2.0 based models, including the often-requested "Illuminati" model.
The "NegativePrompts" embedding file, and others like it, can now be loaded by placing it in the InvokeAI embeddings directory.
If no --model is specified at launch time, InvokeAI will remember the last model used and restore it the next time it is launched.
On Linux systems, the invoke.sh launcher now uses a prettier console-based interface. To take advantage of it, install the dialog package using your package manager (e.g. sudo apt install dialog).
When loading legacy models (safetensors/ckpt) you can specify a custom config file and/or a VAE by placing like-named files in the same directory as the model following this example:
my-favorite-model.ckpt
my-favorite-model.yaml
my-favorite-model.vae.pt # or my-favorite-model.vae.safetensors
### Known Bugs in 2.3.3
These are known bugs in the release.
The Ancestral DPMSolverMultistepScheduler (k_dpmpp_2a) sampler is not yet implemented for diffusers models and will disappear from the WebUI Sampler menu when a diffusers model is selected.
Windows Defender will sometimes raise Trojan or backdoor alerts for the codeformer.pth face restoration model, as well as the CIDAS/clipseg and runwayml/stable-diffusion-v1.5 models. These are false positives and can be safely ignored. InvokeAI performs a malware scan on all models as they are loaded. For additional security, you should use safetensors models whenever they are available.
## v2.3.2 <small>(11 March 2023)</small>
This is a bugfix and minor feature release.
### Bugfixes
Since version 2.3.1 the following bugs have been fixed:
Black images appearing for potential NSFW images when generating with legacy checkpoint models and both --no-nsfw_checker and --ckpt_convert turned on.
Black images appearing when generating from models fine-tuned on Stable-Diffusion-2-1-base. When importing V2-derived models, you may be asked to select whether the model was derived from a "base" model (512 pixels) or the 768-pixel SD-2.1 model.
The "Use All" button was not restoring the Hi-Res Fix setting on the WebUI
When using the model installer console app, models failed to import correctly when importing from directories with spaces in their names. A similar issue with the output directory was also fixed.
Crashes that occurred during model merging.
Restore previous naming of Stable Diffusion base and 768 models.
Upgraded to latest versions of diffusers, transformers, safetensors and accelerate libraries upstream. We hope that this will fix the assertion NDArray > 2**32 issue that MacOS users have had when generating images larger than 768x768 pixels. Please report back.
As part of the upgrade to diffusers, the location of the diffusers-based models has changed from models/diffusers to models/hub. When you launch InvokeAI for the first time, it will prompt you to OK a one-time move. This should be quick and harmless, but if you have modified your models/diffusers directory in some way, for example using symlinks, you may wish to cancel the migration and make appropriate adjustments.
New "Invokeai-batch" script
### Invoke AI Batch
2.3.2 introduces a new command-line only script called invokeai-batch that can be used to generate hundreds of images from prompts and settings that vary systematically. This can be used to try the same prompt across multiple combinations of models, steps, CFG settings and so forth. It also allows you to template prompts and generate a combinatorial list like:
a shack in the mountains, photograph
a shack in the mountains, watercolor
a shack in the mountains, oil painting
a chalet in the mountains, photograph
a chalet in the mountains, watercolor
a chalet in the mountains, oil painting
a shack in the desert, photograph
...
If you have a system with multiple GPUs, or a single GPU with lots of VRAM, you can parallelize generation across the combinatorial set, reducing wait times and using your system's resources efficiently (make sure you have good GPU cooling).
To try invokeai-batch out. Launch the "developer's console" using the invoke launcher script, or activate the invokeai virtual environment manually. From the console, give the command invokeai-batch --help in order to learn how the script works and create your first template file for dynamic prompt generation.
### Known Bugs in 2.3.2
These are known bugs in the release.
The Ancestral DPMSolverMultistepScheduler (k_dpmpp_2a) sampler is not yet implemented for diffusers models and will disappear from the WebUI Sampler menu when a diffusers model is selected.
Windows Defender will sometimes raise a Trojan alert for the codeformer.pth face restoration model. As far as we have been able to determine, this is a false positive and can be safely whitelisted.
## v2.3.1 <small>(22 February 2023)</small>
This is primarily a bugfix release, but it does provide several new features that will improve the user experience.
### Enhanced support for model management
InvokeAI now makes it convenient to add, remove and modify models. You can individually import models that are stored on your local system, scan an entire folder and its subfolders for models and import them automatically, and even directly import models from the internet by providing their download URLs. You also have the option of designating a local folder to scan for new models each time InvokeAI is restarted.
There are three ways of accessing the model management features:
From the WebUI, click on the cube to the right of the model selection menu. This will bring up a form that allows you to import models individually from your local disk or scan a directory for models to import.
Using the Model Installer App
Choose option (5) download and install models from the invoke launcher script to start a new console-based application for model management. You can use this to select from a curated set of starter models, or import checkpoint, safetensors, and diffusers models from a local disk or the internet. The example below shows importing two checkpoint URLs from popular SD sites and a HuggingFace diffusers model using its Repository ID. It also shows how to designate a folder to be scanned at startup time for new models to import.
Command-line users can start this app using the command invokeai-model-install.
Using the Command Line Client (CLI)
The !install_model and !convert_model commands have been enhanced to allow entering of URLs and local directories to scan and import. The first command installs .ckpt and .safetensors files as-is. The second one converts them into the faster diffusers format before installation.
Internally InvokeAI is able to probe the contents of a .ckpt or .safetensors file to distinguish among v1.x, v2.x and inpainting models. This means that you do not need to include "inpaint" in your model names to use an inpainting model. Note that Stable Diffusion v2.x models will be autoconverted into a diffusers model the first time you use it.
Please see INSTALLING MODELS for more information on model management.
### An Improved Installer Experience
The installer now launches a console-based UI for setting and changing commonly-used startup options:
After selecting the desired options, the installer installs several support models needed by InvokeAI's face reconstruction and upscaling features and then launches the interface for selecting and installing models shown earlier. At any time, you can edit the startup options by launching invoke.sh/invoke.bat and entering option (6) change InvokeAI startup options
Command-line users can launch the new configure app using invokeai-configure.
This release also comes with a renewed updater. To do an update without going through a whole reinstallation, launch invoke.sh or invoke.bat and choose option (9) update InvokeAI . This will bring you to a screen that prompts you to update to the latest released version, to the most current development version, or any released or unreleased version you choose by selecting the tag or branch of the desired version.
Command-line users can run this interface by typing invokeai-configure
### Image Symmetry Options
There are now features to generate horizontal and vertical symmetry during generation. The way these work is to wait until a selected step in the generation process and then to turn on a mirror image effect. In addition to generating some cool images, you can also use this to make side-by-side comparisons of how an image will look with more or fewer steps. Access this option from the WebUI by selecting Symmetry from the image generation settings, or within the CLI by using the options --h_symmetry_time_pct and --v_symmetry_time_pct (these can be abbreviated to --h_sym and --v_sym like all other options).
### A New Unified Canvas Look
This release introduces a beta version of the WebUI Unified Canvas. To try it out, open up the settings dialogue in the WebUI (gear icon) and select Use Canvas Beta Layout:
Refresh the screen and go to to Unified Canvas (left side of screen, third icon from the top). The new layout is designed to provide more space to work in and to keep the image controls close to the image itself:
Model conversion and merging within the WebUI
The WebUI now has an intuitive interface for model merging, as well as for permanent conversion of models from legacy .ckpt/.safetensors formats into diffusers format. These options are also available directly from the invoke.sh/invoke.bat scripts.
An easier way to contribute translations to the WebUI
We have migrated our translation efforts to Weblate, a FOSS translation product. Maintaining the growing project's translations is now far simpler for the maintainers and community. Please review our brief translation guide for more information on how to contribute.
Numerous internal bugfixes and performance issues
### Bug Fixes
This releases quashes multiple bugs that were reported in 2.3.0. Major internal changes include upgrading to diffusers 0.13.0, and using the compel library for prompt parsing. See Detailed Change Log for a detailed list of bugs caught and squished.
Summary of InvokeAI command line scripts (all accessible via the launcher menu)
Command Description
invokeai Command line interface
invokeai --web Web interface
invokeai-model-install Model installer with console forms-based front end
invokeai-ti --gui Textual inversion, with a console forms-based front end
invokeai-merge --gui Model merging, with a console forms-based front end
invokeai-configure Startup configuration; can also be used to reinstall support models
invokeai-update InvokeAI software updater
### Known Bugs in 2.3.1
These are known bugs in the release.
MacOS users generating 768x768 pixel images or greater using diffusers models may experience a hard crash with assertion NDArray > 2**32 This appears to be an issu...
## v2.3.0 <small>(15 January 2023)</small>
**Transition to diffusers
@ -264,7 +494,7 @@ sections describe what's new for InvokeAI.
Invoke AI originated as a project built by the community, and that vision carries forward today as we aim to build the best pro-grade tools available. We work together to incorporate the latest in AI/ML research, making these tools available in over 20 languages to artists and creatives around the world as part of our fully permissive OSS project designed for individual users to self-host and use.
## Contributing to Invoke AI
Anyone who wishes to contribute to InvokeAI, whether features, bug fixes, code cleanup, testing, code reviews, documentation or translation is very much encouraged to do so.
To join, just raise your hand on the InvokeAI Discord server (#dev-chat) or the GitHub discussion board.
### Areas of contribution:
#### Development
If you’d like to help with development, please see our [development guide](contribution_guides/development.md). If you’re unfamiliar with contributing to open source projects, there is a tutorial contained within the development guide.
#### Documentation
If you’d like to help with documentation, please see our [documentation guide](contribution_guides/documenation.md).
#### Translation
If you'd like to help with translation, please see our[translation guide](docs/contributing/.contribution_guides/translation.md).
#### Tutorials
Please reach out to @imic or @hipsterusername on [Discord](https://discord.gg/ZmtBAhwWhy) to help create tutorials for InvokeAI.
We hope you enjoy using our software as much as we enjoy creating it, and we hope that some of those of you who are reading this will elect to become part of our contributor community.
### Contributors
This project is a combined effort of dedicated people from across the world.[Check out the list of all these amazing people](https://invoke-ai.github.io/InvokeAI/other/CONTRIBUTORS/). We thank them for their time, hard work and effort.
### Code of Conduct
The InvokeAI community is a welcoming place, and we want your help in maintaining that. Please review our [Code of Conduct](https://github.com/invoke-ai/InvokeAI/blob/main/CODE_OF_CONDUCT.md) to learn more - it's essential to maintaining a respectful and inclusive environment.
By making a contribution to this project, you certify that:
1. The contribution was created in whole or in part by you and you have the right to submit it under the open-source license indicated in this project’s GitHub repository; or
2. The contribution is based upon previous work that, to the best of your knowledge, is covered under an appropriate open-source license and you have the right under that license to submit that work with modifications, whether created in whole or in part by you, under the same open-source license (unless you are permitted to submit under a different license); or
3. The contribution was provided directly to you by some other person who certified (1) or (2) and you have not modified it; or
4. You understand and agree that this project and the contribution are public and that a record of the contribution (including all personal information you submit with it, including your sign-off) is maintained indefinitely and may be redistributed consistent with this project or the open-source license(s) involved.
This disclaimer is not a license and does not grant any rights or permissions. You must obtain necessary permissions and licenses, including from third parties, before contributing to this project.
This disclaimer is provided "as is" without warranty of any kind, whether expressed or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose, or non-infringement. In no event shall the authors or copyright holders be liable for any claim, damages, or other liability, whether in an action of contract, tort, or otherwise, arising from, out of, or in connection with the contribution or the use or other dealings in the contribution.
### Support
For support, please use this repository's [GitHub Issues](https://github.com/invoke-ai/InvokeAI/issues), or join the [Discord](https://discord.gg/ZmtBAhwWhy).
Original portions of the software are Copyright (c) 2023 by respective contributors.
---
Remember, your contributions help make this project great. We're excited to see what you'll bring to our community!
| Name | `image` | The variable that will hold our image |
| Type Hint | `Union[ImageField, None]` | The types for our field. Indicates that the image can either be an `ImageField` type or `None` |
| Field | `Field(description="The input image", default=None)` | The image variable is a field which needs a description and a default value that we set to `None`. |
Great. Now let us create our other inputs for `width` and `height`
| type_hints | `Dict[str, Literal["integer", "float", "boolean", "string", "enum", "image", "latents", "model", "control"]]` | `type_hint: "model"` provides type hints related to the model like displaying a list of available models |
| tags | `List[str]` | `tags: ['resize', 'image']` will classify your invocation under the tags of resize and image. |
| title | `str` | `title: 'Resize Image` will rename your to this custom title rather than infer from the name of the Invocation class. |
So let us update your `ResizeInvocation` with some extra configuration and see
If you are looking to help to with a code contribution, InvokeAI uses several different technologies under the hood: Python (Pydantic, FastAPI, diffusers) and Typescript (React, Redux Toolkit, ChakraUI, Mantine, Konva). Familiarity with StableDiffusion and image generation concepts is helpful, but not essential.
For more information, please review our area specific documentation:
If you don't feel ready to make a code contribution yet, no problem! You can also help out in other ways, such as [documentation](documentation.md) or [translation](translation.md).
There are two paths to making a development contribution:
1. Choosing an open issue to address. Open issues can be found in the [Issues](https://github.com/invoke-ai/InvokeAI/issues?q=is%3Aissue+is%3Aopen) section of the InvokeAI repository. These are tagged by the issue type (bug, enhancement, etc.) along with the “good first issues” tag denoting if they are suitable for first time contributors.
1. Additional items can be found on our [roadmap](https://github.com/orgs/invoke-ai/projects/7). The roadmap is organized in terms of priority, and contains features of varying size and complexity. If there is an inflight item you’d like to help with, reach out to the contributor assigned to the item to see how you can help.
2. Opening a new issue or feature to add. **Please make sure you have searched through existing issues before creating new ones.**
*Regardless of what you choose, please post in the [#dev-chat](https://discord.com/channels/1020123559063990373/1049495067846524939) channel of the Discord before you start development in order to confirm that the issue or feature is aligned with the current direction of the project. We value our contributors time and effort and want to ensure that no one’s time is being misspent.*
## Best Practices:
* Keep your pull requests small. Smaller pull requests are more likely to be accepted and merged
* Comments! Commenting your code helps reviwers easily understand your contribution
* Use Python and Typescript’s typing systems, and consider using an editor with [LSP](https://microsoft.github.io/language-server-protocol/) support to streamline development
* Make all communications public. This ensure knowledge is shared with the whole community
## **How do I make a contribution?**
Never made an open source contribution before? Wondering how contributions work in our project? Here's a quick rundown!
Before starting these steps, ensure you have your local environment [configured for development](../LOCAL_DEVELOPMENT.md).
1. Find a [good first issue](https://github.com/invoke-ai/InvokeAI/contribute) that you are interested in addressing or a feature that you would like to add. Then, reach out to our team in the [#dev-chat](https://discord.com/channels/1020123559063990373/1049495067846524939) channel of the Discord to ensure you are setup for success.
2. Fork the [InvokeAI](https://github.com/invoke-ai/InvokeAI) repository to your GitHub profile. This means that you will have a copy of the repository under**your-GitHub-username/InvokeAI**.
3. Clone the repository to your local machine using:
If you're unfamiliar with using Git through the commandline, [GitHub Desktop](https://desktop.github.com) is a easy-to-use alternative with a UI. You can do all the same steps listed here, but through the interface.
4. Create a new branch for your fix using:
```bash
git checkout -b branch-name-here
```
5. Make the appropriate changes for the issue you are trying to address or the feature that you want to add.
6. Add the file contents of the changed files to the "snapshot" git uses to manage the state of the project, also known as the index:
```bash
git add insert-paths-of-changed-files-here
```
7. Store the contents of the index with a descriptive message.
```bash
git commit -m "Insert a short message of the changes made here"
```
8. Push the changes to the remote repository using
```markdown
git push origin branch-name-here
```
9. Submit a pull request to the **main** branch of the InvokeAI repository.
10. Title the pull request with a short description of the changes made and the issue or bug number associated with your change. For example, you can title an issue like so "Added more log outputting to resolve #1234".
11. In the description of the pull request, explain the changes that you made, any issues you think exist with the pull request you made, and any questions you have for the maintainer. It's OK if your pull request is not perfect (no pull request is), the reviewer will be able to help you fix any problems and improve it!
12. Wait for the pull request to be reviewed by other collaborators.
13. Make changes to the pull request if the reviewer(s) recommend them.
14. Celebrate your success after your pull request is merged!
If you’d like to learn more about contributing to Open Source projects, here is a[Getting Started Guide](https://opensource.com/article/19/7/create-pull-request-github).
## **Where can I go for help?**
If you need help, you can ask questions in the [#dev-chat](https://discord.com/channels/1020123559063990373/1049495067846524939) channel of the Discord.
For frontend related work, **@pyschedelicious** is the best person to reach out to.
For backend related work, please reach out to **@blessedcoolant**, **@lstein**, **@StAlKeR7779** or **@pyschedelicious**.
## **What does the Code of Conduct mean for me?**
Our [Code of Conduct](CODE_OF_CONDUCT.md) means that you are responsible for treating everyone on the project with respect and courtesy regardless of their identity. If you are the victim of any inappropriate behavior or comments as described in our Code of Conduct, we are here for you and will do the best to ensure that the abuser is reprimanded appropriately, per our code.
The UI is a fairly straightforward Typescript React app, with the Unified Canvas being more complex.
Code is located in`invokeai/frontend/web/`for review.
## Stack
State management is Redux via[Redux Toolkit](https://github.com/reduxjs/redux-toolkit). We lean heavily on RTK:
-`createAsyncThunk`for HTTP requests
-`createEntityAdapter`for fetching images and models
-`createListenerMiddleware`for workflows
The API client and associated types are generated from the OpenAPI schema. See API_CLIENT.md.
Communication with server is a mix of HTTP and[socket.io](https://github.com/socketio/socket.io-client)(with a simple socket.io redux middleware to help).
[Chakra-UI](https://github.com/chakra-ui/chakra-ui)& [Mantine](https://github.com/mantinedev/mantine) for components and styling.
[Konva](https://github.com/konvajs/react-konva)for the canvas, but we are pushing the limits of what is feasible with it (and HTML canvas in general). We plan to rebuild it with[PixiJS](https://github.com/pixijs/pixijs)to take advantage of WebGL's improved raster handling.
[Vite](https://vitejs.dev/)for bundling.
Localisation is via[i18next](https://github.com/i18next/react-i18next), but translation happens on our[Weblate](https://hosted.weblate.org/engage/invokeai/)project. Only the English source strings should be changed on this repo.
## Contributing
Thanks for your interest in contributing to the InvokeAI Web UI!
We encourage you to ping @psychedelicious and @blessedcoolant on[Discord](https://discord.gg/ZmtBAhwWhy)if you want to contribute, just to touch base and ensure your work doesn't conflict with anything else going on. The project is very active.
### Dev Environment
**Setup**
1. Install[node](https://nodejs.org/en/download/). You can confirm node is installed with:
```bash
node --version
```
2. Install [yarn classic](https://classic.yarnpkg.com/lang/en/) and confirm it is installed by running this:
```bash
npm install --global yarn
yarn --version
```
From`invokeai/frontend/web/`run`yarn install`to get everything set up.
Start everything in dev mode:
1. Ensure your virtual environment is running
2. Start the dev server:`yarn dev`
3. Start the InvokeAI Nodes backend:`python scripts/invokeai-web.py # run from the repo root`
4. Point your browser to the dev server address e.g.[http://localhost:5173/](http://localhost:5173/)
### VSCode Remote Dev
We've noticed an intermittent issue with the VSCode Remote Dev port forwarding. If you use this feature of VSCode, you may intermittently click the Invoke button and then get nothing until the request times out. Suggest disabling the IDE's port forwarding feature and doing it manually via SSH:
Documentation is an important part of any open source project. It provides a clear and concise way to communicate how the software works, how to use it, and how to troubleshoot issues. Without proper documentation, it can be difficult for users to understand the purpose and functionality of the project.
## Contributing
All documentation is maintained in the InvokeAI GitHub repository. If you come across documentation that is out of date or incorrect, please submit a pull request with the necessary changes.
When updating or creating documentation, please keep in mind InvokeAI is a tool for everyone, not just those who have familiarity with generative art.
## Help & Questions
Please ping @imic1 or @hipsterusername in the [Discord](https://discord.com/channels/1020123559063990373/1049495067846524939) if you have any questions.
InvokeAI uses[Weblate](https://weblate.org/)for translation. Weblate is a FOSS project providing a scalable translation service. Weblate automates the tedious parts of managing translation of a growing project, and the service is generously provided at no cost to FOSS projects like InvokeAI.
## Contributing
If you'd like to contribute by adding or updating a translation, please visit our[Weblate project](https://hosted.weblate.org/engage/invokeai/). You'll need to sign in with your GitHub account (a number of other accounts are supported, including Google).
Once signed in, select a language and then the Web UI component. From here you can Browse and Translate strings from English to your chosen language. Zen mode offers a simpler translation experience.
Your changes will be attributed to you in the automated PR process; you don't need to do anything else.
## Help & Questions
Please check Weblate's[documentation](https://docs.weblate.org/en/latest/index.html)or ping @Harvestor on [Discord](https://discord.com/channels/1020123559063990373/1049495067846524939) if you have any questions.
## Thanks
Thanks to the InvokeAI community for their efforts to translate the project!
Tutorials help new & existing users expand their abilty to use InvokeAI to the full extent of our features and services.
Currently, we have a set of tutorials available on our [YouTube channel](https://www.youtube.com/@invokeai), but as InvokeAI continues to evolve with new updates, we want to ensure that we are giving our users the resources they need to succeed.
Tutorials can be in the form of videos or article walkthroughs on a subject of your choice. We recommend focusing tutorials on the key image generation methods, or on a specific component within one of the image generation methods.
## Contributing
Please reach out to @imic or @hipsterusername on [Discord](https://discord.gg/ZmtBAhwWhy) to help create tutorials for InvokeAI.
| `--seamless_axes` | | `x,y` | Specify which axes to use circular convolution on. |
| `--log_tokenization` | `-t` | `False` | Display a color-coded list of the parsed tokens derived from the prompt |
| `--skip_normalization` | `-x` | `False` | Weighted subprompts will not be normalized. See [Weighted Prompts](./OTHER.md#weighted-prompts) |
| `--skip_normalization` | `-x` | `False` | Weighted subprompts will not be normalized. See [Weighted Prompts](../features/OTHER.md#weighted-prompts) |
| `--upscale <int> <float>` | `-U <int> <float>` | `-U 1 0.75` | Upscale image by magnification factor (2, 4), and set strength of upscaling (0.0-1.0). If strength not set, will default to 0.75. |
| `--facetool_strength <float>` | `-G <float> ` | `-G0` | Fix faces (defaults to using the GFPGAN algorithm); argument indicates how hard the algorithm should try (0.0-1.0) |
| `--facetool <name>` | `-ft <name>` | `-ft gfpgan` | Select face restoration algorithm to use: gfpgan, codeformer |
| `--codeformer_fidelity` | `-cf <float>` | `0.75` | Used along with CodeFormer. Takes values between 0 and 1. 0 produces high quality but low accuracy. 1 produces high accuracy but low quality |
| `--save_original` | `-save_orig` | `False` | When upscaling or fixing faces, this will cause the original image to be saved rather than replaced. |
| `--variation <float>` | `-v<float>` | `0.0` | Add a bit of noise (0.0=none, 1.0=high) to the image in order to generate a series of variations. Usually used in combination with `-S<seed>` and `-n<int>` to generate a series a riffs on a starting image. See [Variations](./VARIATIONS.md). |
| `--with_variations <pattern>` | | `None` | Combine two or more variations. See [Variations](./VARIATIONS.md) for now to use this. |
| `--variation <float>` | `-v<float>` | `0.0` | Add a bit of noise (0.0=none, 1.0=high) to the image in order to generate a series of variations. Usually used in combination with `-S<seed>` and `-n<int>` to generate a series a riffs on a starting image. See [Variations](../features/VARIATIONS.md). |
| `--with_variations <pattern>` | | `None` | Combine two or more variations. See [Variations](../features/VARIATIONS.md) for now to use this. |
| `--save_intermediates <n>` | | `None` | Save the image from every nth step into an "intermediates" folder inside the output directory |
| `--h_symmetry_time_pct <float>` | | `None` | Create symmetry along the X axis at the desired percent complete of the generation process. (Must be between 0.0 and 1.0; set to a very small number like 0.0001 for just after the first step of generation.) |
| `--v_symmetry_time_pct <float>` | | `None` | Create symmetry along the Y axis at the desired percent complete of the generation process. (Must be between 0.0 and 1.0; set to a very small number like 0.0001 for just after the first step of generation.) |
@ -257,7 +257,7 @@ additional options:
by `-M`. You may also supply just a single initial image with the areas
to overpaint made transparent, but you must be careful not to destroy
the pixels underneath when you create the transparent areas. See
[Inpainting](./INPAINTING.md) for details.
[Inpainting](INPAINTING.md) for details.
inpainting accepts all the arguments used for txt2img and img2img, as well as
the --mask (-M) and --text_mask (-tm) arguments:
@ -297,7 +297,7 @@ invoke> a piece of cake -I /path/to/breakfast.png -tm bagel 0.6
You can load and use hundreds of community-contributed Textual
Inversion models just by typing the appropriate trigger phrase. Please
see [Concepts Library](CONCEPTS.md) for more details.
see [Concepts Library](../features/CONCEPTS.md) for more details.
# :material-library-shelves: The Hugging Face Concepts Library and Importing Textual Inversion files
# :material-library-shelves: Textual Inversions and LoRAs
With the advances in research, many new capabilities are available to customize the knowledge and understanding of novel concepts not originally contained in the base model.
## Using Textual Inversion Files
@ -12,18 +15,16 @@ and artistic styles. They are also known as "embeds" in the machine learning
world.
Each TI file introduces one or more vocabulary terms to the SD model. These are
known in InvokeAI as "triggers." Triggers are often, but not always, denoted
using angle brackets as in "<trigger-phrase>". The two most common type of
known in InvokeAI as "triggers." Triggers are denoted using angle brackets
as in "<trigger-phrase>". The two most common type of
TI files that you'll encounter are `.pt` and `.bin` files, which are produced by
different TI training packages. InvokeAI supports both formats, but its
[built-in TI training system](TEXTUAL_INVERSION.md) produces `.pt`.
[built-in TI training system](TRAINING.md) produces `.pt`.
The [Hugging Face company](https://huggingface.co/sd-concepts-library) has
amassed a large ligrary of >800 community-contributed TI files covering a
broad range of subjects and styles. InvokeAI has built-in support for this
library which downloads and merges TI files automatically upon request. You can
also install your own or others' TI files by placing them in a designated
directory.
broad range of subjects and styles. You can also install your own or others' TI files
by placing them in the designated directory for the compatible model type
### An Example
@ -41,91 +42,47 @@ You can also combine styles and concepts:
The configuration settings are divided into several distinct
groups in `invokeia.yaml`:
### Web Server
| Setting | Default Value | Description |
|----------|----------------|--------------|
| `host` | `localhost` | Name or IP address of the network interface that the web server will listen on |
| `port` | `9090` | Network port number that the web server will listen on |
| `allow_origins` | `[]` | A list of host names or IP addresses that are allowed to connect to the InvokeAI API in the format `['host1','host2',...]` |
| `allow_credentials | `true` | Require credentials for a foreign host to access the InvokeAI API (don't change this) |
| `allow_methods` | `*` | List of HTTP methods ("GET", "POST") that the web server is allowed to use when accessing the API |
| `allow_headers` | `*` | List of HTTP headers that the web server will accept when accessing the API |
The documentation for InvokeAI's API can be accessed by browsing to the following URL: [http://localhost:9090/docs].
### Features
These configuration settings allow you to enable and disable various InvokeAI features:
| Setting | Default Value | Description |
|----------|----------------|--------------|
| `esrgan` | `true` | Activate the ESRGAN upscaling options|
| `internet_available` | `true` | When a resource is not available locally, try to fetch it via the internet |
| `log_tokenization` | `false` | Before each text2image generation, print a color-coded representation of the prompt to the console; this can help understand why a prompt is not working as expected |
| `patchmatch` | `true` | Activate the "patchmatch" algorithm for improved inpainting |
| `restore` | `true` | Activate the facial restoration features (DEPRECATED; restoration features will be removed in 3.0.0) |
### Memory/Performance
These options tune InvokeAI's memory and performance characteristics.
| Setting | Default Value | Description |
|----------|----------------|--------------|
| `always_use_cpu` | `false` | Use the CPU to generate images, even if a GPU is available |
| `free_gpu_mem` | `false` | Aggressively free up GPU memory after each operation; this will allow you to run in low-VRAM environments with some performance penalties |
| `max_cache_size` | `6` | Amount of CPU RAM (in GB) to reserve for caching models in memory; more cache allows you to keep models in memory and switch among them quickly |
| `max_vram_cache_size` | `2.75` | Amount of GPU VRAM (in GB) to reserve for caching models in VRAM; more cache speeds up generation but reduces the size of the images that can be generated. This can be set to zero to maximize the amount of memory available for generation. |
| `precision` | `auto` | Floating point precision. One of `auto`, `float16` or `float32`. `float16` will consume half the memory of `float32` but produce slightly lower-quality images. The `auto` setting will guess the proper precision based on your video card and operating system |
| `sequential_guidance` | `false` | Calculate guidance in serial rather than in parallel, lowering memory requirements at the cost of some performance loss |
| `xformers_enabled` | `true` | If the x-formers memory-efficient attention module is installed, activate it for better memory usage and generation speed|
| `tiled_decode` | `false` | If true, then during the VAE decoding phase the image will be decoded a section at a time, reducing memory consumption at the cost of a performance hit |
### Paths
These options set the paths of various directories and files used by
InvokeAI. Relative paths are interpreted relative to INVOKEAI_ROOT, so
if INVOKEAI_ROOT is `/home/fred/invokeai` and the path is
`autoimport/main`, then the corresponding directory will be located at
`/home/fred/invokeai/autoimport/main`.
| Setting | Default Value | Description |
|----------|----------------|--------------|
| `autoimport_dir` | `autoimport/main` | At startup time, read and import any main model files found in this directory |
| `lora_dir` | `autoimport/lora` | At startup time, read and import any LoRA/LyCORIS models found in this directory |
| `embedding_dir` | `autoimport/embedding` | At startup time, read and import any textual inversion (embedding) models found in this directory |
| `controlnet_dir` | `autoimport/controlnet` | At startup time, read and import any ControlNet models found in this directory |
| `conf_path` | `configs/models.yaml` | Location of the `models.yaml` model configuration file |
| `models_dir` | `models` | Location of the directory containing models installed by InvokeAI's model manager |
| `legacy_conf_dir` | `configs/stable-diffusion` | Location of the directory containing the .yaml configuration files for legacy checkpoint models |
| `db_dir` | `databases` | Location of the directory containing InvokeAI's image, schema and session database |
| `outdir` | `outputs` | Location of the directory in which the gallery of generated and uploaded images will be stored |
| `use_memory_db` | `false` | Keep database information in memory rather than on disk; this will not preserve image gallery information across restarts |
Note that the autoimport directories will be searched recursively,
allowing you to organize the models into folders and subfolders in any
way you wish. In addition, while we have split up autoimport
directories by the type of model they contain, this isn't
necessary. You can combine different model types in the same folder
and InvokeAI will figure out what they are. So you can easily use just
one autoimport directory by commenting out the unneeded paths:
```
Paths:
autoimport_dir: autoimport
# lora_dir: null
# embedding_dir: null
# controlnet_dir: null
```
### Logging
These settings control the information, warning, and debugging
messages printed to the console log while InvokeAI is running:
| Setting | Default Value | Description |
|----------|----------------|--------------|
| `log_handlers` | `console` | This controls where log messages are sent, and can be a list of one or more destinations. Values include `console`, `file`, `syslog` and `http`. These are described in more detail below |
| `log_format` | `color` | This controls the formatting of the log messages. Values are `plain`, `color`, `legacy` and `syslog` |
| `log_level` | `debug` | This filters messages according to the level of severity and can be one of `debug`, `info`, `warning`, `error` and `critical`. For example, setting to `warning` will display all messages at the warning level or higher, but won't display "debug" or "info" messages |
Several different log handler destinations are available, and multiple destinations are supported by providing a list:
```
log_handlers:
- console
- syslog=localhost
- file=/var/log/invokeai.log
```
* `console` is the default. It prints log messages to the command-line window from which InvokeAI was launched.
* `syslog` is only available on Linux and Macintosh systems. It uses
the operating system's "syslog" facility to write log file entries
locally or to a remote logging machine. `syslog` offers a variety
of configuration options:
```
syslog=/dev/log` - log to the /dev/log device
syslog=localhost` - log to the network logger running on the local machine
syslog=localhost:512` - same as above, but using a non-standard port
Command-line users can launch the model installer using the command
`invokeai-model-install`.
_Be aware that some ControlNet models require additional code
functionality in order to work properly, so just installing a
third-party ControlNet model may not have the desired effect._ Please
read and follow the documentation for installing a third party model
not currently included among InvokeAI's default list.
The models currently supported include:
**Canny**:
When the Canny model is used in ControlNet, Invoke will attempt to generate images that match the edges detected.
Canny edge detection works by detecting the edges in an image by looking for abrupt changes in intensity. It is known for its ability to detect edges accurately while reducing noise and false edges, and the preprocessor can identify more information by decreasing the thresholds.
**M-LSD**:
M-LSD is another edge detection algorithm used in ControlNet. It stands for Multi-Scale Line Segment Detector.
It detects straight line segments in an image by analyzing the local structure of the image at multiple scales. It can be useful for architectural imagery, or anything where straight-line structural information is needed for the resulting output.
**Lineart**:
The Lineart model in ControlNet generates line drawings from an input image. The resulting pre-processed image is a simplified version of the original, with only the outlines of objects visible.The Lineart model in ControlNet is known for its ability to accurately capture the contours of the objects in an input sketch.
**Lineart Anime**:
A variant of the Lineart model that generates line drawings with a distinct style inspired by anime and manga art styles.
**Depth**:
A model that generates depth maps of images, allowing you to create more realistic 3D models or to simulate depth effects in post-processing.
**Normal Map (BAE):**
A model that generates normal maps from input images, allowing for more realistic lighting effects in 3D rendering.
**Image Segmentation**:
A model that divides input images into segments or regions, each of which corresponds to a different object or part of the image. (More details coming soon)
**Openpose**:
The OpenPose control model allows for the identification of the general pose of a character by pre-processing an existing image with a clear human structure. With advanced options, Openpose can also detect the face or hands in the image.
**Mediapipe Face**:
The MediaPipe Face identification processor is able to clearly identify facial features in order to capture vivid expressions of human faces.
**Tile (experimental)**:
The Tile model fills out details in the image to match the image, rather than the prompt. The Tile Model is a versatile tool that offers a range of functionalities. Its primary capabilities can be boiled down to two main behaviors:
- It can reinterpret specific details within an image and create fresh, new elements.
- It has the ability to disregard global instructions if there's a discrepancy between them and the local context or specific parts of the image. In such cases, it uses the local context to guide the process.
The Tile Model can be a powerful tool in your arsenal for enhancing image quality and details. If there are undesirable elements in your images, such as blurriness caused by resizing, this model can effectively eliminate these issues, resulting in cleaner, crisper images. Moreover, it can generate and add refined details to your images, improving their overall quality and appeal.
**Pix2Pix (experimental)**
With Pix2Pix, you can input an image into the controlnet, and then "instruct" the model to change it using your prompt. For example, you can say "Make it winter" to add more wintry elements to a scene.
**Inpaint**: Coming Soon - Currently this model is available but not functional on the Canvas. An upcoming release will provide additional capabilities for using this model when inpainting.
Each of these models can be adjusted and combined with other ControlNet models to achieve different results, giving you even more control over your image generation process.
## Using ControlNet
To use ControlNet, you can simply select the desired model and adjust both the ControlNet and Pre-processor settings to achieve the desired result. You can also use multiple ControlNet models at the same time, allowing you to achieve even more complex effects or styles in your generated images.
Each ControlNet has two settings that are applied to the ControlNet.
Weight - Strength of the Controlnet model applied to the generation for the section, defined by start/end.
Start/End - 0 represents the start of the generation, 1 represents the end. The Start/end setting controls what steps during the generation process have the ControlNet applied.
Additionally, each ControlNet section can be expanded in order to manipulate settings for the image pre-processor that adjusts your uploaded image before using it in when you Invoke.
*The node editor is experimental. We've made it accessible because we use it to develop the application, but we have not addressed the many known rough edges. It's very easy to shoot yourself in the foot, and we cannot offer support for it until it sees full release (ETA v3.1). Everything is subject to change without warning.*
🚨
The nodes editor is a blank canvas allowing for the use of individual functions and image transformations to control the image generation workflow. The node processing flow is usually done from left (inputs) to right (outputs), though linearity can become abstracted the more complex the node graph becomes. Nodes inputs and outputs are connected by dragging connectors from node to node.
To better understand how nodes are used, think of how an electric power bar works. It takes in one input (electricity from a wall outlet) and passes it to multiple devices through multiple outputs. Similarly, a node could have multiple inputs and outputs functioning at the same (or different) time, but all node outputs pass information onward like a power bar passes electricity. Not all outputs are compatible with all inputs, however - Each node has different constraints on how it is expecting to input/output information. In general, node outputs are colour-coded to match compatible inputs of other nodes.
## Anatomy of a Node
Individual nodes are made up of the following:
- Inputs: Edge points on the left side of the node window where you connect outputs from other nodes.
- Outputs: Edge points on the right side of the node window where you connect to inputs on other nodes.
- Options: Various options which are either manually configured, or overridden by connecting an output from another node to the input.
## Diffusion Overview
Taking the time to understand the diffusion process will help you to understand how to set up your nodes in the nodes editor.
There are two main spaces Stable Diffusion works in: image space and latent space.
Image space represents images in pixel form that you look at. Latent space represents compressed inputs. It’s in latent space that Stable Diffusion processes images. A VAE (Variational Auto Encoder) is responsible for compressing and encoding inputs into latent space, as well as decoding outputs back into image space.
When you generate an image using text-to-image, multiple steps occur in latent space:
1. Random noise is generated at the chosen height and width. The noise’s characteristics are dictated by the chosen (or not chosen) seed. This noise tensor is passed into latent space. We’ll call this noise A.
1. Using a model’s U-Net, a noise predictor examines noise A, and the words tokenized by CLIP from your prompt (conditioning). It generates its own noise tensor to predict what the final image might look like in latent space. We’ll call this noise B.
1. Noise B is subtracted from noise A in an attempt to create a final latent image indicative of the inputs. This step is repeated for the number of sampler steps chosen.
1. The VAE decodes the final latent image from latent space into image space.
image-to-image is a similar process, with only step 1 being different:
1. The input image is decoded from image space into latent space by the VAE. Noise is then added to the input latent image. Denoising Strength dictates how much noise is added, 0 being none, and 1 being all-encompassing. We’ll call this noise A. The process is then the same as steps 2-4 in the text-to-image explanation above.
Furthermore, a model provides the CLIP prompt tokenizer, the VAE, and a U-Net (where noise prediction occurs given a prompt and initial noise tensor).
A noise scheduler (eg. DPM++ 2M Karras) schedules the subtraction of noise from the latent image across the sampler steps chosen (step 3 above). Less noise is usually subtracted at higher sampler steps.
| CannyImageProcessor | Canny edge detection for ControlNet |
| ClipSkip | Skip layers in clip text_encoder model |
| Collect | Collects values into a collection |
| Prompt (Compel) | Parse prompt using compel package to conditioning |
| ContentShuffleImageProcessor | Applies content shuffle processing to image |
| ControlNet | Collects ControlNet info to pass to other nodes |
| CvInpaint | Simple inpaint using opencv |
| Divide | Divides two numbers |
| DynamicPrompt | Parses a prompt using adieyal/dynamic prompt's random or combinatorial generator |
| FloatLinearRange | Creates a range |
| HedImageProcessor | Applies HED edge detection to image |
| ImageBlur | Blurs an image |
| ImageChannel | Gets a channel from an image |
| ImageCollection | Load a collection of images and provide it as output |
| ImageConvert | Converts an image to a different mode |
| ImageCrop | Crops an image to a specified box. The box can be outside of the image. |
| ImageInverseLerp | Inverse linear interpolation of all pixels of an image |
| ImageLerp | Linear interpolation of all pixels of an image |
| ImageMultiply | Multiplies two images together using `PIL.ImageChops.Multiply()` |
| ImageNSFWBlurInvocation | Detects and blurs images that may contain sexually explicit content |
| ImagePaste | Pastes an image into another image |
| ImageProcessor | Base class for invocations that reprocess images for ControlNet |
| ImageResize | Resizes an image to specific dimensions |
| ImageScale | Scales an image by a factor |
| ImageToLatents | Scales latents by a given factor |
| ImageWatermarkInvocation | Adds an invisible watermark to images |
| InfillColor | Infills transparent areas of an image with a solid color |
| InfillPatchMatch | Infills transparent areas of an image using the PatchMatch algorithm |
| InfillTile | Infills transparent areas of an image with tiles of the image |
| Inpaint | Generates an image using inpaint |
| Iterate | Iterates over a list of items |
| LatentsToImage | Generates an image from latents |
| LatentsToLatents | Generates latents using latents as base image |
| LeresImageProcessor | Applies leres processing to image |
| LineartAnimeImageProcessor | Applies line art anime processing to image |
| LineartImageProcessor | Applies line art processing to image |
| LoadImage | Load an image and provide it as output |
| Lora Loader | Apply selected lora to unet and text_encoder |
| Model Loader | Loads a main model, outputting its submodels |
| MaskFromAlpha | Extracts the alpha channel of an image as a mask |
| MediapipeFaceProcessor | Applies mediapipe face processing to image |
| MidasDepthImageProcessor | Applies Midas depth processing to image |
| MlsdImageProcessor | Applied MLSD processing to image |
| Multiply | Multiplies two numbers |
| Noise | Generates latent noise |
| NormalbaeImageProcessor | Applies NormalBAE processing to image |
| OpenposeImageProcessor | Applies Openpose processing to image |
| ParamFloat | A float parameter |
| ParamInt | An integer parameter |
| PidiImageProcessor | Applies PIDI processing to an image |
| Progress Image | Displays the progress image in the Node Editor |
| RandomInit | Outputs a single random integer |
| RandomRange | Creates a collection of random numbers |
| Range | Creates a range of numbers from start to stop with step |
| RangeOfSize | Creates a range from start to start + size with step |
| ResizeLatents | Resizes latents to explicit width/height (in pixels). Provided dimensions are floor-divided by 8. |
| RestoreFace | Restores faces in the image |
| ScaleLatents | Scales latents by a given factor |
| SegmentAnythingProcessor | Applies segment anything processing to image |
| ShowImage | Displays a provided image, and passes it forward in the pipeline |
| StepParamEasing | Experimental per-step parameter for easing for denoising steps |
| Subtract | Subtracts two numbers |
| TextToLatents | Generates latents from conditionings |
| TileResampleProcessor | Bass class for invocations that preprocess images for ControlNet |
| Upscale | Upscales an image |
| VAE Loader | Loads a VAE model, outputting a VaeLoaderOutput |
| ZoeDepthImageProcessor | Applies Zoe depth processing to image |
## Node Grouping Concepts
There are several node grouping concepts that can be examined with a narrow focus. These (and other) groupings can be pieced together to make up functional graph setups, and are important to understanding how groups of nodes work together as part of a whole. Note that the screenshots below aren't examples of complete functioning node graphs (see Examples).
### Noise
As described, an initial noise tensor is necessary for the latent diffusion process. As a result, all non-image *ToLatents nodes require a noise node input.

### Conditioning
As described, conditioning is necessary for the latent diffusion process, whether empty or not. As a result, all non-image *ToLatents nodes require positive and negative conditioning inputs. Conditioning is reliant on a CLIP tokenizer provided by the Model Loader node.
The ImageToLatents node doesn't require a noise node input, but requires a VAE input to convert the image from image space into latent space. In reverse, the LatentsToImage node requires a VAE input to convert from latent space back into image space.

### Defined & Random Seeds
It is common to want to use both the same seed (for continuity) and random seeds (for variance). To define a seed, simply enter it into the 'Seed' field on a noise node. Conversely, the RandomInt node generates a random integer between 'Low' and 'High', and can be used as input to the 'Seed' edge point on a noise node to randomize your seed.
Control means to guide the diffusion process to adhere to a defined input or structure. Control can be provided as input to non-image *ToLatents nodes from ControlNet nodes. ControlNet nodes usually require an image processor which converts an input image for use with ControlNet.
The Lora Loader node lets you load a LoRA (say that ten times fast) and pass it as output to both the Prompt (Compel) and non-image *ToLatents nodes. A model's CLIP tokenizer is passed through the LoRA into Prompt (Compel), where it affects conditioning. A model's U-Net is also passed through the LoRA into a non-image *ToLatents node, where it affects noise prediction.

### Scaling
Use the ImageScale, ScaleLatents, and Upscale nodes to upscale images and/or latent images. The chosen method differs across contexts. However, be aware that latents are already noisy and compressed at their original resolution; scaling an image could produce more detailed results.
Iteration is a common concept in any processing, and means to repeat a process with given input. In nodes, you're able to use the Iterate node to iterate through collections usually gathered by the Collect node. The Iterate node has many potential uses, from processing a collection of images one after another, to varying seeds across multiple image generations and more. This screenshot demonstrates how to collect several images and pass them out one at a time.
Multiple image generation in the node editor is done using the RandomRange node. In this case, the 'Size' field represents the number of images to generate. As RandomRange produces a collection of integers, we need to add the Iterate node to iterate through the collection.
To control seeds across generations takes some care. The first row in the screenshot will generate multiple images with different seeds, but using the same RandomRange parameters across invocations will result in the same group of random seeds being used across the images, producing repeatable results. In the second row, adding the RandomInt node as input to RandomRange's 'Seed' edge point will ensure that seeds are varied across all images across invocations, producing varied results.
With our knowledge of node grouping and the diffusion process, let’s break down some basic graphs in the nodes editor. Note that a node's options can be overridden by inputs from other nodes. These examples aren't strict rules to follow and only demonstrate some basic configurations.
### Basic text-to-image Node Graph

- Model Loader: A necessity to generating images (as we’ve read above). We choose our model from the dropdown. It outputs a U-Net, CLIP tokenizer, and VAE.
- Prompt (Compel): Another necessity. Two prompt nodes are created. One will output positive conditioning (what you want, ‘dog’), one will output negative (what you don’t want, ‘cat’). They both input the CLIP tokenizer that the Model Loader node outputs.
- Noise: Consider this noise A from step one of the text-to-image explanation above. Choose a seed number, width, and height.
- TextToLatents: This node takes many inputs for converting and processing text & noise from image space into latent space, hence the name TextTo**Latents**. In this setup, it inputs positive and negative conditioning from the prompt nodes for processing (step 2 above). It inputs noise from the noise node for processing (steps 2 & 3 above). Lastly, it inputs a U-Net from the Model Loader node for processing (step 2 above). It outputs latents for use in the next LatentsToImage node. Choose number of sampler steps, CFG scale, and scheduler.
- LatentsToImage: This node takes in processed latents from the TextToLatents node, and the model’s VAE from the Model Loader node which is responsible for decoding latents back into the image space, hence the name LatentsTo**Image**. This node is the last stop, and once the image is decoded, it is saved to the gallery.
### Basic image-to-image Node Graph

- Model Loader: Choose a model from the dropdown.
- Prompt (Compel): Two prompt nodes. One positive (dog), one negative (dog). Same CLIP inputs from the Model Loader node as before.
- ImageToLatents: Upload a source image directly in the node window, via drag'n'drop from the gallery, or passed in as input. The ImageToLatents node inputs the VAE from the Model Loader node to decode the chosen image from image space into latent space, hence the name ImageTo**Latents**. It outputs latents for use in the next LatentsToLatents node. It also outputs the source image's width and height for use in the next Noise node if the final image is to be the same dimensions as the source image.
- Noise: A noise tensor is created with the width and height of the source image, and connected to the next LatentsToLatents node. Notice the width and height fields are overridden by the input from the ImageToLatents width and height outputs.
- LatentsToLatents: The inputs and options are nearly identical to TextToLatents, except that LatentsToLatents also takes latents as an input. Considering our source image is already converted to latents in the last ImageToLatents node, and text + noise are no longer the only inputs to process, we use the LatentsToLatents node.
- LatentsToImage: Like previously, the LatentsToImage node will use the VAE from the Model Loader as input to decode the latents from LatentsToLatents into image space, and save it to the gallery.
### Basic ControlNet Node Graph

- Model Loader
- Prompt (Compel)
- Noise: Width and height of the CannyImageProcessor ControlNet image is passed in to set the dimensions of the noise passed to TextToLatents.
- CannyImageProcessor: The CannyImageProcessor node is used to process the source image being used as a ControlNet. Each ControlNet processor node applies control in different ways, and has some different options to configure. Width and height are passed to noise, as mentioned. The processed ControlNet image is output to the ControlNet node.
- ControlNet: Select the type of control model. In this case, canny is chosen as the CannyImageProcessor was used to generate the ControlNet image. Configure the control node options, and pass the control output to TextToLatents.
- TextToLatents: Similar to the basic text-to-image example, except ControlNet is passed to the control input edge point.
dream> a red car --steps 25 -C 9.8 --perlin 0.1 --fnformat {prompt}_steps.{steps}_cfg.{cfg_scale}_perlin.{perlin}.png
```
generates a file with the name: `outputs/img-samples/a red car_steps.25_cfg.9.8_perlin.0.1.png`
---
## **Thresholding and Perlin Noise Initialization Options**
Two new options are the thresholding (`--threshold`) and the perlin noise initialization (`--perlin`) options. Thresholding limits the range of the latent values during optimization, which helps combat oversaturation with higher CFG scale values. Perlin noise initialization starts with a percentage (a value ranging from 0 to 1) of perlin noise mixed into the initial noise. Both features allow for more variations and options in the course of generating images.
For better intuition into what these options do in practice:

In generating this graphic, perlin noise at initialization was programmatically varied going across on the diagram by values 0.0, 0.1, 0.2, 0.4, 0.5, 0.6, 0.8, 0.9, 1.0; and the threshold was varied going down from
0, 1, 2, 3, 4, 5, 10, 20, 100. The other options are fixed, so the initial prompt is as follows (no thresholding or perlin noise):
```bash
invoke> "a portrait of a beautiful young lady" -S 1950357039 -s 100 -C 20 -A k_euler_a --threshold 0 --perlin 0
```
Here's an example of another prompt used when setting the threshold to 5 and perlin noise to 0.2:
```bash
invoke> "a portrait of a beautiful young lady" -S 1950357039 -s 100 -C 20 -A k_euler_a --threshold 5 --perlin 0.2
```
!!! note
currently the thresholding feature is only implemented for the k-diffusion style samplers, and empirically appears to work best with `k_euler_a` and `k_dpm_2_a`. Using 0 disables thresholding. Using 0 for perlin noise disables using perlin noise for initialization. Finally, using 1 for perlin noise uses only perlin noise for initialization.
---
## **Simplified API**
For programmers who wish to incorporate stable-diffusion into other products, this repository
includes a simplified API for text to image generation, which lets you create images from a prompt
in just three lines of code:
```bash
from ldm.generate import Generate
g= Generate()
outputs= g.txt2img("a unicorn in manhattan")
```
Outputs is a list of lists in the format [filename1,seed1],[filename2,seed2]...].
Please see the documentation in ldm/generate.py for more information.
When the script is finished, each of the 27 combinations
of adjective, sampler and CFG will be executed.
The command-line interface provides `!fetch` and `!replay` commands
which allow you to read the prompts from a single previously-generated
image or a whole directory of them, write the prompts to a file, and
then replay them. Or you can create your own file of prompts and feed
them to the command-line client from within an interactive session.
See [Command-Line Interface](CLI.md) for details.
---
## **Negative and Unconditioned Prompts**
Any words between a pair of square brackets will instruct Stable Diffusion to
attempt to ban the concept from the generated image.
Any words between a pair of square brackets will instruct Stable
Diffusion to attempt to ban the concept from the generated image. The
same effect is achieved by placing words in the "Negative Prompts"
textbox in the Web UI.
```text
this is a test prompt [not really] to make you understand [cool] how this works.
@ -87,7 +22,9 @@ Here's a prompt that depicts what it does.
original prompt:
`#!bash "A fantastical translucent pony made of water and foam, ethereal, radiant, hyperalism, scottish folklore, digital painting, artstation, concept art, smooth, 8 k frostbite 3 engine, ultra detailed, art by artgerm and greg rutkowski and magali villeneuve" -s 20 -W 512 -H 768 -C 7.5 -A k_euler_a -S 1654590180`
`#!bash "A fantastical translucent pony made of water and foam, ethereal, radiant, hyperalism, scottish folklore, digital painting, artstation, concept art, smooth, 8 k frostbite 3 engine, ultra detailed, art by artgerm and greg rutkowski and magali villeneuve"`
@ -99,7 +36,8 @@ That image has a woman, so if we want the horse without a rider, we can
influence the image not to have a woman by putting [woman] in the prompt, like
this:
`#!bash "A fantastical translucent poney made of water and foam, ethereal, radiant, hyperalism, scottish folklore, digital painting, artstation, concept art, smooth, 8 k frostbite 3 engine, ultra detailed, art by artgerm and greg rutkowski and magali villeneuve [woman]" -s 20 -W 512 -H 768 -C 7.5 -A k_euler_a -S 1654590180`
`#!bash "A fantastical translucent poney made of water and foam, ethereal, radiant, hyperalism, scottish folklore, digital painting, artstation, concept art, smooth, 8 k frostbite 3 engine, ultra detailed, art by artgerm and greg rutkowski and magali villeneuve [woman]"`
(same parameters as above)
<figure markdown>
@ -110,7 +48,8 @@ this:
That's nice - but say we also don't want the image to be quite so blue. We can
add "blue" to the list of negative prompts, so it's now [woman blue]:
`#!bash "A fantastical translucent poney made of water and foam, ethereal, radiant, hyperalism, scottish folklore, digital painting, artstation, concept art, smooth, 8 k frostbite 3 engine, ultra detailed, art by artgerm and greg rutkowski and magali villeneuve [woman blue]" -s 20 -W 512 -H 768 -C 7.5 -A k_euler_a -S 1654590180`
`#!bash "A fantastical translucent poney made of water and foam, ethereal, radiant, hyperalism, scottish folklore, digital painting, artstation, concept art, smooth, 8 k frostbite 3 engine, ultra detailed, art by artgerm and greg rutkowski and magali villeneuve [woman blue]"`
(same parameters as above)
<figure markdown>
@ -121,7 +60,8 @@ add "blue" to the list of negative prompts, so it's now [woman blue]:
Getting close - but there's no sense in having a saddle when our horse doesn't
have a rider, so we'll add one more negative prompt: [woman blue saddle].
`#!bash "A fantastical translucent poney made of water and foam, ethereal, radiant, hyperalism, scottish folklore, digital painting, artstation, concept art, smooth, 8 k frostbite 3 engine, ultra detailed, art by artgerm and greg rutkowski and magali villeneuve [woman blue saddle]" -s 20 -W 512 -H 768 -C 7.5 -A k_euler_a -S 1654590180`
`#!bash "A fantastical translucent poney made of water and foam, ethereal, radiant, hyperalism, scottish folklore, digital painting, artstation, concept art, smooth, 8 k frostbite 3 engine, ultra detailed, art by artgerm and greg rutkowski and magali villeneuve [woman blue saddle]"`
(same parameters as above)
<figure markdown>
@ -261,19 +201,6 @@ Prompt2prompt `.swap()` is not compatible with xformers, which will be temporari
Note that `prompt2prompt` is not currently working with the runwayML inpainting
model, and may never work due to the way this model is set up. If you attempt to
use `prompt2prompt` you will get the original image back. However, since this
model is so good at inpainting, a good substitute is to use the `clipseg` text
masking option:
```bash
invoke> a fluffy cat eating a hotdog
Outputs:
[1010] outputs/000025.2182095108.png: a fluffy cat eating a hotdog
invoke> a smiling dog eating a hotdog -I 000025.2182095108.png -tm cat
```
### Escaping parantheses () and speech marks ""
If the model you are using has parentheses () or speech marks "" as part of its
@ -374,6 +301,48 @@ summoning up the concept of some sort of scifi creature? Let's find out.
Indeed, removing the word "hybrid" produces an image that is more like what we'd
expect.
In conclusion, prompt blending is great for exploring creative space, but can be
difficult to direct. A forthcoming release of InvokeAI will feature more
deterministic prompt weighting.
## Dynamic Prompts
Dynamic Prompts are a powerful feature designed to produce a variety of prompts based on user-defined options. Using a special syntax, you can construct a prompt with multiple possibilities, and the system will automatically generate a series of permutations based on your settings. This is extremely beneficial for ideation, exploring various scenarios, or testing different concepts swiftly and efficiently.
### Structure of a Dynamic Prompt
A Dynamic Prompt comprises of regular text, supplemented with alternatives enclosed within curly braces {} and separated by a vertical bar |. For example: {option1|option2|option3}. The system will then select one of the options to include in the final prompt. This flexible system allows for options to be placed throughout the text as needed.
Furthermore, Dynamic Prompts can designate multiple selections from a single group of options. This feature is triggered by prefixing the options with a numerical value followed by $$. For example, in {2$$option1|option2|option3}, the system will select two distinct options from the set.
### Creating Dynamic Prompts
To create a Dynamic Prompt, follow these steps:
Draft your sentence or phrase, identifying words or phrases with multiple possible options.
Encapsulate the different options within curly braces {}.
Within the braces, separate each option using a vertical bar |.
If you want to include multiple options from a single group, prefix with the desired number and $$.
For instance: A {house|apartment|lodge|cottage} in {summer|winter|autumn|spring} designed in {2$$style1|style2|style3}.
### How Dynamic Prompts Work
Once a Dynamic Prompt is configured, the system generates an array of combinations using the options provided. Each group of options in curly braces is treated independently, with the system selecting one option from each group. For a prefixed set (e.g., 2$$), the system will select two distinct options.
For example, the following prompts could be generated from the above Dynamic Prompt:
A house in summer designed in style1, style2
A lodge in autumn designed in style3, style1
A cottage in winter designed in style2, style3
And many more!
When the `Combinatorial` setting is on, Invoke will disable the "Images" selection, and generate every combination up until the setting for Max Prompts is reached.
When the `Combinatorial` setting is off, Invoke will randomly generate combinations up until the setting for Images has been reached.
### Tips and Tricks for Using Dynamic Prompts
Below are some useful strategies for creating Dynamic Prompts:
Utilize Dynamic Prompts to generate a wide spectrum of prompts, perfect for brainstorming and exploring diverse ideas.
Ensure that the options within a group are contextually relevant to the part of the sentence where they are used. For instance, group building types together, and seasons together.
Apply the 2$$ prefix when you want to incorporate more than one option from a single group. This becomes quite handy when mixing and matching different elements.
Experiment with different quantities for the prefix. For example, 3$$ will select three distinct options.
Be aware of coherence in your prompts. Although the system can generate all possible combinations, not all may semantically make sense. Therefore, carefully choose the options for each group.
Always review and fine-tune the generated prompts as needed. While Dynamic Prompts can help you generate a multitude of combinations, the final polishing and refining remain in your hands.
You may personalize the generated images to provide your own styles or objects
@ -46,11 +47,19 @@ start the front end by selecting choice (3):
```sh
Do you want to generate images using the
1. command-line
2. browser-based UI
3. textual inversion training
4. open the developer console
Please enter 1, 2, 3, or 4: [1] 3
1: Browser-based UI
2: Command-line interface
3: Run textual inversion training
4: Merge models (diffusers type only)
5: Download and install models
6: Change InvokeAI startup options
7: Re-run the configure script to fix a broken install
8: Open the developer console
9: Update InvokeAI
10: Command-line help
Q: Quit
Please enter 1-10, Q: [1]
```
From the command line, with the InvokeAI virtual environment active,
@ -250,16 +259,6 @@ invokeai-ti \
--only_save_embeds
```
## Using Embeddings
After training completes, the resultant embeddings will be saved into your `$INVOKEAI_ROOT/embeddings/<trigger word>/learned_embeds.bin`.
These will be automatically loaded when you start InvokeAI.
Add the trigger word, surrounded by angle brackets, to use that embedding. For example, if your trigger word was `terence`, use `<terence>` in prompts. This is the same syntax used by the HuggingFace concepts library.
**Note:** `.pt` embeddings do not require the angle brackets.
## Troubleshooting
### `Cannot load embedding for <trigger>. It was trained on a model with token dimension 1024, but the current model has token dimension 768`
| `--host HOST` | Web server: Host or IP to listen on. Set to 0.0.0.0 to accept traffic from other devices on your network. |
| `--port PORT` | Web server: Port to listen on |
| `--certfile CERTFILE` | Web server: Path to certificate file to use for SSL. Use together with --keyfile |
| `--keyfile KEYFILE` | Web server: Path to private key file to use for SSL. Use together with --certfile' |
| `--gui` | Start InvokeAI GUI - This is the "desktop mode" version of the web app. It uses Flask to create a desktop app experience of the webserver. |
New to image generation with AI? You’re in the right place!
This is a high level walkthrough of some of the concepts and terms you’ll see as you start using InvokeAI. Please note, this is not an exhaustive guide and may be out of date due to the rapidly changing nature of the space.
## Using InvokeAI
### **Prompt Crafting**
- Prompts are the basis of using InvokeAI, providing the models directions on what to generate. As a general rule of thumb, the more detailed your prompt is, the better your result will be.
*To get started, here’s an easy template to use for structuring your prompts:*
- Subject, Style, Quality, Aesthetic
- **Subject:** What your image will be about. E.g. “a futuristic city with trains”, “penguins floating on icebergs”, “friends sharing beers”
- **Style:** The style or medium in which your image will be in. E.g. “photograph”, “pencil sketch”, “oil paints”, or “pop art”, “cubism”, “abstract”
- **Quality:** A particular aspect or trait that you would like to see emphasized in your image. E.g. "award-winning", "featured in {relevant set of high quality works}", "professionally acclaimed". Many people often use "masterpiece".
- **Aesthetics:** The visual impact and design of the artwork. This can be colors, mood, lighting, setting, etc.
- There are two prompt boxes: *Positive Prompt*&*Negative Prompt*.
- A **Positive** Prompt includes words you want the model to reference when creating an image.
- Negative Prompt is for anything you want the model to eliminate when creating an image. It doesn’t always interpret things exactly the way you would, but helps control the generation process. Always try to include a few terms - you can typically use lower quality image terms like “blurry” or “distorted” with good success.
- Some examples prompts you can try on your own:
- A detailed oil painting of a tranquil forest at sunset with vibrant+ colors and soft, golden light filtering through the trees
- friends sharing beers in a busy city, realistic colored pencil sketch, twilight, masterpiece, bright, lively
### Generation Workflows
- Invoke offers a number of different workflows for interacting with models to produce images. Each is extremely powerful on its own, but together provide you an unparalleled way of producing high quality creative outputs that align with your vision.
- **Text to Image:** The text to image tab focuses on the key workflow of using a prompt to generate a new image. It includes other features that help control the generation process as well.
- **Image to Image:** With image to image, you provide an image as a reference (called the “initial image”), which provides more guidance around color and structure to the AI as it generates a new image. This is provided alongside the same features as Text to Image.
- **Unified Canvas:** The Unified Canvas is an advanced AI-first image editing tool that is easy to use, but hard to master. Drag an image onto the canvas from your gallery in order to regenerate certain elements, edit content or colors (known as inpainting), or extend the image with an exceptional degree of consistency and clarity (called outpainting).
### Improving Image Quality
- Fine tuning your prompt - the more specific you are, the closer the image will turn out to what is in your head! Adding more details in the Positive Prompt or Negative Prompt can help add / remove pieces of your image to improve it - You can also use advanced techniques like upweighting and downweighting to control the influence of certain words. [Learn more here](https://invoke-ai.github.io/InvokeAI/features/PROMPTS/#prompt-syntax-features).
- **Tip: If you’re seeing poor results, try adding the things you don’t like about the image to your negative prompt may help. E.g. distorted, low quality, unrealistic, etc.**
- Explore different models - Other models can produce different results due to the data they’ve been trained on. Each model has specific language and settings it works best with; a model’s documentation is your friend here. Play around with some and see what works best for you!
- Increasing Steps - The number of steps used controls how much time the model is given to produce an image, and depends on the “Scheduler” used. The schedule controls how each step is processed by the model. More steps tends to mean better results, but will take longer - We recommend at least 30 steps for most
- Tweak and Iterate - Remember, it’s best to change one thing at a time so you know what is working and what isn't. Sometimes you just need to try a new image, and other times using a new prompt might be the ticket. For testing, consider turning off the “random” Seed - Using the same seed with the same settings will produce the same image, which makes it the perfect way to learn exactly what your changes are doing.
- Explore Advanced Settings - InvokeAI has a full suite of tools available to allow you complete control over your image creation process - Check out our [docs if you want to learn more](https://invoke-ai.github.io/InvokeAI/features/).
## Terms & Concepts
If you're interested in learning more, check out [this presentation](https://docs.google.com/presentation/d/1IO78i8oEXFTZ5peuHHYkVF-Y3e2M6iM5tCnc-YBfcCM/edit?usp=sharing) from one of our maintainers (@lstein).
### Stable Diffusion
Stable Diffusion is deep learning, text-to-image model that is the foundation of the capabilities found in InvokeAI. Since the release of Stable Diffusion, there have been many subsequent models created based on Stable Diffusion that are designed to generate specific types of images.
### Prompts
Prompts provide the models directions on what to generate. As a general rule of thumb, the more detailed your prompt is, the better your result will be.
### Models
Models are the magic that power InvokeAI. These files represent the output of training a machine on understanding massive amounts of images - providing them with the capability to generate new images using just a text description of what you’d like to see. (Like Stable Diffusion!)
Invoke offers a simple way to download several different models upon installation, but many more can be discovered online, including at ****. Each model can produce a unique style of output, based on the images it was trained on - Try out different models to see which best fits your creative vision!
- *Models that contain “inpainting” in the name are designed for use with the inpainting feature of the Unified Canvas*
### Scheduler
Schedulers guide the process of removing noise (de-noising) from data. They determine:
1. The number of steps to take to remove the noise.
2. Whether the steps are random (stochastic) or predictable (deterministic).
3. The specific method (algorithm) used for de-noising.
Experimenting with different schedulers is recommended as each will produce different outputs!
### Steps
The number of de-noising steps each generation through.
Schedulers can be intricate and there's often a balance to strike between how quickly they can de-noise data and how well they can do it. It's typically advised to experiment with different schedulers to see which one gives the best results. There has been a lot written on the internet about different schedulers, as well as exploring what the right level of "steps" are for each. You can save generation time by reducing the number of steps used, but you'll want to make sure that you are satisfied with the quality of images produced!
### Low-Rank Adaptations / LoRAs
Low-Rank Adaptations (LoRAs) are like a smaller, more focused version of models, intended to focus on training a better understanding of how a specific character, style, or concept looks.
### Textual Inversion Embeddings
Textual Inversion Embeddings, like LoRAs, assist with more easily prompting for certain characters, styles, or concepts. However, embeddings are trained to update the relationship between a specific word (known as the “trigger”) and the intended output.
### ControlNet
ControlNets are neural network models that are able to extract key features from an existing image and use these features to guide the output of the image generation model.
### VAE
Variational auto-encoder (VAE) is a encode/decode model that translates the "latents" image produced during the image generation procees to the large pixel images that we see.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.