c22d529528
This PR adds the core of the node-based invocation system first discussed in https://github.com/invoke-ai/InvokeAI/discussions/597 and implements it through a basic CLI and API. This supersedes #1047, which was too far behind to rebase. ## Architecture ### Invocations The core of the new system is **invocations**, found in `/ldm/invoke/app/invocations`. These represent individual nodes of execution, each with inputs and outputs. Core invocations are already implemented (`txt2img`, `img2img`, `upscale`, `face_restore`) as well as a debug invocation (`show_image`). To implement a new invocation, all that is required is to add a new implementation in this folder (there is a markdown document describing the specifics, though it is slightly out-of-date). ### Sessions Invocations and links between them are maintained in a **session**. These can be queued for invocation (either the next ready node, or all nodes). Some notes: * Sessions may be added to at any time (including after invocation), but may not be modified. * Links are always added with a node, and are always links from existing nodes to the new node. These links can be relative "history" links, e.g. `-1` to link from a previously executed node, and can link either specific outputs, or can opportunistically link all matching outputs by name and type by using `*`. * There are no iteration/looping constructs. Most needs for this could be solved by either duplicating nodes or cloning sessions. This is open for discussion, but is a difficult problem to solve in a way that doesn't make the code even more complex/confusing (especially regarding node ids and history). ### Services These make up the core the invocation system, found in `/ldm/invoke/app/services`. One of the key design philosophies here is that most components should be replaceable when possible. For example, if someone wants to use cloud storage for their images, they should be able to replace the image storage service easily. The services are broken down as follows (several of these are intentionally implemented with an initial simple/naïve approach): * Invoker: Responsible for creating and executing **sessions** and managing services used to do so. * Session Manager: Manages session history. An on-disk implementation is provided, which stores sessions as json files on disk, and caches recently used sessions for quick access. * Image Storage: Stores images of multiple types. An on-disk implementation is provided, which stores images on disk and retains recently used images in an in-memory cache. * Invocation Queue: Used to queue invocations for execution. An in-memory implementation is provided. * Events: An event system, primarily used with socket.io to support future web UI integration. ## Apps Apps are available through the `/scripts/invoke-new.py` script (to-be integrated/renamed). ### CLI ``` python scripts/invoke-new.py ``` Implements a simple CLI. The CLI creates a single session, and automatically links all inputs to the previous node's output. Commands are automatically generated from all invocations, with command options being automatically generated from invocation inputs. Help is also available for the cli and for each command, and is very verbose. Additionally, the CLI supports command piping for single-line entry of multiple commands. Example: ``` > txt2img --prompt "a cat eating sushi" --steps 20 --seed 1234 | upscale | show_image ``` ### API ``` python scripts/invoke-new.py --api --host 0.0.0.0 ``` Implements an API using FastAPI with Socket.io support for signaling. API documentation is available at `http://localhost:9090/docs` or `http://localhost:9090/redoc`. This includes OpenAPI schema for all available invocations, session interaction APIs, and image APIs. Socket.io signals are per-session, and can be subscribed to by session id. These aren't currently auto-documented, though the code for event emission is centralized in `/ldm/invoke/app/services/events.py`. A very simple test html and script are available at `http://localhost:9090/static/test.html` This demonstrates creating a session from a graph, invoking it, and receiving signals from Socket.io. ## What's left? * There are a number of features not currently covered by invocations. I kept the set of invocations small during core development in order to simplify refactoring as I went. Now that the invocation code has stabilized, I'd love some help filling those out! * There's no image metadata generated. It would be fairly straightforward (and would make good sense) to serialize either a session and node reference into an image, or the entire node into the image. There are a lot of questions to answer around source images, linked images, etc. though. This history is all stored in the session as well, and with complex sessions, the metadata in an image may lose its value. This needs some further discussion. * We need a list of features (both current and future) that would be difficult to implement without looping constructs so we can have a good conversation around it. I'm really hoping we can avoid needing looping/iteration in the graph execution, since it'll necessitate separating an execution of a graph into its own concept/system, and will further complicate the system. * The API likely needs further filling out to support the UI. I think using the new API for the current UI is possible, and potentially interesting, since it could work like the new/demo CLI in a "single operation at a time" workflow. I don't know how compatible that will be with our UI goals though. It would be nice to support only a single API though. * Deeper separation of systems. I intentionally tried to not touch Generate or other systems too much, but a lot could be gained by breaking those apart. Even breaking apart Args into two pieces (command line arguments and the parser for the current CLI) would make it easier to maintain. This is probably in the future though. |
||
---|---|---|
.dev_scripts | ||
.github | ||
binary_installer | ||
docker | ||
docs | ||
installer | ||
invokeai | ||
ldm | ||
notebooks | ||
scripts | ||
static/dream_web | ||
tests | ||
.coveragerc | ||
.dockerignore | ||
.editorconfig | ||
.gitattributes | ||
.gitignore | ||
.gitmodules | ||
.prettierrc.yaml | ||
.pytest.ini | ||
CODE_OF_CONDUCT.md | ||
InvokeAI_Statement_of_Values.md | ||
LICENSE | ||
LICENSE-ModelWeights.txt | ||
mkdocs.yml | ||
pyproject.toml | ||
README.md | ||
shell.nix | ||
Stable_Diffusion_v1_Model_Card.md |
InvokeAI is a leading creative engine built to empower professionals and enthusiasts alike. Generate and create stunning visual media using the latest AI-driven technologies. InvokeAI offers an industry leading Web Interface, interactive Command Line Interface, and also serves as the foundation for multiple commercial products.
Quick links: [How to Install] [Discord Server] [Documentation and Tutorials] [Code and Downloads] [Bug Reports] [Discussion, Ideas & Q&A]
Note: InvokeAI is rapidly evolving. Please use the Issues tab to report bugs and make feature requests. Be sure to use the provided templates. They will help us diagnose issues faster.
Table of Contents
- Quick Start
- Installation
- Hardware Requirements
- Features
- Latest Changes
- Troubleshooting
- Contributing
- Contributors
- Support
- Further Reading
Getting Started with InvokeAI
For full installation and upgrade instructions, please see: InvokeAI Installation Overview
Automatic Installer (suggested for 1st time users)
-
Go to the bottom of the Latest Release Page
-
Download the .zip file for your OS (Windows/macOS/Linux).
-
Unzip the file.
-
If you are on Windows, double-click on the
install.bat
script. On macOS, open a Terminal window, drag the fileinstall.sh
from Finder into the Terminal, and press return. On Linux, runinstall.sh
. -
You'll be asked to confirm the location of the folder in which to install InvokeAI and its image generation model files. Pick a location with at least 15 GB of free memory. More if you plan on installing lots of models.
-
Wait while the installer does its thing. After installing the software, the installer will launch a script that lets you configure InvokeAI and select a set of starting image generaiton models.
-
Find the folder that InvokeAI was installed into (it is not the same as the unpacked zip file directory!) The default location of this folder (if you didn't change it in step 5) is
~/invokeai
on Linux/Mac systems, andC:\Users\YourName\invokeai
on Windows. This directory will contain launcher scripts namedinvoke.sh
andinvoke.bat
. -
On Windows systems, double-click on the
invoke.bat
file. On macOS, open a Terminal window, draginvoke.sh
from the folder into the Terminal, and press return. On Linux, runinvoke.sh
-
Press 2 to open the "browser-based UI", press enter/return, wait a minute or two for Stable Diffusion to start up, then open your browser and go to http://localhost:9090.
-
Type
banana sushi
in the box on the top left and clickInvoke
Command-Line Installation (for users familiar with Terminals)
You must have Python 3.9 or 3.10 installed on your machine. Earlier or later versions are not supported.
-
Open a command-line window on your machine. The PowerShell is recommended for Windows.
-
Create a directory to install InvokeAI into. You'll need at least 15 GB of free space:
mkdir invokeai
-
Create a virtual environment named
.venv
inside this directory and activate it:cd invokeai python -m venv .venv --prompt InvokeAI
-
Activate the virtual environment (do it every time you run InvokeAI)
For Linux/Mac users:
source .venv/bin/activate
For Windows users:
.venv\Scripts\activate
-
Install the InvokeAI module and its dependencies. Choose the command suited for your platform & GPU.
For Windows/Linux with an NVIDIA GPU:
pip install InvokeAI[xformers] --use-pep517 --extra-index-url https://download.pytorch.org/whl/cu117
For Linux with an AMD GPU:
pip install InvokeAI --use-pep517 --extra-index-url https://download.pytorch.org/whl/rocm5.2
For Macintoshes, either Intel or M1/M2:
pip install InvokeAI --use-pep517
-
Configure InvokeAI and install a starting set of image generation models (you only need to do this once):
invokeai-configure
-
Launch the web server (do it every time you run InvokeAI):
invokeai --web
-
Point your browser to http://localhost:9090 to bring up the web interface.
-
Type
banana sushi
in the box on the top left and clickInvoke
.
Be sure to activate the virtual environment each time before re-launching InvokeAI,
using source .venv/bin/activate
or .venv\Scripts\activate
.
Detailed Installation Instructions
This fork is supported across Linux, Windows and Macintosh. Linux users can use either an Nvidia-based card (with CUDA support) or an AMD card (using the ROCm driver). For full installation and upgrade instructions, please see: InvokeAI Installation Overview
Hardware Requirements
InvokeAI is supported across Linux, Windows and macOS. Linux users can use either an Nvidia-based card (with CUDA support) or an AMD card (using the ROCm driver).
System
You will need one of the following:
- An NVIDIA-based graphics card with 4 GB or more VRAM memory.
- An Apple computer with an M1 chip.
- An AMD-based graphics card with 4GB or more VRAM memory. (Linux only)
We do not recommend the GTX 1650 or 1660 series video cards. They are unable to run in half-precision mode and do not have sufficient VRAM to render 512x512 images.
Memory
- At least 12 GB Main Memory RAM.
Disk
- At least 12 GB of free disk space for the machine learning model, Python, and all its dependencies.
Features
Feature documentation can be reviewed by navigating to the InvokeAI Documentation page
Web Server & UI
InvokeAI offers a locally hosted Web Server & React Frontend, with an industry leading user experience. The Web-based UI allows for simple and intuitive workflows, and is responsive for use on mobile devices and tablets accessing the web server.
Unified Canvas
The Unified Canvas is a fully integrated canvas implementation with support for all core generation capabilities, in/outpainting, brush tools, and more. This creative tool unlocks the capability for artists to create with AI as a creative collaborator, and can be used to augment AI-generated imagery, sketches, photography, renders, and more.
Advanced Prompt Syntax
InvokeAI's advanced prompt syntax allows for token weighting, cross-attention control, and prompt blending, allowing for fine-tuned tweaking of your invocations and exploration of the latent space.
Command Line Interface
For users utilizing a terminal-based environment, or who want to take advantage of CLI features, InvokeAI offers an extensive and actively supported command-line interface that provides the full suite of generation functionality available in the tool.
Other features
- Support for both ckpt and diffusers models
- SD 2.0, 2.1 support
- Noise Control & Tresholding
- Popular Sampler Support
- Upscaling & Face Restoration Tools
- Embedding Manager & Support
- Model Manager & Support
Coming Soon
- Node-Based Architecture & UI
- And more...
Latest Changes
For our latest changes, view our Release Notes and the CHANGELOG.
Troubleshooting
Please check out our Q&A to get solutions for common installation problems and other issues.
Contributing
Anyone who wishes to contribute to this project, whether documentation, features, bug fixes, code cleanup, testing, or code reviews, is very much encouraged to do so.
To join, just raise your hand on the InvokeAI Discord server (#dev-chat) or the GitHub discussion board.
If you'd like to help with translation, please see our translation guide.
If you are unfamiliar with how to contribute to GitHub projects, here is a Getting Started Guide. A full set of contribution guidelines, along with templates, are in progress. You can make your pull request against the "main" branch.
We hope you enjoy using our software as much as we enjoy creating it, and we hope that some of those of you who are reading this will elect to become part of our community.
Welcome to InvokeAI!
Contributors
This fork is a combined effort of various people from across the world. Check out the list of all these amazing people. We thank them for their time, hard work and effort.
Thanks to Weblate for generously providing translation services to this project.
Support
For support, please use this repository's GitHub Issues tracking service, or join the Discord.
Original portions of the software are Copyright (c) 2023 by respective contributors.