mirror of
https://github.com/invoke-ai/InvokeAI
synced 2024-08-30 20:32:17 +00:00
add missing doc files
This commit is contained in:
parent
b1d43eae46
commit
5a22a83f4c
13
README.md
13
README.md
@ -41,10 +41,13 @@ _This repository was formally known as lstein/stable-diffusion_
|
||||
[latest release link]: https://github.com/invoke-ai/InvokeAI/releases
|
||||
</div>
|
||||
|
||||
This is a fork of [CompVis/stable-diffusion](https://github.com/CompVis/stable-diffusion), the open
|
||||
source text-to-image generator. It provides a streamlined process with various new features and
|
||||
options to aid the image generation process. It runs on Windows, Mac and Linux machines, and runs on
|
||||
GPU cards with as little as 4 GB or RAM.
|
||||
This is a fork of
|
||||
[CompVis/stable-diffusion](https://github.com/CompVis/stable-diffusion),
|
||||
the open source text-to-image generator. It provides a streamlined
|
||||
process with various new features and options to aid the image
|
||||
generation process. It runs on Windows, Mac and Linux machines, with
|
||||
GPU cards with as little as 4 GB or RAM. It provides both a polished
|
||||
Web interface, and an easy-to-use command-line interface.
|
||||
|
||||
_Note: This fork is rapidly evolving. Please use the
|
||||
[Issues](https://github.com/invoke-ai/InvokeAI/issues) tab to report bugs and make feature
|
||||
@ -109,6 +112,7 @@ you can try starting `invoke.py` with the `--precision=float32` flag:
|
||||
|
||||
#### Major Features
|
||||
|
||||
- [Web Server](docs/features/WEB.md)
|
||||
- [Interactive Command Line Interface](docs/features/CLI.md)
|
||||
- [Image To Image](docs/features/IMG2IMG.md)
|
||||
- [Inpainting Support](docs/features/INPAINTING.md)
|
||||
@ -116,7 +120,6 @@ you can try starting `invoke.py` with the `--precision=float32` flag:
|
||||
- [Upscaling, face-restoration and outpainting](docs/features/POSTPROCESS.md)
|
||||
- [Seamless Tiling](docs/features/OTHER.md#seamless-tiling)
|
||||
- [Google Colab](docs/features/OTHER.md#google-colab)
|
||||
- [Web Server](docs/features/WEB.md)
|
||||
- [Reading Prompts From File](docs/features/PROMPTS.md#reading-prompts-from-a-file)
|
||||
- [Shortcut: Reusing Seeds](docs/features/OTHER.md#shortcuts-reusing-seeds)
|
||||
- [Prompt Blending](docs/features/PROMPTS.md#prompt-blending)
|
||||
|
@ -20,39 +20,33 @@ The default face restoration module is GFPGAN. The default upscale is
|
||||
Real-ESRGAN. For an alternative face restoration module, see [CodeFormer
|
||||
Support] below.
|
||||
|
||||
As of version 1.14, environment.yaml will install the Real-ESRGAN package into
|
||||
the standard install location for python packages, and will put GFPGAN into a
|
||||
subdirectory of "src" in the InvokeAI directory. (The reason for this is
|
||||
that the standard GFPGAN distribution has a minor bug that adversely affects
|
||||
image color.) Upscaling with Real-ESRGAN should "just work" without further
|
||||
intervention. Simply pass the --upscale (-U) option on the invoke> command line,
|
||||
or indicate the desired scale on the popup in the Web GUI.
|
||||
As of version 1.14, environment.yaml will install the Real-ESRGAN
|
||||
package into the standard install location for python packages, and
|
||||
will put GFPGAN into a subdirectory of "src" in the InvokeAI
|
||||
directory. Upscaling with Real-ESRGAN should "just work" without
|
||||
further intervention. Simply pass the --upscale (-U) option on the
|
||||
invoke> command line, or indicate the desired scale on the popup in
|
||||
the Web GUI.
|
||||
|
||||
For **GFPGAN** to work, there is one additional step needed. You will need to
|
||||
download and copy the GFPGAN
|
||||
[models file](https://github.com/TencentARC/GFPGAN/releases/download/v1.3.0/GFPGANv1.4.pth)
|
||||
into **src/gfpgan/experiments/pretrained_models**. On Mac and Linux systems,
|
||||
here's how you'd do it using **wget**:
|
||||
**GFPGAN** requires a series of downloadable model files to
|
||||
work. These are loaded when you run `scripts/preload_models.py`. If
|
||||
GFPAN is failing with an error, please run the following from the
|
||||
InvokeAI directory:
|
||||
|
||||
```bash
|
||||
wget https://github.com/TencentARC/GFPGAN/releases/download/v1.3.0/GFPGANv1.4.pth -P src/gfpgan/experiments/pretrained_models/
|
||||
```
|
||||
~~~~
|
||||
python scripts/preload_models.py
|
||||
~~~~
|
||||
|
||||
Make sure that you're in the InvokeAI directory when you do this.
|
||||
If you do not run this script in advance, the GFPGAN module will attempt
|
||||
to download the models files the first time you try to perform facial
|
||||
reconstruction.
|
||||
|
||||
Alternatively, if you have GFPGAN installed elsewhere, or if you are using an
|
||||
earlier version of this package which asked you to install GFPGAN in a sibling
|
||||
directory, you may use the `--gfpgan_dir` argument with `invoke.py` to set a
|
||||
custom path to your GFPGAN directory. _There are other GFPGAN related boot
|
||||
arguments if you wish to customize further._
|
||||
|
||||
!!! warning "Internet connection needed"
|
||||
|
||||
Users whose GPU machines are isolated from the Internet (e.g.
|
||||
on a University cluster) should be aware that the first time you run invoke.py with GFPGAN and
|
||||
Real-ESRGAN turned on, it will try to download model files from the Internet. To rectify this, you
|
||||
may run `python3 scripts/preload_models.py` after you have installed GFPGAN and all its
|
||||
dependencies.
|
||||
Alternatively, if you have GFPGAN installed elsewhere, or if you are
|
||||
using an earlier version of this package which asked you to install
|
||||
GFPGAN in a sibling directory, you may use the `--gfpgan_dir` argument
|
||||
with `invoke.py` to set a custom path to your GFPGAN directory. _There
|
||||
are other GFPGAN related boot arguments if you wish to customize
|
||||
further._
|
||||
|
||||
## Usage
|
||||
|
||||
@ -124,15 +118,15 @@ actions.
|
||||
This repo also allows you to perform face restoration using
|
||||
[CodeFormer](https://github.com/sczhou/CodeFormer).
|
||||
|
||||
In order to setup CodeFormer to work, you need to download the models like with
|
||||
GFPGAN. You can do this either by running `preload_models.py` or by manually
|
||||
downloading the
|
||||
[model file](https://github.com/sczhou/CodeFormer/releases/download/v0.1.0/codeformer.pth)
|
||||
In order to setup CodeFormer to work, you need to download the models
|
||||
like with GFPGAN. You can do this either by running
|
||||
`preload_models.py` or by manually downloading the [model
|
||||
file](https://github.com/sczhou/CodeFormer/releases/download/v0.1.0/codeformer.pth)
|
||||
and saving it to `ldm/restoration/codeformer/weights` folder.
|
||||
|
||||
You can use `-ft` prompt argument to swap between CodeFormer and the default
|
||||
GFPGAN. The above mentioned `-G` prompt argument will allow you to control the
|
||||
strength of the restoration effect.
|
||||
You can use `-ft` prompt argument to swap between CodeFormer and the
|
||||
default GFPGAN. The above mentioned `-G` prompt argument will allow
|
||||
you to control the strength of the restoration effect.
|
||||
|
||||
### Usage:
|
||||
|
||||
|
@ -20,10 +20,273 @@ wildcard `0.0.0.0`. For example:
|
||||
(ldm) ~/InvokeAI$ python3 scripts/invoke.py --web --host 0.0.0.0
|
||||
```
|
||||
|
||||
# Quick guided walkthrough of the WebGUI's features
|
||||
|
||||
While most of the WebGUI's features are intuitive, here is a guided
|
||||
walkthrough through its various components.
|
||||
|
||||
<img src="../assets/invoke-web-server.png-1" width=350>
|
||||
|
||||
The screenshot above shows the Text to Image tab of the WebGUI. There
|
||||
are three main sections:
|
||||
|
||||
1. A **control panel** on the left, which contains various settings
|
||||
for text to image generation. The most important part is the text
|
||||
field (currently showing `strawberry sushi`) for entering the text
|
||||
prompt, and the camera icon directly underneath that will render the
|
||||
image. We'll call this the *Invoke* button from now on.
|
||||
|
||||
2. The **current image** section in the middle, which shows a large
|
||||
format version of the image you are currently working on. A series of
|
||||
buttons at the top ("image to image", "Use All", "Use Seed", etc) lets
|
||||
you modify the image in various ways.
|
||||
|
||||
3. A **gallery* section on the left that contains a history of the
|
||||
images you have generated. These images are read and written to the
|
||||
directory specified at launch time in `--outdir`.
|
||||
|
||||
In addition to these three elements, there are a series of icons for
|
||||
changing global settings, reporting bugs, and changing the theme on
|
||||
the upper right.
|
||||
|
||||
There are also a series of icons to the left of the control panel (see
|
||||
highlighted area in the screenshot below) which select among a series
|
||||
of tabs for performing different types of operations.
|
||||
|
||||
<img src="../assets/invoke-web-server.png-2">
|
||||
|
||||
From top to bottom, these are:
|
||||
|
||||
1. Text to Image - generate images from text
|
||||
2. Image to Image - from an uploaded starting image (drawing or photograph) generate a new one, modified by the text prompt
|
||||
3. Inpainting (pending) - Interactively erase portions of a starting image and have the AI fill in the erased region from a text prompt.
|
||||
4. Outpainting (pending) - Interactively add blank space to the borders of a starting image and fill in the background from a text prompt.
|
||||
5. Postprocessing (pending) - Interactively postprocess generated images using a variety of filters.
|
||||
|
||||
The inpainting, outpainting and postprocessing tabs are currently in
|
||||
development. However, limited versions of their features can already
|
||||
be accessed through the Text to Image and Image to Image tabs.
|
||||
|
||||
## Walkthrough
|
||||
|
||||
The following walkthrough will exercise most (but not all) of the
|
||||
WebGUI's feature set.
|
||||
|
||||
### Text to Image
|
||||
|
||||
1. Launch the WebGUI using `python scripts/invoke.py --web` and
|
||||
connect to it with your browser by accessing
|
||||
`http://localhost:9090`. If the browser and server are running on
|
||||
different machines on your LAN, add the option `--host 0.0.0.0` to the
|
||||
launch command line and connect to the machine hosting the web server
|
||||
using its IP address or domain name.
|
||||
|
||||
2. If all goes well, the WebGUI should come up and you'll see a green
|
||||
`connected` message on the upper right.
|
||||
|
||||
#### Basics
|
||||
|
||||
3. Generate an image by typing *strawberry sushi* into the large
|
||||
prompt field on the upper left and then clicking on the Invoke button
|
||||
(the one with the Camera icon). After a short wait, you'll see a large
|
||||
image of sushi in the image panel, and a new thumbnail in the gallery
|
||||
on the right.
|
||||
|
||||
If you need more room on the screen, you can turn the gallery off
|
||||
by clicking on the **x** to the right of "Your Invocations". You can
|
||||
turn it back on later by clicking the image icon that appears in the
|
||||
gallery's place.
|
||||
|
||||
The images are written into the directory indicated by the `--outdir`
|
||||
option provided at script launch time. By default, this is
|
||||
`outputs/img-samples` under the InvokeAI directory.
|
||||
|
||||
4. Generate a bunch of strawberry sushi images by increasing the
|
||||
number of requested images by adjusting the Images counter just below
|
||||
the Camera button. As each is generated, it will be added to the
|
||||
gallery. You can switch the active image by clicking on the gallery
|
||||
thumbnails.
|
||||
|
||||
5. Try playing with different settings, including image width and
|
||||
height, the Sampler, the Steps and the CFG scale.
|
||||
|
||||
Image *Width* and *Height* do what you'd expect. However, be aware that
|
||||
larger images consume more VRAM memory and take longer to generate.
|
||||
|
||||
The *Sampler* controls how the AI selects the image to display. Some
|
||||
samplers are more "creative" than others and will produce a wider
|
||||
range of variations (see next section). Some samplers run faster than
|
||||
others.
|
||||
|
||||
*Steps* controls how many noising/denoising/sampling steps the AI will
|
||||
take. The higher this value, the more refined the image will be, but
|
||||
the longer the image will take to generate. A typical strategy is to
|
||||
generate images with a low number of steps in order to select one to
|
||||
work on further, and then regenerate it using a higher number of
|
||||
steps.
|
||||
|
||||
The *CFG Scale* controls how hard the AI tries to match the generated
|
||||
image to the input prompt. You can go as high or low as you like, but
|
||||
generally values greater than 20 won't improve things much, and values
|
||||
lower than 5 will produce unexpected images. There are complex
|
||||
interactions between *Steps*, *CFG Scale* and the *Sampler*, so
|
||||
experiment to find out what works for you.
|
||||
|
||||
6. To regenerate a previously-generated image, select the image you
|
||||
want and click *Use All*. This loads the text prompt and other
|
||||
original settings into the control panel. If you then press *Invoke*
|
||||
it will regenerate the image exactly. You can also selectively modify
|
||||
the prompt or other settings to tweak the image.
|
||||
|
||||
Alternatively, you may click on *Use Seed* to load just the image's
|
||||
seed, and leave other settings unchanged.
|
||||
|
||||
7. To regenerate a Stable Diffusion image that was generated by
|
||||
another SD package, you need to know its text prompt and its
|
||||
*Seed*. Copy-paste the prompt into the prompt box, unset the
|
||||
*Randomize Seed* control in the control panel, and copy-paste the
|
||||
desired *Seed* into its text field. When you Invoke, you will get
|
||||
something similar to the original image. It will not be exact unless
|
||||
you also set the correct values for the original sampler, CFG,
|
||||
steps and dimensions, but it will (usually) be close.
|
||||
|
||||
#### Variations on a theme
|
||||
|
||||
5. Let's try generating some variations. Select your favorite sushi
|
||||
image from the gallery to load it. Then select "Use All" from the list
|
||||
of buttons above. This will load up all the settings used to generate
|
||||
this image, including its unique seed.
|
||||
|
||||
Go down to the Variations section of the Control Panel and set the
|
||||
button to On. Set Variation Amount to 0.2 to generate a modest
|
||||
number of variations on the image, and also set the Image counter to
|
||||
4. Press the `invoke` button. This will generate a series of related
|
||||
images. To obtain smaller variations, just lower the Variation
|
||||
Amount. You may also experiment with changing the Sampler. Some
|
||||
samplers generate more variability than others. *k_euler_a* is
|
||||
particularly creative, while *ddim* is pretty conservative.
|
||||
|
||||
6. For even more variations, experiment with increasing the setting
|
||||
for *Perlin*. This adds a bit of noise to the image generation
|
||||
process. Note that values of Perlin noise greater than 0.15 produce
|
||||
poor images for several of the samplers.
|
||||
|
||||
#### Facial reconstruction and upscaling
|
||||
|
||||
Stable Diffusion frequently produces mangled faces, particularly when
|
||||
there are multiple figures in the same scene. Stable Diffusion has
|
||||
particular issues with generating reallistic eyes. InvokeAI provides
|
||||
the ability to reconstruct faces using either the GFPGAN or CodeFormer
|
||||
libraries. For more information see [POSTPROCESS](POSTPROCESS.md).
|
||||
|
||||
7. Invoke a prompt that generates a mangled face. A prompt that often
|
||||
gives this is "portrait of a lawyer, 3/4 shot" (this is not intended
|
||||
as a slur against lawyers!) Once you have an image that needs some
|
||||
touching up, load it into the Image panel, and press the button with
|
||||
the face icon (highlighted in the first screenshot below). A dialog
|
||||
box will appear. Leave *Strength* at 0.8 and press *Restore Faces". If
|
||||
all goes well, the eyes and other aspects of the face will be improved
|
||||
(see the second screenshot)
|
||||
|
||||
<img src="../assets/invoke-web-server-3.png">
|
||||
<img src="../assets/invoke-web-server-4.png">
|
||||
|
||||
The facial reconstruction *Strength* field adjusts how aggressively
|
||||
the face library will try to alter the face. It can be as high as 1.0,
|
||||
but be aware that this often softens the face airbrush style, losing
|
||||
some details. The default 0.8 is usually sufficient.
|
||||
|
||||
8. "Upscaling" is the process of increasing the size of an image while
|
||||
retaining the sharpness. InvokeAI uses an external library called
|
||||
"ESRGAN" to do this. To invoke upscaling, simply select an image and
|
||||
press the *HD* button above it. You can select between 2X and 4X
|
||||
upscaling, and adjust the upscaling strength, which has much the same
|
||||
meaning as in facial reconstruction. Try running this on one of your
|
||||
previously-generated images.
|
||||
|
||||
9. Finally, you can run facial reconstruction and/or upscaling
|
||||
automatically after each Invocation. Go to the Advanced Options
|
||||
section of the Control Panel and turn on *Restore Face* and/or
|
||||
*Upscale*.
|
||||
|
||||
### Image to Image
|
||||
|
||||
InvokeAI lets you take an existing image and use it as the basis for a
|
||||
new creation. You can use any sort of image, including a photograph, a
|
||||
scanned sketch, or a digital drawing, as long as it is in PNG or JPEG
|
||||
format.
|
||||
|
||||
For this tutorial, we'll use files named
|
||||
[Lincoln-and-Parrot-512.png](../assets/Lincoln-and-Parrot-512.png),
|
||||
and
|
||||
[Lincoln-and-Parrot-512-transparent.png](../assets/Lincoln-and-Parrot-512-transparent.png).
|
||||
Download these images to your local machine now to continue with the walkthrough.
|
||||
|
||||
10. Click on the *Image to Image* tab icon, which is the second icon
|
||||
from the top on the left-hand side of the screen:
|
||||
|
||||
<img src="../assets/invoke-web-server-5.png">
|
||||
|
||||
This will bring you to a screen similar to the one shown here:
|
||||
|
||||
<img src="../assets/invoke-web-server-6.png" width=350>
|
||||
|
||||
Drag-and-drop the Lincoln-and-Parrot image into the Image panel, or
|
||||
click the blank area to get an upload dialog. The image will load into
|
||||
an area marked *Initial Image*. (The WebGUI will also load the most
|
||||
recently-generated image from the gallery into a section on the left,
|
||||
but this image will be replaced in the next step.)
|
||||
|
||||
11. Go to the prompt box and type *old sea captain with raven on
|
||||
shoulder* and press Invoke. A derived image will appear to the right
|
||||
of the original one:
|
||||
|
||||
<img src="../assets/invoke-web-server-7.png" width=350>
|
||||
|
||||
12. Experiment with the different settings. The most influential one
|
||||
in Image to Image is *Image to Image Strength* located about midway
|
||||
down the control panel. By default it is set to 0.75, but can range
|
||||
from 0.0 to 0.99. The higher the value, the more of the original image
|
||||
the AI will replace. A value of 0 will leave the initial image
|
||||
completely unchanged, while 0.99 will replace it completely. However,
|
||||
the Sampler and CFG Scale also influence the final result. You can
|
||||
also generate variations in the same way as described in Text to
|
||||
Image.
|
||||
|
||||
13. What if we only want to change certain part(s) of the image and
|
||||
leave the rest intact? This is called Inpainting, and a future version
|
||||
of the InvokeAI web server will provide an interactive painting canvas
|
||||
on which you can directly draw the areas you wish to Inpaint into. For
|
||||
now, you can achieve this effect by using an external photoeditor tool
|
||||
to make one or more regions of the image transparent as described in
|
||||
[INPAINTING.md] and uploading that.
|
||||
|
||||
The file
|
||||
[Lincoln-and-Parrot-512-transparent.png](../assets/Lincoln-and-Parrot-512-transparent.png)
|
||||
is a version of the earlier image in which the area around the parrot
|
||||
has been replaced with transparency. Click on the "x" in the upper
|
||||
right of the Initial Image and upload the transparent version. Using
|
||||
the same prompt "old sea captain with raven on shoulder" try Invoking
|
||||
an image. This time, only the parrot will be replaced, leaving the
|
||||
rest of the original image intact:
|
||||
|
||||
<img src="../assets/invoke-web-server-8.png" width=350>
|
||||
|
||||
## Parting remarks
|
||||
|
||||
This concludes the walkthrough, but there are several more features that you
|
||||
can explore. Please check out the [Command Line Interface](CLI.md)
|
||||
documentation for further explanation of the advanced features that
|
||||
were not covered here.
|
||||
|
||||
The WebGUI is only rapid development. Check back regularly for
|
||||
updates!
|
||||
|
||||
## Credits
|
||||
|
||||
Kudos to [Psychedelicious](https://github.com/psychedelicious),
|
||||
[BlessedCoolant](https://github.com/blessedcoolant), [Tesseract
|
||||
Cat](https://github.com/TesseractCat),
|
||||
[dagf2101](https://github.com/dagf2101), and many others who
|
||||
contributed to this code.
|
||||
|
||||
![Dream Web Server](../assets/invoke_web_server.png)
|
||||
|
Loading…
Reference in New Issue
Block a user