From 5a22a83f4ceb7a3d57ee87ac1a86407a5211c249 Mon Sep 17 00:00:00 2001 From: Lincoln Stein Date: Sun, 9 Oct 2022 11:38:39 -0400 Subject: [PATCH] add missing doc files --- README.md | 13 +- docs/features/POSTPROCESS.md | 66 ++++----- docs/features/WEB.md | 265 ++++++++++++++++++++++++++++++++++- 3 files changed, 302 insertions(+), 42 deletions(-) diff --git a/README.md b/README.md index aa778dcb56..9606d30b9c 100644 --- a/README.md +++ b/README.md @@ -41,10 +41,13 @@ _This repository was formally known as lstein/stable-diffusion_ [latest release link]: https://github.com/invoke-ai/InvokeAI/releases -This is a fork of [CompVis/stable-diffusion](https://github.com/CompVis/stable-diffusion), the open -source text-to-image generator. It provides a streamlined process with various new features and -options to aid the image generation process. It runs on Windows, Mac and Linux machines, and runs on -GPU cards with as little as 4 GB or RAM. +This is a fork of +[CompVis/stable-diffusion](https://github.com/CompVis/stable-diffusion), +the open source text-to-image generator. It provides a streamlined +process with various new features and options to aid the image +generation process. It runs on Windows, Mac and Linux machines, with +GPU cards with as little as 4 GB or RAM. It provides both a polished +Web interface, and an easy-to-use command-line interface. _Note: This fork is rapidly evolving. Please use the [Issues](https://github.com/invoke-ai/InvokeAI/issues) tab to report bugs and make feature @@ -109,6 +112,7 @@ you can try starting `invoke.py` with the `--precision=float32` flag: #### Major Features +- [Web Server](docs/features/WEB.md) - [Interactive Command Line Interface](docs/features/CLI.md) - [Image To Image](docs/features/IMG2IMG.md) - [Inpainting Support](docs/features/INPAINTING.md) @@ -116,7 +120,6 @@ you can try starting `invoke.py` with the `--precision=float32` flag: - [Upscaling, face-restoration and outpainting](docs/features/POSTPROCESS.md) - [Seamless Tiling](docs/features/OTHER.md#seamless-tiling) - [Google Colab](docs/features/OTHER.md#google-colab) -- [Web Server](docs/features/WEB.md) - [Reading Prompts From File](docs/features/PROMPTS.md#reading-prompts-from-a-file) - [Shortcut: Reusing Seeds](docs/features/OTHER.md#shortcuts-reusing-seeds) - [Prompt Blending](docs/features/PROMPTS.md#prompt-blending) diff --git a/docs/features/POSTPROCESS.md b/docs/features/POSTPROCESS.md index fbcd1c8005..b5156f54f0 100644 --- a/docs/features/POSTPROCESS.md +++ b/docs/features/POSTPROCESS.md @@ -20,39 +20,33 @@ The default face restoration module is GFPGAN. The default upscale is Real-ESRGAN. For an alternative face restoration module, see [CodeFormer Support] below. -As of version 1.14, environment.yaml will install the Real-ESRGAN package into -the standard install location for python packages, and will put GFPGAN into a -subdirectory of "src" in the InvokeAI directory. (The reason for this is -that the standard GFPGAN distribution has a minor bug that adversely affects -image color.) Upscaling with Real-ESRGAN should "just work" without further -intervention. Simply pass the --upscale (-U) option on the invoke> command line, -or indicate the desired scale on the popup in the Web GUI. +As of version 1.14, environment.yaml will install the Real-ESRGAN +package into the standard install location for python packages, and +will put GFPGAN into a subdirectory of "src" in the InvokeAI +directory. Upscaling with Real-ESRGAN should "just work" without +further intervention. Simply pass the --upscale (-U) option on the +invoke> command line, or indicate the desired scale on the popup in +the Web GUI. -For **GFPGAN** to work, there is one additional step needed. You will need to -download and copy the GFPGAN -[models file](https://github.com/TencentARC/GFPGAN/releases/download/v1.3.0/GFPGANv1.4.pth) -into **src/gfpgan/experiments/pretrained_models**. On Mac and Linux systems, -here's how you'd do it using **wget**: +**GFPGAN** requires a series of downloadable model files to +work. These are loaded when you run `scripts/preload_models.py`. If +GFPAN is failing with an error, please run the following from the +InvokeAI directory: -```bash -wget https://github.com/TencentARC/GFPGAN/releases/download/v1.3.0/GFPGANv1.4.pth -P src/gfpgan/experiments/pretrained_models/ -``` +~~~~ +python scripts/preload_models.py +~~~~ -Make sure that you're in the InvokeAI directory when you do this. +If you do not run this script in advance, the GFPGAN module will attempt +to download the models files the first time you try to perform facial +reconstruction. -Alternatively, if you have GFPGAN installed elsewhere, or if you are using an -earlier version of this package which asked you to install GFPGAN in a sibling -directory, you may use the `--gfpgan_dir` argument with `invoke.py` to set a -custom path to your GFPGAN directory. _There are other GFPGAN related boot -arguments if you wish to customize further._ - -!!! warning "Internet connection needed" - - Users whose GPU machines are isolated from the Internet (e.g. - on a University cluster) should be aware that the first time you run invoke.py with GFPGAN and - Real-ESRGAN turned on, it will try to download model files from the Internet. To rectify this, you - may run `python3 scripts/preload_models.py` after you have installed GFPGAN and all its - dependencies. +Alternatively, if you have GFPGAN installed elsewhere, or if you are +using an earlier version of this package which asked you to install +GFPGAN in a sibling directory, you may use the `--gfpgan_dir` argument +with `invoke.py` to set a custom path to your GFPGAN directory. _There +are other GFPGAN related boot arguments if you wish to customize +further._ ## Usage @@ -124,15 +118,15 @@ actions. This repo also allows you to perform face restoration using [CodeFormer](https://github.com/sczhou/CodeFormer). -In order to setup CodeFormer to work, you need to download the models like with -GFPGAN. You can do this either by running `preload_models.py` or by manually -downloading the -[model file](https://github.com/sczhou/CodeFormer/releases/download/v0.1.0/codeformer.pth) +In order to setup CodeFormer to work, you need to download the models +like with GFPGAN. You can do this either by running +`preload_models.py` or by manually downloading the [model +file](https://github.com/sczhou/CodeFormer/releases/download/v0.1.0/codeformer.pth) and saving it to `ldm/restoration/codeformer/weights` folder. -You can use `-ft` prompt argument to swap between CodeFormer and the default -GFPGAN. The above mentioned `-G` prompt argument will allow you to control the -strength of the restoration effect. +You can use `-ft` prompt argument to swap between CodeFormer and the +default GFPGAN. The above mentioned `-G` prompt argument will allow +you to control the strength of the restoration effect. ### Usage: diff --git a/docs/features/WEB.md b/docs/features/WEB.md index 79f66314fa..e634a83be2 100644 --- a/docs/features/WEB.md +++ b/docs/features/WEB.md @@ -20,10 +20,273 @@ wildcard `0.0.0.0`. For example: (ldm) ~/InvokeAI$ python3 scripts/invoke.py --web --host 0.0.0.0 ``` +# Quick guided walkthrough of the WebGUI's features + +While most of the WebGUI's features are intuitive, here is a guided +walkthrough through its various components. + + + +The screenshot above shows the Text to Image tab of the WebGUI. There +are three main sections: + +1. A **control panel** on the left, which contains various settings +for text to image generation. The most important part is the text +field (currently showing `strawberry sushi`) for entering the text +prompt, and the camera icon directly underneath that will render the +image. We'll call this the *Invoke* button from now on. + +2. The **current image** section in the middle, which shows a large +format version of the image you are currently working on. A series of +buttons at the top ("image to image", "Use All", "Use Seed", etc) lets +you modify the image in various ways. + +3. A **gallery* section on the left that contains a history of the +images you have generated. These images are read and written to the +directory specified at launch time in `--outdir`. + +In addition to these three elements, there are a series of icons for +changing global settings, reporting bugs, and changing the theme on +the upper right. + +There are also a series of icons to the left of the control panel (see +highlighted area in the screenshot below) which select among a series +of tabs for performing different types of operations. + + + +From top to bottom, these are: + +1. Text to Image - generate images from text +2. Image to Image - from an uploaded starting image (drawing or photograph) generate a new one, modified by the text prompt +3. Inpainting (pending) - Interactively erase portions of a starting image and have the AI fill in the erased region from a text prompt. +4. Outpainting (pending) - Interactively add blank space to the borders of a starting image and fill in the background from a text prompt. +5. Postprocessing (pending) - Interactively postprocess generated images using a variety of filters. + +The inpainting, outpainting and postprocessing tabs are currently in +development. However, limited versions of their features can already +be accessed through the Text to Image and Image to Image tabs. + +## Walkthrough + +The following walkthrough will exercise most (but not all) of the +WebGUI's feature set. + +### Text to Image + +1. Launch the WebGUI using `python scripts/invoke.py --web` and +connect to it with your browser by accessing +`http://localhost:9090`. If the browser and server are running on +different machines on your LAN, add the option `--host 0.0.0.0` to the +launch command line and connect to the machine hosting the web server +using its IP address or domain name. + +2. If all goes well, the WebGUI should come up and you'll see a green +`connected` message on the upper right. + +#### Basics + +3. Generate an image by typing *strawberry sushi* into the large +prompt field on the upper left and then clicking on the Invoke button +(the one with the Camera icon). After a short wait, you'll see a large +image of sushi in the image panel, and a new thumbnail in the gallery +on the right. + +If you need more room on the screen, you can turn the gallery off +by clicking on the **x** to the right of "Your Invocations". You can +turn it back on later by clicking the image icon that appears in the +gallery's place. + +The images are written into the directory indicated by the `--outdir` +option provided at script launch time. By default, this is +`outputs/img-samples` under the InvokeAI directory. + +4. Generate a bunch of strawberry sushi images by increasing the +number of requested images by adjusting the Images counter just below +the Camera button. As each is generated, it will be added to the +gallery. You can switch the active image by clicking on the gallery +thumbnails. + +5. Try playing with different settings, including image width and +height, the Sampler, the Steps and the CFG scale. + +Image *Width* and *Height* do what you'd expect. However, be aware that +larger images consume more VRAM memory and take longer to generate. + +The *Sampler* controls how the AI selects the image to display. Some +samplers are more "creative" than others and will produce a wider +range of variations (see next section). Some samplers run faster than +others. + +*Steps* controls how many noising/denoising/sampling steps the AI will +take. The higher this value, the more refined the image will be, but +the longer the image will take to generate. A typical strategy is to +generate images with a low number of steps in order to select one to +work on further, and then regenerate it using a higher number of +steps. + +The *CFG Scale* controls how hard the AI tries to match the generated +image to the input prompt. You can go as high or low as you like, but +generally values greater than 20 won't improve things much, and values +lower than 5 will produce unexpected images. There are complex +interactions between *Steps*, *CFG Scale* and the *Sampler*, so +experiment to find out what works for you. + +6. To regenerate a previously-generated image, select the image you +want and click *Use All*. This loads the text prompt and other +original settings into the control panel. If you then press *Invoke* +it will regenerate the image exactly. You can also selectively modify +the prompt or other settings to tweak the image. + +Alternatively, you may click on *Use Seed* to load just the image's +seed, and leave other settings unchanged. + +7. To regenerate a Stable Diffusion image that was generated by +another SD package, you need to know its text prompt and its +*Seed*. Copy-paste the prompt into the prompt box, unset the +*Randomize Seed* control in the control panel, and copy-paste the +desired *Seed* into its text field. When you Invoke, you will get +something similar to the original image. It will not be exact unless +you also set the correct values for the original sampler, CFG, +steps and dimensions, but it will (usually) be close. + +#### Variations on a theme + +5. Let's try generating some variations. Select your favorite sushi +image from the gallery to load it. Then select "Use All" from the list +of buttons above. This will load up all the settings used to generate +this image, including its unique seed. + +Go down to the Variations section of the Control Panel and set the +button to On. Set Variation Amount to 0.2 to generate a modest +number of variations on the image, and also set the Image counter to +4. Press the `invoke` button. This will generate a series of related +images. To obtain smaller variations, just lower the Variation +Amount. You may also experiment with changing the Sampler. Some +samplers generate more variability than others. *k_euler_a* is +particularly creative, while *ddim* is pretty conservative. + +6. For even more variations, experiment with increasing the setting +for *Perlin*. This adds a bit of noise to the image generation +process. Note that values of Perlin noise greater than 0.15 produce +poor images for several of the samplers. + +#### Facial reconstruction and upscaling + +Stable Diffusion frequently produces mangled faces, particularly when +there are multiple figures in the same scene. Stable Diffusion has +particular issues with generating reallistic eyes. InvokeAI provides +the ability to reconstruct faces using either the GFPGAN or CodeFormer +libraries. For more information see [POSTPROCESS](POSTPROCESS.md). + +7. Invoke a prompt that generates a mangled face. A prompt that often +gives this is "portrait of a lawyer, 3/4 shot" (this is not intended +as a slur against lawyers!) Once you have an image that needs some +touching up, load it into the Image panel, and press the button with +the face icon (highlighted in the first screenshot below). A dialog +box will appear. Leave *Strength* at 0.8 and press *Restore Faces". If +all goes well, the eyes and other aspects of the face will be improved +(see the second screenshot) + + + + +The facial reconstruction *Strength* field adjusts how aggressively +the face library will try to alter the face. It can be as high as 1.0, +but be aware that this often softens the face airbrush style, losing +some details. The default 0.8 is usually sufficient. + +8. "Upscaling" is the process of increasing the size of an image while +retaining the sharpness. InvokeAI uses an external library called +"ESRGAN" to do this. To invoke upscaling, simply select an image and +press the *HD* button above it. You can select between 2X and 4X +upscaling, and adjust the upscaling strength, which has much the same +meaning as in facial reconstruction. Try running this on one of your +previously-generated images. + +9. Finally, you can run facial reconstruction and/or upscaling +automatically after each Invocation. Go to the Advanced Options +section of the Control Panel and turn on *Restore Face* and/or +*Upscale*. + +### Image to Image + +InvokeAI lets you take an existing image and use it as the basis for a +new creation. You can use any sort of image, including a photograph, a +scanned sketch, or a digital drawing, as long as it is in PNG or JPEG +format. + +For this tutorial, we'll use files named +[Lincoln-and-Parrot-512.png](../assets/Lincoln-and-Parrot-512.png), +and +[Lincoln-and-Parrot-512-transparent.png](../assets/Lincoln-and-Parrot-512-transparent.png). +Download these images to your local machine now to continue with the walkthrough. + +10. Click on the *Image to Image* tab icon, which is the second icon +from the top on the left-hand side of the screen: + + + +This will bring you to a screen similar to the one shown here: + + + +Drag-and-drop the Lincoln-and-Parrot image into the Image panel, or +click the blank area to get an upload dialog. The image will load into +an area marked *Initial Image*. (The WebGUI will also load the most +recently-generated image from the gallery into a section on the left, +but this image will be replaced in the next step.) + +11. Go to the prompt box and type *old sea captain with raven on +shoulder* and press Invoke. A derived image will appear to the right +of the original one: + + + +12. Experiment with the different settings. The most influential one +in Image to Image is *Image to Image Strength* located about midway +down the control panel. By default it is set to 0.75, but can range +from 0.0 to 0.99. The higher the value, the more of the original image +the AI will replace. A value of 0 will leave the initial image +completely unchanged, while 0.99 will replace it completely. However, +the Sampler and CFG Scale also influence the final result. You can +also generate variations in the same way as described in Text to +Image. + +13. What if we only want to change certain part(s) of the image and +leave the rest intact? This is called Inpainting, and a future version +of the InvokeAI web server will provide an interactive painting canvas +on which you can directly draw the areas you wish to Inpaint into. For +now, you can achieve this effect by using an external photoeditor tool +to make one or more regions of the image transparent as described in +[INPAINTING.md] and uploading that. + +The file +[Lincoln-and-Parrot-512-transparent.png](../assets/Lincoln-and-Parrot-512-transparent.png) +is a version of the earlier image in which the area around the parrot +has been replaced with transparency. Click on the "x" in the upper +right of the Initial Image and upload the transparent version. Using +the same prompt "old sea captain with raven on shoulder" try Invoking +an image. This time, only the parrot will be replaced, leaving the +rest of the original image intact: + + + +## Parting remarks + +This concludes the walkthrough, but there are several more features that you +can explore. Please check out the [Command Line Interface](CLI.md) +documentation for further explanation of the advanced features that +were not covered here. + +The WebGUI is only rapid development. Check back regularly for +updates! + +## Credits + Kudos to [Psychedelicious](https://github.com/psychedelicious), [BlessedCoolant](https://github.com/blessedcoolant), [Tesseract Cat](https://github.com/TesseractCat), [dagf2101](https://github.com/dagf2101), and many others who contributed to this code. -![Dream Web Server](../assets/invoke_web_server.png)