Merge branch 'development' into main

This merge adds the following major features:

* Support for image variations.

* Security fix for webGUI (binds to localhost by default, use
--host=0.0.0.0 to allow access from external interface.

* Scalable configs/models.yaml configuration file for adding more
models as they become available.

* More tuning and exception handling for M1 hardware running MPS.

* Various documentation fixes.
This commit is contained in:
Lincoln Stein 2022-09-03 11:58:46 -04:00
commit 4ffdf73412
16 changed files with 820 additions and 107 deletions

View File

@ -1,37 +1,81 @@
# Apple Silicon Mac Users
# macOS Instructions
Several people have gotten Stable Diffusion to work on Apple Silicon
Macs using Anaconda, miniforge, etc. I've gathered up most of their instructions and
put them in this fork (and readme). Things have moved really fast and so these
instructions change often. Hopefully things will settle down a little.
Requirements
There's several places where people are discussing Apple
MPS functionality: [the original CompVis
issue](https://github.com/CompVis/stable-diffusion/issues/25), and generally on
[lstein's fork](https://github.com/lstein/stable-diffusion/).
- macOS 12.3 Monterey or later
- Python
- Patience
- Apple Silicon*
You have to have macOS 12.3 Monterey or later. Anything earlier than that won't work.
*I haven't tested any of this on Intel Macs but I have read that one person got
it to work, so Apple Silicon might not be requried.
Tested on a 2022 Macbook M2 Air with 10-core GPU and 24 GB unified memory.
Things have moved really fast and so these instructions change often and are
often out-of-date. One of the problems is that there are so many different ways to
run this.
How to:
We are trying to build a testing setup so that when we make changes it doesn't
always break.
How to (this hasn't been 100% tested yet):
First get the weights checkpoint download started - it's big:
1. Sign up at https://huggingface.co
2. Go to the [Stable diffusion diffusion model page](https://huggingface.co/CompVis/stable-diffusion-v-1-4-original)
3. Accept the terms and click Access Repository:
4. Download [sd-v1-4.ckpt (4.27 GB)](https://huggingface.co/CompVis/stable-diffusion-v-1-4-original/blob/main/sd-v1-4.ckpt) and note where you have saved it (probably the Downloads folder)
While that is downloading, open Terminal and run the following commands one at a time.
```
# install brew (and Xcode command line tools):
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
# install python 3, git, cmake, protobuf:
brew install cmake protobuf rust
# install miniconda (M1 arm64 version):
curl https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh -o Miniconda3-latest-MacOSX-arm64.sh
/bin/bash Miniconda3-latest-MacOSX-arm64.sh
# clone the repo
git clone https://github.com/lstein/stable-diffusion.git
cd stable-diffusion
#
# wait until the checkpoint file has downloaded, then proceed
#
# create symlink to checkpoint
mkdir -p models/ldm/stable-diffusion-v1/
PATH_TO_CKPT="$HOME/Documents/stable-diffusion-v-1-4-original" # or wherever yours is.
PATH_TO_CKPT="$HOME/Downloads" # or wherever you saved sd-v1-4.ckpt
ln -s "$PATH_TO_CKPT/sd-v1-4.ckpt" models/ldm/stable-diffusion-v1/model.ckpt
CONDA_SUBDIR=osx-arm64 conda env create -f environment-mac.yaml
# install packages
PIP_EXISTS_ACTION=w CONDA_SUBDIR=osx-arm64 conda env create -f environment-mac.yaml
conda activate ldm
# only need to do this once
python scripts/preload_models.py
# run SD!
python scripts/dream.py --full_precision # half-precision requires autocast and won't work
```
After you follow all the instructions and run dream.py you might get several errors. Here's the errors I've seen and found solutions for.
The original scripts should work as well.
```
python scripts/orig_scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms
```
Note, `export PIP_EXISTS_ACTION=w` is a precaution to fix `conda env create -f environment-mac.yaml`
never finishing in some situations. So it isn't required but wont hurt.
After you follow all the instructions and run dream.py you might get several
errors. Here's the errors I've seen and found solutions for.
### Is it slow?
@ -52,27 +96,37 @@ One debugging step is to update to the latest version of PyTorch nightly.
conda install pytorch torchvision torchaudio -c pytorch-nightly
Or you can clean everything up.
If `conda env create -f environment-mac.yaml` takes forever run this.
git clean -f
And run this.
conda clean --yes --all
Or you can reset Anaconda.
Or you could reset Anaconda.
conda update --force-reinstall -y -n base -c defaults conda
### "No module named cv2" (or some other module)
### "No module named cv2", torch, 'ldm', 'transformers', 'taming', etc.
Did you remember to `conda activate ldm`? If your terminal prompt
There are several causes of these errors.
First, did you remember to `conda activate ldm`? If your terminal prompt
begins with "(ldm)" then you activated it. If it begins with "(base)"
or something else you haven't.
If it says you're missing taming you need to rebuild your virtual
Second, you might've run `./scripts/preload_models.py` or `./scripts/dream.py`
instead of `python ./scripts/preload_models.py` or `python ./scripts/dream.py`.
The cause of this error is long so it's below.
Third, if it says you're missing taming you need to rebuild your virtual
environment.
conda env remove -n ldm
conda env create -f environment-mac.yaml
If you have activated the ldm virtual environment and tried rebuilding
Fourth, If you have activated the ldm virtual environment and tried rebuilding
it, maybe the problem could be that I have something installed that
you don't and you'll just need to manually install it. Make sure you
activate the virtual environment so it installs there instead of
@ -83,6 +137,56 @@ globally.
You might also need to install Rust (I mention this again below).
### How many snakes are living in your computer?
Here's the reason why you have to specify which python to use.
There are several versions of python on macOS and the computer is
picking the wrong one. More specifically, preload_models.py and dream.py says to
find the first `python3` in the path environment variable. You can see which one
it is picking with `which python3`. These are the mostly likely paths you'll see.
% which python3
/usr/bin/python3
The above path is part of the OS. However, that path is a stub that asks you if
you want to install Xcode. If you have Xcode installed already,
/usr/bin/python3 will execute /Library/Developer/CommandLineTools/usr/bin/python3 or
/Applications/Xcode.app/Contents/Developer/usr/bin/python3 (depending on which
Xcode you've selected with `xcode-select`).
% which python3
/opt/homebrew/bin/python3
If you installed python3 with Homebrew and you've modified your path to search
for Homebrew binaries before system ones, you'll see the above path.
% which python
/opt/anaconda3/bin/python
If you drop the "3" you get an entirely different python. Note: starting in
macOS 12.3, /usr/bin/python no longer exists (it was python 2 anyway).
If you have Anaconda installed, this is what you'll see. There is a
/opt/anaconda3/bin/python3 also.
(ldm) % which python
/Users/name/miniforge3/envs/ldm/bin/python
This is what you'll see if you have miniforge and you've correctly activated
the ldm environment. This is the goal.
It's all a mess and you should know [how to modify the path environment variable](https://support.apple.com/guide/terminal/use-environment-variables-apd382cc5fa-4f58-4449-b20a-41c53c006f8f/mac)
if you want to fix it. Here's a brief hint of all the ways you can modify it
(don't really have the time to explain it all here).
- ~/.zshrc
- ~/.bash_profile
- ~/.bashrc
- /etc/paths.d
- /etc/path
Which one you use will depend on what you have installed except putting a file
in /etc/paths.d is what I prefer to do.
### Debugging?
@ -139,7 +243,7 @@ the environment variable `CONDA_SUBDIR=osx-arm64`, like so:
This error happens with Anaconda on Macs when the Intel-only `mkl` is pulled in by
a dependency. [nomkl](https://stackoverflow.com/questions/66224879/what-is-the-nomkl-python-package-used-for)
is a metapackage designed to prevent this, by making it impossible to install
`mkl`, but if your environment is already broken it may not work.
`mkl`, but if your environment is already broken it may not work.
Do *not* use `os.environ['KMP_DUPLICATE_LIB_OK']='True'` or equivalents as this
masks the underlying issue of using Intel packages.
@ -207,7 +311,7 @@ change instead. This is a 32-bit vs 16-bit problem.
What? Intel? On an Apple Silicon?
Intel MKL FATAL ERROR: This system does not meet the minimum requirements for use of the Intel(R) Math Kernel Library.
The processor must support the Intel(R) Supplemental Streaming SIMD Extensions 3 (Intel(R) SSSE3) instructions.██████████████| 50/50 [02:25<00:00, 2.53s/it]
The processor must support the Intel(R) Supplemental Streaming SIMD Extensions 3 (Intel(R) SSSE3) instructions.
The processor must support the Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) instructions.
The processor must support the Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.

View File

@ -21,17 +21,21 @@ text-to-image generator. This fork supports:
2. A basic Web interface that allows you to run a local web server for
generating images in your browser.
3. A notebook for running the code on Google Colab.
4. Support for img2img in which you provide a seed image to guide the
3. Support for img2img in which you provide a seed image to guide the
image creation. (inpainting & masking coming soon)
4. A notebook for running the code on Google Colab.
5. Upscaling and face fixing using the optional ESRGAN and GFPGAN
packages.
6. Weighted subprompts for prompt tuning.
7. Textual inversion for customization of the prompt language and images.
7. [Image variations](VARIATIONS.md) which allow you to systematically
generate variations of an image you like and combine two or more
images together to combine the best features of both.
8. Textual inversion for customization of the prompt language and images.
8. ...and more!
@ -42,8 +46,11 @@ improvements and bug fixes.
# Table of Contents
1. [Major Features](#features)
2. [Changelog](#latest)
2. [Changelog](#latest-changes)
3. [Installation](#installation)
1. [Linux](#linux)
1. [Windows](#windows)
1. [MacOS](README-Mac-MPS.md)
4. [Troubleshooting](#troubleshooting)
5. [Contributing](#contributing)
6. [Support](#support)
@ -331,8 +338,10 @@ and introducing a new vocabulary to the fixed model.
To train, prepare a folder that contains images sized at 512x512 and execute the following:
WINDOWS: As the default backend is not available on Windows, if you're using that platform, set the environment variable `PL_TORCH_DISTRIBUTED_BACKEND=gloo`
```
# As the default backend is not available on Windows, if you're using that platform, execute SET PL_TORCH_DISTRIBUTED_BACKEND=gloo
(ldm) ~/stable-diffusion$ python3 ./main.py --base ./configs/stable-diffusion/v1-finetune.yaml \
-t \
--actual_resume ./models/ldm/stable-diffusion-v1/model.ckpt \
@ -505,6 +514,30 @@ This will bring your local copy into sync with the remote one.
## Windows
### Notebook install (semi-automated)
We have a
[Jupyter notebook](https://github.com/lstein/stable-diffusion/blob/main/Stable-Diffusion-local-Windows.ipynb)
with cell-by-cell installation steps. It will download the code in this repo as
one of the steps, so instead of cloning this repo, simply download the notebook
from the link above and load it up in VSCode (with the
appropriate extensions installed)/Jupyter/JupyterLab and start running the cells one-by-one.
Note that you will need NVIDIA drivers, Python 3.10, and Git installed
beforehand - simplified
[step-by-step instructions](https://github.com/lstein/stable-diffusion/wiki/Easy-peasy-Windows-install)
are available in the wiki (you'll only need steps 1, 2, & 3 ).
### Manual installs
#### pip
See
[Easy-peasy Windows install](https://github.com/lstein/stable-diffusion/wiki/Easy-peasy-Windows-install)
in the wiki
#### Conda
1. Install Anaconda3 (miniconda3 version) from here: https://docs.anaconda.com/anaconda/install/windows/
2. Install Git from here: https://git-scm.com/download/win

View File

@ -0,0 +1,259 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Easy-peasy Windows install"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that you will need NVIDIA drivers, Python 3.10, and Git installed\n",
"beforehand - simplified\n",
"[step-by-step instructions](https://github.com/lstein/stable-diffusion/wiki/Easy-peasy-Windows-install)\n",
"are available in the wiki (you'll only need steps 1, 2, & 3 )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Run each cell in turn. In VSCode, either hit SHIFT-ENTER, or click on the little ▶️ to the left of the cell. In Jupyter/JupyterLab, you **must** hit SHIFT-ENTER"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install pew"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%cmd\n",
"git clone https://github.com/lstein/stable-diffusion.git"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%cd stable-diffusion"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%writefile requirements.txt\n",
"albumentations==0.4.3\n",
"einops==0.3.0\n",
"huggingface-hub==0.8.1\n",
"imageio-ffmpeg==0.4.2\n",
"imageio==2.9.0\n",
"kornia==0.6.0\n",
"omegaconf==2.1.1\n",
"opencv-python==4.6.0.66\n",
"pillow==9.2.0\n",
"pudb==2019.2\n",
"pytorch-lightning==1.4.2\n",
"streamlit==1.12.0\n",
"# Regular \"taming-transformers\" doesn't seem to work\n",
"taming-transformers-rom1504==0.0.6\n",
"test-tube>=0.7.5\n",
"torch-fidelity==0.3.0\n",
"torchmetrics==0.6.0\n",
"torchvision==0.12.0\n",
"transformers==4.19.2\n",
"git+https://github.com/openai/CLIP.git@main#egg=clip\n",
"git+https://github.com/lstein/k-diffusion.git@master#egg=k-diffusion\n",
"# No CUDA in PyPi builds\n",
"torch@https://download.pytorch.org/whl/cu113/torch-1.11.0%2Bcu113-cp310-cp310-win_amd64.whl\n",
"# No MKL in PyPi builds (faster, more robust than OpenBLAS)\n",
"numpy@https://download.lfd.uci.edu/pythonlibs/archived/numpy-1.22.4+mkl-cp310-cp310-win_amd64.whl\n",
"-e .\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%cmd\n",
"pew new --python 3.10 -r requirements.txt --dont-activate ldm"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Switch the notebook kernel to the new 'ldm' environment!\n",
"\n",
"## VSCode: restart VSCode and come back to this cell\n",
"\n",
"1. Ctrl+Shift+P\n",
"1. Type \"Select Interpreter\" and select \"Jupyter: Select Interpreter to Start Jupyter Server\"\n",
"1. VSCode will say that it needs to install packages. Click the \"Install\" button.\n",
"1. Once the install is finished, do 1 & 2 again\n",
"1. Pick 'ldm'\n",
"1. Run the following cell"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%cd stable-diffusion"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"## Jupyter/JupyterLab\n",
"\n",
"1. Run the cell below\n",
"1. Click on the toolbar where it says \"(ipyknel)\" ↗️. You should get a pop-up asking you to \"Select Kernel\". Pick 'ldm' from the drop-down.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### DO NOT RUN THE FOLLOWING CELL IF YOU ARE USING VSCODE!!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# DO NOT RUN THIS CELL IF YOU ARE USING VSCODE!!\n",
"%%cmd\n",
"pew workon ldm\n",
"pip3 install ipykernel\n",
"python -m ipykernel install --name=ldm"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### When running the next cell, Jupyter/JupyterLab users might get a warning saying \"IProgress not found\". This can be ignored."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%run \"scripts/preload_models.py\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%cmd\n",
"mkdir \"models/ldm/stable-diffusion-v1\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Now copy the SD model you downloaded from Hugging Face into the above new directory, and (if necessary) rename it to 'model.ckpt'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Now go create some magic!\n",
"\n",
"VSCode\n",
"\n",
"- The actual input box for the 'dream' prompt will appear at the very top of the VSCode window. Type in your commands and hit 'ENTER'.\n",
"- To quit, hit the 'Interrupt' button in the toolbar up there ⬆️ a couple of times, then hit ENTER (you'll probably see a terrifying traceback from Python - just ignore it).\n",
"\n",
"Jupyter/JupyterLab\n",
"\n",
"- The input box for the 'dream' prompt will appear below. Type in your commands and hit 'ENTER'.\n",
"- To quit, hit the interrupt button (⏹️) in the toolbar up there ⬆️ a couple of times, then hit ENTER (you'll probably see a terrifying traceback from Python - just ignore it)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%run \"scripts/dream.py\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Once this seems to be working well, you can try opening a terminal\n",
"\n",
"- VSCode: type ('CTRL+`')\n",
"- Jupyter/JupyterLab: File|New Terminal\n",
"- Or jump out of the notebook entirely, and open Powershell/Command Prompt\n",
"\n",
"Now:\n",
"\n",
"1. `cd` to wherever the 'stable-diffusion' directory is\n",
"1. Run `pew workon ldm`\n",
"1. Run `winpty python scripts\\dream.py`"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.10.6 ('ldm')",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
},
"vscode": {
"interpreter": {
"hash": "a05e4574567b7bc2c98f7f9aa579f9ea5b8739b54844ab610ac85881c4be2659"
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}

113
VARIATIONS.md Normal file
View File

@ -0,0 +1,113 @@
# Cheat Sheat for Generating Variations
Release 1.13 of SD-Dream adds support for image variations. There are two things that you can do:
1. Generate a series of systematic variations of an image, given a
prompt. The amount of variation from one image to the next can be
controlled.
2. Given two or more variations that you like, you can combine them in
a weighted fashion
This cheat sheet provides a quick guide for how this works in
practice, using variations to create the desired image of Xena,
Warrior Princess.
## Step 1 -- find a base image that you like
The prompt we will use throughout is "lucy lawless as xena, warrior
princess, character portrait, high resolution." This will be indicated
as "prompt" in the examples below.
First we let SD create a series of images in the usual way, in this case
requesting six iterations:
~~~
dream> lucy lawless as xena, warrior princess, character portrait, high resolution -n6
...
Outputs:
./outputs/Xena/000001.1579445059.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -S1579445059
./outputs/Xena/000001.1880768722.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -S1880768722
./outputs/Xena/000001.332057179.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -S332057179
./outputs/Xena/000001.2224800325.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -S2224800325
./outputs/Xena/000001.465250761.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -S465250761
./outputs/Xena/000001.3357757885.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -S3357757885
~~~
The one with seed 3357757885 looks nice:
<img src="static/variation_walkthru/000001.3357757885.png"/>
Let's try to generate some variations. Using the same seed, we pass
the argument -v0.1 (or --variant_amount), which generates a series of
variations each differing by a variation amount of 0.2. This number
ranges from 0 to 1.0, with higher numbers being larger amounts of
variation.
~~~
dream> "prompt" -n6 -S3357757885 -v0.2
...
Outputs:
./outputs/Xena/000002.784039624.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -V 784039624:0.2 -S3357757885
./outputs/Xena/000002.3647897225.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -V 3647897225:0.2 -S3357757885
./outputs/Xena/000002.917731034.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -V 917731034:0.2 -S3357757885
./outputs/Xena/000002.4116285959.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -V 4116285959:0.2 -S3357757885
./outputs/Xena/000002.1614299449.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -V 1614299449:0.2 -S3357757885
./outputs/Xena/000002.1335553075.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -V 1335553075:0.2 -S3357757885
~~~
Note that the output for each image has a -V option giving the
"variant subseed" for that image, consisting of a seed followed by the
variation amount used to generate it.
This gives us a series of closely-related variations, including the
two shown here.
<img src="static/variation_walkthru/000002.3647897225.png">
<img src="static/variation_walkthru/000002.1614299449.png">
I like the expression on Xena's face in the first one (subseed
3647897225), and the armor on her shoulder in the second one (subseed
1614299449). Can we combine them to get the best of both worlds?
We combine the two variations using -V (--with_variations). Again, we
must provide the seed for the originally-chosen image in order for
this to work.
~~~
dream> "prompt" -S3357757885 -V3647897225,0.1;1614299449,0.1
Outputs:
./outputs/Xena/000003.1614299449.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -V 3647897225:0.1,1614299449:0.1 -S3357757885
~~~
Here we are providing equal weights (0.1 and 0.1) for both the
subseeds. The resulting image is close, but not exactly what I
wanted:
<img src="static/variation_walkthru/000003.1614299449.png">
We could either try combining the images with different weights, or we
can generate more variations around the almost-but-not-quite image. We
do the latter, using both the -V (combining) and -v (variation
strength) options. Note that we use -n6 to generate 6 variations:
~~~~
dream> "prompt" -S3357757885 -V3647897225,0.1;1614299449,0.1 -v0.05 -n6
Outputs:
./outputs/Xena/000004.3279757577.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -V 3647897225:0.1,1614299449:0.1,3279757577:0.05 -S3357757885
./outputs/Xena/000004.2853129515.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -V 3647897225:0.1,1614299449:0.1,2853129515:0.05 -S3357757885
./outputs/Xena/000004.3747154981.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -V 3647897225:0.1,1614299449:0.1,3747154981:0.05 -S3357757885
./outputs/Xena/000004.2664260391.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -V 3647897225:0.1,1614299449:0.1,2664260391:0.05 -S3357757885
./outputs/Xena/000004.1642517170.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -V 3647897225:0.1,1614299449:0.1,1642517170:0.05 -S3357757885
./outputs/Xena/000004.2183375608.png: "prompt" -s50 -W512 -H512 -C7.5 -Ak_lms -V 3647897225:0.1,1614299449:0.1,2183375608:0.05 -S3357757885
~~~~
This produces six images, all slight variations on the combination of
the chosen two images. Here's the one I like best:
<img src="static/variation_walkthru/000004.3747154981.png">
As you can see, this is a very powerful too, which when combined with
subprompt weighting, gives you great control over the content and
quality of your generated images.

18
configs/models.yaml Normal file
View File

@ -0,0 +1,18 @@
# This file describes the alternative machine learning models
# available to the dream script.
#
# To add a new model, follow the examples below. Each
# model requires a model config file, a weights file,
# and the width and height of the images it
# was trained on.
laion400m:
config: configs/latent-diffusion/txt2img-1p4B-eval.yaml
weights: models/ldm/text2img-large/model.ckpt
width: 256
height: 256
stable-diffusion-1.4:
config: configs/stable-diffusion/v1-inference.yaml
weights: models/ldm/stable-diffusion-v1/model.ckpt
width: 512
height: 512

View File

@ -69,6 +69,11 @@ class PromptFormatter:
switches.append(f'-G{opt.gfpgan_strength}')
if opt.upscale:
switches.append(f'-U {" ".join([str(u) for u in opt.upscale])}')
if opt.variation_amount > 0:
switches.append(f'-v{opt.variation_amount}')
if opt.with_variations:
formatted_variations = ','.join(f'{seed}:{weight}' for seed, weight in opt.with_variations)
switches.append(f'-V{formatted_variations}')
if t2i.full_precision:
switches.append('-F')
return ' '.join(switches)

View File

@ -66,8 +66,8 @@ class KSampler(object):
img_callback(k_callback_values['x'], k_callback_values['i'])
sigmas = self.model.get_sigmas(S)
if x_T:
x = x_T
if x_T is not None:
x = x_T * sigmas[0]
else:
x = (
torch.randn([batch_size, *shape], device=self.device)

View File

@ -151,7 +151,7 @@ class T2I:
self.grid = grid
self.ddim_eta = ddim_eta
self.precision = precision
self.full_precision = full_precision
self.full_precision = True if choose_torch_device() == 'mps' else full_precision
self.strength = strength
self.embedding_path = embedding_path
self.device_type = device_type
@ -226,6 +226,8 @@ class T2I:
upscale = None,
sampler_name = None,
log_tokenization= False,
with_variations = None,
variation_amount = 0.0,
**args,
): # eat up additional cruft
"""
@ -244,6 +246,8 @@ class T2I:
ddim_eta // image randomness (eta=0.0 means the same seed always produces the same image)
step_callback // a function or method that will be called each step
image_callback // a function or method that will be called each time an image is generated
with_variations // a weighted list [(seed_1, weight_1), (seed_2, weight_2), ...] of variations which should be applied before doing any generation
variation_amount // optional 0-1 value to slerp from -S noise to random noise (allows variations on an image)
To use the step callback, define a function that receives two arguments:
- Image GPU data
@ -262,7 +266,6 @@ class T2I:
"""
# TODO: convert this into a getattr() loop
steps = steps or self.steps
seed = seed or self.seed
width = width or self.width
height = height or self.height
cfg_scale = cfg_scale or self.cfg_scale
@ -270,6 +273,7 @@ class T2I:
iterations = iterations or self.iterations
strength = strength or self.strength
self.log_tokenization = log_tokenization
with_variations = [] if with_variations is None else with_variations
model = (
self.load_model()
@ -278,7 +282,20 @@ class T2I:
assert (
0.0 <= strength <= 1.0
), 'can only work with strength in [0.0, 1.0]'
assert (
0.0 <= variation_amount <= 1.0
), '-v --variation_amount must be in [0.0, 1.0]'
if len(with_variations) > 0 or variation_amount > 1.0:
assert seed is not None,\
'seed must be specified when using with_variations'
if variation_amount == 0.0:
assert iterations == 1,\
'when using --with_variations, multiple iterations are only possible when using --variation_amount'
assert all(0 <= weight <= 1 for _, weight in with_variations),\
f'variation weights must be in [0.0, 1.0]: got {[weight for _, weight in with_variations]}'
seed = seed or self.seed
width, height, _ = self._resolution_check(width, height, log=True)
# TODO: - Check if this is still necessary to run on M1 devices.
@ -301,24 +318,26 @@ class T2I:
try:
if init_img:
assert os.path.exists(init_img), f'{init_img}: File not found'
images_iterator = self._img2img(
init_image = self._load_img(init_img, width, height, fit).to(self.device)
with scope(self.device.type):
init_latent = self.model.get_first_stage_encoding(
self.model.encode_first_stage(init_image)
) # move to latent space
print(f' DEBUG: seed at make_image time ={seed}')
make_image = self._img2img(
prompt,
precision_scope=scope,
steps=steps,
cfg_scale=cfg_scale,
ddim_eta=ddim_eta,
skip_normalize=skip_normalize,
init_img=init_img,
width=width,
height=height,
fit=fit,
init_latent=init_latent,
strength=strength,
callback=step_callback,
)
else:
images_iterator = self._txt2img(
make_image = self._txt2img(
prompt,
precision_scope=scope,
steps=steps,
cfg_scale=cfg_scale,
ddim_eta=ddim_eta,
@ -328,11 +347,38 @@ class T2I:
callback=step_callback,
)
initial_noise = None
if variation_amount > 0 or len(with_variations) > 0:
# use fixed initial noise plus random noise per iteration
seed_everything(seed)
initial_noise = self._get_noise(init_img,width,height)
for v_seed, v_weight in with_variations:
seed = v_seed
seed_everything(seed)
next_noise = self._get_noise(init_img,width,height)
initial_noise = self.slerp(v_weight, initial_noise, next_noise)
if variation_amount > 0:
random.seed() # reset RNG to an actually random state, so we can get a random seed for variations
seed = random.randrange(0,np.iinfo(np.uint32).max)
device_type = choose_autocast_device(self.device)
with scope(device_type), self.model.ema_scope():
for n in trange(iterations, desc='Generating'):
seed_everything(seed)
image = next(images_iterator)
x_T = None
if variation_amount > 0:
seed_everything(seed)
target_noise = self._get_noise(init_img,width,height)
x_T = self.slerp(variation_amount, initial_noise, target_noise)
elif initial_noise is not None:
# i.e. we specified particular variations
x_T = initial_noise
else:
seed_everything(seed)
if self.device.type == 'mps':
x_T = self._get_noise(init_img,width,height)
# make_image will do the equivalent of get_noise itself
print(f' DEBUG: seed at make_image() invocation time ={seed}')
image = make_image(x_T)
results.append([image, seed])
if image_callback is not None:
image_callback(image, seed)
@ -406,7 +452,6 @@ class T2I:
def _txt2img(
self,
prompt,
precision_scope,
steps,
cfg_scale,
ddim_eta,
@ -416,12 +461,13 @@ class T2I:
callback,
):
"""
An infinite iterator of images from the prompt.
Returns a function returning an image derived from the prompt and the initial image
Return value depends on the seed at the time you call it
"""
sampler = self.sampler
while True:
def make_image(x_T):
uc, c = self._get_uc_and_c(prompt, skip_normalize)
shape = [
self.latent_channels,
@ -431,6 +477,7 @@ class T2I:
samples, _ = sampler.sample(
batch_size=1,
S=steps,
x_T=x_T,
conditioning=c,
shape=shape,
verbose=False,
@ -439,26 +486,24 @@ class T2I:
eta=ddim_eta,
img_callback=callback
)
yield self._sample_to_image(samples)
return self._sample_to_image(samples)
return make_image
@torch.no_grad()
def _img2img(
self,
prompt,
precision_scope,
steps,
cfg_scale,
ddim_eta,
skip_normalize,
init_img,
width,
height,
fit,
init_latent,
strength,
callback, # Currently not implemented for img2img
):
"""
An infinite iterator of images from the prompt and the initial image
Returns a function returning an image derived from the prompt and the initial image
Return value depends on the seed at the time you call it
"""
# PLMS sampler not supported yet, so ignore previous sampler
@ -470,24 +515,20 @@ class T2I:
else:
sampler = self.sampler
init_image = self._load_img(init_img, width, height,fit).to(self.device)
with precision_scope(self.device.type):
init_latent = self.model.get_first_stage_encoding(
self.model.encode_first_stage(init_image)
) # move to latent space
sampler.make_schedule(
ddim_num_steps=steps, ddim_eta=ddim_eta, verbose=False
)
t_enc = int(strength * steps)
while True:
def make_image(x_T):
uc, c = self._get_uc_and_c(prompt, skip_normalize)
# encode (scaled latent)
z_enc = sampler.stochastic_encode(
init_latent, torch.tensor([t_enc]).to(self.device)
init_latent,
torch.tensor([t_enc]).to(self.device),
noise=x_T
)
# decode it
samples = sampler.decode(
@ -498,7 +539,8 @@ class T2I:
unconditional_guidance_scale=cfg_scale,
unconditional_conditioning=uc,
)
yield self._sample_to_image(samples)
return self._sample_to_image(samples)
return make_image
# TODO: does this actually need to run every loop? does anything in it vary by random seed?
def _get_uc_and_c(self, prompt, skip_normalize):
@ -513,8 +555,7 @@ class T2I:
# i dont know if this is correct.. but it works
c = torch.zeros_like(uc)
# normalize each "sub prompt" and add it
for i in range(0, len(weighted_subprompts)):
subprompt, weight = weighted_subprompts[i]
for subprompt, weight in weighted_subprompts:
self._log_tokenization(subprompt)
c = torch.add(
c,
@ -564,6 +605,27 @@ class T2I:
return self.model
# returns a tensor filled with random numbers from a normal distribution
def _get_noise(self,init_img,width,height):
if init_img:
if self.device.type == 'mps':
return torch.randn_like(init_latent, device='cpu').to(self.device)
else:
return torch.randn_like(init_latent, device=self.device)
else:
if self.device.type == 'mps':
return torch.randn([1,
self.latent_channels,
height // self.downsampling_factor,
width // self.downsampling_factor],
device='cpu').to(self.device)
else:
return torch.randn([1,
self.latent_channels,
height // self.downsampling_factor,
width // self.downsampling_factor],
device=self.device)
def _set_sampler(self):
msg = f'>> Setting Sampler to {self.sampler_name}'
if self.sampler_name == 'plms':
@ -619,7 +681,7 @@ class T2I:
print(
f'>> loaded input image of size {image.width}x{image.height} from {path}'
)
# The logic here is:
# 1. If "fit" is true, then the image will be fit into the bounding box defined
# by width and height. It will do this in a way that preserves the init image's
@ -644,7 +706,7 @@ class T2I:
if resize_needed:
return InitImageResizer(image).resize(x,y)
return image
def _fit_image(self,image,max_dimensions):
w,h = max_dimensions
@ -677,10 +739,10 @@ class T2I:
(?:\\\:|[^:])+ # match one or more non ':' characters or escaped colons '\:'
) # end 'prompt'
(?: # non-capture group
:+ # match one or more ':' characters
:+ # match one or more ':' characters
(?P<weight> # capture group for 'weight'
-?\d+(?:\.\d+)? # match positive or negative integer or decimal number
)? # end weight capture group, make optional
)? # end weight capture group, make optional
\s* # strip spaces after weight
| # OR
$ # else, if no ':' then match end of line
@ -741,3 +803,41 @@ class T2I:
print(">> This input is larger than your defaults. If you run out of memory, please use a smaller image.")
return width, height, resize_needed
def slerp(self, t, v0, v1, DOT_THRESHOLD=0.9995):
'''
Spherical linear interpolation
Args:
t (float/np.ndarray): Float value between 0.0 and 1.0
v0 (np.ndarray): Starting vector
v1 (np.ndarray): Final vector
DOT_THRESHOLD (float): Threshold for considering the two vectors as
colineal. Not recommended to alter this.
Returns:
v2 (np.ndarray): Interpolation vector between v0 and v1
'''
inputs_are_torch = False
if not isinstance(v0, np.ndarray):
inputs_are_torch = True
v0 = v0.detach().cpu().numpy()
if not isinstance(v1, np.ndarray):
inputs_are_torch = True
v1 = v1.detach().cpu().numpy()
dot = np.sum(v0 * v1 / (np.linalg.norm(v0) * np.linalg.norm(v1)))
if np.abs(dot) > DOT_THRESHOLD:
v2 = (1 - t) * v0 + t * v1
else:
theta_0 = np.arccos(dot)
sin_theta_0 = np.sin(theta_0)
theta_t = theta_0 * t
sin_theta_t = np.sin(theta_t)
s0 = np.sin(theta_0 - theta_t) / sin_theta_0
s1 = sin_theta_t / sin_theta_0
v2 = s0 * v0 + s1 * v1
if inputs_are_torch:
v2 = torch.from_numpy(v2).to(self.device)
return v2

View File

@ -4,12 +4,13 @@ huggingface-hub==0.8.1
imageio==2.9.0
imageio-ffmpeg==0.4.2
kornia==0.6.0
numpy==1.19.2
numpy==1.23.1
--pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu
omegaconf==2.1.1
opencv-python==4.1.2.30
opencv-python==4.6.0.66
pillow==9.2.0
pudb==2019.2
torch==1.11.0
torch==1.12.1
torchvision==0.12.0
pytorch-lightning==1.4.2
streamlit==1.12.0
@ -19,4 +20,4 @@ torchmetrics==0.6.0
transformers==4.19.2
-e git+https://github.com/openai/CLIP.git@main#egg=clip
-e git+https://github.com/CompVis/taming-transformers.git@master#egg=taming-transformers
-e git+https://github.com/lstein/k-diffusion.git@master#egg=k-diffusion
-e git+https://github.com/Birch-san/k-diffusion.git@mps#egg=k-diffusion

View File

@ -9,31 +9,33 @@ import sys
import copy
import warnings
import time
from ldm.dream.devices import choose_torch_device
import ldm.dream.readline
from ldm.dream.pngwriter import PngWriter, PromptFormatter
from ldm.dream.server import DreamServer, ThreadingDreamServer
from ldm.dream.image_util import make_grid
from omegaconf import OmegaConf
def main():
"""Initialize command-line parsers and the diffusion model"""
arg_parser = create_argv_parser()
opt = arg_parser.parse_args()
if opt.laion400m:
# defaults suitable to the older latent diffusion weights
width = 256
height = 256
config = 'configs/latent-diffusion/txt2img-1p4B-eval.yaml'
weights = 'models/ldm/text2img-large/model.ckpt'
else:
# some defaults suitable for stable diffusion weights
width = 512
height = 512
config = 'configs/stable-diffusion/v1-inference.yaml'
if '.ckpt' in opt.weights:
weights = opt.weights
else:
weights = f'models/ldm/stable-diffusion-v1/{opt.weights}.ckpt'
print('--laion400m flag has been deprecated. Please use --model laion400m instead.')
sys.exit(-1)
if opt.weights != 'model':
print('--weights argument has been deprecated. Please configure ./configs/models.yaml, and call it using --model instead.')
sys.exit(-1)
try:
models = OmegaConf.load(opt.config)
width = models[opt.model].width
height = models[opt.model].height
config = models[opt.model].config
weights = models[opt.model].weights
except (FileNotFoundError, IOError, KeyError) as e:
print(f'{e}. Aborting.')
sys.exit(-1)
print('* Initializing, be patient...\n')
sys.path.append('.')
@ -99,7 +101,7 @@ def main():
cmd_parser = create_cmd_parser()
if opt.web:
dream_server_loop(t2i)
dream_server_loop(t2i, opt.host, opt.port)
else:
main_loop(t2i, opt.outdir, opt.prompt_as_dir, cmd_parser, infile)
@ -181,9 +183,32 @@ def main_loop(t2i, outdir, prompt_as_dir, parser, infile):
print(f'No previous seed at position {opt.seed} found')
opt.seed = None
normalized_prompt = PromptFormatter(t2i, opt).normalize_prompt()
do_grid = opt.grid or t2i.grid
individual_images = not do_grid
if opt.with_variations is not None:
# shotgun parsing, woo
parts = []
broken = False # python doesn't have labeled loops...
for part in opt.with_variations.split(','):
seed_and_weight = part.split(':')
if len(seed_and_weight) != 2:
print(f'could not parse with_variation part "{part}"')
broken = True
break
try:
seed = int(seed_and_weight[0])
weight = float(seed_and_weight[1])
except ValueError:
print(f'could not parse with_variation part "{part}"')
broken = True
break
parts.append([seed, weight])
if broken:
continue
if len(parts) > 0:
opt.with_variations = parts
else:
opt.with_variations = None
if opt.outdir:
if not os.path.exists(opt.outdir):
@ -211,7 +236,7 @@ def main_loop(t2i, outdir, prompt_as_dir, parser, infile):
file_writer = PngWriter(current_outdir)
prefix = file_writer.unique_prefix()
seeds = set()
results = []
results = [] # list of filename, prompt pairs
grid_images = dict() # seed -> Image, only used if `do_grid`
def image_writer(image, seed, upscaled=False):
if do_grid:
@ -221,10 +246,26 @@ def main_loop(t2i, outdir, prompt_as_dir, parser, infile):
filename = f'{prefix}.{seed}.postprocessed.png'
else:
filename = f'{prefix}.{seed}.png'
path = file_writer.save_image_and_prompt_to_png(image, f'{normalized_prompt} -S{seed}', filename)
if opt.variation_amount > 0:
iter_opt = argparse.Namespace(**vars(opt)) # copy
this_variation = [[seed, opt.variation_amount]]
if opt.with_variations is None:
iter_opt.with_variations = this_variation
else:
iter_opt.with_variations = opt.with_variations + this_variation
iter_opt.variation_amount = 0
normalized_prompt = PromptFormatter(t2i, iter_opt).normalize_prompt()
metadata_prompt = f'{normalized_prompt} -S{iter_opt.seed}'
elif opt.with_variations is not None:
normalized_prompt = PromptFormatter(t2i, opt).normalize_prompt()
metadata_prompt = f'{normalized_prompt} -S{opt.seed}' # use the original seed - the per-iteration value is the last variation-seed
else:
normalized_prompt = PromptFormatter(t2i, opt).normalize_prompt()
metadata_prompt = f'{normalized_prompt} -S{seed}'
path = file_writer.save_image_and_prompt_to_png(image, metadata_prompt, filename)
if (not upscaled) or opt.save_original:
# only append to results if we didn't overwrite an earlier output
results.append([path, seed])
results.append([path, metadata_prompt])
seeds.add(seed)
@ -235,11 +276,12 @@ def main_loop(t2i, outdir, prompt_as_dir, parser, infile):
first_seed = next(iter(seeds))
filename = f'{prefix}.{first_seed}.png'
# TODO better metadata for grid images
metadata_prompt = f'{normalized_prompt} -S{first_seed}'
normalized_prompt = PromptFormatter(t2i, opt).normalize_prompt()
metadata_prompt = f'{normalized_prompt} -S{first_seed} --grid -N{len(grid_images)}'
path = file_writer.save_image_and_prompt_to_png(
grid_img, metadata_prompt, filename
)
results = [[path, seeds]]
results = [[path, metadata_prompt]]
last_seeds = list(seeds)
@ -253,7 +295,7 @@ def main_loop(t2i, outdir, prompt_as_dir, parser, infile):
print('Outputs:')
log_path = os.path.join(current_outdir, 'dream_log.txt')
write_log_message(normalized_prompt, results, log_path)
write_log_message(results, log_path)
print('goodbye!')
@ -270,7 +312,7 @@ def get_next_command(infile=None) -> str: #command string
print(f'#{command}')
return command
def dream_server_loop(t2i):
def dream_server_loop(t2i, host, port):
print('\n* --web was specified, starting web server...')
# Change working directory to the stable-diffusion directory
os.chdir(
@ -279,9 +321,13 @@ def dream_server_loop(t2i):
# Start server
DreamServer.model = t2i
dream_server = ThreadingDreamServer(("0.0.0.0", 9090))
print("\nStarted Stable Diffusion dream server!")
print("Point your browser at http://localhost:9090 or use the host's DNS name or IP address.")
dream_server = ThreadingDreamServer((host, port))
print(">> Started Stable Diffusion dream server!")
if host == '0.0.0.0':
print(f"Point your browser at http://localhost:{port} or use the host's DNS name or IP address.")
else:
print(">> Default host address now 127.0.0.1 (localhost). Use --host 0.0.0.0 to bind any address.")
print(f">> Point your browser at http://{host}:{port}.")
try:
dream_server.serve_forever()
@ -291,9 +337,9 @@ def dream_server_loop(t2i):
dream_server.server_close()
def write_log_message(prompt, results, log_path):
def write_log_message(results, log_path):
"""logs the name of the output image, prompt, and prompt args to the terminal and log file"""
log_lines = [f'{r[0]}: {prompt} -S{r[1]}\n' for r in results]
log_lines = [f'{path}: {prompt}\n' for path, prompt in results]
print(*log_lines, sep='')
with open(log_path, 'a', encoding='utf-8') as file:
@ -347,9 +393,7 @@ def create_argv_parser():
'--full_precision',
dest='full_precision',
action='store_true',
help='Use slower full precision math for calculations',
# MPS only functions with full precision, see https://github.com/lstein/stable-diffusion/issues/237
default=choose_torch_device() == 'mps',
help='Use more memory-intensive full precision math for calculations',
)
parser.add_argument(
'-g',
@ -417,6 +461,18 @@ def create_argv_parser():
action='store_true',
help='Start in web server mode.',
)
parser.add_argument(
'--host',
type=str,
default='127.0.0.1',
help='Web server: Host or IP to listen on. Set to 0.0.0.0 to accept traffic from other devices on your network.'
)
parser.add_argument(
'--port',
type=int,
default='9090',
help='Web server: Port to listen on'
)
parser.add_argument(
'--weights',
default='model',
@ -429,6 +485,16 @@ def create_argv_parser():
default='cuda',
help="device to run stable diffusion on. defaults to cuda `torch.cuda.current_device()` if available"
)
parser.add_argument(
'--model',
default='stable-diffusion-1.4',
help='Indicates which diffusion model to load. (currently "stable-diffusion-1.4" (default) or "laion400m")',
)
parser.add_argument(
'--config',
default ='configs/models.yaml',
help ='Path to configuration file for alternate models.',
)
return parser
@ -546,6 +612,20 @@ def create_cmd_parser():
action='store_true',
help='shows how the prompt is split into tokens'
)
parser.add_argument(
'-v',
'--variation_amount',
default=0.0,
type=float,
help='If > 0, generates variations on the initial seed instead of random seeds per iteration. Must be between 0 and 1. Higher values will be more different.'
)
parser.add_argument(
'-V',
'--with_variations',
default=None,
type=str,
help='list of variations to apply, in the format `seed:weight,seed:weight,...'
)
return parser

View File

@ -44,7 +44,7 @@ div {
}
#results {
text-align: center;
max-width: 1000px;
// max-width: 1024px;
margin: auto;
padding-top: 10px;
}
@ -64,7 +64,7 @@ input[type="number"] {
width: 150px;
}
hr {
width: 200px;
// width: 200px;
}
label {
white-space: nowrap;

Binary file not shown.

After

Width:  |  Height:  |  Size: 429 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 445 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 426 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 427 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 424 KiB