add documentation and minor bug fixes

- Added new documentation for textual inversion training process
- Move `main.py` into the deprecated scripts folder
- Fix bug in `textual_inversion.py` which was causing it to not load
  the globals module correctly.
- Sort models alphabetically in console front end
- Only show diffusers models in console front end
This commit is contained in:
Lincoln Stein 2023-01-20 16:55:50 -05:00
parent 195294e74f
commit 080fc4b380
6 changed files with 239 additions and 63 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 124 KiB

View File

@ -10,83 +10,259 @@ You may personalize the generated images to provide your own styles or objects
by training a new LDM checkpoint and introducing a new vocabulary to the fixed by training a new LDM checkpoint and introducing a new vocabulary to the fixed
model as a (.pt) embeddings file. Alternatively, you may use or train model as a (.pt) embeddings file. Alternatively, you may use or train
HuggingFace Concepts embeddings files (.bin) from HuggingFace Concepts embeddings files (.bin) from
<https://huggingface.co/sd-concepts-library> and its associated notebooks. <https://huggingface.co/sd-concepts-library> and its associated
notebooks.
## **Training** ## **Hardware and Software Requirements**
To train, prepare a folder that contains images sized at 512x512 and execute the You will need a GPU to perform training in a reasonable length of
following: time, and at least 12 GB of VRAM. We recommend using the [`xformers`
library](../installation/070_INSTALL_XFORMERS) to accelerate the
training process further. During training, about ~8 GB is temporarily
needed in order to store intermediate models, checkpoints and logs.
### WINDOWS ## **Preparing for Training**
As the default backend is not available on Windows, if you're using that To train, prepare a folder that contains 3-5 images that illustrate
platform, set the environment variable `PL_TORCH_DISTRIBUTED_BACKEND` to `gloo` the object or concept. It is good to provide a variety of examples or
poses to avoid overtraining the system. Format these images as PNG
(preferred) or JPG. You do not need to resize or crop the images in
advance, but for more control you may wish to do so.
```bash Place the training images in a directory on the machine InvokeAI runs
python3 ./main.py -t \ on. We recommend placing them in a subdirectory of the
--base ./configs/stable-diffusion/v1-finetune.yaml \ `text-inversion-training-data` folder located in the InvokeAI root
--actual_resume ./models/ldm/stable-diffusion-v1/model.ckpt \ directory, ordinarily `~/invokeai` (Linux/Mac), or
-n my_cat \ `C:\Users\your_name\invokeai` (Windows). For example, to create an
--gpus 0 \ embedding for the "psychedelic" style, you'd place the training images
--data_root D:/textual-inversion/my_cat \ into the directory
--init_word 'cat' `~invokeai/text-inversion-training-data/psychedelic`.
## **Launching Training Using the Console Front End**
InvokeAI 2.3 and higher comes with a text console-based training front
end. From within the `invoke.sh`/`invoke.bat` Invoke launcher script,
start the front end by selecting choice (3):
```sh
Do you want to generate images using the
1. command-line
2. browser-based UI
3. textual inversion training
4. open the developer console
Please enter 1, 2, 3, or 4: [1] 3
``` ```
During the training process, files will be created in From the command line, with the InvokeAI virtual environment active,
`/logs/[project][time][project]/` where you can see the process. you can launch the front end with the command
`textual_inversion_fe`.
Conditioning contains the training prompts inputs, reconstruction the input This will launch a text-based front end that will look like this:
images for the training epoch samples, samples scaled for a sample of the prompt
and one with the init word provided.
On a RTX3090, the process for SD will take ~1h @1.6 iterations/sec. <figure markdown>
![ti-frontend](../assets/textual-inversion/ti-frontend.png)
</figure>
!!! note The interface is keyboard-based. Move from field to field using
control-N (^N) to move to the next field and control-P (^P) to the
previous one. <Tab> and <shift-TAB> work as well. Once a field is
active, use the cursor keys. In a checkbox group, use the up and down
cursor keys to move from choice to choice, and <space> to select a
choice. In a scrollbar, use the left and right cursor keys to increase
and decrease the value of the scroll. In textfields, type the desired
values.
According to the associated paper, the optimal number of The number of parameters may look intimidating, but in most cases the
images is 3-5. Your model may not converge if you use more images than predefined defaults work fine. The red circled fields in the above
that. illustration are the ones you will adjust most frequently.
Training will run indefinitely, but you may wish to stop it (with ctrl-c) before ### Model Name
the heat death of the universe, when you find a low loss epoch or around ~5000
iterations. Note that you can set a fixed limit on the number of training steps
by decreasing the "max_steps" option in
configs/stable_diffusion/v1-finetune.yaml (currently set to 4000000)
## **Run the Model** This will list all the diffusers models that are currently
installed. Select the one you wish to use as the basis for your
embedding. Be aware that if you use a SD-1.X-based model for your
training, you will only be able to use this embedding with other
SD-1.X-based models. Similarly, if you train on SD-2.X, you will only
be able to use the embeddings with models based on SD-2.X.
Once the model is trained, specify the trained .pt or .bin file when starting ### Trigger Term
invoke using
```bash This is the prompt term you will use to trigger the embedding. Type a
python3 ./scripts/invoke.py \ single word or phrase you wish to use as the trigger, example
--embedding_path /path/to/embedding.pt "psychedelic" (without angle brackets). Within InvokeAI, you will then
be able to activate the trigger using the syntax `<psychedelic>`.
### Initializer
This is a single character that is used internally during the training
process as a placeholder for the trigger term. It defaults to "*" and
can usually be left alone.
### Resume from last saved checkpoint
As training proceeds, textual inversion will write a series of
intermediate files that can be used to resume training from where it
was left off in the case of an interruption. This checkbox will be
automatically selected if you provide a previously used trigger term
and at least one checkpoint file is found on disk.
Note that as of 20 January 2023, resume does not seem to be working
properly due to an issue with the upstream code.
### Data Training Directory
This is the location of the images to be used for training. When you
select a trigger term like "my-trigger", the frontend will prepopulate
this field with `~/invokeai/text-inversion-training-data/my-trigger`,
but you can change the path to wherever you want.
### Output Destination Directory
This is the location of the logs, checkpoint files, and embedding
files created during training. When you select a trigger term like
"my-trigger", the frontend will prepopulate this field with
`~/invokeai/text-inversion-output/my-trigger`, but you can change the
path to wherever you want.
### Image resolution
The images in the training directory will be automatically scaled to
the value you use here. For best results, you will want to use the
same default resolution of the underlying model (512 pixels for
SD-1.5, 768 for the larger version of SD-2.1).
### Center crop images
If this is selected, your images will be center cropped to make them
square before resizing them to the desired resolution. Center cropping
can indiscriminately cut off the top of subjects' heads for portrait
aspect images, so if you have images like this, you may wish to use a
photoeditor to manually crop them to a square aspect ratio.
### Mixed precision
Select the floating point precision for the embedding. "no" will
result in a full 32-bit precision, "fp16" will provide 16-bit
precision, and "bf16" will provide mixed precision (only available
when XFormers is used).
### Max training steps
How many steps the training will take before the model converges. Most
training sets will converge with 2000-3000 steps.
### Batch size
This adjusts how many training images are processed simultaneously in
each step. Higher values will cause the training process to run more
quickly, but use more memory. The default size will run with GPUs with
as little as 12 GB.
### Learning rate
The rate at which the system adjusts its internal weights during
training. Higher values risk overtraining (getting the same image each
time), and lower values will take more steps to train a good
model. The default of 0.0005 is conservative; you may wish to increase
it to 0.005 to speed up training.
### Scale learning rate by number of GPUs, steps and batch size
If this is selected (the default) the system will adjust the provided
learning rate to improve performance.
### Use xformers acceleration
This will activate XFormers memory-efficient attention. You need to
have XFormers installed for this to have an effect.
### Learning rate scheduler
This adjusts how the learning rate changes over the course of
training. The default "constant" means to use a constant learning rate
for the entire training session. The other values scale the learning
rate according to various formulas.
Only "constant" is supported by the XFormers library.
### Gradient accumulation steps
This is a parameter that allows you to use bigger batch sizes than
your GPU's VRAM would ordinarily accommodate, at the cost of some
performance.
### Warmup steps
If "constant_with_warmup" is selected in the learning rate scheduler,
then this provides the number of warmup steps. Warmup steps have a
very low learning rate, and are one way of preventing early
overtraining.
## The training run
Start the training run by advancing to the OK button (bottom right)
and pressing <enter>. A series of progress messages will be displayed
as the training process proceeds. This may take an hour or two,
depending on settings and the speed of your system. Various log and
checkpoint files will be written into the output directory (ordinarily
`~/invokeai/text-inversion-output/my-model/`)
At the end of successful training, the system will copy the file
`learned_embeds.bin` into the InvokeAI root directory's `embeddings`
directory, using a subdirectory named after the trigger token. For
example, if the trigger token was `psychedelic`, then look for the
embeddings file in
`~/invokeai/embeddings/psychedelic/learned_embeds.bin`
You may now launch InvokeAI and try out a prompt that uses the trigger
term. For example `a plate of banana sushi in <psychedelic> style`.
## **Training with the Command-Line Script**
InvokeAI also comes with a traditional command-line script for
launching textual inversion training. It is named
`textual_inversion`, and can be launched from within the
"developer's console", or from the command line after activating
InvokeAI's virtual environment.
It accepts a large number of arguments, which can be summarized by
passing the `--help` argument:
```sh
textual_inversion --help
``` ```
Then, to utilize your subject at the invoke prompt Typical usage is shown here:
```sh
```bash python textual_inversion.py \
invoke> "a photo of *" --model=stable-diffusion-1.5 \
--resolution=512 \
--learnable_property=style \
--initializer_token='*' \
--placeholder_token='<psychedelic>' \
--train_data_dir=/home/lstein/invokeai/training-data/psychedelic \
--output_dir=/home/lstein/invokeai/text-inversion-training/psychedelic \
--scale_lr \
--train_batch_size=8 \
--gradient_accumulation_steps=4 \
--max_train_steps=3000 \
--learning_rate=0.0005 \
--resume_from_checkpoint=latest \
--lr_scheduler=constant \
--mixed_precision=fp16 \
--only_save_embeds
``` ```
This also works with image2image ## Reading
```bash For more information on textual inversion, please see the following
invoke> "waterfall and rainbow in the style of *" --init_img=./init-images/crude_drawing.png --strength=0.5 -s100 -n4 resources:
```
For .pt files it's also possible to train multiple tokens (modify the * The [textual inversion repository](https://github.com/rinongal/textual_inversion) and
placeholder string in `configs/stable-diffusion/v1-finetune.yaml`) and combine
LDM checkpoints using:
```bash
python3 ./scripts/merge_embeddings.py \
--manager_ckpts /path/to/first/embedding.pt \
[</path/to/second/embedding.pt>,[...]] \
--output_path /path/to/output/embedding.pt
```
Credit goes to rinongal and the repository
Please see [the repository](https://github.com/rinongal/textual_inversion) and
associated paper for details and limitations. associated paper for details and limitations.
* [HuggingFace's textual inversion training
page](https://huggingface.co/docs/diffusers/training/text_inversion)
---
copyright (c) 2023, Lincoln Stein and the InvokeAI Development Team

View File

@ -746,7 +746,7 @@ def initialize_rootdir(root:str,yes_to_all:bool=False):
safety_checker = '--nsfw_checker' if enable_safety_checker else '--no-nsfw_checker' safety_checker = '--nsfw_checker' if enable_safety_checker else '--no-nsfw_checker'
for name in ('models','configs','embeddings'): for name in ('models','configs','embeddings','text-inversion-data','text-inversion-training-data'):
os.makedirs(os.path.join(root,name), exist_ok=True) os.makedirs(os.path.join(root,name), exist_ok=True)
for src in (['configs']): for src in (['configs']):
dest = os.path.join(root,src) dest = os.path.join(root,src)

View File

@ -1,11 +1,11 @@
#!/usr/bin/env python #!/usr/bin/env python
# Copyright 2023, Lincoln Stein @lstein # Copyright 2023, Lincoln Stein @lstein
from ldm.invoke.globals import Globals, set_root from ldm.invoke.globals import Globals, global_set_root
from ldm.invoke.textual_inversion_training import parse_args, do_textual_inversion_training from ldm.invoke.textual_inversion_training import parse_args, do_textual_inversion_training
if __name__ == "__main__": if __name__ == "__main__":
args = parse_args() args = parse_args()
set_root(args.root_dir or Globals.root) global_set_root(args.root_dir or Globals.root)
kwargs = vars(args) kwargs = vars(args)
do_textual_inversion_training(**kwargs) do_textual_inversion_training(**kwargs)

View File

@ -13,8 +13,8 @@ from pathlib import Path
from typing import List from typing import List
import argparse import argparse
TRAINING_DATA = 'training-data' TRAINING_DATA = 'text-inversion-training-data'
TRAINING_DIR = 'text-inversion-training' TRAINING_DIR = 'text-inversion-output'
CONF_FILE = 'preferences.conf' CONF_FILE = 'preferences.conf'
class textualInversionForm(npyscreen.FormMultiPageAction): class textualInversionForm(npyscreen.FormMultiPageAction):
@ -219,7 +219,7 @@ class textualInversionForm(npyscreen.FormMultiPageAction):
def get_model_names(self)->(List[str],int): def get_model_names(self)->(List[str],int):
conf = OmegaConf.load(os.path.join(Globals.root,'configs/models.yaml')) conf = OmegaConf.load(os.path.join(Globals.root,'configs/models.yaml'))
model_names = sorted(list(conf.keys())) model_names = [idx for idx in sorted(list(conf.keys())) if conf[idx].get('format',None)=='diffusers']
defaults = [idx for idx in range(len(model_names)) if 'default' in conf[model_names[idx]]] defaults = [idx for idx in range(len(model_names)) if 'default' in conf[model_names[idx]]]
return (model_names,defaults[0]) return (model_names,defaults[0])