InvokeAI/docs/features/TEXTUAL_INVERSION.md
blessedcoolant 0881d429f2
Docs Update (#466)
Authored-by: @blessedcoolant 
Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com>
2022-09-11 11:52:43 -04:00

2.7 KiB

Personalizing Text-to-Image Generation

You may personalize the generated images to provide your own styles or objects by training a new LDM checkpoint and introducing a new vocabulary to the fixed model.

To train, prepare a folder that contains images sized at 512x512 and execute the following:

WINDOWS: As the default backend is not available on Windows, if you're using that platform, set the environment variable PL_TORCH_DISTRIBUTED_BACKEND=gloo

(ldm) ~/stable-diffusion$ python3 ./main.py --base ./configs/stable-diffusion/v1-finetune.yaml \
                                            -t \
                                            --actual_resume ./models/ldm/stable-diffusion-v1/model.ckpt \
                                            -n my_cat \
                                            --gpus 0, \
                                            --data_root D:/textual-inversion/my_cat \
                                            --init_word 'cat'

During the training process, files will be created in /logs/[project][time][project]/ where you can see the process.

Conditioning contains the training prompts inputs, reconstruction the input images for the training epoch samples, samples scaled for a sample of the prompt and one with the init word provided.

On a RTX3090, the process for SD will take ~1h @1.6 iterations/sec.

Note: According to the associated paper, the optimal number of images is 3-5. Your model may not converge if you use more images than that.

Training will run indefinately, but you may wish to stop it before the heat death of the universe, when you find a low loss epoch or around ~5000 iterations.

Once the model is trained, specify the trained .pt file when starting dream using

(ldm) ~/stable-diffusion$ python3 ./scripts/dream.py --embedding_path /path/to/embedding.pt --full_precision

Then, to utilize your subject at the dream prompt

dream> "a photo of *"

This also works with image2image

dream> "waterfall and rainbow in the style of *" --init_img=./init-images/crude_drawing.png --strength=0.5 -s100 -n4

It's also possible to train multiple token (modify the placeholder string in configs/stable-diffusion/v1-finetune.yaml) and combine LDM checkpoints using:

(ldm) ~/stable-diffusion$ python3 ./scripts/merge_embeddings.py \
                                            --manager_ckpts /path/to/first/embedding.pt /path/to/second/embedding.pt [...] \
                                            --output_path /path/to/output/embedding.pt

Credit goes to rinongal and the repository located at https://github.com/rinongal/textual_inversion Please see the repository and associated paper for details and limitations.