mirror of
https://github.com/invoke-ai/InvokeAI
synced 2024-08-30 20:32:17 +00:00
0881d429f2
Authored-by: @blessedcoolant Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com>
58 lines
2.7 KiB
Markdown
58 lines
2.7 KiB
Markdown
# **Personalizing Text-to-Image Generation**
|
|
|
|
You may personalize the generated images to provide your own styles or objects by training a new LDM checkpoint and introducing a new vocabulary to the fixed model.
|
|
|
|
To train, prepare a folder that contains images sized at 512x512 and execute the following:
|
|
|
|
**WINDOWS**: As the default backend is not available on Windows, if you're using that platform, set the environment variable `PL_TORCH_DISTRIBUTED_BACKEND=gloo`
|
|
|
|
```
|
|
(ldm) ~/stable-diffusion$ python3 ./main.py --base ./configs/stable-diffusion/v1-finetune.yaml \
|
|
-t \
|
|
--actual_resume ./models/ldm/stable-diffusion-v1/model.ckpt \
|
|
-n my_cat \
|
|
--gpus 0, \
|
|
--data_root D:/textual-inversion/my_cat \
|
|
--init_word 'cat'
|
|
```
|
|
|
|
During the training process, files will be created in /logs/[project][time][project]/
|
|
where you can see the process.
|
|
|
|
Conditioning contains the training prompts
|
|
inputs, reconstruction the input images for the training epoch samples, samples scaled for a sample of the prompt and one with the init word provided.
|
|
|
|
On a RTX3090, the process for SD will take ~1h @1.6 iterations/sec.
|
|
|
|
_Note_: According to the associated paper, the optimal number of images is 3-5. Your model may not converge if you use more images than that.
|
|
|
|
Training will run indefinately, but you may wish to stop it before the heat death of the universe, when you find a low loss epoch or around ~5000 iterations.
|
|
|
|
Once the model is trained, specify the trained .pt file when starting dream using
|
|
|
|
```
|
|
(ldm) ~/stable-diffusion$ python3 ./scripts/dream.py --embedding_path /path/to/embedding.pt --full_precision
|
|
```
|
|
|
|
Then, to utilize your subject at the dream prompt
|
|
|
|
```
|
|
dream> "a photo of *"
|
|
```
|
|
|
|
This also works with image2image
|
|
|
|
```
|
|
dream> "waterfall and rainbow in the style of *" --init_img=./init-images/crude_drawing.png --strength=0.5 -s100 -n4
|
|
```
|
|
|
|
It's also possible to train multiple token (modify the placeholder string in `configs/stable-diffusion/v1-finetune.yaml`) and combine LDM checkpoints using:
|
|
|
|
```
|
|
(ldm) ~/stable-diffusion$ python3 ./scripts/merge_embeddings.py \
|
|
--manager_ckpts /path/to/first/embedding.pt /path/to/second/embedding.pt [...] \
|
|
--output_path /path/to/output/embedding.pt
|
|
```
|
|
|
|
Credit goes to rinongal and the repository located at https://github.com/rinongal/textual_inversion Please see the repository and associated paper for details and limitations.
|