InvokeAI/docs/features/TEXTUAL_INVERSION.md

# **Personalizing Text-to-Image Generation**

You may personalize the generated images to provide your own styles or objects by training a new LDM checkpoint and introducing a new vocabulary to the fixed model as a (.pt) embeddings file. Alternatively, you may use or train HuggingFace Concepts embeddings files (.bin) from https://huggingface.co/sd-concepts-library and its associated notebooks.

**Training**

To train, prepare a folder that contains images sized at 512x512 and execute the following:

**WINDOWS**: As the default backend is not available on Windows, if you're using that platform, set the environment variable `PL_TORCH_DISTRIBUTED_BACKEND=gloo`

```
(ldm) ~/stable-diffusion$ python3 ./main.py --base ./configs/stable-diffusion/v1-finetune.yaml \
                                            -t \
                                            --actual_resume ./models/ldm/stable-diffusion-v1/model.ckpt \
                                            -n my_cat \
                                            --gpus 0, \
                                            --data_root D:/textual-inversion/my_cat \
                                            --init_word 'cat'
```

During the training process, files will be created in
/logs/[project][time][project]/ where you can see the process.

Conditioning contains the training prompts inputs, reconstruction the
input images for the training epoch samples, samples scaled for a
sample of the prompt and one with the init word provided.

On a RTX3090, the process for SD will take ~1h @1.6 iterations/sec.

_Note_: According to the associated paper, the optimal number of
images is 3-5. Your model may not converge if you use more images than
that.

Training will run indefinitely, but you may wish to stop it (with
ctrl-c) before the heat death of the universe, when you find a low
loss epoch or around ~5000 iterations. Note that you can set a fixed
limit on the number of training steps by decreasing the "max_steps"
option in configs/stable_diffusion/v1-finetune.yaml (currently set to
4000000)

**Running**

Once the model is trained, specify the trained .pt or .bin file when
starting dream using

```
(ldm) ~/stable-diffusion$ python3 ./scripts/dream.py --embedding_path /path/to/embedding.pt --full_precision
```

Then, to utilize your subject at the dream prompt

```
dream> "a photo of *"
```

This also works with image2image

```
dream> "waterfall and rainbow in the style of *" --init_img=./init-images/crude_drawing.png --strength=0.5 -s100 -n4
```

For .pt files it's also possible to train multiple tokens (modify the placeholder string in `configs/stable-diffusion/v1-finetune.yaml`) and combine LDM checkpoints using:

```
(ldm) ~/stable-diffusion$ python3 ./scripts/merge_embeddings.py \
                                            --manager_ckpts /path/to/first/embedding.pt /path/to/second/embedding.pt [...] \
                                            --output_path /path/to/output/embedding.pt
```

Credit goes to rinongal and the repository located at https://github.com/rinongal/textual_inversion Please see the repository and associated paper for details and limitations.
Docs Update (#466) Authored-by: @blessedcoolant Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com> 2022-09-11 15:52:43 +00:00			`# Personalizing Text-to-Image Generation`

Adding support for .bin files from huggingface concepts (#498) * Adding support for .bin files from huggingface concepts * Updating documentation to include huggingface .bin info 2022-09-11 19:44:26 +00:00			`You may personalize the generated images to provide your own styles or objects by training a new LDM checkpoint and introducing a new vocabulary to the fixed model as a (.pt) embeddings file. Alternatively, you may use or train HuggingFace Concepts embeddings files (.bin) from https://huggingface.co/sd-concepts-library and its associated notebooks.`

			`Training`
Docs Update (#466) Authored-by: @blessedcoolant Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com> 2022-09-11 15:52:43 +00:00
			`To train, prepare a folder that contains images sized at 512x512 and execute the following:`

			WINDOWS: As the default backend is not available on Windows, if you're using that platform, set the environment variable `PL_TORCH_DISTRIBUTED_BACKEND=gloo`

			```
			`(ldm) ~/stable-diffusion$ python3 ./main.py --base ./configs/stable-diffusion/v1-finetune.yaml \`
			`-t \`
			`--actual_resume ./models/ldm/stable-diffusion-v1/model.ckpt \`
			`-n my_cat \`
			`--gpus 0, \`
			`--data_root D:/textual-inversion/my_cat \`
			`--init_word 'cat'`
			```

upped max_steps in v1-finetune.yaml and fixed TI docs to address #493 2022-09-11 20:20:14 +00:00			`During the training process, files will be created in`
			`/logs/[project][time][project]/ where you can see the process.`
Docs Update (#466) Authored-by: @blessedcoolant Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com> 2022-09-11 15:52:43 +00:00
upped max_steps in v1-finetune.yaml and fixed TI docs to address #493 2022-09-11 20:20:14 +00:00			`Conditioning contains the training prompts inputs, reconstruction the`
			`input images for the training epoch samples, samples scaled for a`
			`sample of the prompt and one with the init word provided.`
Docs Update (#466) Authored-by: @blessedcoolant Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com> 2022-09-11 15:52:43 +00:00
			`On a RTX3090, the process for SD will take ~1h @1.6 iterations/sec.`

upped max_steps in v1-finetune.yaml and fixed TI docs to address #493 2022-09-11 20:20:14 +00:00			`_Note_: According to the associated paper, the optimal number of`
			`images is 3-5. Your model may not converge if you use more images than`
			`that.`
Docs Update (#466) Authored-by: @blessedcoolant Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com> 2022-09-11 15:52:43 +00:00
upped max_steps in v1-finetune.yaml and fixed TI docs to address #493 2022-09-11 20:20:14 +00:00			`Training will run indefinitely, but you may wish to stop it (with`
			`ctrl-c) before the heat death of the universe, when you find a low`
			`loss epoch or around ~5000 iterations. Note that you can set a fixed`
			`limit on the number of training steps by decreasing the "max_steps"`
			`option in configs/stable_diffusion/v1-finetune.yaml (currently set to`
			`4000000)`
Adding support for .bin files from huggingface concepts (#498) * Adding support for .bin files from huggingface concepts * Updating documentation to include huggingface .bin info 2022-09-11 19:44:26 +00:00
			`Running`
Docs Update (#466) Authored-by: @blessedcoolant Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com> 2022-09-11 15:52:43 +00:00
upped max_steps in v1-finetune.yaml and fixed TI docs to address #493 2022-09-11 20:20:14 +00:00			`Once the model is trained, specify the trained .pt or .bin file when`
			`starting dream using`
Docs Update (#466) Authored-by: @blessedcoolant Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com> 2022-09-11 15:52:43 +00:00
			```
			`(ldm) ~/stable-diffusion$ python3 ./scripts/dream.py --embedding_path /path/to/embedding.pt --full_precision`
			```

			`Then, to utilize your subject at the dream prompt`

			```
			`dream> "a photo of *"`
			```

			`This also works with image2image`

			```
			`dream> "waterfall and rainbow in the style of *" --init_img=./init-images/crude_drawing.png --strength=0.5 -s100 -n4`
			```

Adding support for .bin files from huggingface concepts (#498) * Adding support for .bin files from huggingface concepts * Updating documentation to include huggingface .bin info 2022-09-11 19:44:26 +00:00			For .pt files it's also possible to train multiple tokens (modify the placeholder string in `configs/stable-diffusion/v1-finetune.yaml`) and combine LDM checkpoints using:
Docs Update (#466) Authored-by: @blessedcoolant Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com> 2022-09-11 15:52:43 +00:00
			```
			`(ldm) ~/stable-diffusion$ python3 ./scripts/merge_embeddings.py \`
			`--manager_ckpts /path/to/first/embedding.pt /path/to/second/embedding.pt [...] \`
			`--output_path /path/to/output/embedding.pt`
			```

			`Credit goes to rinongal and the repository located at https://github.com/rinongal/textual_inversion Please see the repository and associated paper for details and limitations.`