mirror of
https://github.com/invoke-ai/InvokeAI
synced 2024-08-30 20:32:17 +00:00
Merge branch 'main' into feat/import-with-vae
This commit is contained in:
commit
079ec4cb5c
BIN
docs/assets/textual-inversion/ti-frontend.png
Normal file
BIN
docs/assets/textual-inversion/ti-frontend.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 124 KiB |
@ -10,83 +10,263 @@ You may personalize the generated images to provide your own styles or objects
|
|||||||
by training a new LDM checkpoint and introducing a new vocabulary to the fixed
|
by training a new LDM checkpoint and introducing a new vocabulary to the fixed
|
||||||
model as a (.pt) embeddings file. Alternatively, you may use or train
|
model as a (.pt) embeddings file. Alternatively, you may use or train
|
||||||
HuggingFace Concepts embeddings files (.bin) from
|
HuggingFace Concepts embeddings files (.bin) from
|
||||||
<https://huggingface.co/sd-concepts-library> and its associated notebooks.
|
<https://huggingface.co/sd-concepts-library> and its associated
|
||||||
|
notebooks.
|
||||||
|
|
||||||
## **Training**
|
## **Hardware and Software Requirements**
|
||||||
|
|
||||||
To train, prepare a folder that contains images sized at 512x512 and execute the
|
You will need a GPU to perform training in a reasonable length of
|
||||||
following:
|
time, and at least 12 GB of VRAM. We recommend using the [`xformers`
|
||||||
|
library](../installation/070_INSTALL_XFORMERS) to accelerate the
|
||||||
|
training process further. During training, about ~8 GB is temporarily
|
||||||
|
needed in order to store intermediate models, checkpoints and logs.
|
||||||
|
|
||||||
### WINDOWS
|
## **Preparing for Training**
|
||||||
|
|
||||||
As the default backend is not available on Windows, if you're using that
|
To train, prepare a folder that contains 3-5 images that illustrate
|
||||||
platform, set the environment variable `PL_TORCH_DISTRIBUTED_BACKEND` to `gloo`
|
the object or concept. It is good to provide a variety of examples or
|
||||||
|
poses to avoid overtraining the system. Format these images as PNG
|
||||||
|
(preferred) or JPG. You do not need to resize or crop the images in
|
||||||
|
advance, but for more control you may wish to do so.
|
||||||
|
|
||||||
```bash
|
Place the training images in a directory on the machine InvokeAI runs
|
||||||
python3 ./main.py -t \
|
on. We recommend placing them in a subdirectory of the
|
||||||
--base ./configs/stable-diffusion/v1-finetune.yaml \
|
`text-inversion-training-data` folder located in the InvokeAI root
|
||||||
--actual_resume ./models/ldm/stable-diffusion-v1/model.ckpt \
|
directory, ordinarily `~/invokeai` (Linux/Mac), or
|
||||||
-n my_cat \
|
`C:\Users\your_name\invokeai` (Windows). For example, to create an
|
||||||
--gpus 0 \
|
embedding for the "psychedelic" style, you'd place the training images
|
||||||
--data_root D:/textual-inversion/my_cat \
|
into the directory
|
||||||
--init_word 'cat'
|
`~invokeai/text-inversion-training-data/psychedelic`.
|
||||||
|
|
||||||
|
## **Launching Training Using the Console Front End**
|
||||||
|
|
||||||
|
InvokeAI 2.3 and higher comes with a text console-based training front
|
||||||
|
end. From within the `invoke.sh`/`invoke.bat` Invoke launcher script,
|
||||||
|
start the front end by selecting choice (3):
|
||||||
|
|
||||||
|
```sh
|
||||||
|
Do you want to generate images using the
|
||||||
|
1. command-line
|
||||||
|
2. browser-based UI
|
||||||
|
3. textual inversion training
|
||||||
|
4. open the developer console
|
||||||
|
Please enter 1, 2, 3, or 4: [1] 3
|
||||||
```
|
```
|
||||||
|
|
||||||
During the training process, files will be created in
|
From the command line, with the InvokeAI virtual environment active,
|
||||||
`/logs/[project][time][project]/` where you can see the process.
|
you can launch the front end with the command
|
||||||
|
`textual_inversion_fe`.
|
||||||
|
|
||||||
Conditioning contains the training prompts inputs, reconstruction the input
|
This will launch a text-based front end that will look like this:
|
||||||
images for the training epoch samples, samples scaled for a sample of the prompt
|
|
||||||
and one with the init word provided.
|
|
||||||
|
|
||||||
On a RTX3090, the process for SD will take ~1h @1.6 iterations/sec.
|
<figure markdown>
|
||||||
|
![ti-frontend](../assets/textual-inversion/ti-frontend.png)
|
||||||
|
</figure>
|
||||||
|
|
||||||
!!! note
|
The interface is keyboard-based. Move from field to field using
|
||||||
|
control-N (^N) to move to the next field and control-P (^P) to the
|
||||||
|
previous one. <Tab> and <shift-TAB> work as well. Once a field is
|
||||||
|
active, use the cursor keys. In a checkbox group, use the up and down
|
||||||
|
cursor keys to move from choice to choice, and <space> to select a
|
||||||
|
choice. In a scrollbar, use the left and right cursor keys to increase
|
||||||
|
and decrease the value of the scroll. In textfields, type the desired
|
||||||
|
values.
|
||||||
|
|
||||||
According to the associated paper, the optimal number of
|
The number of parameters may look intimidating, but in most cases the
|
||||||
images is 3-5. Your model may not converge if you use more images than
|
predefined defaults work fine. The red circled fields in the above
|
||||||
that.
|
illustration are the ones you will adjust most frequently.
|
||||||
|
|
||||||
Training will run indefinitely, but you may wish to stop it (with ctrl-c) before
|
### Model Name
|
||||||
the heat death of the universe, when you find a low loss epoch or around ~5000
|
|
||||||
iterations. Note that you can set a fixed limit on the number of training steps
|
|
||||||
by decreasing the "max_steps" option in
|
|
||||||
configs/stable_diffusion/v1-finetune.yaml (currently set to 4000000)
|
|
||||||
|
|
||||||
## **Run the Model**
|
This will list all the diffusers models that are currently
|
||||||
|
installed. Select the one you wish to use as the basis for your
|
||||||
|
embedding. Be aware that if you use a SD-1.X-based model for your
|
||||||
|
training, you will only be able to use this embedding with other
|
||||||
|
SD-1.X-based models. Similarly, if you train on SD-2.X, you will only
|
||||||
|
be able to use the embeddings with models based on SD-2.X.
|
||||||
|
|
||||||
Once the model is trained, specify the trained .pt or .bin file when starting
|
### Trigger Term
|
||||||
invoke using
|
|
||||||
|
|
||||||
```bash
|
This is the prompt term you will use to trigger the embedding. Type a
|
||||||
python3 ./scripts/invoke.py \
|
single word or phrase you wish to use as the trigger, example
|
||||||
--embedding_path /path/to/embedding.pt
|
"psychedelic" (without angle brackets). Within InvokeAI, you will then
|
||||||
|
be able to activate the trigger using the syntax `<psychedelic>`.
|
||||||
|
|
||||||
|
### Initializer
|
||||||
|
|
||||||
|
This is a single character that is used internally during the training
|
||||||
|
process as a placeholder for the trigger term. It defaults to "*" and
|
||||||
|
can usually be left alone.
|
||||||
|
|
||||||
|
### Resume from last saved checkpoint
|
||||||
|
|
||||||
|
As training proceeds, textual inversion will write a series of
|
||||||
|
intermediate files that can be used to resume training from where it
|
||||||
|
was left off in the case of an interruption. This checkbox will be
|
||||||
|
automatically selected if you provide a previously used trigger term
|
||||||
|
and at least one checkpoint file is found on disk.
|
||||||
|
|
||||||
|
Note that as of 20 January 2023, resume does not seem to be working
|
||||||
|
properly due to an issue with the upstream code.
|
||||||
|
|
||||||
|
### Data Training Directory
|
||||||
|
|
||||||
|
This is the location of the images to be used for training. When you
|
||||||
|
select a trigger term like "my-trigger", the frontend will prepopulate
|
||||||
|
this field with `~/invokeai/text-inversion-training-data/my-trigger`,
|
||||||
|
but you can change the path to wherever you want.
|
||||||
|
|
||||||
|
### Output Destination Directory
|
||||||
|
|
||||||
|
This is the location of the logs, checkpoint files, and embedding
|
||||||
|
files created during training. When you select a trigger term like
|
||||||
|
"my-trigger", the frontend will prepopulate this field with
|
||||||
|
`~/invokeai/text-inversion-output/my-trigger`, but you can change the
|
||||||
|
path to wherever you want.
|
||||||
|
|
||||||
|
### Image resolution
|
||||||
|
|
||||||
|
The images in the training directory will be automatically scaled to
|
||||||
|
the value you use here. For best results, you will want to use the
|
||||||
|
same default resolution of the underlying model (512 pixels for
|
||||||
|
SD-1.5, 768 for the larger version of SD-2.1).
|
||||||
|
|
||||||
|
### Center crop images
|
||||||
|
|
||||||
|
If this is selected, your images will be center cropped to make them
|
||||||
|
square before resizing them to the desired resolution. Center cropping
|
||||||
|
can indiscriminately cut off the top of subjects' heads for portrait
|
||||||
|
aspect images, so if you have images like this, you may wish to use a
|
||||||
|
photoeditor to manually crop them to a square aspect ratio.
|
||||||
|
|
||||||
|
### Mixed precision
|
||||||
|
|
||||||
|
Select the floating point precision for the embedding. "no" will
|
||||||
|
result in a full 32-bit precision, "fp16" will provide 16-bit
|
||||||
|
precision, and "bf16" will provide mixed precision (only available
|
||||||
|
when XFormers is used).
|
||||||
|
|
||||||
|
### Max training steps
|
||||||
|
|
||||||
|
How many steps the training will take before the model converges. Most
|
||||||
|
training sets will converge with 2000-3000 steps.
|
||||||
|
|
||||||
|
### Batch size
|
||||||
|
|
||||||
|
This adjusts how many training images are processed simultaneously in
|
||||||
|
each step. Higher values will cause the training process to run more
|
||||||
|
quickly, but use more memory. The default size will run with GPUs with
|
||||||
|
as little as 12 GB.
|
||||||
|
|
||||||
|
### Learning rate
|
||||||
|
|
||||||
|
The rate at which the system adjusts its internal weights during
|
||||||
|
training. Higher values risk overtraining (getting the same image each
|
||||||
|
time), and lower values will take more steps to train a good
|
||||||
|
model. The default of 0.0005 is conservative; you may wish to increase
|
||||||
|
it to 0.005 to speed up training.
|
||||||
|
|
||||||
|
### Scale learning rate by number of GPUs, steps and batch size
|
||||||
|
|
||||||
|
If this is selected (the default) the system will adjust the provided
|
||||||
|
learning rate to improve performance.
|
||||||
|
|
||||||
|
### Use xformers acceleration
|
||||||
|
|
||||||
|
This will activate XFormers memory-efficient attention. You need to
|
||||||
|
have XFormers installed for this to have an effect.
|
||||||
|
|
||||||
|
### Learning rate scheduler
|
||||||
|
|
||||||
|
This adjusts how the learning rate changes over the course of
|
||||||
|
training. The default "constant" means to use a constant learning rate
|
||||||
|
for the entire training session. The other values scale the learning
|
||||||
|
rate according to various formulas.
|
||||||
|
|
||||||
|
Only "constant" is supported by the XFormers library.
|
||||||
|
|
||||||
|
### Gradient accumulation steps
|
||||||
|
|
||||||
|
This is a parameter that allows you to use bigger batch sizes than
|
||||||
|
your GPU's VRAM would ordinarily accommodate, at the cost of some
|
||||||
|
performance.
|
||||||
|
|
||||||
|
### Warmup steps
|
||||||
|
|
||||||
|
If "constant_with_warmup" is selected in the learning rate scheduler,
|
||||||
|
then this provides the number of warmup steps. Warmup steps have a
|
||||||
|
very low learning rate, and are one way of preventing early
|
||||||
|
overtraining.
|
||||||
|
|
||||||
|
## The training run
|
||||||
|
|
||||||
|
Start the training run by advancing to the OK button (bottom right)
|
||||||
|
and pressing <enter>. A series of progress messages will be displayed
|
||||||
|
as the training process proceeds. This may take an hour or two,
|
||||||
|
depending on settings and the speed of your system. Various log and
|
||||||
|
checkpoint files will be written into the output directory (ordinarily
|
||||||
|
`~/invokeai/text-inversion-output/my-model/`)
|
||||||
|
|
||||||
|
At the end of successful training, the system will copy the file
|
||||||
|
`learned_embeds.bin` into the InvokeAI root directory's `embeddings`
|
||||||
|
directory, using a subdirectory named after the trigger token. For
|
||||||
|
example, if the trigger token was `psychedelic`, then look for the
|
||||||
|
embeddings file in
|
||||||
|
`~/invokeai/embeddings/psychedelic/learned_embeds.bin`
|
||||||
|
|
||||||
|
You may now launch InvokeAI and try out a prompt that uses the trigger
|
||||||
|
term. For example `a plate of banana sushi in <psychedelic> style`.
|
||||||
|
|
||||||
|
## **Training with the Command-Line Script**
|
||||||
|
|
||||||
|
InvokeAI also comes with a traditional command-line script for
|
||||||
|
launching textual inversion training. It is named
|
||||||
|
`textual_inversion`, and can be launched from within the
|
||||||
|
"developer's console", or from the command line after activating
|
||||||
|
InvokeAI's virtual environment.
|
||||||
|
|
||||||
|
It accepts a large number of arguments, which can be summarized by
|
||||||
|
passing the `--help` argument:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
textual_inversion --help
|
||||||
```
|
```
|
||||||
|
|
||||||
Then, to utilize your subject at the invoke prompt
|
Typical usage is shown here:
|
||||||
|
```sh
|
||||||
```bash
|
python textual_inversion.py \
|
||||||
invoke> "a photo of *"
|
--model=stable-diffusion-1.5 \
|
||||||
|
--resolution=512 \
|
||||||
|
--learnable_property=style \
|
||||||
|
--initializer_token='*' \
|
||||||
|
--placeholder_token='<psychedelic>' \
|
||||||
|
--train_data_dir=/home/lstein/invokeai/training-data/psychedelic \
|
||||||
|
--output_dir=/home/lstein/invokeai/text-inversion-training/psychedelic \
|
||||||
|
--scale_lr \
|
||||||
|
--train_batch_size=8 \
|
||||||
|
--gradient_accumulation_steps=4 \
|
||||||
|
--max_train_steps=3000 \
|
||||||
|
--learning_rate=0.0005 \
|
||||||
|
--resume_from_checkpoint=latest \
|
||||||
|
--lr_scheduler=constant \
|
||||||
|
--mixed_precision=fp16 \
|
||||||
|
--only_save_embeds
|
||||||
```
|
```
|
||||||
|
|
||||||
This also works with image2image
|
## Reading
|
||||||
|
|
||||||
```bash
|
For more information on textual inversion, please see the following
|
||||||
invoke> "waterfall and rainbow in the style of *" --init_img=./init-images/crude_drawing.png --strength=0.5 -s100 -n4
|
resources:
|
||||||
```
|
|
||||||
|
|
||||||
For .pt files it's also possible to train multiple tokens (modify the
|
* The [textual inversion repository](https://github.com/rinongal/textual_inversion) and
|
||||||
placeholder string in `configs/stable-diffusion/v1-finetune.yaml`) and combine
|
associated paper for details and limitations.
|
||||||
LDM checkpoints using:
|
* [HuggingFace's textual inversion training
|
||||||
|
page](https://huggingface.co/docs/diffusers/training/text_inversion)
|
||||||
|
* [HuggingFace example script
|
||||||
|
documentation](https://github.com/huggingface/diffusers/tree/main/examples/textual_inversion)
|
||||||
|
(Note that this script is similar to, but not identical, to
|
||||||
|
`textual_inversion`, but produces embed files that are completely compatible.
|
||||||
|
|
||||||
```bash
|
---
|
||||||
python3 ./scripts/merge_embeddings.py \
|
|
||||||
--manager_ckpts /path/to/first/embedding.pt \
|
|
||||||
[</path/to/second/embedding.pt>,[...]] \
|
|
||||||
--output_path /path/to/output/embedding.pt
|
|
||||||
```
|
|
||||||
|
|
||||||
Credit goes to rinongal and the repository
|
copyright (c) 2023, Lincoln Stein and the InvokeAI Development Team
|
||||||
|
|
||||||
Please see [the repository](https://github.com/rinongal/textual_inversion) and
|
|
||||||
associated paper for details and limitations.
|
|
@ -4,7 +4,6 @@
|
|||||||
# and modified slightly by Lincoln Stein (@lstein) to work with InvokeAI
|
# and modified slightly by Lincoln Stein (@lstein) to work with InvokeAI
|
||||||
|
|
||||||
import argparse
|
import argparse
|
||||||
from argparse import Namespace
|
|
||||||
import logging
|
import logging
|
||||||
import math
|
import math
|
||||||
import os
|
import os
|
||||||
@ -207,6 +206,12 @@ def parse_args():
|
|||||||
parser.add_argument("--adam_epsilon", type=float, default=1e-08, help="Epsilon value for the Adam optimizer")
|
parser.add_argument("--adam_epsilon", type=float, default=1e-08, help="Epsilon value for the Adam optimizer")
|
||||||
parser.add_argument("--push_to_hub", action="store_true", help="Whether or not to push the model to the Hub.")
|
parser.add_argument("--push_to_hub", action="store_true", help="Whether or not to push the model to the Hub.")
|
||||||
parser.add_argument("--hub_token", type=str, default=None, help="The token to use to push to the Model Hub.")
|
parser.add_argument("--hub_token", type=str, default=None, help="The token to use to push to the Model Hub.")
|
||||||
|
parser.add_argument(
|
||||||
|
"--hub_model_id",
|
||||||
|
type=str,
|
||||||
|
default=None,
|
||||||
|
help="The name of the repository to keep in sync with the local `output_dir`.",
|
||||||
|
)
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--logging_dir",
|
"--logging_dir",
|
||||||
type=Path,
|
type=Path,
|
||||||
@ -455,7 +460,8 @@ def do_textual_inversion_training(
|
|||||||
checkpointing_steps:int=500,
|
checkpointing_steps:int=500,
|
||||||
resume_from_checkpoint:Path=None,
|
resume_from_checkpoint:Path=None,
|
||||||
enable_xformers_memory_efficient_attention:bool=False,
|
enable_xformers_memory_efficient_attention:bool=False,
|
||||||
root_dir:Path=None
|
root_dir:Path=None,
|
||||||
|
hub_model_id:str=None,
|
||||||
):
|
):
|
||||||
env_local_rank = int(os.environ.get("LOCAL_RANK", -1))
|
env_local_rank = int(os.environ.get("LOCAL_RANK", -1))
|
||||||
if env_local_rank != -1 and env_local_rank != local_rank:
|
if env_local_rank != -1 and env_local_rank != local_rank:
|
||||||
@ -518,10 +524,10 @@ def do_textual_inversion_training(
|
|||||||
pretrained_model_name_or_path = model_conf.get('repo_id',None) or Path(model_conf.get('path'))
|
pretrained_model_name_or_path = model_conf.get('repo_id',None) or Path(model_conf.get('path'))
|
||||||
assert pretrained_model_name_or_path, f"models.yaml error: neither 'repo_id' nor 'path' is defined for {model}"
|
assert pretrained_model_name_or_path, f"models.yaml error: neither 'repo_id' nor 'path' is defined for {model}"
|
||||||
pipeline_args = dict(cache_dir=global_cache_dir('diffusers'))
|
pipeline_args = dict(cache_dir=global_cache_dir('diffusers'))
|
||||||
|
|
||||||
# Load tokenizer
|
# Load tokenizer
|
||||||
if tokenizer_name:
|
if tokenizer_name:
|
||||||
tokenizer = CLIPTokenizer.from_pretrained(tokenizer_name,cache_dir=global_cache_dir('transformers'))
|
tokenizer = CLIPTokenizer.from_pretrained(tokenizer_name,**pipeline_args)
|
||||||
else:
|
else:
|
||||||
tokenizer = CLIPTokenizer.from_pretrained(pretrained_model_name_or_path, subfolder="tokenizer", **pipeline_args)
|
tokenizer = CLIPTokenizer.from_pretrained(pretrained_model_name_or_path, subfolder="tokenizer", **pipeline_args)
|
||||||
|
|
||||||
@ -631,7 +637,7 @@ def do_textual_inversion_training(
|
|||||||
text_encoder, optimizer, train_dataloader, lr_scheduler
|
text_encoder, optimizer, train_dataloader, lr_scheduler
|
||||||
)
|
)
|
||||||
|
|
||||||
# For mixed precision training we cast the text_encoder and vae weights to half-precision
|
# For mixed precision training we cast the unet and vae weights to half-precision
|
||||||
# as these models are only used for inference, keeping weights in full precision is not required.
|
# as these models are only used for inference, keeping weights in full precision is not required.
|
||||||
weight_dtype = torch.float32
|
weight_dtype = torch.float32
|
||||||
if accelerator.mixed_precision == "fp16":
|
if accelerator.mixed_precision == "fp16":
|
||||||
@ -670,6 +676,7 @@ def do_textual_inversion_training(
|
|||||||
logger.info(f" Total optimization steps = {max_train_steps}")
|
logger.info(f" Total optimization steps = {max_train_steps}")
|
||||||
global_step = 0
|
global_step = 0
|
||||||
first_epoch = 0
|
first_epoch = 0
|
||||||
|
resume_step = None
|
||||||
|
|
||||||
# Potentially load in the weights and states from a previous save
|
# Potentially load in the weights and states from a previous save
|
||||||
if resume_from_checkpoint:
|
if resume_from_checkpoint:
|
||||||
@ -680,15 +687,22 @@ def do_textual_inversion_training(
|
|||||||
dirs = os.listdir(output_dir)
|
dirs = os.listdir(output_dir)
|
||||||
dirs = [d for d in dirs if d.startswith("checkpoint")]
|
dirs = [d for d in dirs if d.startswith("checkpoint")]
|
||||||
dirs = sorted(dirs, key=lambda x: int(x.split("-")[1]))
|
dirs = sorted(dirs, key=lambda x: int(x.split("-")[1]))
|
||||||
path = dirs[-1]
|
path = dirs[-1] if len(dirs) > 0 else None
|
||||||
accelerator.print(f"Resuming from checkpoint {path}")
|
|
||||||
accelerator.load_state(os.path.join(output_dir, path))
|
if path is None:
|
||||||
global_step = int(path.split("-")[1])
|
accelerator.print(
|
||||||
|
f"Checkpoint '{resume_from_checkpoint}' does not exist. Starting a new training run."
|
||||||
resume_global_step = global_step * gradient_accumulation_steps
|
)
|
||||||
first_epoch = resume_global_step // num_update_steps_per_epoch
|
resume_from_checkpoint = None
|
||||||
resume_step = resume_global_step % num_update_steps_per_epoch
|
else:
|
||||||
|
accelerator.print(f"Resuming from checkpoint {path}")
|
||||||
|
accelerator.load_state(os.path.join(output_dir, path))
|
||||||
|
global_step = int(path.split("-")[1])
|
||||||
|
|
||||||
|
resume_global_step = global_step * gradient_accumulation_steps
|
||||||
|
first_epoch = global_step // num_update_steps_per_epoch
|
||||||
|
resume_step = resume_global_step % (num_update_steps_per_epoch * gradient_accumulation_steps)
|
||||||
|
|
||||||
# Only show the progress bar once on each machine.
|
# Only show the progress bar once on each machine.
|
||||||
progress_bar = tqdm(range(global_step, max_train_steps), disable=not accelerator.is_local_main_process)
|
progress_bar = tqdm(range(global_step, max_train_steps), disable=not accelerator.is_local_main_process)
|
||||||
progress_bar.set_description("Steps")
|
progress_bar.set_description("Steps")
|
||||||
@ -700,7 +714,7 @@ def do_textual_inversion_training(
|
|||||||
text_encoder.train()
|
text_encoder.train()
|
||||||
for step, batch in enumerate(train_dataloader):
|
for step, batch in enumerate(train_dataloader):
|
||||||
# Skip steps until we reach the resumed step
|
# Skip steps until we reach the resumed step
|
||||||
if resume_from_checkpoint and epoch == first_epoch and step < resume_step:
|
if resume_step and resume_from_checkpoint and epoch == first_epoch and step < resume_step:
|
||||||
if step % gradient_accumulation_steps == 0:
|
if step % gradient_accumulation_steps == 0:
|
||||||
progress_bar.update(1)
|
progress_bar.update(1)
|
||||||
continue
|
continue
|
||||||
|
@ -72,8 +72,9 @@ class TextualInversionManager():
|
|||||||
self._add_textual_inversion(embedding_info['name'],
|
self._add_textual_inversion(embedding_info['name'],
|
||||||
embedding_info['embedding'],
|
embedding_info['embedding'],
|
||||||
defer_injecting_tokens=defer_injecting_tokens)
|
defer_injecting_tokens=defer_injecting_tokens)
|
||||||
except ValueError:
|
except ValueError as e:
|
||||||
print(f' | ignoring incompatible embedding {embedding_info["name"]}')
|
print(f' | Ignoring incompatible embedding {embedding_info["name"]}')
|
||||||
|
print(f' | The error was {str(e)}')
|
||||||
else:
|
else:
|
||||||
print(f'>> Failed to load embedding located at {ckpt_path}. Unsupported file.')
|
print(f'>> Failed to load embedding located at {ckpt_path}. Unsupported file.')
|
||||||
|
|
||||||
@ -157,7 +158,8 @@ class TextualInversionManager():
|
|||||||
try:
|
try:
|
||||||
self._inject_tokens_and_assign_embeddings(ti)
|
self._inject_tokens_and_assign_embeddings(ti)
|
||||||
except ValueError as e:
|
except ValueError as e:
|
||||||
print(f' | ignoring incompatible embedding trigger {ti.trigger_string}')
|
print(f' | Ignoring incompatible embedding trigger {ti.trigger_string}')
|
||||||
|
print(f' | The error was {str(e)}')
|
||||||
continue
|
continue
|
||||||
injected_token_ids.append(ti.trigger_token_id)
|
injected_token_ids.append(ti.trigger_token_id)
|
||||||
injected_token_ids.extend(ti.pad_token_ids)
|
injected_token_ids.extend(ti.pad_token_ids)
|
||||||
|
@ -747,7 +747,7 @@ def initialize_rootdir(root:str,yes_to_all:bool=False):
|
|||||||
|
|
||||||
safety_checker = '--nsfw_checker' if enable_safety_checker else '--no-nsfw_checker'
|
safety_checker = '--nsfw_checker' if enable_safety_checker else '--no-nsfw_checker'
|
||||||
|
|
||||||
for name in ('models','configs','embeddings'):
|
for name in ('models','configs','embeddings','text-inversion-data','text-inversion-training-data'):
|
||||||
os.makedirs(os.path.join(root,name), exist_ok=True)
|
os.makedirs(os.path.join(root,name), exist_ok=True)
|
||||||
for src in (['configs']):
|
for src in (['configs']):
|
||||||
dest = os.path.join(root,src)
|
dest = os.path.join(root,src)
|
||||||
|
@ -1,11 +1,11 @@
|
|||||||
#!/usr/bin/env python
|
#!/usr/bin/env python
|
||||||
|
|
||||||
# Copyright 2023, Lincoln Stein @lstein
|
# Copyright 2023, Lincoln Stein @lstein
|
||||||
from ldm.invoke.globals import Globals, set_root
|
from ldm.invoke.globals import Globals, global_set_root
|
||||||
from ldm.invoke.textual_inversion_training import parse_args, do_textual_inversion_training
|
from ldm.invoke.textual_inversion_training import parse_args, do_textual_inversion_training
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
args = parse_args()
|
args = parse_args()
|
||||||
set_root(args.root_dir or Globals.root)
|
global_set_root(args.root_dir or Globals.root)
|
||||||
kwargs = vars(args)
|
kwargs = vars(args)
|
||||||
do_textual_inversion_training(**kwargs)
|
do_textual_inversion_training(**kwargs)
|
||||||
|
@ -6,14 +6,15 @@ import sys
|
|||||||
import re
|
import re
|
||||||
import shutil
|
import shutil
|
||||||
import traceback
|
import traceback
|
||||||
|
import curses
|
||||||
from ldm.invoke.globals import Globals, global_set_root
|
from ldm.invoke.globals import Globals, global_set_root
|
||||||
from omegaconf import OmegaConf
|
from omegaconf import OmegaConf
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from typing import List
|
from typing import List
|
||||||
import argparse
|
import argparse
|
||||||
|
|
||||||
TRAINING_DATA = 'training-data'
|
TRAINING_DATA = 'text-inversion-training-data'
|
||||||
TRAINING_DIR = 'text-inversion-training'
|
TRAINING_DIR = 'text-inversion-output'
|
||||||
CONF_FILE = 'preferences.conf'
|
CONF_FILE = 'preferences.conf'
|
||||||
|
|
||||||
class textualInversionForm(npyscreen.FormMultiPageAction):
|
class textualInversionForm(npyscreen.FormMultiPageAction):
|
||||||
@ -43,6 +44,11 @@ class textualInversionForm(npyscreen.FormMultiPageAction):
|
|||||||
except:
|
except:
|
||||||
pass
|
pass
|
||||||
|
|
||||||
|
self.add_widget_intelligent(
|
||||||
|
npyscreen.FixedText,
|
||||||
|
value='Use ctrl-N and ctrl-P to move to the <N>ext and <P>revious fields, cursor arrows to make a selection, and space to toggle checkboxes.'
|
||||||
|
)
|
||||||
|
|
||||||
self.model = self.add_widget_intelligent(
|
self.model = self.add_widget_intelligent(
|
||||||
npyscreen.TitleSelectOne,
|
npyscreen.TitleSelectOne,
|
||||||
name='Model Name:',
|
name='Model Name:',
|
||||||
@ -82,18 +88,18 @@ class textualInversionForm(npyscreen.FormMultiPageAction):
|
|||||||
max_height=4,
|
max_height=4,
|
||||||
)
|
)
|
||||||
self.train_data_dir = self.add_widget_intelligent(
|
self.train_data_dir = self.add_widget_intelligent(
|
||||||
npyscreen.TitleFilenameCombo,
|
npyscreen.TitleFilename,
|
||||||
name='Data Training Directory:',
|
name='Data Training Directory:',
|
||||||
select_dir=True,
|
select_dir=True,
|
||||||
must_exist=True,
|
must_exist=False,
|
||||||
value=saved_args.get('train_data_dir',Path(Globals.root) / TRAINING_DATA / default_placeholder_token)
|
value=str(saved_args.get('train_data_dir',Path(Globals.root) / TRAINING_DATA / default_placeholder_token))
|
||||||
)
|
)
|
||||||
self.output_dir = self.add_widget_intelligent(
|
self.output_dir = self.add_widget_intelligent(
|
||||||
npyscreen.TitleFilenameCombo,
|
npyscreen.TitleFilename,
|
||||||
name='Output Destination Directory:',
|
name='Output Destination Directory:',
|
||||||
select_dir=True,
|
select_dir=True,
|
||||||
must_exist=False,
|
must_exist=False,
|
||||||
value=saved_args.get('output_dir',Path(Globals.root) / TRAINING_DIR / default_placeholder_token)
|
value=str(saved_args.get('output_dir',Path(Globals.root) / TRAINING_DIR / default_placeholder_token))
|
||||||
)
|
)
|
||||||
self.resolution = self.add_widget_intelligent(
|
self.resolution = self.add_widget_intelligent(
|
||||||
npyscreen.TitleSelectOne,
|
npyscreen.TitleSelectOne,
|
||||||
@ -182,8 +188,8 @@ class textualInversionForm(npyscreen.FormMultiPageAction):
|
|||||||
def initializer_changed(self):
|
def initializer_changed(self):
|
||||||
placeholder = self.placeholder_token.value
|
placeholder = self.placeholder_token.value
|
||||||
self.prompt_token.value = f'(Trigger by using <{placeholder}> in your prompts)'
|
self.prompt_token.value = f'(Trigger by using <{placeholder}> in your prompts)'
|
||||||
self.train_data_dir.value = Path(Globals.root) / TRAINING_DATA / placeholder
|
self.train_data_dir.value = str(Path(Globals.root) / TRAINING_DATA / placeholder)
|
||||||
self.output_dir.value = Path(Globals.root) / TRAINING_DIR / placeholder
|
self.output_dir.value = str(Path(Globals.root) / TRAINING_DIR / placeholder)
|
||||||
self.resume_from_checkpoint.value = Path(self.output_dir.value).exists()
|
self.resume_from_checkpoint.value = Path(self.output_dir.value).exists()
|
||||||
|
|
||||||
def on_ok(self):
|
def on_ok(self):
|
||||||
@ -221,7 +227,7 @@ class textualInversionForm(npyscreen.FormMultiPageAction):
|
|||||||
|
|
||||||
def get_model_names(self)->(List[str],int):
|
def get_model_names(self)->(List[str],int):
|
||||||
conf = OmegaConf.load(os.path.join(Globals.root,'configs/models.yaml'))
|
conf = OmegaConf.load(os.path.join(Globals.root,'configs/models.yaml'))
|
||||||
model_names = list(conf.keys())
|
model_names = [idx for idx in sorted(list(conf.keys())) if conf[idx].get('format',None)=='diffusers']
|
||||||
defaults = [idx for idx in range(len(model_names)) if 'default' in conf[model_names[idx]]]
|
defaults = [idx for idx in range(len(model_names)) if 'default' in conf[model_names[idx]]]
|
||||||
return (model_names,defaults[0])
|
return (model_names,defaults[0])
|
||||||
|
|
||||||
@ -288,7 +294,9 @@ def save_args(args:dict):
|
|||||||
'''
|
'''
|
||||||
Save the current argument values to an omegaconf file
|
Save the current argument values to an omegaconf file
|
||||||
'''
|
'''
|
||||||
conf_file = Path(Globals.root) / TRAINING_DIR / CONF_FILE
|
dest_dir = Path(Globals.root) / TRAINING_DIR
|
||||||
|
os.makedirs(dest_dir, exist_ok=True)
|
||||||
|
conf_file = dest_dir / CONF_FILE
|
||||||
conf = OmegaConf.create(args)
|
conf = OmegaConf.create(args)
|
||||||
OmegaConf.save(config=conf, f=conf_file)
|
OmegaConf.save(config=conf, f=conf_file)
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user