Merge branch 'main' into feat/import-with-vae

This commit is contained in:
Kevin Turner 2023-01-23 00:17:46 -08:00 committed by GitHub
commit 70f8793700
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
14 changed files with 301 additions and 58 deletions

View File

@ -93,9 +93,15 @@ getting InvokeAI up and running on your system. For alternative installation and
upgrade instructions, please see: upgrade instructions, please see:
[InvokeAI Installation Overview](installation/) [InvokeAI Installation Overview](installation/)
Linux users who wish to make use of the PyPatchMatch inpainting functions will Users who wish to make use of the **PyPatchMatch** inpainting functions
need to perform a bit of extra work to enable this module. Instructions can be will need to perform a bit of extra work to enable this
found at [Installing PyPatchMatch](installation/060_INSTALL_PATCHMATCH.md). module. Instructions can be found at [Installing
PyPatchMatch](installation/060_INSTALL_PATCHMATCH.md).
If you have an NVIDIA card, you can benefit from the significant
memory savings and performance benefits provided by Facebook Lab's
**xFormers** module. Instructions for Linux and Windows users can be found
at [Installing xFormers](installation/070_INSTALL_XFORMERS.md).
## :fontawesome-solid-computer: Hardware Requirements ## :fontawesome-solid-computer: Hardware Requirements

View File

@ -0,0 +1,206 @@
---
title: Installing xFormers
---
# :material-image-size-select-large: Installing xformers
xFormers is toolbox that integrates with the pyTorch and CUDA
libraries to provide accelerated performance and reduced memory
consumption for applications using the transformers machine learning
architecture. After installing xFormers, InvokeAI users who have
CUDA GPUs will see a noticeable decrease in GPU memory consumption and
an increase in speed.
xFormers can be installed into a working InvokeAI installation without
any code changes or other updates. This document explains how to
install xFormers.
## Pip Install
For both Windows and Linux, you can install `xformers` in just a
couple of steps from the command line.
If you are used to launching `invoke.sh` or `invoke.bat` to start
InvokeAI, then run the launcher and select the "developer's console"
to get to the command line. If you run invoke.py directly from the
command line, then just be sure to activate it's virtual environment.
Then run the following three commands:
```sh
pip install xformers==0.0.16rc425
pip install triton
python -m xformers.info output
```
The first command installs `xformers`, the second installs the
`triton` training accelerator, and the third prints out the `xformers`
installation status. If all goes well, you'll see a report like the
following:
```sh
xFormers 0.0.16rc425
memory_efficient_attention.cutlassF: available
memory_efficient_attention.cutlassB: available
memory_efficient_attention.flshattF: available
memory_efficient_attention.flshattB: available
memory_efficient_attention.smallkF: available
memory_efficient_attention.smallkB: available
memory_efficient_attention.tritonflashattF: available
memory_efficient_attention.tritonflashattB: available
swiglu.fused.p.cpp: available
is_triton_available: True
is_functorch_available: False
pytorch.version: 1.13.1+cu117
pytorch.cuda: available
gpu.compute_capability: 8.6
gpu.name: NVIDIA RTX A2000 12GB
build.info: available
build.cuda_version: 1107
build.python_version: 3.10.9
build.torch_version: 1.13.1+cu117
build.env.TORCH_CUDA_ARCH_LIST: 5.0+PTX 6.0 6.1 7.0 7.5 8.0 8.6
build.env.XFORMERS_BUILD_TYPE: Release
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS: None
build.env.NVCC_FLAGS: None
build.env.XFORMERS_PACKAGE_FROM: wheel-v0.0.16rc425
source.privacy: open source
```
## Source Builds
`xformers` is currently under active development and at some point you
may wish to build it from sourcce to get the latest features and
bugfixes.
### Source Build on Linux
Note that xFormers only works with true NVIDIA GPUs and will not work
properly with the ROCm driver for AMD acceleration.
xFormers is not currently available as a pip binary wheel and must be
installed from source. These instructions were written for a system
running Ubuntu 22.04, but other Linux distributions should be able to
adapt this recipe.
#### 1. Install CUDA Toolkit 11.7
You will need the CUDA developer's toolkit in order to compile and
install xFormers. **Do not try to install Ubuntu's nvidia-cuda-toolkit
package.** It is out of date and will cause conflicts among the NVIDIA
driver and binaries. Instead install the CUDA Toolkit package provided
by NVIDIA itself. Go to [CUDA Toolkit 11.7
Downloads](https://developer.nvidia.com/cuda-11-7-0-download-archive)
and use the target selection wizard to choose your platform and Linux
distribution. Select an installer type of "runfile (local)" at the
last step.
This will provide you with a recipe for downloading and running a
install shell script that will install the toolkit and drivers. For
example, the install script recipe for Ubuntu 22.04 running on a
x86_64 system is:
```
wget https://developer.download.nvidia.com/compute/cuda/11.7.0/local_installers/cuda_11.7.0_515.43.04_linux.run
sudo sh cuda_11.7.0_515.43.04_linux.run
```
Rather than cut-and-paste this example, We recommend that you walk
through the toolkit wizard in order to get the most up to date
installer for your system.
#### 2. Confirm/Install pyTorch 1.13 with CUDA 11.7 support
If you are using InvokeAI 2.3 or higher, these will already be
installed. If not, you can check whether you have the needed libraries
using a quick command. Activate the invokeai virtual environment,
either by entering the "developer's console", or manually with a
command similar to `source ~/invokeai/.venv/bin/activate` (depending
on where your `invokeai` directory is.
Then run the command:
```sh
python -c 'exec("import torch\nprint(torch.__version__)")'
```
If it prints __1.13.1+cu117__ you're good. If not, you can install the
most up to date libraries with this command:
```sh
pip install --upgrade --force-reinstall torch torchvision
```
#### 3. Install the triton module
This module isn't necessary for xFormers image inference optimization,
but avoids a startup warning.
```sh
pip install triton
```
#### 4. Install source code build prerequisites
To build xFormers from source, you will need the `build-essentials`
package. If you don't have it installed already, run:
```sh
sudo apt install build-essential
```
#### 5. Build xFormers
There is no pip wheel package for xFormers at this time (January
2023). Although there is a conda package, InvokeAI no longer
officially supports conda installations and you're on your own if you
wish to try this route.
Following the recipe provided at the [xFormers GitHub
page](https://github.com/facebookresearch/xformers), and with the
InvokeAI virtual environment active (see step 1) run the following
commands:
```sh
pip install ninja
export TORCH_CUDA_ARCH_LIST="6.0;6.1;6.2;7.0;7.2;7.5;8.0;8.6"
pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers
```
The TORCH_CUDA_ARCH_LIST is a list of GPU architectures to compile
xFormer support for. You can speed up compilation by selecting
the architecture specific for your system. You'll find the list of
GPUs and their architectures at NVIDIA's [GPU Compute
Capability](https://developer.nvidia.com/cuda-gpus) table.
If the compile and install completes successfully, you can check that
xFormers is installed with this command:
```sh
python -m xformers.info
```
If suiccessful, the top of the listing should indicate "available" for
each of the `memory_efficient_attention` modules, as shown here:
```sh
memory_efficient_attention.cutlassF: available
memory_efficient_attention.cutlassB: available
memory_efficient_attention.flshattF: available
memory_efficient_attention.flshattB: available
memory_efficient_attention.smallkF: available
memory_efficient_attention.smallkB: available
memory_efficient_attention.tritonflashattF: available
memory_efficient_attention.tritonflashattB: available
[...]
```
You can now launch InvokeAI and enjoy the benefits of xFormers.
### Windows
To come
---
(c) Copyright 2023 Lincoln Stein and the InvokeAI Development Team

View File

@ -19,6 +19,8 @@ experience and preferences.
those who prefer the `conda` tool, and one suited to those who prefer those who prefer the `conda` tool, and one suited to those who prefer
`pip` and Python virtual environments. In our hands the pip install `pip` and Python virtual environments. In our hands the pip install
is faster and more reliable, but your mileage may vary. is faster and more reliable, but your mileage may vary.
Note that the conda installation method is currently deprecated and
will not be supported at some point in the future.
This method is recommended for users who have previously used `conda` This method is recommended for users who have previously used `conda`
or `pip` in the past, developers, and anyone who wishes to remain on or `pip` in the past, developers, and anyone who wishes to remain on

View File

@ -45,6 +45,7 @@ def main():
Globals.try_patchmatch = args.patchmatch Globals.try_patchmatch = args.patchmatch
Globals.always_use_cpu = args.always_use_cpu Globals.always_use_cpu = args.always_use_cpu
Globals.internet_available = args.internet_available and check_internet() Globals.internet_available = args.internet_available and check_internet()
Globals.disable_xformers = not args.xformers
print(f'>> Internet connectivity is {Globals.internet_available}') print(f'>> Internet connectivity is {Globals.internet_available}')
if not args.conf: if not args.conf:
@ -902,7 +903,7 @@ def prepare_image_metadata(
try: try:
filename = opt.fnformat.format(**wildcards) filename = opt.fnformat.format(**wildcards)
except KeyError as e: except KeyError as e:
print(f'** The filename format contains an unknown key \'{e.args[0]}\'. Will use \'{{prefix}}.{{seed}}.png\' instead') print(f'** The filename format contains an unknown key \'{e.args[0]}\'. Will use {{prefix}}.{{seed}}.png\' instead')
filename = f'{prefix}.{seed}.png' filename = f'{prefix}.{seed}.png'
except IndexError: except IndexError:
print(f'** The filename format is broken or complete. Will use \'{{prefix}}.{{seed}}.png\' instead') print(f'** The filename format is broken or complete. Will use \'{{prefix}}.{{seed}}.png\' instead')

View File

@ -482,6 +482,12 @@ class Args(object):
action='store_true', action='store_true',
help='Force free gpu memory before final decoding', help='Force free gpu memory before final decoding',
) )
model_group.add_argument(
'--xformers',
action=argparse.BooleanOptionalAction,
default=True,
help='Enable/disable xformers support (default enabled if installed)',
)
model_group.add_argument( model_group.add_argument(
"--always_use_cpu", "--always_use_cpu",
dest="always_use_cpu", dest="always_use_cpu",

View File

@ -21,7 +21,7 @@ import os
import re import re
import torch import torch
from pathlib import Path from pathlib import Path
from ldm.invoke.globals import Globals from ldm.invoke.globals import Globals, global_cache_dir
from safetensors.torch import load_file from safetensors.torch import load_file
try: try:
@ -637,7 +637,7 @@ def convert_ldm_bert_checkpoint(checkpoint, config):
def convert_ldm_clip_checkpoint(checkpoint): def convert_ldm_clip_checkpoint(checkpoint):
text_model = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14") text_model = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14",cache_dir=global_cache_dir('hub'))
keys = list(checkpoint.keys()) keys = list(checkpoint.keys())
@ -677,7 +677,8 @@ textenc_pattern = re.compile("|".join(protected.keys()))
def convert_paint_by_example_checkpoint(checkpoint): def convert_paint_by_example_checkpoint(checkpoint):
config = CLIPVisionConfig.from_pretrained("openai/clip-vit-large-patch14") cache_dir = global_cache_dir('hub')
config = CLIPVisionConfig.from_pretrained("openai/clip-vit-large-patch14",cache_dir=cache_dir)
model = PaintByExampleImageEncoder(config) model = PaintByExampleImageEncoder(config)
keys = list(checkpoint.keys()) keys = list(checkpoint.keys())
@ -744,7 +745,8 @@ def convert_paint_by_example_checkpoint(checkpoint):
def convert_open_clip_checkpoint(checkpoint): def convert_open_clip_checkpoint(checkpoint):
text_model = CLIPTextModel.from_pretrained("stabilityai/stable-diffusion-2", subfolder="text_encoder") cache_dir=global_cache_dir('hub')
text_model = CLIPTextModel.from_pretrained("stabilityai/stable-diffusion-2", subfolder="text_encoder", cache_dir=cache_dir)
keys = list(checkpoint.keys()) keys = list(checkpoint.keys())
@ -795,6 +797,7 @@ def convert_ckpt_to_diffuser(checkpoint_path:str,
): ):
checkpoint = load_file(checkpoint_path) if Path(checkpoint_path).suffix == '.safetensors' else torch.load(checkpoint_path) checkpoint = load_file(checkpoint_path) if Path(checkpoint_path).suffix == '.safetensors' else torch.load(checkpoint_path)
cache_dir = global_cache_dir('hub')
# Sometimes models don't have the global_step item # Sometimes models don't have the global_step item
if "global_step" in checkpoint: if "global_step" in checkpoint:
@ -904,7 +907,7 @@ def convert_ckpt_to_diffuser(checkpoint_path:str,
if model_type == "FrozenOpenCLIPEmbedder": if model_type == "FrozenOpenCLIPEmbedder":
text_model = convert_open_clip_checkpoint(checkpoint) text_model = convert_open_clip_checkpoint(checkpoint)
tokenizer = CLIPTokenizer.from_pretrained("stabilityai/stable-diffusion-2", subfolder="tokenizer") tokenizer = CLIPTokenizer.from_pretrained("stabilityai/stable-diffusion-2", subfolder="tokenizer",cache_dir=global_cache_dir('diffusers'))
pipe = StableDiffusionPipeline( pipe = StableDiffusionPipeline(
vae=vae, vae=vae,
text_encoder=text_model, text_encoder=text_model,
@ -917,8 +920,8 @@ def convert_ckpt_to_diffuser(checkpoint_path:str,
) )
elif model_type == "PaintByExample": elif model_type == "PaintByExample":
vision_model = convert_paint_by_example_checkpoint(checkpoint) vision_model = convert_paint_by_example_checkpoint(checkpoint)
tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14") tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14",cache_dir=cache_dir)
feature_extractor = AutoFeatureExtractor.from_pretrained("CompVis/stable-diffusion-safety-checker") feature_extractor = AutoFeatureExtractor.from_pretrained("CompVis/stable-diffusion-safety-checker",cache_dir=cache_dir)
pipe = PaintByExamplePipeline( pipe = PaintByExamplePipeline(
vae=vae, vae=vae,
image_encoder=vision_model, image_encoder=vision_model,
@ -929,9 +932,9 @@ def convert_ckpt_to_diffuser(checkpoint_path:str,
) )
elif model_type in ['FrozenCLIPEmbedder','WeightedFrozenCLIPEmbedder']: elif model_type in ['FrozenCLIPEmbedder','WeightedFrozenCLIPEmbedder']:
text_model = convert_ldm_clip_checkpoint(checkpoint) text_model = convert_ldm_clip_checkpoint(checkpoint)
tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14") tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14",cache_dir=cache_dir)
safety_checker = StableDiffusionSafetyChecker.from_pretrained("CompVis/stable-diffusion-safety-checker") safety_checker = StableDiffusionSafetyChecker.from_pretrained("CompVis/stable-diffusion-safety-checker",cache_dir=cache_dir)
feature_extractor = AutoFeatureExtractor.from_pretrained("CompVis/stable-diffusion-safety-checker") feature_extractor = AutoFeatureExtractor.from_pretrained("CompVis/stable-diffusion-safety-checker",cache_dir=cache_dir)
pipe = StableDiffusionPipeline( pipe = StableDiffusionPipeline(
vae=vae, vae=vae,
text_encoder=text_model, text_encoder=text_model,
@ -944,7 +947,7 @@ def convert_ckpt_to_diffuser(checkpoint_path:str,
else: else:
text_config = create_ldm_bert_config(original_config) text_config = create_ldm_bert_config(original_config)
text_model = convert_ldm_bert_checkpoint(checkpoint, text_config) text_model = convert_ldm_bert_checkpoint(checkpoint, text_config)
tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased") tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased",cache_dir=cache_dir)
pipe = LDMTextToImagePipeline(vqvae=vae, bert=text_model, tokenizer=tokenizer, unet=unet, scheduler=scheduler) pipe = LDMTextToImagePipeline(vqvae=vae, bert=text_model, tokenizer=tokenizer, unet=unet, scheduler=scheduler)
pipe.save_pretrained( pipe.save_pretrained(

View File

@ -39,6 +39,7 @@ from diffusers.utils.outputs import BaseOutput
from torchvision.transforms.functional import resize as tv_resize from torchvision.transforms.functional import resize as tv_resize
from transformers import CLIPFeatureExtractor, CLIPTextModel, CLIPTokenizer from transformers import CLIPFeatureExtractor, CLIPTextModel, CLIPTokenizer
from ldm.invoke.globals import Globals
from ldm.models.diffusion.shared_invokeai_diffusion import InvokeAIDiffuserComponent, ThresholdSettings from ldm.models.diffusion.shared_invokeai_diffusion import InvokeAIDiffuserComponent, ThresholdSettings
from ldm.modules.textual_inversion_manager import TextualInversionManager from ldm.modules.textual_inversion_manager import TextualInversionManager
@ -306,7 +307,7 @@ class StableDiffusionGeneratorPipeline(StableDiffusionPipeline):
textual_inversion_manager=self.textual_inversion_manager textual_inversion_manager=self.textual_inversion_manager
) )
if is_xformers_available(): if is_xformers_available() and not Globals.disable_xformers:
self.enable_xformers_memory_efficient_attention() self.enable_xformers_memory_efficient_attention()
def image_from_embeddings(self, latents: torch.Tensor, num_inference_steps: int, def image_from_embeddings(self, latents: torch.Tensor, num_inference_steps: int,

View File

@ -3,6 +3,7 @@ ldm.invoke.generator.txt2img inherits from ldm.invoke.generator
''' '''
import math import math
from diffusers.utils.logging import get_verbosity, set_verbosity, set_verbosity_error
from typing import Callable, Optional from typing import Callable, Optional
import torch import torch
@ -66,6 +67,8 @@ class Txt2Img2Img(Generator):
second_pass_noise = self.get_noise_like(resized_latents) second_pass_noise = self.get_noise_like(resized_latents)
verbosity = get_verbosity()
set_verbosity_error()
pipeline_output = pipeline.img2img_from_latents_and_embeddings( pipeline_output = pipeline.img2img_from_latents_and_embeddings(
resized_latents, resized_latents,
num_inference_steps=steps, num_inference_steps=steps,
@ -73,6 +76,7 @@ class Txt2Img2Img(Generator):
strength=strength, strength=strength,
noise=second_pass_noise, noise=second_pass_noise,
callback=step_callback) callback=step_callback)
set_verbosity(verbosity)
return pipeline.numpy_to_pil(pipeline_output.images)[0] return pipeline.numpy_to_pil(pipeline_output.images)[0]

View File

@ -43,6 +43,9 @@ Globals.always_use_cpu = False
# The CLI will test connectivity at startup time. # The CLI will test connectivity at startup time.
Globals.internet_available = True Globals.internet_available = True
# Whether to disable xformers
Globals.disable_xformers = False
# whether we are forcing full precision # whether we are forcing full precision
Globals.full_precision = False Globals.full_precision = False

View File

@ -27,6 +27,7 @@ import torch
import safetensors import safetensors
import transformers import transformers
from diffusers import AutoencoderKL, logging as dlogging from diffusers import AutoencoderKL, logging as dlogging
from diffusers.utils.logging import get_verbosity, set_verbosity, set_verbosity_error
from omegaconf import OmegaConf from omegaconf import OmegaConf
from omegaconf.dictconfig import DictConfig from omegaconf.dictconfig import DictConfig
from picklescan.scanner import scan_file_path from picklescan.scanner import scan_file_path
@ -871,11 +872,11 @@ class ModelManager(object):
return model return model
# diffusers really really doesn't like us moving a float16 model onto CPU # diffusers really really doesn't like us moving a float16 model onto CPU
import logging verbosity = get_verbosity()
logging.getLogger('diffusers.pipeline_utils').setLevel(logging.CRITICAL) set_verbosity_error()
model.cond_stage_model.device = 'cpu' model.cond_stage_model.device = 'cpu'
model.to('cpu') model.to('cpu')
logging.getLogger('pipeline_utils').setLevel(logging.INFO) set_verbosity(verbosity)
for submodel in ('first_stage_model','cond_stage_model','model'): for submodel in ('first_stage_model','cond_stage_model','model'):
try: try:

View File

@ -1,18 +1,16 @@
import math import math
import os.path from functools import partial
from typing import Optional from typing import Optional
import clip
import kornia
import torch import torch
import torch.nn as nn import torch.nn as nn
from functools import partial from einops import repeat
import clip
from einops import rearrange, repeat
from transformers import CLIPTokenizer, CLIPTextModel from transformers import CLIPTokenizer, CLIPTextModel
import kornia
from ldm.invoke.devices import choose_torch_device
from ldm.invoke.globals import Globals, global_cache_dir
#from ldm.modules.textual_inversion_manager import TextualInversionManager
from ldm.invoke.devices import choose_torch_device
from ldm.invoke.globals import global_cache_dir
from ldm.modules.x_transformer import ( from ldm.modules.x_transformer import (
Encoder, Encoder,
TransformerWrapper, TransformerWrapper,
@ -654,21 +652,22 @@ class WeightedFrozenCLIPEmbedder(FrozenCLIPEmbedder):
per_token_weights += [weight] * len(this_fragment_token_ids) per_token_weights += [weight] * len(this_fragment_token_ids)
# leave room for bos/eos # leave room for bos/eos
if len(all_token_ids) > self.max_length - 2: max_token_count_without_bos_eos_markers = self.max_length - 2
excess_token_count = len(all_token_ids) - self.max_length - 2 if len(all_token_ids) > max_token_count_without_bos_eos_markers:
excess_token_count = len(all_token_ids) - max_token_count_without_bos_eos_markers
# TODO build nice description string of how the truncation was applied # TODO build nice description string of how the truncation was applied
# this should be done by calling self.tokenizer.convert_ids_to_tokens() then passing the result to # this should be done by calling self.tokenizer.convert_ids_to_tokens() then passing the result to
# self.tokenizer.convert_tokens_to_string() for the token_ids on each side of the truncation limit. # self.tokenizer.convert_tokens_to_string() for the token_ids on each side of the truncation limit.
print(f">> Prompt is {excess_token_count} token(s) too long and has been truncated") print(f">> Prompt is {excess_token_count} token(s) too long and has been truncated")
all_token_ids = all_token_ids[0:self.max_length] all_token_ids = all_token_ids[0:max_token_count_without_bos_eos_markers]
per_token_weights = per_token_weights[0:self.max_length] per_token_weights = per_token_weights[0:max_token_count_without_bos_eos_markers]
# pad out to a 77-entry array: [eos_token, <prompt tokens>, eos_token, ..., eos_token] # pad out to a 77-entry array: [bos_token, <prompt tokens>, eos_token, pad_token…]
# (77 = self.max_length) # (77 = self.max_length)
all_token_ids = [self.tokenizer.bos_token_id] + all_token_ids + [self.tokenizer.eos_token_id] all_token_ids = [self.tokenizer.bos_token_id] + all_token_ids + [self.tokenizer.eos_token_id]
per_token_weights = [1.0] + per_token_weights + [1.0] per_token_weights = [1.0] + per_token_weights + [1.0]
pad_length = self.max_length - len(all_token_ids) pad_length = self.max_length - len(all_token_ids)
all_token_ids += [self.tokenizer.eos_token_id] * pad_length all_token_ids += [self.tokenizer.pad_token_id] * pad_length
per_token_weights += [1.0] * pad_length per_token_weights += [1.0] * pad_length
all_token_ids_tensor = torch.tensor(all_token_ids, dtype=torch.long).to(self.device) all_token_ids_tensor = torch.tensor(all_token_ids, dtype=torch.long).to(self.device)

View File

@ -3,8 +3,9 @@ import math
import torch import torch
from transformers import CLIPTokenizer, CLIPTextModel from transformers import CLIPTokenizer, CLIPTextModel
from ldm.modules.textual_inversion_manager import TextualInversionManager
from ldm.invoke.devices import torch_dtype from ldm.invoke.devices import torch_dtype
from ldm.modules.textual_inversion_manager import TextualInversionManager
class WeightedPromptFragmentsToEmbeddingsConverter(): class WeightedPromptFragmentsToEmbeddingsConverter():
@ -22,8 +23,8 @@ class WeightedPromptFragmentsToEmbeddingsConverter():
return self.tokenizer.model_max_length return self.tokenizer.model_max_length
def get_embeddings_for_weighted_prompt_fragments(self, def get_embeddings_for_weighted_prompt_fragments(self,
text: list[str], text: list[list[str]],
fragment_weights: list[float], fragment_weights: list[list[float]],
should_return_tokens: bool = False, should_return_tokens: bool = False,
device='cpu' device='cpu'
) -> torch.Tensor: ) -> torch.Tensor:
@ -198,12 +199,12 @@ class WeightedPromptFragmentsToEmbeddingsConverter():
all_token_ids = all_token_ids[0:max_token_count_without_bos_eos_markers] all_token_ids = all_token_ids[0:max_token_count_without_bos_eos_markers]
per_token_weights = per_token_weights[0:max_token_count_without_bos_eos_markers] per_token_weights = per_token_weights[0:max_token_count_without_bos_eos_markers]
# pad out to a self.max_length-entry array: [eos_token, <prompt tokens>, eos_token, ..., eos_token] # pad out to a self.max_length-entry array: [bos_token, <prompt tokens>, eos_token, pad_token…]
# (typically self.max_length == 77) # (typically self.max_length == 77)
all_token_ids = [self.tokenizer.bos_token_id] + all_token_ids + [self.tokenizer.eos_token_id] all_token_ids = [self.tokenizer.bos_token_id] + all_token_ids + [self.tokenizer.eos_token_id]
per_token_weights = [1.0] + per_token_weights + [1.0] per_token_weights = [1.0] + per_token_weights + [1.0]
pad_length = self.max_length - len(all_token_ids) pad_length = self.max_length - len(all_token_ids)
all_token_ids += [self.tokenizer.eos_token_id] * pad_length all_token_ids += [self.tokenizer.pad_token_id] * pad_length
per_token_weights += [1.0] * pad_length per_token_weights += [1.0] * pad_length
all_token_ids_tensor = torch.tensor(all_token_ids, dtype=torch.long, device=device) all_token_ids_tensor = torch.tensor(all_token_ids, dtype=torch.long, device=device)

View File

@ -676,6 +676,7 @@ def download_weights(opt:dict) -> Union[str, None]:
return return
access_token = authenticate() access_token = authenticate()
if access_token is not None:
HfFolder.save_token(access_token) HfFolder.save_token(access_token)
print('\n** DOWNLOADING WEIGHTS **') print('\n** DOWNLOADING WEIGHTS **')

View File

@ -115,6 +115,14 @@ class textualInversionForm(npyscreen.FormMultiPageAction):
value=self.precisions.index(saved_args.get('mixed_precision','fp16')), value=self.precisions.index(saved_args.get('mixed_precision','fp16')),
max_height=4, max_height=4,
) )
self.num_train_epochs = self.add_widget_intelligent(
npyscreen.TitleSlider,
name='Number of training epochs:',
out_of=1000,
step=50,
lowest=1,
value=saved_args.get('num_train_epochs',100)
)
self.max_train_steps = self.add_widget_intelligent( self.max_train_steps = self.add_widget_intelligent(
npyscreen.TitleSlider, npyscreen.TitleSlider,
name='Max Training Steps:', name='Max Training Steps:',
@ -131,6 +139,22 @@ class textualInversionForm(npyscreen.FormMultiPageAction):
lowest=1, lowest=1,
value=saved_args.get('train_batch_size',8), value=saved_args.get('train_batch_size',8),
) )
self.gradient_accumulation_steps = self.add_widget_intelligent(
npyscreen.TitleSlider,
name='Gradient Accumulation Steps (may need to decrease this to resume from a checkpoint):',
out_of=10,
step=1,
lowest=1,
value=saved_args.get('gradient_accumulation_steps',4)
)
self.lr_warmup_steps = self.add_widget_intelligent(
npyscreen.TitleSlider,
name='Warmup Steps:',
out_of=100,
step=1,
lowest=0,
value=saved_args.get('lr_warmup_steps',0),
)
self.learning_rate = self.add_widget_intelligent( self.learning_rate = self.add_widget_intelligent(
npyscreen.TitleText, npyscreen.TitleText,
name="Learning Rate:", name="Learning Rate:",
@ -154,22 +178,6 @@ class textualInversionForm(npyscreen.FormMultiPageAction):
scroll_exit = True, scroll_exit = True,
value=self.lr_schedulers.index(saved_args.get('lr_scheduler','constant')), value=self.lr_schedulers.index(saved_args.get('lr_scheduler','constant')),
) )
self.gradient_accumulation_steps = self.add_widget_intelligent(
npyscreen.TitleSlider,
name='Gradient Accumulation Steps:',
out_of=10,
step=1,
lowest=1,
value=saved_args.get('gradient_accumulation_steps',4)
)
self.lr_warmup_steps = self.add_widget_intelligent(
npyscreen.TitleSlider,
name='Warmup Steps:',
out_of=100,
step=1,
lowest=0,
value=saved_args.get('lr_warmup_steps',0),
)
def initializer_changed(self): def initializer_changed(self):
placeholder = self.placeholder_token.value placeholder = self.placeholder_token.value
@ -236,7 +244,7 @@ class textualInversionForm(npyscreen.FormMultiPageAction):
# all the integers # all the integers
for attr in ('train_batch_size','gradient_accumulation_steps', for attr in ('train_batch_size','gradient_accumulation_steps',
'max_train_steps','lr_warmup_steps'): 'num_train_epochs','max_train_steps','lr_warmup_steps'):
args[attr] = int(getattr(self,attr).value) args[attr] = int(getattr(self,attr).value)
# the floats (just one) # the floats (just one)
@ -324,6 +332,7 @@ if __name__ == '__main__':
save_args(args) save_args(args)
try: try:
print(f'DEBUG: args = {args}')
do_textual_inversion_training(**args) do_textual_inversion_training(**args)
copy_to_embeddings_folder(args) copy_to_embeddings_folder(args)
except Exception as e: except Exception as e: