mirror of
https://github.com/invoke-ai/InvokeAI
synced 2024-08-30 20:32:17 +00:00
Merge branch 'main' into feat/import-with-vae
This commit is contained in:
commit
70f8793700
@ -93,9 +93,15 @@ getting InvokeAI up and running on your system. For alternative installation and
|
|||||||
upgrade instructions, please see:
|
upgrade instructions, please see:
|
||||||
[InvokeAI Installation Overview](installation/)
|
[InvokeAI Installation Overview](installation/)
|
||||||
|
|
||||||
Linux users who wish to make use of the PyPatchMatch inpainting functions will
|
Users who wish to make use of the **PyPatchMatch** inpainting functions
|
||||||
need to perform a bit of extra work to enable this module. Instructions can be
|
will need to perform a bit of extra work to enable this
|
||||||
found at [Installing PyPatchMatch](installation/060_INSTALL_PATCHMATCH.md).
|
module. Instructions can be found at [Installing
|
||||||
|
PyPatchMatch](installation/060_INSTALL_PATCHMATCH.md).
|
||||||
|
|
||||||
|
If you have an NVIDIA card, you can benefit from the significant
|
||||||
|
memory savings and performance benefits provided by Facebook Lab's
|
||||||
|
**xFormers** module. Instructions for Linux and Windows users can be found
|
||||||
|
at [Installing xFormers](installation/070_INSTALL_XFORMERS.md).
|
||||||
|
|
||||||
## :fontawesome-solid-computer: Hardware Requirements
|
## :fontawesome-solid-computer: Hardware Requirements
|
||||||
|
|
||||||
|
206
docs/installation/070_INSTALL_XFORMERS.md
Normal file
206
docs/installation/070_INSTALL_XFORMERS.md
Normal file
@ -0,0 +1,206 @@
|
|||||||
|
---
|
||||||
|
title: Installing xFormers
|
||||||
|
---
|
||||||
|
|
||||||
|
# :material-image-size-select-large: Installing xformers
|
||||||
|
|
||||||
|
xFormers is toolbox that integrates with the pyTorch and CUDA
|
||||||
|
libraries to provide accelerated performance and reduced memory
|
||||||
|
consumption for applications using the transformers machine learning
|
||||||
|
architecture. After installing xFormers, InvokeAI users who have
|
||||||
|
CUDA GPUs will see a noticeable decrease in GPU memory consumption and
|
||||||
|
an increase in speed.
|
||||||
|
|
||||||
|
xFormers can be installed into a working InvokeAI installation without
|
||||||
|
any code changes or other updates. This document explains how to
|
||||||
|
install xFormers.
|
||||||
|
|
||||||
|
## Pip Install
|
||||||
|
|
||||||
|
For both Windows and Linux, you can install `xformers` in just a
|
||||||
|
couple of steps from the command line.
|
||||||
|
|
||||||
|
If you are used to launching `invoke.sh` or `invoke.bat` to start
|
||||||
|
InvokeAI, then run the launcher and select the "developer's console"
|
||||||
|
to get to the command line. If you run invoke.py directly from the
|
||||||
|
command line, then just be sure to activate it's virtual environment.
|
||||||
|
|
||||||
|
Then run the following three commands:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
pip install xformers==0.0.16rc425
|
||||||
|
pip install triton
|
||||||
|
python -m xformers.info output
|
||||||
|
```
|
||||||
|
|
||||||
|
The first command installs `xformers`, the second installs the
|
||||||
|
`triton` training accelerator, and the third prints out the `xformers`
|
||||||
|
installation status. If all goes well, you'll see a report like the
|
||||||
|
following:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
xFormers 0.0.16rc425
|
||||||
|
memory_efficient_attention.cutlassF: available
|
||||||
|
memory_efficient_attention.cutlassB: available
|
||||||
|
memory_efficient_attention.flshattF: available
|
||||||
|
memory_efficient_attention.flshattB: available
|
||||||
|
memory_efficient_attention.smallkF: available
|
||||||
|
memory_efficient_attention.smallkB: available
|
||||||
|
memory_efficient_attention.tritonflashattF: available
|
||||||
|
memory_efficient_attention.tritonflashattB: available
|
||||||
|
swiglu.fused.p.cpp: available
|
||||||
|
is_triton_available: True
|
||||||
|
is_functorch_available: False
|
||||||
|
pytorch.version: 1.13.1+cu117
|
||||||
|
pytorch.cuda: available
|
||||||
|
gpu.compute_capability: 8.6
|
||||||
|
gpu.name: NVIDIA RTX A2000 12GB
|
||||||
|
build.info: available
|
||||||
|
build.cuda_version: 1107
|
||||||
|
build.python_version: 3.10.9
|
||||||
|
build.torch_version: 1.13.1+cu117
|
||||||
|
build.env.TORCH_CUDA_ARCH_LIST: 5.0+PTX 6.0 6.1 7.0 7.5 8.0 8.6
|
||||||
|
build.env.XFORMERS_BUILD_TYPE: Release
|
||||||
|
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS: None
|
||||||
|
build.env.NVCC_FLAGS: None
|
||||||
|
build.env.XFORMERS_PACKAGE_FROM: wheel-v0.0.16rc425
|
||||||
|
source.privacy: open source
|
||||||
|
```
|
||||||
|
|
||||||
|
## Source Builds
|
||||||
|
|
||||||
|
`xformers` is currently under active development and at some point you
|
||||||
|
may wish to build it from sourcce to get the latest features and
|
||||||
|
bugfixes.
|
||||||
|
|
||||||
|
### Source Build on Linux
|
||||||
|
|
||||||
|
Note that xFormers only works with true NVIDIA GPUs and will not work
|
||||||
|
properly with the ROCm driver for AMD acceleration.
|
||||||
|
|
||||||
|
xFormers is not currently available as a pip binary wheel and must be
|
||||||
|
installed from source. These instructions were written for a system
|
||||||
|
running Ubuntu 22.04, but other Linux distributions should be able to
|
||||||
|
adapt this recipe.
|
||||||
|
|
||||||
|
#### 1. Install CUDA Toolkit 11.7
|
||||||
|
|
||||||
|
You will need the CUDA developer's toolkit in order to compile and
|
||||||
|
install xFormers. **Do not try to install Ubuntu's nvidia-cuda-toolkit
|
||||||
|
package.** It is out of date and will cause conflicts among the NVIDIA
|
||||||
|
driver and binaries. Instead install the CUDA Toolkit package provided
|
||||||
|
by NVIDIA itself. Go to [CUDA Toolkit 11.7
|
||||||
|
Downloads](https://developer.nvidia.com/cuda-11-7-0-download-archive)
|
||||||
|
and use the target selection wizard to choose your platform and Linux
|
||||||
|
distribution. Select an installer type of "runfile (local)" at the
|
||||||
|
last step.
|
||||||
|
|
||||||
|
This will provide you with a recipe for downloading and running a
|
||||||
|
install shell script that will install the toolkit and drivers. For
|
||||||
|
example, the install script recipe for Ubuntu 22.04 running on a
|
||||||
|
x86_64 system is:
|
||||||
|
|
||||||
|
```
|
||||||
|
wget https://developer.download.nvidia.com/compute/cuda/11.7.0/local_installers/cuda_11.7.0_515.43.04_linux.run
|
||||||
|
sudo sh cuda_11.7.0_515.43.04_linux.run
|
||||||
|
```
|
||||||
|
|
||||||
|
Rather than cut-and-paste this example, We recommend that you walk
|
||||||
|
through the toolkit wizard in order to get the most up to date
|
||||||
|
installer for your system.
|
||||||
|
|
||||||
|
#### 2. Confirm/Install pyTorch 1.13 with CUDA 11.7 support
|
||||||
|
|
||||||
|
If you are using InvokeAI 2.3 or higher, these will already be
|
||||||
|
installed. If not, you can check whether you have the needed libraries
|
||||||
|
using a quick command. Activate the invokeai virtual environment,
|
||||||
|
either by entering the "developer's console", or manually with a
|
||||||
|
command similar to `source ~/invokeai/.venv/bin/activate` (depending
|
||||||
|
on where your `invokeai` directory is.
|
||||||
|
|
||||||
|
Then run the command:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
python -c 'exec("import torch\nprint(torch.__version__)")'
|
||||||
|
```
|
||||||
|
|
||||||
|
If it prints __1.13.1+cu117__ you're good. If not, you can install the
|
||||||
|
most up to date libraries with this command:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
pip install --upgrade --force-reinstall torch torchvision
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 3. Install the triton module
|
||||||
|
|
||||||
|
This module isn't necessary for xFormers image inference optimization,
|
||||||
|
but avoids a startup warning.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
pip install triton
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 4. Install source code build prerequisites
|
||||||
|
|
||||||
|
To build xFormers from source, you will need the `build-essentials`
|
||||||
|
package. If you don't have it installed already, run:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
sudo apt install build-essential
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 5. Build xFormers
|
||||||
|
|
||||||
|
There is no pip wheel package for xFormers at this time (January
|
||||||
|
2023). Although there is a conda package, InvokeAI no longer
|
||||||
|
officially supports conda installations and you're on your own if you
|
||||||
|
wish to try this route.
|
||||||
|
|
||||||
|
Following the recipe provided at the [xFormers GitHub
|
||||||
|
page](https://github.com/facebookresearch/xformers), and with the
|
||||||
|
InvokeAI virtual environment active (see step 1) run the following
|
||||||
|
commands:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
pip install ninja
|
||||||
|
export TORCH_CUDA_ARCH_LIST="6.0;6.1;6.2;7.0;7.2;7.5;8.0;8.6"
|
||||||
|
pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers
|
||||||
|
```
|
||||||
|
|
||||||
|
The TORCH_CUDA_ARCH_LIST is a list of GPU architectures to compile
|
||||||
|
xFormer support for. You can speed up compilation by selecting
|
||||||
|
the architecture specific for your system. You'll find the list of
|
||||||
|
GPUs and their architectures at NVIDIA's [GPU Compute
|
||||||
|
Capability](https://developer.nvidia.com/cuda-gpus) table.
|
||||||
|
|
||||||
|
If the compile and install completes successfully, you can check that
|
||||||
|
xFormers is installed with this command:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
python -m xformers.info
|
||||||
|
```
|
||||||
|
|
||||||
|
If suiccessful, the top of the listing should indicate "available" for
|
||||||
|
each of the `memory_efficient_attention` modules, as shown here:
|
||||||
|
|
||||||
|
```sh
|
||||||
|
memory_efficient_attention.cutlassF: available
|
||||||
|
memory_efficient_attention.cutlassB: available
|
||||||
|
memory_efficient_attention.flshattF: available
|
||||||
|
memory_efficient_attention.flshattB: available
|
||||||
|
memory_efficient_attention.smallkF: available
|
||||||
|
memory_efficient_attention.smallkB: available
|
||||||
|
memory_efficient_attention.tritonflashattF: available
|
||||||
|
memory_efficient_attention.tritonflashattB: available
|
||||||
|
[...]
|
||||||
|
```
|
||||||
|
|
||||||
|
You can now launch InvokeAI and enjoy the benefits of xFormers.
|
||||||
|
|
||||||
|
### Windows
|
||||||
|
|
||||||
|
To come
|
||||||
|
|
||||||
|
|
||||||
|
---
|
||||||
|
(c) Copyright 2023 Lincoln Stein and the InvokeAI Development Team
|
@ -19,6 +19,8 @@ experience and preferences.
|
|||||||
those who prefer the `conda` tool, and one suited to those who prefer
|
those who prefer the `conda` tool, and one suited to those who prefer
|
||||||
`pip` and Python virtual environments. In our hands the pip install
|
`pip` and Python virtual environments. In our hands the pip install
|
||||||
is faster and more reliable, but your mileage may vary.
|
is faster and more reliable, but your mileage may vary.
|
||||||
|
Note that the conda installation method is currently deprecated and
|
||||||
|
will not be supported at some point in the future.
|
||||||
|
|
||||||
This method is recommended for users who have previously used `conda`
|
This method is recommended for users who have previously used `conda`
|
||||||
or `pip` in the past, developers, and anyone who wishes to remain on
|
or `pip` in the past, developers, and anyone who wishes to remain on
|
||||||
|
@ -45,6 +45,7 @@ def main():
|
|||||||
Globals.try_patchmatch = args.patchmatch
|
Globals.try_patchmatch = args.patchmatch
|
||||||
Globals.always_use_cpu = args.always_use_cpu
|
Globals.always_use_cpu = args.always_use_cpu
|
||||||
Globals.internet_available = args.internet_available and check_internet()
|
Globals.internet_available = args.internet_available and check_internet()
|
||||||
|
Globals.disable_xformers = not args.xformers
|
||||||
print(f'>> Internet connectivity is {Globals.internet_available}')
|
print(f'>> Internet connectivity is {Globals.internet_available}')
|
||||||
|
|
||||||
if not args.conf:
|
if not args.conf:
|
||||||
@ -902,7 +903,7 @@ def prepare_image_metadata(
|
|||||||
try:
|
try:
|
||||||
filename = opt.fnformat.format(**wildcards)
|
filename = opt.fnformat.format(**wildcards)
|
||||||
except KeyError as e:
|
except KeyError as e:
|
||||||
print(f'** The filename format contains an unknown key \'{e.args[0]}\'. Will use \'{{prefix}}.{{seed}}.png\' instead')
|
print(f'** The filename format contains an unknown key \'{e.args[0]}\'. Will use {{prefix}}.{{seed}}.png\' instead')
|
||||||
filename = f'{prefix}.{seed}.png'
|
filename = f'{prefix}.{seed}.png'
|
||||||
except IndexError:
|
except IndexError:
|
||||||
print(f'** The filename format is broken or complete. Will use \'{{prefix}}.{{seed}}.png\' instead')
|
print(f'** The filename format is broken or complete. Will use \'{{prefix}}.{{seed}}.png\' instead')
|
||||||
|
@ -482,6 +482,12 @@ class Args(object):
|
|||||||
action='store_true',
|
action='store_true',
|
||||||
help='Force free gpu memory before final decoding',
|
help='Force free gpu memory before final decoding',
|
||||||
)
|
)
|
||||||
|
model_group.add_argument(
|
||||||
|
'--xformers',
|
||||||
|
action=argparse.BooleanOptionalAction,
|
||||||
|
default=True,
|
||||||
|
help='Enable/disable xformers support (default enabled if installed)',
|
||||||
|
)
|
||||||
model_group.add_argument(
|
model_group.add_argument(
|
||||||
"--always_use_cpu",
|
"--always_use_cpu",
|
||||||
dest="always_use_cpu",
|
dest="always_use_cpu",
|
||||||
|
@ -21,7 +21,7 @@ import os
|
|||||||
import re
|
import re
|
||||||
import torch
|
import torch
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
from ldm.invoke.globals import Globals
|
from ldm.invoke.globals import Globals, global_cache_dir
|
||||||
from safetensors.torch import load_file
|
from safetensors.torch import load_file
|
||||||
|
|
||||||
try:
|
try:
|
||||||
@ -637,7 +637,7 @@ def convert_ldm_bert_checkpoint(checkpoint, config):
|
|||||||
|
|
||||||
|
|
||||||
def convert_ldm_clip_checkpoint(checkpoint):
|
def convert_ldm_clip_checkpoint(checkpoint):
|
||||||
text_model = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14")
|
text_model = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14",cache_dir=global_cache_dir('hub'))
|
||||||
|
|
||||||
keys = list(checkpoint.keys())
|
keys = list(checkpoint.keys())
|
||||||
|
|
||||||
@ -677,7 +677,8 @@ textenc_pattern = re.compile("|".join(protected.keys()))
|
|||||||
|
|
||||||
|
|
||||||
def convert_paint_by_example_checkpoint(checkpoint):
|
def convert_paint_by_example_checkpoint(checkpoint):
|
||||||
config = CLIPVisionConfig.from_pretrained("openai/clip-vit-large-patch14")
|
cache_dir = global_cache_dir('hub')
|
||||||
|
config = CLIPVisionConfig.from_pretrained("openai/clip-vit-large-patch14",cache_dir=cache_dir)
|
||||||
model = PaintByExampleImageEncoder(config)
|
model = PaintByExampleImageEncoder(config)
|
||||||
|
|
||||||
keys = list(checkpoint.keys())
|
keys = list(checkpoint.keys())
|
||||||
@ -744,7 +745,8 @@ def convert_paint_by_example_checkpoint(checkpoint):
|
|||||||
|
|
||||||
|
|
||||||
def convert_open_clip_checkpoint(checkpoint):
|
def convert_open_clip_checkpoint(checkpoint):
|
||||||
text_model = CLIPTextModel.from_pretrained("stabilityai/stable-diffusion-2", subfolder="text_encoder")
|
cache_dir=global_cache_dir('hub')
|
||||||
|
text_model = CLIPTextModel.from_pretrained("stabilityai/stable-diffusion-2", subfolder="text_encoder", cache_dir=cache_dir)
|
||||||
|
|
||||||
keys = list(checkpoint.keys())
|
keys = list(checkpoint.keys())
|
||||||
|
|
||||||
@ -795,6 +797,7 @@ def convert_ckpt_to_diffuser(checkpoint_path:str,
|
|||||||
):
|
):
|
||||||
|
|
||||||
checkpoint = load_file(checkpoint_path) if Path(checkpoint_path).suffix == '.safetensors' else torch.load(checkpoint_path)
|
checkpoint = load_file(checkpoint_path) if Path(checkpoint_path).suffix == '.safetensors' else torch.load(checkpoint_path)
|
||||||
|
cache_dir = global_cache_dir('hub')
|
||||||
|
|
||||||
# Sometimes models don't have the global_step item
|
# Sometimes models don't have the global_step item
|
||||||
if "global_step" in checkpoint:
|
if "global_step" in checkpoint:
|
||||||
@ -904,7 +907,7 @@ def convert_ckpt_to_diffuser(checkpoint_path:str,
|
|||||||
|
|
||||||
if model_type == "FrozenOpenCLIPEmbedder":
|
if model_type == "FrozenOpenCLIPEmbedder":
|
||||||
text_model = convert_open_clip_checkpoint(checkpoint)
|
text_model = convert_open_clip_checkpoint(checkpoint)
|
||||||
tokenizer = CLIPTokenizer.from_pretrained("stabilityai/stable-diffusion-2", subfolder="tokenizer")
|
tokenizer = CLIPTokenizer.from_pretrained("stabilityai/stable-diffusion-2", subfolder="tokenizer",cache_dir=global_cache_dir('diffusers'))
|
||||||
pipe = StableDiffusionPipeline(
|
pipe = StableDiffusionPipeline(
|
||||||
vae=vae,
|
vae=vae,
|
||||||
text_encoder=text_model,
|
text_encoder=text_model,
|
||||||
@ -917,8 +920,8 @@ def convert_ckpt_to_diffuser(checkpoint_path:str,
|
|||||||
)
|
)
|
||||||
elif model_type == "PaintByExample":
|
elif model_type == "PaintByExample":
|
||||||
vision_model = convert_paint_by_example_checkpoint(checkpoint)
|
vision_model = convert_paint_by_example_checkpoint(checkpoint)
|
||||||
tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
|
tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14",cache_dir=cache_dir)
|
||||||
feature_extractor = AutoFeatureExtractor.from_pretrained("CompVis/stable-diffusion-safety-checker")
|
feature_extractor = AutoFeatureExtractor.from_pretrained("CompVis/stable-diffusion-safety-checker",cache_dir=cache_dir)
|
||||||
pipe = PaintByExamplePipeline(
|
pipe = PaintByExamplePipeline(
|
||||||
vae=vae,
|
vae=vae,
|
||||||
image_encoder=vision_model,
|
image_encoder=vision_model,
|
||||||
@ -929,9 +932,9 @@ def convert_ckpt_to_diffuser(checkpoint_path:str,
|
|||||||
)
|
)
|
||||||
elif model_type in ['FrozenCLIPEmbedder','WeightedFrozenCLIPEmbedder']:
|
elif model_type in ['FrozenCLIPEmbedder','WeightedFrozenCLIPEmbedder']:
|
||||||
text_model = convert_ldm_clip_checkpoint(checkpoint)
|
text_model = convert_ldm_clip_checkpoint(checkpoint)
|
||||||
tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
|
tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14",cache_dir=cache_dir)
|
||||||
safety_checker = StableDiffusionSafetyChecker.from_pretrained("CompVis/stable-diffusion-safety-checker")
|
safety_checker = StableDiffusionSafetyChecker.from_pretrained("CompVis/stable-diffusion-safety-checker",cache_dir=cache_dir)
|
||||||
feature_extractor = AutoFeatureExtractor.from_pretrained("CompVis/stable-diffusion-safety-checker")
|
feature_extractor = AutoFeatureExtractor.from_pretrained("CompVis/stable-diffusion-safety-checker",cache_dir=cache_dir)
|
||||||
pipe = StableDiffusionPipeline(
|
pipe = StableDiffusionPipeline(
|
||||||
vae=vae,
|
vae=vae,
|
||||||
text_encoder=text_model,
|
text_encoder=text_model,
|
||||||
@ -944,7 +947,7 @@ def convert_ckpt_to_diffuser(checkpoint_path:str,
|
|||||||
else:
|
else:
|
||||||
text_config = create_ldm_bert_config(original_config)
|
text_config = create_ldm_bert_config(original_config)
|
||||||
text_model = convert_ldm_bert_checkpoint(checkpoint, text_config)
|
text_model = convert_ldm_bert_checkpoint(checkpoint, text_config)
|
||||||
tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased")
|
tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased",cache_dir=cache_dir)
|
||||||
pipe = LDMTextToImagePipeline(vqvae=vae, bert=text_model, tokenizer=tokenizer, unet=unet, scheduler=scheduler)
|
pipe = LDMTextToImagePipeline(vqvae=vae, bert=text_model, tokenizer=tokenizer, unet=unet, scheduler=scheduler)
|
||||||
|
|
||||||
pipe.save_pretrained(
|
pipe.save_pretrained(
|
||||||
|
@ -39,6 +39,7 @@ from diffusers.utils.outputs import BaseOutput
|
|||||||
from torchvision.transforms.functional import resize as tv_resize
|
from torchvision.transforms.functional import resize as tv_resize
|
||||||
from transformers import CLIPFeatureExtractor, CLIPTextModel, CLIPTokenizer
|
from transformers import CLIPFeatureExtractor, CLIPTextModel, CLIPTokenizer
|
||||||
|
|
||||||
|
from ldm.invoke.globals import Globals
|
||||||
from ldm.models.diffusion.shared_invokeai_diffusion import InvokeAIDiffuserComponent, ThresholdSettings
|
from ldm.models.diffusion.shared_invokeai_diffusion import InvokeAIDiffuserComponent, ThresholdSettings
|
||||||
from ldm.modules.textual_inversion_manager import TextualInversionManager
|
from ldm.modules.textual_inversion_manager import TextualInversionManager
|
||||||
|
|
||||||
@ -306,7 +307,7 @@ class StableDiffusionGeneratorPipeline(StableDiffusionPipeline):
|
|||||||
textual_inversion_manager=self.textual_inversion_manager
|
textual_inversion_manager=self.textual_inversion_manager
|
||||||
)
|
)
|
||||||
|
|
||||||
if is_xformers_available():
|
if is_xformers_available() and not Globals.disable_xformers:
|
||||||
self.enable_xformers_memory_efficient_attention()
|
self.enable_xformers_memory_efficient_attention()
|
||||||
|
|
||||||
def image_from_embeddings(self, latents: torch.Tensor, num_inference_steps: int,
|
def image_from_embeddings(self, latents: torch.Tensor, num_inference_steps: int,
|
||||||
|
@ -3,6 +3,7 @@ ldm.invoke.generator.txt2img inherits from ldm.invoke.generator
|
|||||||
'''
|
'''
|
||||||
|
|
||||||
import math
|
import math
|
||||||
|
from diffusers.utils.logging import get_verbosity, set_verbosity, set_verbosity_error
|
||||||
from typing import Callable, Optional
|
from typing import Callable, Optional
|
||||||
|
|
||||||
import torch
|
import torch
|
||||||
@ -66,6 +67,8 @@ class Txt2Img2Img(Generator):
|
|||||||
|
|
||||||
second_pass_noise = self.get_noise_like(resized_latents)
|
second_pass_noise = self.get_noise_like(resized_latents)
|
||||||
|
|
||||||
|
verbosity = get_verbosity()
|
||||||
|
set_verbosity_error()
|
||||||
pipeline_output = pipeline.img2img_from_latents_and_embeddings(
|
pipeline_output = pipeline.img2img_from_latents_and_embeddings(
|
||||||
resized_latents,
|
resized_latents,
|
||||||
num_inference_steps=steps,
|
num_inference_steps=steps,
|
||||||
@ -73,6 +76,7 @@ class Txt2Img2Img(Generator):
|
|||||||
strength=strength,
|
strength=strength,
|
||||||
noise=second_pass_noise,
|
noise=second_pass_noise,
|
||||||
callback=step_callback)
|
callback=step_callback)
|
||||||
|
set_verbosity(verbosity)
|
||||||
|
|
||||||
return pipeline.numpy_to_pil(pipeline_output.images)[0]
|
return pipeline.numpy_to_pil(pipeline_output.images)[0]
|
||||||
|
|
||||||
|
@ -43,6 +43,9 @@ Globals.always_use_cpu = False
|
|||||||
# The CLI will test connectivity at startup time.
|
# The CLI will test connectivity at startup time.
|
||||||
Globals.internet_available = True
|
Globals.internet_available = True
|
||||||
|
|
||||||
|
# Whether to disable xformers
|
||||||
|
Globals.disable_xformers = False
|
||||||
|
|
||||||
# whether we are forcing full precision
|
# whether we are forcing full precision
|
||||||
Globals.full_precision = False
|
Globals.full_precision = False
|
||||||
|
|
||||||
|
@ -27,6 +27,7 @@ import torch
|
|||||||
import safetensors
|
import safetensors
|
||||||
import transformers
|
import transformers
|
||||||
from diffusers import AutoencoderKL, logging as dlogging
|
from diffusers import AutoencoderKL, logging as dlogging
|
||||||
|
from diffusers.utils.logging import get_verbosity, set_verbosity, set_verbosity_error
|
||||||
from omegaconf import OmegaConf
|
from omegaconf import OmegaConf
|
||||||
from omegaconf.dictconfig import DictConfig
|
from omegaconf.dictconfig import DictConfig
|
||||||
from picklescan.scanner import scan_file_path
|
from picklescan.scanner import scan_file_path
|
||||||
@ -871,11 +872,11 @@ class ModelManager(object):
|
|||||||
return model
|
return model
|
||||||
|
|
||||||
# diffusers really really doesn't like us moving a float16 model onto CPU
|
# diffusers really really doesn't like us moving a float16 model onto CPU
|
||||||
import logging
|
verbosity = get_verbosity()
|
||||||
logging.getLogger('diffusers.pipeline_utils').setLevel(logging.CRITICAL)
|
set_verbosity_error()
|
||||||
model.cond_stage_model.device = 'cpu'
|
model.cond_stage_model.device = 'cpu'
|
||||||
model.to('cpu')
|
model.to('cpu')
|
||||||
logging.getLogger('pipeline_utils').setLevel(logging.INFO)
|
set_verbosity(verbosity)
|
||||||
|
|
||||||
for submodel in ('first_stage_model','cond_stage_model','model'):
|
for submodel in ('first_stage_model','cond_stage_model','model'):
|
||||||
try:
|
try:
|
||||||
|
@ -1,18 +1,16 @@
|
|||||||
import math
|
import math
|
||||||
import os.path
|
from functools import partial
|
||||||
from typing import Optional
|
from typing import Optional
|
||||||
|
|
||||||
|
import clip
|
||||||
|
import kornia
|
||||||
import torch
|
import torch
|
||||||
import torch.nn as nn
|
import torch.nn as nn
|
||||||
from functools import partial
|
from einops import repeat
|
||||||
import clip
|
|
||||||
from einops import rearrange, repeat
|
|
||||||
from transformers import CLIPTokenizer, CLIPTextModel
|
from transformers import CLIPTokenizer, CLIPTextModel
|
||||||
import kornia
|
|
||||||
from ldm.invoke.devices import choose_torch_device
|
|
||||||
from ldm.invoke.globals import Globals, global_cache_dir
|
|
||||||
#from ldm.modules.textual_inversion_manager import TextualInversionManager
|
|
||||||
|
|
||||||
|
from ldm.invoke.devices import choose_torch_device
|
||||||
|
from ldm.invoke.globals import global_cache_dir
|
||||||
from ldm.modules.x_transformer import (
|
from ldm.modules.x_transformer import (
|
||||||
Encoder,
|
Encoder,
|
||||||
TransformerWrapper,
|
TransformerWrapper,
|
||||||
@ -654,21 +652,22 @@ class WeightedFrozenCLIPEmbedder(FrozenCLIPEmbedder):
|
|||||||
per_token_weights += [weight] * len(this_fragment_token_ids)
|
per_token_weights += [weight] * len(this_fragment_token_ids)
|
||||||
|
|
||||||
# leave room for bos/eos
|
# leave room for bos/eos
|
||||||
if len(all_token_ids) > self.max_length - 2:
|
max_token_count_without_bos_eos_markers = self.max_length - 2
|
||||||
excess_token_count = len(all_token_ids) - self.max_length - 2
|
if len(all_token_ids) > max_token_count_without_bos_eos_markers:
|
||||||
|
excess_token_count = len(all_token_ids) - max_token_count_without_bos_eos_markers
|
||||||
# TODO build nice description string of how the truncation was applied
|
# TODO build nice description string of how the truncation was applied
|
||||||
# this should be done by calling self.tokenizer.convert_ids_to_tokens() then passing the result to
|
# this should be done by calling self.tokenizer.convert_ids_to_tokens() then passing the result to
|
||||||
# self.tokenizer.convert_tokens_to_string() for the token_ids on each side of the truncation limit.
|
# self.tokenizer.convert_tokens_to_string() for the token_ids on each side of the truncation limit.
|
||||||
print(f">> Prompt is {excess_token_count} token(s) too long and has been truncated")
|
print(f">> Prompt is {excess_token_count} token(s) too long and has been truncated")
|
||||||
all_token_ids = all_token_ids[0:self.max_length]
|
all_token_ids = all_token_ids[0:max_token_count_without_bos_eos_markers]
|
||||||
per_token_weights = per_token_weights[0:self.max_length]
|
per_token_weights = per_token_weights[0:max_token_count_without_bos_eos_markers]
|
||||||
|
|
||||||
# pad out to a 77-entry array: [eos_token, <prompt tokens>, eos_token, ..., eos_token]
|
# pad out to a 77-entry array: [bos_token, <prompt tokens>, eos_token, pad_token…]
|
||||||
# (77 = self.max_length)
|
# (77 = self.max_length)
|
||||||
all_token_ids = [self.tokenizer.bos_token_id] + all_token_ids + [self.tokenizer.eos_token_id]
|
all_token_ids = [self.tokenizer.bos_token_id] + all_token_ids + [self.tokenizer.eos_token_id]
|
||||||
per_token_weights = [1.0] + per_token_weights + [1.0]
|
per_token_weights = [1.0] + per_token_weights + [1.0]
|
||||||
pad_length = self.max_length - len(all_token_ids)
|
pad_length = self.max_length - len(all_token_ids)
|
||||||
all_token_ids += [self.tokenizer.eos_token_id] * pad_length
|
all_token_ids += [self.tokenizer.pad_token_id] * pad_length
|
||||||
per_token_weights += [1.0] * pad_length
|
per_token_weights += [1.0] * pad_length
|
||||||
|
|
||||||
all_token_ids_tensor = torch.tensor(all_token_ids, dtype=torch.long).to(self.device)
|
all_token_ids_tensor = torch.tensor(all_token_ids, dtype=torch.long).to(self.device)
|
||||||
|
@ -3,8 +3,9 @@ import math
|
|||||||
import torch
|
import torch
|
||||||
from transformers import CLIPTokenizer, CLIPTextModel
|
from transformers import CLIPTokenizer, CLIPTextModel
|
||||||
|
|
||||||
from ldm.modules.textual_inversion_manager import TextualInversionManager
|
|
||||||
from ldm.invoke.devices import torch_dtype
|
from ldm.invoke.devices import torch_dtype
|
||||||
|
from ldm.modules.textual_inversion_manager import TextualInversionManager
|
||||||
|
|
||||||
|
|
||||||
class WeightedPromptFragmentsToEmbeddingsConverter():
|
class WeightedPromptFragmentsToEmbeddingsConverter():
|
||||||
|
|
||||||
@ -22,8 +23,8 @@ class WeightedPromptFragmentsToEmbeddingsConverter():
|
|||||||
return self.tokenizer.model_max_length
|
return self.tokenizer.model_max_length
|
||||||
|
|
||||||
def get_embeddings_for_weighted_prompt_fragments(self,
|
def get_embeddings_for_weighted_prompt_fragments(self,
|
||||||
text: list[str],
|
text: list[list[str]],
|
||||||
fragment_weights: list[float],
|
fragment_weights: list[list[float]],
|
||||||
should_return_tokens: bool = False,
|
should_return_tokens: bool = False,
|
||||||
device='cpu'
|
device='cpu'
|
||||||
) -> torch.Tensor:
|
) -> torch.Tensor:
|
||||||
@ -198,12 +199,12 @@ class WeightedPromptFragmentsToEmbeddingsConverter():
|
|||||||
all_token_ids = all_token_ids[0:max_token_count_without_bos_eos_markers]
|
all_token_ids = all_token_ids[0:max_token_count_without_bos_eos_markers]
|
||||||
per_token_weights = per_token_weights[0:max_token_count_without_bos_eos_markers]
|
per_token_weights = per_token_weights[0:max_token_count_without_bos_eos_markers]
|
||||||
|
|
||||||
# pad out to a self.max_length-entry array: [eos_token, <prompt tokens>, eos_token, ..., eos_token]
|
# pad out to a self.max_length-entry array: [bos_token, <prompt tokens>, eos_token, pad_token…]
|
||||||
# (typically self.max_length == 77)
|
# (typically self.max_length == 77)
|
||||||
all_token_ids = [self.tokenizer.bos_token_id] + all_token_ids + [self.tokenizer.eos_token_id]
|
all_token_ids = [self.tokenizer.bos_token_id] + all_token_ids + [self.tokenizer.eos_token_id]
|
||||||
per_token_weights = [1.0] + per_token_weights + [1.0]
|
per_token_weights = [1.0] + per_token_weights + [1.0]
|
||||||
pad_length = self.max_length - len(all_token_ids)
|
pad_length = self.max_length - len(all_token_ids)
|
||||||
all_token_ids += [self.tokenizer.eos_token_id] * pad_length
|
all_token_ids += [self.tokenizer.pad_token_id] * pad_length
|
||||||
per_token_weights += [1.0] * pad_length
|
per_token_weights += [1.0] * pad_length
|
||||||
|
|
||||||
all_token_ids_tensor = torch.tensor(all_token_ids, dtype=torch.long, device=device)
|
all_token_ids_tensor = torch.tensor(all_token_ids, dtype=torch.long, device=device)
|
||||||
|
@ -676,6 +676,7 @@ def download_weights(opt:dict) -> Union[str, None]:
|
|||||||
return
|
return
|
||||||
|
|
||||||
access_token = authenticate()
|
access_token = authenticate()
|
||||||
|
if access_token is not None:
|
||||||
HfFolder.save_token(access_token)
|
HfFolder.save_token(access_token)
|
||||||
|
|
||||||
print('\n** DOWNLOADING WEIGHTS **')
|
print('\n** DOWNLOADING WEIGHTS **')
|
||||||
|
@ -115,6 +115,14 @@ class textualInversionForm(npyscreen.FormMultiPageAction):
|
|||||||
value=self.precisions.index(saved_args.get('mixed_precision','fp16')),
|
value=self.precisions.index(saved_args.get('mixed_precision','fp16')),
|
||||||
max_height=4,
|
max_height=4,
|
||||||
)
|
)
|
||||||
|
self.num_train_epochs = self.add_widget_intelligent(
|
||||||
|
npyscreen.TitleSlider,
|
||||||
|
name='Number of training epochs:',
|
||||||
|
out_of=1000,
|
||||||
|
step=50,
|
||||||
|
lowest=1,
|
||||||
|
value=saved_args.get('num_train_epochs',100)
|
||||||
|
)
|
||||||
self.max_train_steps = self.add_widget_intelligent(
|
self.max_train_steps = self.add_widget_intelligent(
|
||||||
npyscreen.TitleSlider,
|
npyscreen.TitleSlider,
|
||||||
name='Max Training Steps:',
|
name='Max Training Steps:',
|
||||||
@ -131,6 +139,22 @@ class textualInversionForm(npyscreen.FormMultiPageAction):
|
|||||||
lowest=1,
|
lowest=1,
|
||||||
value=saved_args.get('train_batch_size',8),
|
value=saved_args.get('train_batch_size',8),
|
||||||
)
|
)
|
||||||
|
self.gradient_accumulation_steps = self.add_widget_intelligent(
|
||||||
|
npyscreen.TitleSlider,
|
||||||
|
name='Gradient Accumulation Steps (may need to decrease this to resume from a checkpoint):',
|
||||||
|
out_of=10,
|
||||||
|
step=1,
|
||||||
|
lowest=1,
|
||||||
|
value=saved_args.get('gradient_accumulation_steps',4)
|
||||||
|
)
|
||||||
|
self.lr_warmup_steps = self.add_widget_intelligent(
|
||||||
|
npyscreen.TitleSlider,
|
||||||
|
name='Warmup Steps:',
|
||||||
|
out_of=100,
|
||||||
|
step=1,
|
||||||
|
lowest=0,
|
||||||
|
value=saved_args.get('lr_warmup_steps',0),
|
||||||
|
)
|
||||||
self.learning_rate = self.add_widget_intelligent(
|
self.learning_rate = self.add_widget_intelligent(
|
||||||
npyscreen.TitleText,
|
npyscreen.TitleText,
|
||||||
name="Learning Rate:",
|
name="Learning Rate:",
|
||||||
@ -154,22 +178,6 @@ class textualInversionForm(npyscreen.FormMultiPageAction):
|
|||||||
scroll_exit = True,
|
scroll_exit = True,
|
||||||
value=self.lr_schedulers.index(saved_args.get('lr_scheduler','constant')),
|
value=self.lr_schedulers.index(saved_args.get('lr_scheduler','constant')),
|
||||||
)
|
)
|
||||||
self.gradient_accumulation_steps = self.add_widget_intelligent(
|
|
||||||
npyscreen.TitleSlider,
|
|
||||||
name='Gradient Accumulation Steps:',
|
|
||||||
out_of=10,
|
|
||||||
step=1,
|
|
||||||
lowest=1,
|
|
||||||
value=saved_args.get('gradient_accumulation_steps',4)
|
|
||||||
)
|
|
||||||
self.lr_warmup_steps = self.add_widget_intelligent(
|
|
||||||
npyscreen.TitleSlider,
|
|
||||||
name='Warmup Steps:',
|
|
||||||
out_of=100,
|
|
||||||
step=1,
|
|
||||||
lowest=0,
|
|
||||||
value=saved_args.get('lr_warmup_steps',0),
|
|
||||||
)
|
|
||||||
|
|
||||||
def initializer_changed(self):
|
def initializer_changed(self):
|
||||||
placeholder = self.placeholder_token.value
|
placeholder = self.placeholder_token.value
|
||||||
@ -236,7 +244,7 @@ class textualInversionForm(npyscreen.FormMultiPageAction):
|
|||||||
|
|
||||||
# all the integers
|
# all the integers
|
||||||
for attr in ('train_batch_size','gradient_accumulation_steps',
|
for attr in ('train_batch_size','gradient_accumulation_steps',
|
||||||
'max_train_steps','lr_warmup_steps'):
|
'num_train_epochs','max_train_steps','lr_warmup_steps'):
|
||||||
args[attr] = int(getattr(self,attr).value)
|
args[attr] = int(getattr(self,attr).value)
|
||||||
|
|
||||||
# the floats (just one)
|
# the floats (just one)
|
||||||
@ -324,6 +332,7 @@ if __name__ == '__main__':
|
|||||||
save_args(args)
|
save_args(args)
|
||||||
|
|
||||||
try:
|
try:
|
||||||
|
print(f'DEBUG: args = {args}')
|
||||||
do_textual_inversion_training(**args)
|
do_textual_inversion_training(**args)
|
||||||
copy_to_embeddings_folder(args)
|
copy_to_embeddings_folder(args)
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
|
Loading…
Reference in New Issue
Block a user