From c1521be4451aa620a14ff60394f65619ea0978e7 Mon Sep 17 00:00:00 2001
From: Lincoln Stein <lstein@gmail.com>
Date: Wed, 18 Jan 2023 09:31:19 -0500
Subject: [PATCH 1/7] add instructions for installing xFormers on linux

---
 docs/index.md                             |  12 +-
 docs/installation/070_INSTALL_XFORMERS.md | 140 ++++++++++++++++++++++
 2 files changed, 149 insertions(+), 3 deletions(-)
 create mode 100644 docs/installation/070_INSTALL_XFORMERS.md

diff --git a/docs/index.md b/docs/index.md
index 3c5bd3904b..c38f840d32 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -93,9 +93,15 @@ getting InvokeAI up and running on your system. For alternative installation and
 upgrade instructions, please see:
 [InvokeAI Installation Overview](installation/)
 
-Linux users who wish to make use of the PyPatchMatch inpainting functions will
-need to perform a bit of extra work to enable this module. Instructions can be
-found at [Installing PyPatchMatch](installation/060_INSTALL_PATCHMATCH.md).
+Users who wish to make use of the **PyPatchMatch** inpainting functions
+will need to perform a bit of extra work to enable this
+module. Instructions can be found at [Installing
+PyPatchMatch](installation/060_INSTALL_PATCHMATCH.md).
+
+If you have an NVIDIA card, you can benefit from the significant
+memory savings and performance benefits provided by Facebook Lab's
+**xFormers** module. Instructions for Linux and Windows users can be found
+at [Installing xFormers](installation/070_INSTALL_XFORMERS.md).
 
 ## :fontawesome-solid-computer: Hardware Requirements
 
diff --git a/docs/installation/070_INSTALL_XFORMERS.md b/docs/installation/070_INSTALL_XFORMERS.md
new file mode 100644
index 0000000000..a406c28c22
--- /dev/null
+++ b/docs/installation/070_INSTALL_XFORMERS.md
@@ -0,0 +1,140 @@
+---
+title: Installing xFormers
+---
+
+# :material-image-size-select-large: Installing xformers
+
+xFormers is toolbox that integrates with the pyTorch and CUDA
+libraries to provide accelerated performance and reduced memory
+consumption for applications using the transformers machine learning
+architecture. After installing xFormers, InvokeAI users who have
+CUDA GPUs will see a noticeable decrease in GPU memory consumption and
+an increase in speed.
+
+xFormers can be installed into a working InvokeAI installation without
+any code changes or other updates. This document explains how to
+install xFormers.
+
+## Linux
+
+Note that xFormers only works with true NVIDIA GPUs and will not work
+properly with the ROCm driver for AMD acceleration.
+
+xFormers is not currently available as a pip binary wheel and must be
+installed from source. These instructions were written for a system
+running Ubuntu 22.04, but other Linux distributions should be able to
+adapt this recipe.
+
+### 1. Install CUDA Toolkit 11.7
+
+You will need the CUDA developer's toolkit in order to compile and
+install xFormers. **Do not try to install Ubuntu's nvidia-cuda-toolkit
+package.** It is out of date and will cause conflicts among the NVIDIA
+driver and binaries. Instead install the CUDA Toolkit package provided
+by NVIDIA itself. Go to [CUDA Toolkit 11.7
+Downloads](https://developer.nvidia.com/cuda-11-7-0-download-archive)
+and use the target selection wizard to choose your platform and Linux
+distribution. Select an installer type of "runfile (local)" at the
+last step.
+
+This will provide you with a recipe for downloading and running a
+install shell script that will install the toolkit and drivers. For
+example, the install script recipe for Ubuntu 22.04 running on a
+x86_64 system is:
+
+```
+wget https://developer.download.nvidia.com/compute/cuda/11.7.0/local_installers/cuda_11.7.0_515.43.04_linux.run
+sudo sh cuda_11.7.0_515.43.04_linux.run
+```
+
+Rather than cut-and-paste this example, We recommend that you walk
+through the toolkit wizard in order to get the most up to date
+installer for your system.
+
+### 2. Confirm/Install pyTorch 1.13 with CUDA 11.7 support
+
+If you are using InvokeAI 2.3 or higher, these will already be
+installed. If not, you can check whether you have the needed libraries
+using a quick command. Activate the invokeai virtual environment,
+either by entering the "developer's console", or manually with a
+command similar to `source ~/invokeai/.venv/bin/activate` (depending
+on where your `invokeai` directory is.
+
+Then run the command:
+
+```sh
+python -c 'exec("import torch\nprint(torch.__version__)")'
+```
+
+If it prints __1.13.1+cu117__ you're good. If not, you can install the
+most up to date libraries with this command:
+
+```sh
+pip install --upgrade --force-reinstall torch torchvision
+```
+
+### 3. Install source code build prerequisites
+
+To build xFormers from source, you will need the `build-essentials`
+package. If you don't have it installed already, run:
+
+```sh
+sudo apt install build-essential
+```
+
+### 4. Build xFormers
+
+There is no pip wheel package for xFormers at this time (January
+2023). Although there is a conda package, InvokeAI no longer
+officially supports conda installations and you're on your own if you
+wish to try this route.
+
+Following the recipe provided at the [xFormers GitHub
+page](https://github.com/facebookresearch/xformers), and with the
+InvokeAI virtual environment active (see step 1) run the following
+commands:
+
+```sh
+pip install ninja
+export TORCH_CUDA_ARCH_LIST="6.0;6.1;6.2;7.0;7.2;7.5;8.0;8.6"
+pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers
+```
+
+The TORCH_CUDA_ARCH_LIST is a list of GPU architectures to compile
+xFormer support for. You can speed up compilation by selecting
+the architecture specific for your system. You'll find the list of
+GPUs and their architectures at NVIDIA's [GPU Compute
+Capability](https://developer.nvidia.com/cuda-gpus) table.
+
+If the compile and install completes successfully, you can check that
+xFormers is installed with this command:
+
+```sh
+python -m xformers.info
+```
+
+If suiccessful, the top of the listing should indicate "available" for
+each of the `memory_efficient_attention` modules, as shown here:
+
+```sh
+memory_efficient_attention.cutlassF:               available
+memory_efficient_attention.cutlassB:               available
+memory_efficient_attention.flshattF:               available
+memory_efficient_attention.flshattB:               available
+memory_efficient_attention.smallkF:                available
+memory_efficient_attention.smallkB:                available
+memory_efficient_attention.tritonflashattF:        available
+memory_efficient_attention.tritonflashattB:        available
+[...]
+```
+
+You can now launch InvokeAI and enjoy the benefits of xFormers.
+
+## Windows
+
+To come
+
+## Macintosh
+
+Since CUDA is unavailable on Macintosh systems, you will not benefit
+from xFormers.

From 284b432ffd4d53a3ff313758d7e4bc978f4e74ba Mon Sep 17 00:00:00 2001
From: Lincoln Stein <lstein@gmail.com>
Date: Wed, 18 Jan 2023 22:34:36 -0500
Subject: [PATCH 2/7] add triton install instructions

---
 docs/installation/070_INSTALL_XFORMERS.md | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/docs/installation/070_INSTALL_XFORMERS.md b/docs/installation/070_INSTALL_XFORMERS.md
index a406c28c22..99138744f8 100644
--- a/docs/installation/070_INSTALL_XFORMERS.md
+++ b/docs/installation/070_INSTALL_XFORMERS.md
@@ -73,7 +73,16 @@ most up to date libraries with this command:
 pip install --upgrade --force-reinstall torch torchvision
 ```
 
-### 3. Install source code build prerequisites
+### 3. Install the triton module
+
+This module isn't necessary for xFormers image inference optimization,
+but avoids a startup warning.
+
+```sh
+pip install triton
+```
+
+### 4. Install source code build prerequisites
 
 To build xFormers from source, you will need the `build-essentials`
 package. If you don't have it installed already, run:
@@ -82,7 +91,7 @@ package. If you don't have it installed already, run:
 sudo apt install build-essential
 ```
 
-### 4. Build xFormers
+### 5. Build xFormers
 
 There is no pip wheel package for xFormers at this time (January
 2023). Although there is a conda package, InvokeAI no longer

From da81165a4bef30afed96afdbd060637566b9509b Mon Sep 17 00:00:00 2001
From: michaelk71 <michaelk71@web.de>
Date: Fri, 20 Jan 2023 19:03:12 +0100
Subject: [PATCH 3/7] Update index.md

---
 docs/installation/index.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/docs/installation/index.md b/docs/installation/index.md
index ef50cbab5f..51753f2c9b 100644
--- a/docs/installation/index.md
+++ b/docs/installation/index.md
@@ -18,7 +18,9 @@ experience and preferences.
     InvokeAI and its dependencies. We offer two recipes: one suited to
     those who prefer the `conda` tool, and one suited to those who prefer
     `pip` and Python virtual environments. In our hands the pip install
-    is faster and more reliable, but your mileage may vary.
+    is faster and more reliable, but your mileage may vary. 
+    Note that the conda installation method is currently deprecated and 
+    will not be supported at some point in the future.
 
     This method is recommended for users who have previously used `conda`
     or `pip` in the past, developers, and anyone who wishes to remain on

From 9b73292fcb5f3b2fa6eed50b118d75f328e7eed6 Mon Sep 17 00:00:00 2001
From: Lincoln Stein <lincoln.stein@gmail.com>
Date: Fri, 20 Jan 2023 17:28:14 -0500
Subject: [PATCH 4/7] add pip install documentation for xformers

---
 docs/installation/070_INSTALL_XFORMERS.md | 77 ++++++++++++++++++++---
 1 file changed, 67 insertions(+), 10 deletions(-)

diff --git a/docs/installation/070_INSTALL_XFORMERS.md b/docs/installation/070_INSTALL_XFORMERS.md
index 99138744f8..be54a3ee86 100644
--- a/docs/installation/070_INSTALL_XFORMERS.md
+++ b/docs/installation/070_INSTALL_XFORMERS.md
@@ -15,7 +15,65 @@ xFormers can be installed into a working InvokeAI installation without
 any code changes or other updates. This document explains how to
 install xFormers.
 
-## Linux
+## Pip Install
+
+For both Windows and Linux, you can install `xformers` in just a
+couple of steps from the command line.
+
+If you are used to launching `invoke.sh` or `invoke.bat` to start
+InvokeAI, then run the launcher and select the "developer's console"
+to get to the command line. If you run invoke.py directly from the
+command line, then just be sure to activate it's virtual environment.
+
+Then run the following three commands:
+
+```sh
+pip install xformers==0.0.16rc425
+pip install triton
+python -m xformers.info output
+```
+
+The first command installs `xformers`, the second installs the
+`triton` training accelerator, and the third prints out the `xformers`
+installation status. If all goes well, you'll see a report like the
+following:
+
+```sh
+xFormers 0.0.16rc425
+memory_efficient_attention.cutlassF:               available
+memory_efficient_attention.cutlassB:               available
+memory_efficient_attention.flshattF:               available
+memory_efficient_attention.flshattB:               available
+memory_efficient_attention.smallkF:                available
+memory_efficient_attention.smallkB:                available
+memory_efficient_attention.tritonflashattF:        available
+memory_efficient_attention.tritonflashattB:        available
+swiglu.fused.p.cpp:                                available
+is_triton_available:                               True
+is_functorch_available:                            False
+pytorch.version:                                   1.13.1+cu117
+pytorch.cuda:                                      available
+gpu.compute_capability:                            8.6
+gpu.name:                                          NVIDIA RTX A2000 12GB
+build.info:                                        available
+build.cuda_version:                                1107
+build.python_version:                              3.10.9
+build.torch_version:                               1.13.1+cu117
+build.env.TORCH_CUDA_ARCH_LIST:                    5.0+PTX 6.0 6.1 7.0 7.5 8.0 8.6
+build.env.XFORMERS_BUILD_TYPE:                     Release
+build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS:        None
+build.env.NVCC_FLAGS:                              None
+build.env.XFORMERS_PACKAGE_FROM:                   wheel-v0.0.16rc425
+source.privacy:                                    open source
+```
+
+## Source Builds
+
+`xformers` is currently under active development and at some point you
+may wish to build it from sourcce to get the latest features and
+bugfixes.
+
+### Source Build on Linux
 
 Note that xFormers only works with true NVIDIA GPUs and will not work
 properly with the ROCm driver for AMD acceleration.
@@ -25,7 +83,7 @@ installed from source. These instructions were written for a system
 running Ubuntu 22.04, but other Linux distributions should be able to
 adapt this recipe.
 
-### 1. Install CUDA Toolkit 11.7
+#### 1. Install CUDA Toolkit 11.7
 
 You will need the CUDA developer's toolkit in order to compile and
 install xFormers. **Do not try to install Ubuntu's nvidia-cuda-toolkit
@@ -51,7 +109,7 @@ Rather than cut-and-paste this example, We recommend that you walk
 through the toolkit wizard in order to get the most up to date
 installer for your system.
 
-### 2. Confirm/Install pyTorch 1.13 with CUDA 11.7 support
+#### 2. Confirm/Install pyTorch 1.13 with CUDA 11.7 support
 
 If you are using InvokeAI 2.3 or higher, these will already be
 installed. If not, you can check whether you have the needed libraries
@@ -73,7 +131,7 @@ most up to date libraries with this command:
 pip install --upgrade --force-reinstall torch torchvision
 ```
 
-### 3. Install the triton module
+#### 3. Install the triton module
 
 This module isn't necessary for xFormers image inference optimization,
 but avoids a startup warning.
@@ -82,7 +140,7 @@ but avoids a startup warning.
 pip install triton
 ```
 
-### 4. Install source code build prerequisites
+#### 4. Install source code build prerequisites
 
 To build xFormers from source, you will need the `build-essentials`
 package. If you don't have it installed already, run:
@@ -91,7 +149,7 @@ package. If you don't have it installed already, run:
 sudo apt install build-essential
 ```
 
-### 5. Build xFormers
+#### 5. Build xFormers
 
 There is no pip wheel package for xFormers at this time (January
 2023). Although there is a conda package, InvokeAI no longer
@@ -139,11 +197,10 @@ memory_efficient_attention.tritonflashattB:        available
 
 You can now launch InvokeAI and enjoy the benefits of xFormers.
 
-## Windows
+### Windows
 
 To come
 
-## Macintosh
 
-Since CUDA is unavailable on Macintosh systems, you will not benefit
-from xFormers.
+---
+(c) Copyright 2023 Lincoln Stein and the InvokeAI Development Team

From d35ec3398d8255f5621ec29f8c0eb0ec8a7c3129 Mon Sep 17 00:00:00 2001
From: Kevin Turner <83819+keturn@users.noreply.github.com>
Date: Fri, 20 Jan 2023 19:23:12 -0800
Subject: [PATCH 5/7] fix: use pad_token for padding

Stable Diffusion does not use the eos_token for padding.
---
 ldm/modules/encoders/modules.py               | 18 ++++++++----------
 ldm/modules/prompt_to_embeddings_converter.py | 11 ++++++-----
 2 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/ldm/modules/encoders/modules.py b/ldm/modules/encoders/modules.py
index aafb1299ad..6715b229f1 100644
--- a/ldm/modules/encoders/modules.py
+++ b/ldm/modules/encoders/modules.py
@@ -1,18 +1,16 @@
 import math
-import os.path
+from functools import partial
 from typing import Optional
 
+import clip
+import kornia
 import torch
 import torch.nn as nn
-from functools import partial
-import clip
-from einops import rearrange, repeat
+from einops import repeat
 from transformers import CLIPTokenizer, CLIPTextModel
-import kornia
-from ldm.invoke.devices import choose_torch_device
-from ldm.invoke.globals import Globals, global_cache_dir
-#from ldm.modules.textual_inversion_manager import TextualInversionManager
 
+from ldm.invoke.devices import choose_torch_device
+from ldm.invoke.globals import global_cache_dir
 from ldm.modules.x_transformer import (
     Encoder,
     TransformerWrapper,
@@ -663,12 +661,12 @@ class WeightedFrozenCLIPEmbedder(FrozenCLIPEmbedder):
             all_token_ids = all_token_ids[0:self.max_length]
             per_token_weights = per_token_weights[0:self.max_length]
 
-        # pad out to a 77-entry array: [eos_token, <prompt tokens>, eos_token, ..., eos_token]
+        # pad out to a 77-entry array: [bos_token, <prompt tokens>, eos_token, pad_token…]
         # (77 = self.max_length)
         all_token_ids = [self.tokenizer.bos_token_id] + all_token_ids + [self.tokenizer.eos_token_id]
         per_token_weights = [1.0] + per_token_weights + [1.0]
         pad_length = self.max_length - len(all_token_ids)
-        all_token_ids += [self.tokenizer.eos_token_id] * pad_length
+        all_token_ids += [self.tokenizer.pad_token_id] * pad_length
         per_token_weights += [1.0] * pad_length
 
         all_token_ids_tensor = torch.tensor(all_token_ids, dtype=torch.long).to(self.device)
diff --git a/ldm/modules/prompt_to_embeddings_converter.py b/ldm/modules/prompt_to_embeddings_converter.py
index ab989e4892..dea15d61b4 100644
--- a/ldm/modules/prompt_to_embeddings_converter.py
+++ b/ldm/modules/prompt_to_embeddings_converter.py
@@ -3,8 +3,9 @@ import math
 import torch
 from transformers import CLIPTokenizer, CLIPTextModel
 
-from ldm.modules.textual_inversion_manager import TextualInversionManager
 from ldm.invoke.devices import torch_dtype
+from ldm.modules.textual_inversion_manager import TextualInversionManager
+
 
 class WeightedPromptFragmentsToEmbeddingsConverter():
 
@@ -22,8 +23,8 @@ class WeightedPromptFragmentsToEmbeddingsConverter():
         return self.tokenizer.model_max_length
 
     def get_embeddings_for_weighted_prompt_fragments(self,
-                                                     text: list[str],
-                                                     fragment_weights: list[float],
+                                                     text: list[list[str]],
+                                                     fragment_weights: list[list[float]],
                                                      should_return_tokens: bool = False,
                                                      device='cpu'
                                                      ) -> torch.Tensor:
@@ -198,12 +199,12 @@ class WeightedPromptFragmentsToEmbeddingsConverter():
             all_token_ids = all_token_ids[0:max_token_count_without_bos_eos_markers]
             per_token_weights = per_token_weights[0:max_token_count_without_bos_eos_markers]
 
-        # pad out to a self.max_length-entry array: [eos_token, <prompt tokens>, eos_token, ..., eos_token]
+        # pad out to a self.max_length-entry array: [bos_token, <prompt tokens>, eos_token, pad_token…]
         # (typically self.max_length == 77)
         all_token_ids = [self.tokenizer.bos_token_id] + all_token_ids + [self.tokenizer.eos_token_id]
         per_token_weights = [1.0] + per_token_weights + [1.0]
         pad_length = self.max_length - len(all_token_ids)
-        all_token_ids += [self.tokenizer.eos_token_id] * pad_length
+        all_token_ids += [self.tokenizer.pad_token_id] * pad_length
         per_token_weights += [1.0] * pad_length
 
         all_token_ids_tensor = torch.tensor(all_token_ids, dtype=torch.long, device=device)

From e94c8fa285da208fc042ad16ebcb8303d19dc345 Mon Sep 17 00:00:00 2001
From: Damian Stewart <d@damianstewart.com>
Date: Sat, 21 Jan 2023 12:06:23 +0100
Subject: [PATCH 6/7] fix long prompt weighting bug in ckpt codepath

---
 ldm/modules/encoders/modules.py | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/ldm/modules/encoders/modules.py b/ldm/modules/encoders/modules.py
index aafb1299ad..5b5f71cffd 100644
--- a/ldm/modules/encoders/modules.py
+++ b/ldm/modules/encoders/modules.py
@@ -654,14 +654,15 @@ class WeightedFrozenCLIPEmbedder(FrozenCLIPEmbedder):
             per_token_weights += [weight] * len(this_fragment_token_ids)
 
         # leave room for bos/eos
-        if len(all_token_ids) > self.max_length - 2:
-            excess_token_count = len(all_token_ids) - self.max_length - 2
+        max_token_count_without_bos_eos_markers = self.max_length - 2
+        if len(all_token_ids) > max_token_count_without_bos_eos_markers:
+            excess_token_count = len(all_token_ids) - max_token_count_without_bos_eos_markers
             # TODO build nice description string of how the truncation was applied
             # this should be done by calling self.tokenizer.convert_ids_to_tokens() then passing the result to
             # self.tokenizer.convert_tokens_to_string() for the token_ids on each side of the truncation limit.
             print(f">> Prompt is {excess_token_count} token(s) too long and has been truncated")
-            all_token_ids = all_token_ids[0:self.max_length]
-            per_token_weights = per_token_weights[0:self.max_length]
+            all_token_ids = all_token_ids[0:max_token_count_without_bos_eos_markers]
+            per_token_weights = per_token_weights[0:max_token_count_without_bos_eos_markers]
 
         # pad out to a 77-entry array: [eos_token, <prompt tokens>, eos_token, ..., eos_token]
         # (77 = self.max_length)

From ffcc5ad79582575fb145c69173226a808999e43b Mon Sep 17 00:00:00 2001
From: Lincoln Stein <lstein@gmail.com>
Date: Mon, 23 Jan 2023 00:35:16 -0500
Subject: [PATCH 7/7] conversion script uses invokeai models cache by default

---
 ldm/invoke/ckpt_to_diffuser.py | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/ldm/invoke/ckpt_to_diffuser.py b/ldm/invoke/ckpt_to_diffuser.py
index 86281623a6..9b1735f831 100644
--- a/ldm/invoke/ckpt_to_diffuser.py
+++ b/ldm/invoke/ckpt_to_diffuser.py
@@ -21,7 +21,7 @@ import os
 import re
 import torch
 from pathlib import Path
-from ldm.invoke.globals import Globals
+from ldm.invoke.globals import Globals, global_cache_dir
 from safetensors.torch import load_file
 
 try:
@@ -637,7 +637,7 @@ def convert_ldm_bert_checkpoint(checkpoint, config):
 
 
 def convert_ldm_clip_checkpoint(checkpoint):
-    text_model = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14")
+    text_model = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14",cache_dir=global_cache_dir('hub'))
 
     keys = list(checkpoint.keys())
 
@@ -677,7 +677,8 @@ textenc_pattern = re.compile("|".join(protected.keys()))
 
 
 def convert_paint_by_example_checkpoint(checkpoint):
-    config = CLIPVisionConfig.from_pretrained("openai/clip-vit-large-patch14")
+    cache_dir = global_cache_dir('hub')
+    config = CLIPVisionConfig.from_pretrained("openai/clip-vit-large-patch14",cache_dir=cache_dir)
     model = PaintByExampleImageEncoder(config)
 
     keys = list(checkpoint.keys())
@@ -744,7 +745,8 @@ def convert_paint_by_example_checkpoint(checkpoint):
 
 
 def convert_open_clip_checkpoint(checkpoint):
-    text_model = CLIPTextModel.from_pretrained("stabilityai/stable-diffusion-2", subfolder="text_encoder")
+    cache_dir=global_cache_dir('hub')
+    text_model = CLIPTextModel.from_pretrained("stabilityai/stable-diffusion-2", subfolder="text_encoder", cache_dir=cache_dir)
 
     keys = list(checkpoint.keys())
 
@@ -795,6 +797,7 @@ def convert_ckpt_to_diffuser(checkpoint_path:str,
                              ):
 
     checkpoint = load_file(checkpoint_path) if Path(checkpoint_path).suffix == '.safetensors' else torch.load(checkpoint_path)
+    cache_dir = global_cache_dir('hub')
 
     # Sometimes models don't have the global_step item
     if "global_step" in checkpoint:
@@ -904,7 +907,7 @@ def convert_ckpt_to_diffuser(checkpoint_path:str,
 
     if model_type == "FrozenOpenCLIPEmbedder":
         text_model = convert_open_clip_checkpoint(checkpoint)
-        tokenizer = CLIPTokenizer.from_pretrained("stabilityai/stable-diffusion-2", subfolder="tokenizer")
+        tokenizer = CLIPTokenizer.from_pretrained("stabilityai/stable-diffusion-2", subfolder="tokenizer",cache_dir=global_cache_dir('diffusers'))
         pipe = StableDiffusionPipeline(
             vae=vae,
             text_encoder=text_model,
@@ -917,8 +920,8 @@ def convert_ckpt_to_diffuser(checkpoint_path:str,
         )
     elif model_type == "PaintByExample":
         vision_model = convert_paint_by_example_checkpoint(checkpoint)
-        tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
-        feature_extractor = AutoFeatureExtractor.from_pretrained("CompVis/stable-diffusion-safety-checker")
+        tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14",cache_dir=cache_dir)
+        feature_extractor = AutoFeatureExtractor.from_pretrained("CompVis/stable-diffusion-safety-checker",cache_dir=cache_dir)
         pipe = PaintByExamplePipeline(
             vae=vae,
             image_encoder=vision_model,
@@ -929,9 +932,9 @@ def convert_ckpt_to_diffuser(checkpoint_path:str,
         )
     elif model_type in ['FrozenCLIPEmbedder','WeightedFrozenCLIPEmbedder']:
         text_model = convert_ldm_clip_checkpoint(checkpoint)
-        tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
-        safety_checker = StableDiffusionSafetyChecker.from_pretrained("CompVis/stable-diffusion-safety-checker")
-        feature_extractor = AutoFeatureExtractor.from_pretrained("CompVis/stable-diffusion-safety-checker")
+        tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14",cache_dir=cache_dir)
+        safety_checker = StableDiffusionSafetyChecker.from_pretrained("CompVis/stable-diffusion-safety-checker",cache_dir=cache_dir)
+        feature_extractor = AutoFeatureExtractor.from_pretrained("CompVis/stable-diffusion-safety-checker",cache_dir=cache_dir)
         pipe = StableDiffusionPipeline(
             vae=vae,
             text_encoder=text_model,
@@ -944,7 +947,7 @@ def convert_ckpt_to_diffuser(checkpoint_path:str,
     else:
         text_config = create_ldm_bert_config(original_config)
         text_model = convert_ldm_bert_checkpoint(checkpoint, text_config)
-        tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased")
+        tokenizer = BertTokenizerFast.from_pretrained("bert-base-uncased",cache_dir=cache_dir)
         pipe = LDMTextToImagePipeline(vqvae=vae, bert=text_model, tokenizer=tokenizer, unet=unet, scheduler=scheduler)
 
     pipe.save_pretrained(