use correct controlnet config file

Merge remote-tracking branch 'refs/remotes/origin/lstein/feat/diffusers-v0.30' into lstein/feat/diffusers-v0.30
pass configuration templates to from_single_file() using the config option
2024-08-30 20:32:17 +00:00 · 2024-08-27 11:39:34 -04:00 · 2024-08-17 15:58:58 -04:00 · 2024-08-17 15:57:02 -04:00 · 2024-08-17 14:13:33 -04:00 · 2024-08-17 14:06:55 -04:00
942 changed files with 621171 additions and 25928 deletions
--- a/.github/workflows/python-tests.yml
+++ b/.github/workflows/python-tests.yml
@ -60,7 +60,7 @@ jobs:
            extra-index-url: 'https://download.pytorch.org/whl/cpu'
            github-env: $GITHUB_ENV
          - platform: macos-default
-            os: macOS-14
+            os: macOS-12
            github-env: $GITHUB_ENV
          - platform: windows-cpu
            os: windows-2022
--- a/docker/README.md
+++ b/docker/README.md
@ -1,22 +1,20 @@
 # Invoke in Docker

-First things first:
-
- Ensure that Docker can use your [NVIDIA][nvidia docker docs] or [AMD][amd docker docs] GPU.
- This document assumes a Linux system, but should work similarly under Windows with WSL2.
+- Ensure that Docker can use the GPU on your system
+- This documentation assumes Linux, but should work similarly under Windows with WSL2
 - We don't recommend running Invoke in Docker on macOS at this time. It works, but very slowly.

-## Quickstart
+## Quickstart :lightning:

-No `docker compose`, no persistence, single command, using the official images:
+No `docker compose`, no persistence, just a simple one-liner using the official images:

-**CUDA (NVIDIA GPU):**
+**CUDA:**

 ```bash
 docker run --runtime=nvidia --gpus=all --publish 9090:9090 ghcr.io/invoke-ai/invokeai
 ```

-**ROCm (AMD GPU):**
+**ROCm:**

 ```bash
 docker run --device /dev/kfd --device /dev/dri --publish 9090:9090 ghcr.io/invoke-ai/invokeai:main-rocm
@ -24,20 +22,12 @@ docker run --device /dev/kfd --device /dev/dri --publish 9090:9090 ghcr.io/invok

 Open `http://localhost:9090` in your browser once the container finishes booting, install some models, and generate away!

-### Data persistence
-
-To persist your generated images and downloaded models outside of the container, add a `--volume/-v` flag to the above command, e.g.:
-
-```bash
-docker run --volume /some/local/path:/invokeai {...etc...}
-```
-
-`/some/local/path/invokeai` will contain all your data.
-It can *usually* be reused between different installs of Invoke. Tread with caution and read the release notes!
+> [!TIP]
+> To persist your data (including downloaded models) outside of the container, add a `--volume/-v` flag to the above command, e.g.: `docker run --volume /some/local/path:/invokeai <...the rest of the command>`

 ## Customize the container

-The included `run.sh` script is a convenience wrapper around `docker compose`. It can be helpful for passing additional build arguments to `docker compose`. Alternatively, the familiar `docker compose` commands work just as well.
+We ship the `run.sh` script, which is a convenient wrapper around `docker compose` for cases where custom image build args are needed. Alternatively, the familiar `docker compose` commands work just as well.

 ```bash
 cd docker
@ -48,14 +38,11 @@ cp .env.sample .env

 It will take a few minutes to build the image the first time. Once the application starts up, open `http://localhost:9090` in your browser to invoke!

->[!TIP]
->When using the `run.sh` script, the container will continue running after Ctrl+C. To shut it down, use the `docker compose down` command.
-
 ## Docker setup in detail

 #### Linux

-1. Ensure buildkit is enabled in the Docker daemon settings (`/etc/docker/daemon.json`)
+1. Ensure builkit is enabled in the Docker daemon settings (`/etc/docker/daemon.json`)
 2. Install the `docker compose` plugin using your package manager, or follow a [tutorial](https://docs.docker.com/compose/install/linux/#install-using-the-repository).
    - The deprecated `docker-compose` (hyphenated) CLI probably won't work. Update to a recent version.
 3. Ensure docker daemon is able to access the GPU.
@ -111,7 +98,25 @@ GPU_DRIVER=cuda

 Any environment variables supported by InvokeAI can be set here. See the [Configuration docs](https://invoke-ai.github.io/InvokeAI/features/CONFIGURATION/) for further detail.

---
+## Even More Customizing!

-[nvidia docker docs]: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
-[amd docker docs]: https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/docker.html
+See the `docker-compose.yml` file. The `command` instruction can be uncommented and used to run arbitrary startup commands. Some examples below.
+
+### Reconfigure the runtime directory
+
+Can be used to download additional models from the supported model list
+
+In conjunction with `INVOKEAI_ROOT` can be also used to initialize a runtime directory
+
+```yaml
+command:
+  - invokeai-configure
+  - --yes
+```
+
+Or install models:
+
+```yaml
+command:
+  - invokeai-model-install
+```
--- a/invokeai/app/api/routers/session_queue.py
+++ b/invokeai/app/api/routers/session_queue.py
@ -11,7 +11,6 @@ from invokeai.app.services.session_queue.session_queue_common import (
    Batch,
    BatchStatus,
    CancelByBatchIDsResult,
-    CancelByOriginResult,
    ClearResult,
    EnqueueBatchResult,
    PruneResult,
@ -106,19 +105,6 @@ async def cancel_by_batch_ids(
    return ApiDependencies.invoker.services.session_queue.cancel_by_batch_ids(queue_id=queue_id, batch_ids=batch_ids)


-@session_queue_router.put(
-    "/{queue_id}/cancel_by_origin",
-    operation_id="cancel_by_origin",
-    responses={200: {"model": CancelByBatchIDsResult}},
-)
-async def cancel_by_origin(
-    queue_id: str = Path(description="The queue id to perform this operation on"),
-    origin: str = Query(description="The origin to cancel all queue items for"),
-) -> CancelByOriginResult:
-    """Immediately cancels all queue items with the given origin"""
-    return ApiDependencies.invoker.services.session_queue.cancel_by_origin(queue_id=queue_id, origin=origin)
-
-
@session_queue_router.put(
    "/{queue_id}/clear",
    operation_id="clear",
--- a/invokeai/app/api/routers/style_presets.py
+++ b/invokeai/app/api/routers/style_presets.py
@ -26,10 +26,13 @@ from invokeai.app.services.style_preset_records.style_preset_records_common impo
 )


-class StylePresetFormData(BaseModel):
+class StylePresetUpdateFormData(BaseModel):
    name: str = Field(description="Preset name")
    positive_prompt: str = Field(description="Positive prompt")
    negative_prompt: str = Field(description="Negative prompt")
+
+
+class StylePresetCreateFormData(StylePresetUpdateFormData):
    type: PresetType = Field(description="Preset type")


@ -92,10 +95,9 @@ async def update_style_preset(

    try:
        parsed_data = json.loads(data)
-        validated_data = StylePresetFormData(**parsed_data)
+        validated_data = StylePresetUpdateFormData(**parsed_data)

        name = validated_data.name
-        type = validated_data.type
        positive_prompt = validated_data.positive_prompt
        negative_prompt = validated_data.negative_prompt

@ -103,7 +105,7 @@ async def update_style_preset(
        raise HTTPException(status_code=400, detail="Invalid preset data")

    preset_data = PresetData(positive_prompt=positive_prompt, negative_prompt=negative_prompt)
-    changes = StylePresetChanges(name=name, preset_data=preset_data, type=type)
+    changes = StylePresetChanges(name=name, preset_data=preset_data)

    style_preset_image = ApiDependencies.invoker.services.style_preset_image_files.get_url(style_preset_id)
    style_preset = ApiDependencies.invoker.services.style_preset_records.update(
@ -143,7 +145,7 @@ async def create_style_preset(

    try:
        parsed_data = json.loads(data)
-        validated_data = StylePresetFormData(**parsed_data)
+        validated_data = StylePresetCreateFormData(**parsed_data)

        name = validated_data.name
        type = validated_data.type
--- a/invokeai/app/invocations/fields.py
+++ b/invokeai/app/invocations/fields.py
@ -40,7 +40,6 @@ class UIType(str, Enum, metaclass=MetaEnum):

    # region Model Field Types
    MainModel = "MainModelField"
-    FluxMainModel = "FluxMainModelField"
    SDXLMainModel = "SDXLMainModelField"
    SDXLRefinerModel = "SDXLRefinerModelField"
    ONNXModel = "ONNXModelField"
@ -49,7 +48,6 @@ class UIType(str, Enum, metaclass=MetaEnum):
    ControlNetModel = "ControlNetModelField"
    IPAdapterModel = "IPAdapterModelField"
    T2IAdapterModel = "T2IAdapterModelField"
-    T5EncoderModel = "T5EncoderModelField"
    SpandrelImageToImageModel = "SpandrelImageToImageModelField"
    # endregion

@ -127,16 +125,13 @@ class FieldDescriptions:
    negative_cond = "Negative conditioning tensor"
    noise = "Noise tensor"
    clip = "CLIP (tokenizer, text encoder, LoRAs) and skipped layer count"
-    t5_encoder = "T5 tokenizer and text encoder"
    unet = "UNet (scheduler, LoRAs)"
-    transformer = "Transformer"
    vae = "VAE"
    cond = "Conditioning tensor"
    controlnet_model = "ControlNet model to load"
    vae_model = "VAE model to load"
    lora_model = "LoRA model to load"
    main_model = "Main model (UNet, VAE, CLIP) to load"
-    flux_model = "Flux model (Transformer) to load"
    sdxl_main_model = "SDXL Main model (UNet, VAE, CLIP1, CLIP2) to load"
    sdxl_refiner_model = "SDXL Refiner Main Modde (UNet, VAE, CLIP2) to load"
    onnx_main_model = "ONNX Main model (UNet, VAE, CLIP) to load"
@ -236,12 +231,6 @@ class ColorField(BaseModel):
        return (self.r, self.g, self.b, self.a)


-class FluxConditioningField(BaseModel):
-    """A conditioning tensor primitive value"""
-
-    conditioning_name: str = Field(description="The name of conditioning tensor")
-
-
 class ConditioningField(BaseModel):
    """A conditioning tensor primitive value"""

--- a/invokeai/app/invocations/flux_text_encoder.py
+++ b/invokeai/app/invocations/flux_text_encoder.py
@ -1,86 +0,0 @@
-from typing import Literal
-
-import torch
-from transformers import CLIPTextModel, CLIPTokenizer, T5EncoderModel, T5Tokenizer
-
-from invokeai.app.invocations.baseinvocation import BaseInvocation, Classification, invocation
-from invokeai.app.invocations.fields import FieldDescriptions, Input, InputField
-from invokeai.app.invocations.model import CLIPField, T5EncoderField
-from invokeai.app.invocations.primitives import FluxConditioningOutput
-from invokeai.app.services.shared.invocation_context import InvocationContext
-from invokeai.backend.flux.modules.conditioner import HFEncoder
-from invokeai.backend.stable_diffusion.diffusion.conditioning_data import ConditioningFieldData, FLUXConditioningInfo
-
-
-@invocation(
-    "flux_text_encoder",
-    title="FLUX Text Encoding",
-    tags=["prompt", "conditioning", "flux"],
-    category="conditioning",
-    version="1.0.0",
-    classification=Classification.Prototype,
-)
-class FluxTextEncoderInvocation(BaseInvocation):
-    """Encodes and preps a prompt for a flux image."""
-
-    clip: CLIPField = InputField(
-        title="CLIP",
-        description=FieldDescriptions.clip,
-        input=Input.Connection,
-    )
-    t5_encoder: T5EncoderField = InputField(
-        title="T5Encoder",
-        description=FieldDescriptions.t5_encoder,
-        input=Input.Connection,
-    )
-    t5_max_seq_len: Literal[256, 512] = InputField(
-        description="Max sequence length for the T5 encoder. Expected to be 256 for FLUX schnell models and 512 for FLUX dev models."
-    )
-    prompt: str = InputField(description="Text prompt to encode.")
-
-    @torch.no_grad()
-    def invoke(self, context: InvocationContext) -> FluxConditioningOutput:
-        t5_embeddings, clip_embeddings = self._encode_prompt(context)
-        conditioning_data = ConditioningFieldData(
-            conditionings=[FLUXConditioningInfo(clip_embeds=clip_embeddings, t5_embeds=t5_embeddings)]
-        )
-
-        conditioning_name = context.conditioning.save(conditioning_data)
-        return FluxConditioningOutput.build(conditioning_name)
-
-    def _encode_prompt(self, context: InvocationContext) -> tuple[torch.Tensor, torch.Tensor]:
-        # Load CLIP.
-        clip_tokenizer_info = context.models.load(self.clip.tokenizer)
-        clip_text_encoder_info = context.models.load(self.clip.text_encoder)
-
-        # Load T5.
-        t5_tokenizer_info = context.models.load(self.t5_encoder.tokenizer)
-        t5_text_encoder_info = context.models.load(self.t5_encoder.text_encoder)
-
-        prompt = [self.prompt]
-
-        with (
-            t5_text_encoder_info as t5_text_encoder,
-            t5_tokenizer_info as t5_tokenizer,
-        ):
-            assert isinstance(t5_text_encoder, T5EncoderModel)
-            assert isinstance(t5_tokenizer, T5Tokenizer)
-
-            t5_encoder = HFEncoder(t5_text_encoder, t5_tokenizer, False, self.t5_max_seq_len)
-
-            prompt_embeds = t5_encoder(prompt)
-
-        with (
-            clip_text_encoder_info as clip_text_encoder,
-            clip_tokenizer_info as clip_tokenizer,
-        ):
-            assert isinstance(clip_text_encoder, CLIPTextModel)
-            assert isinstance(clip_tokenizer, CLIPTokenizer)
-
-            clip_encoder = HFEncoder(clip_text_encoder, clip_tokenizer, True, 77)
-
-            pooled_prompt_embeds = clip_encoder(prompt)
-
-        assert isinstance(prompt_embeds, torch.Tensor)
-        assert isinstance(pooled_prompt_embeds, torch.Tensor)
-        return prompt_embeds, pooled_prompt_embeds
--- a/invokeai/app/invocations/flux_text_to_image.py
+++ b/invokeai/app/invocations/flux_text_to_image.py
@ -1,172 +0,0 @@
-import torch
-from einops import rearrange
-from PIL import Image
-
-from invokeai.app.invocations.baseinvocation import BaseInvocation, Classification, invocation
-from invokeai.app.invocations.fields import (
-    FieldDescriptions,
-    FluxConditioningField,
-    Input,
-    InputField,
-    WithBoard,
-    WithMetadata,
-)
-from invokeai.app.invocations.model import TransformerField, VAEField
-from invokeai.app.invocations.primitives import ImageOutput
-from invokeai.app.services.session_processor.session_processor_common import CanceledException
-from invokeai.app.services.shared.invocation_context import InvocationContext
-from invokeai.backend.flux.model import Flux
-from invokeai.backend.flux.modules.autoencoder import AutoEncoder
-from invokeai.backend.flux.sampling import denoise, get_noise, get_schedule, prepare_latent_img_patches, unpack
-from invokeai.backend.stable_diffusion.diffusion.conditioning_data import FLUXConditioningInfo
-from invokeai.backend.util.devices import TorchDevice
-
-
-@invocation(
-    "flux_text_to_image",
-    title="FLUX Text to Image",
-    tags=["image", "flux"],
-    category="image",
-    version="1.0.0",
-    classification=Classification.Prototype,
-)
-class FluxTextToImageInvocation(BaseInvocation, WithMetadata, WithBoard):
-    """Text-to-image generation using a FLUX model."""
-
-    transformer: TransformerField = InputField(
-        description=FieldDescriptions.flux_model,
-        input=Input.Connection,
-        title="Transformer",
-    )
-    vae: VAEField = InputField(
-        description=FieldDescriptions.vae,
-        input=Input.Connection,
-    )
-    positive_text_conditioning: FluxConditioningField = InputField(
-        description=FieldDescriptions.positive_cond, input=Input.Connection
-    )
-    width: int = InputField(default=1024, multiple_of=16, description="Width of the generated image.")
-    height: int = InputField(default=1024, multiple_of=16, description="Height of the generated image.")
-    num_steps: int = InputField(
-        default=4, description="Number of diffusion steps. Recommend values are schnell: 4, dev: 50."
-    )
-    guidance: float = InputField(
-        default=4.0,
-        description="The guidance strength. Higher values adhere more strictly to the prompt, and will produce less diverse images. FLUX dev only, ignored for schnell.",
-    )
-    seed: int = InputField(default=0, description="Randomness seed for reproducibility.")
-
-    @torch.no_grad()
-    def invoke(self, context: InvocationContext) -> ImageOutput:
-        # Load the conditioning data.
-        cond_data = context.conditioning.load(self.positive_text_conditioning.conditioning_name)
-        assert len(cond_data.conditionings) == 1
-        flux_conditioning = cond_data.conditionings[0]
-        assert isinstance(flux_conditioning, FLUXConditioningInfo)
-
-        latents = self._run_diffusion(context, flux_conditioning.clip_embeds, flux_conditioning.t5_embeds)
-        image = self._run_vae_decoding(context, latents)
-        image_dto = context.images.save(image=image)
-        return ImageOutput.build(image_dto)
-
-    def _run_diffusion(
-        self,
-        context: InvocationContext,
-        clip_embeddings: torch.Tensor,
-        t5_embeddings: torch.Tensor,
-    ):
-        transformer_info = context.models.load(self.transformer.transformer)
-        inference_dtype = torch.bfloat16
-
-        # Prepare input noise.
-        x = get_noise(
-            num_samples=1,
-            height=self.height,
-            width=self.width,
-            device=TorchDevice.choose_torch_device(),
-            dtype=inference_dtype,
-            seed=self.seed,
-        )
-
-        img, img_ids = prepare_latent_img_patches(x)
-
-        is_schnell = "schnell" in transformer_info.config.config_path
-
-        timesteps = get_schedule(
-            num_steps=self.num_steps,
-            image_seq_len=img.shape[1],
-            shift=not is_schnell,
-        )
-
-        bs, t5_seq_len, _ = t5_embeddings.shape
-        txt_ids = torch.zeros(bs, t5_seq_len, 3, dtype=inference_dtype, device=TorchDevice.choose_torch_device())
-
-        # HACK(ryand): Manually empty the cache. Currently we don't check the size of the model before loading it from
-        # disk. Since the transformer model is large (24GB), there's a good chance that it will OOM on 32GB RAM systems
-        # if the cache is not empty.
-        context.models._services.model_manager.load.ram_cache.make_room(24 * 2**30)
-
-        with transformer_info as transformer:
-            assert isinstance(transformer, Flux)
-
-            def step_callback() -> None:
-                if context.util.is_canceled():
-                    raise CanceledException
-
-                # TODO: Make this look like the image before re-enabling
-                # latent_image = unpack(img.float(), self.height, self.width)
-                # latent_image = latent_image.squeeze()  # Remove unnecessary dimensions
-                # flattened_tensor = latent_image.reshape(-1)  # Flatten to shape [48*128*128]
-
-                # # Create a new tensor of the required shape [255, 255, 3]
-                # latent_image = flattened_tensor[: 255 * 255 * 3].reshape(255, 255, 3)  # Reshape to RGB format
-
-                # # Convert to a NumPy array and then to a PIL Image
-                # image = Image.fromarray(latent_image.cpu().numpy().astype(np.uint8))
-
-                # (width, height) = image.size
-                # width *= 8
-                # height *= 8
-
-                # dataURL = image_to_dataURL(image, image_format="JPEG")
-
-                # # TODO: move this whole function to invocation context to properly reference these variables
-                # context._services.events.emit_invocation_denoise_progress(
-                #     context._data.queue_item,
-                #     context._data.invocation,
-                #     state,
-                #     ProgressImage(dataURL=dataURL, width=width, height=height),
-                # )
-
-            x = denoise(
-                model=transformer,
-                img=img,
-                img_ids=img_ids,
-                txt=t5_embeddings,
-                txt_ids=txt_ids,
-                vec=clip_embeddings,
-                timesteps=timesteps,
-                step_callback=step_callback,
-                guidance=self.guidance,
-            )
-
-        x = unpack(x.float(), self.height, self.width)
-
-        return x
-
-    def _run_vae_decoding(
-        self,
-        context: InvocationContext,
-        latents: torch.Tensor,
-    ) -> Image.Image:
-        vae_info = context.models.load(self.vae.vae)
-        with vae_info as vae:
-            assert isinstance(vae, AutoEncoder)
-            latents = latents.to(dtype=TorchDevice.choose_torch_dtype())
-            img = vae.decode(latents)
-
-        img = img.clamp(-1, 1)
-        img = rearrange(img[0], "c h w -> h w c")
-        img_pil = Image.fromarray((127.5 * (img + 1.0)).byte().cpu().numpy())
-
-        return img_pil
--- a/invokeai/app/invocations/image.py
+++ b/invokeai/app/invocations/image.py
@ -6,19 +6,13 @@ import cv2
 import numpy
 from PIL import Image, ImageChops, ImageFilter, ImageOps

-from invokeai.app.invocations.baseinvocation import (
-    BaseInvocation,
-    Classification,
-    invocation,
-    invocation_output,
-)
+from invokeai.app.invocations.baseinvocation import BaseInvocation, Classification, invocation
 from invokeai.app.invocations.constants import IMAGE_MODES
 from invokeai.app.invocations.fields import (
    ColorField,
    FieldDescriptions,
    ImageField,
    InputField,
-    OutputField,
    WithBoard,
    WithMetadata,
 )
@ -1013,62 +1007,3 @@ class MaskFromIDInvocation(BaseInvocation, WithMetadata, WithBoard):
        image_dto = context.images.save(image=mask, image_category=ImageCategory.MASK)

        return ImageOutput.build(image_dto)
-
-
-@invocation_output("canvas_v2_mask_and_crop_output")
-class CanvasV2MaskAndCropOutput(ImageOutput):
-    offset_x: int = OutputField(description="The x offset of the image, after cropping")
-    offset_y: int = OutputField(description="The y offset of the image, after cropping")
-
-
-@invocation(
-    "canvas_v2_mask_and_crop",
-    title="Canvas V2 Mask and Crop",
-    tags=["image", "mask", "id"],
-    category="image",
-    version="1.0.0",
-    classification=Classification.Prototype,
-)
-class CanvasV2MaskAndCropInvocation(BaseInvocation, WithMetadata, WithBoard):
-    """Handles Canvas V2 image output masking and cropping"""
-
-    source_image: ImageField | None = InputField(
-        default=None,
-        description="The source image onto which the masked generated image is pasted. If omitted, the masked generated image is returned with transparency.",
-    )
-    generated_image: ImageField = InputField(description="The image to apply the mask to")
-    mask: ImageField = InputField(description="The mask to apply")
-    mask_blur: int = InputField(default=0, ge=0, description="The amount to blur the mask by")
-
-    def _prepare_mask(self, mask: Image.Image) -> Image.Image:
-        mask_array = numpy.array(mask)
-        kernel = numpy.ones((self.mask_blur, self.mask_blur), numpy.uint8)
-        dilated_mask_array = cv2.erode(mask_array, kernel, iterations=3)
-        dilated_mask = Image.fromarray(dilated_mask_array)
-        if self.mask_blur > 0:
-            mask = dilated_mask.filter(ImageFilter.GaussianBlur(self.mask_blur))
-        return ImageOps.invert(mask.convert("L"))
-
-    def invoke(self, context: InvocationContext) -> CanvasV2MaskAndCropOutput:
-        mask = self._prepare_mask(context.images.get_pil(self.mask.image_name))
-
-        if self.source_image:
-            generated_image = context.images.get_pil(self.generated_image.image_name)
-            source_image = context.images.get_pil(self.source_image.image_name)
-            source_image.paste(generated_image, (0, 0), mask)
-            image_dto = context.images.save(image=source_image)
-        else:
-            generated_image = context.images.get_pil(self.generated_image.image_name)
-            generated_image.putalpha(mask)
-            image_dto = context.images.save(image=generated_image)
-
-        # bbox = image.getbbox()
-        # image = image.crop(bbox)
-
-        return CanvasV2MaskAndCropOutput(
-            image=ImageField(image_name=image_dto.image_name),
-            offset_x=0,
-            offset_y=0,
-            width=image_dto.width,
-            height=image_dto.height,
-        )
--- a/invokeai/app/invocations/model.py
+++ b/invokeai/app/invocations/model.py
@ -1,5 +1,5 @@
 import copy
-from typing import List, Literal, Optional
+from typing import List, Optional

 from pydantic import BaseModel, Field

@ -13,14 +13,7 @@ from invokeai.app.invocations.baseinvocation import (
 from invokeai.app.invocations.fields import FieldDescriptions, Input, InputField, OutputField, UIType
 from invokeai.app.services.shared.invocation_context import InvocationContext
 from invokeai.app.shared.models import FreeUConfig
-from invokeai.backend.flux.util import max_seq_lengths
-from invokeai.backend.model_manager.config import (
-    AnyModelConfig,
-    BaseModelType,
-    CheckpointConfigBase,
-    ModelType,
-    SubModelType,
-)
+from invokeai.backend.model_manager.config import AnyModelConfig, BaseModelType, ModelType, SubModelType


 class ModelIdentifierField(BaseModel):
@ -67,15 +60,6 @@ class CLIPField(BaseModel):
    loras: List[LoRAField] = Field(description="LoRAs to apply on model loading")


-class TransformerField(BaseModel):
-    transformer: ModelIdentifierField = Field(description="Info to load Transformer submodel")
-
-
-class T5EncoderField(BaseModel):
-    tokenizer: ModelIdentifierField = Field(description="Info to load tokenizer submodel")
-    text_encoder: ModelIdentifierField = Field(description="Info to load text_encoder submodel")
-
-
 class VAEField(BaseModel):
    vae: ModelIdentifierField = Field(description="Info to load vae submodel")
    seamless_axes: List[str] = Field(default_factory=list, description='Axes("x" and "y") to which apply seamless')
@ -138,112 +122,6 @@ class ModelIdentifierInvocation(BaseInvocation):
        return ModelIdentifierOutput(model=self.model)


-@invocation_output("flux_model_loader_output")
-class FluxModelLoaderOutput(BaseInvocationOutput):
-    """Flux base model loader output"""
-
-    transformer: TransformerField = OutputField(description=FieldDescriptions.transformer, title="Transformer")
-    clip: CLIPField = OutputField(description=FieldDescriptions.clip, title="CLIP")
-    t5_encoder: T5EncoderField = OutputField(description=FieldDescriptions.t5_encoder, title="T5 Encoder")
-    vae: VAEField = OutputField(description=FieldDescriptions.vae, title="VAE")
-    max_seq_len: Literal[256, 512] = OutputField(
-        description="The max sequence length to used for the T5 encoder. (256 for schnell transformer, 512 for dev transformer)",
-        title="Max Seq Length",
-    )
-
-
-@invocation(
-    "flux_model_loader",
-    title="Flux Main Model",
-    tags=["model", "flux"],
-    category="model",
-    version="1.0.3",
-    classification=Classification.Prototype,
-)
-class FluxModelLoaderInvocation(BaseInvocation):
-    """Loads a flux base model, outputting its submodels."""
-
-    model: ModelIdentifierField = InputField(
-        description=FieldDescriptions.flux_model,
-        ui_type=UIType.FluxMainModel,
-        input=Input.Direct,
-    )
-
-    t5_encoder: ModelIdentifierField = InputField(
-        description=FieldDescriptions.t5_encoder,
-        ui_type=UIType.T5EncoderModel,
-        input=Input.Direct,
-    )
-
-    def invoke(self, context: InvocationContext) -> FluxModelLoaderOutput:
-        model_key = self.model.key
-
-        if not context.models.exists(model_key):
-            raise ValueError(f"Unknown model: {model_key}")
-        transformer = self._get_model(context, SubModelType.Transformer)
-        tokenizer = self._get_model(context, SubModelType.Tokenizer)
-        tokenizer2 = self._get_model(context, SubModelType.Tokenizer2)
-        clip_encoder = self._get_model(context, SubModelType.TextEncoder)
-        t5_encoder = self._get_model(context, SubModelType.TextEncoder2)
-        vae = self._get_model(context, SubModelType.VAE)
-        transformer_config = context.models.get_config(transformer)
-        assert isinstance(transformer_config, CheckpointConfigBase)
-
-        return FluxModelLoaderOutput(
-            transformer=TransformerField(transformer=transformer),
-            clip=CLIPField(tokenizer=tokenizer, text_encoder=clip_encoder, loras=[], skipped_layers=0),
-            t5_encoder=T5EncoderField(tokenizer=tokenizer2, text_encoder=t5_encoder),
-            vae=VAEField(vae=vae),
-            max_seq_len=max_seq_lengths[transformer_config.config_path],
-        )
-
-    def _get_model(self, context: InvocationContext, submodel: SubModelType) -> ModelIdentifierField:
-        match submodel:
-            case SubModelType.Transformer:
-                return self.model.model_copy(update={"submodel_type": SubModelType.Transformer})
-            case SubModelType.VAE:
-                return self._pull_model_from_mm(
-                    context,
-                    SubModelType.VAE,
-                    "FLUX.1-schnell_ae",
-                    ModelType.VAE,
-                    BaseModelType.Flux,
-                )
-            case submodel if submodel in [SubModelType.Tokenizer, SubModelType.TextEncoder]:
-                return self._pull_model_from_mm(
-                    context,
-                    submodel,
-                    "clip-vit-large-patch14",
-                    ModelType.CLIPEmbed,
-                    BaseModelType.Any,
-                )
-            case submodel if submodel in [SubModelType.Tokenizer2, SubModelType.TextEncoder2]:
-                return self._pull_model_from_mm(
-                    context,
-                    submodel,
-                    self.t5_encoder.name,
-                    ModelType.T5Encoder,
-                    BaseModelType.Any,
-                )
-            case _:
-                raise Exception(f"{submodel.value} is not a supported submodule for a flux model")
-
-    def _pull_model_from_mm(
-        self,
-        context: InvocationContext,
-        submodel: SubModelType,
-        name: str,
-        type: ModelType,
-        base: BaseModelType,
-    ):
-        if models := context.models.search_by_attrs(name=name, base=base, type=type):
-            if len(models) != 1:
-                raise Exception(f"Multiple models detected for selected model with name {name}")
-            return ModelIdentifierField.from_config(models[0]).model_copy(update={"submodel_type": submodel})
-        else:
-            raise ValueError(f"Please install the {base}:{type} model named {name} via starter models")
-
-
@invocation(
    "main_model_loader",
    title="Main Model",
--- a/invokeai/app/invocations/primitives.py
+++ b/invokeai/app/invocations/primitives.py
@ -12,7 +12,6 @@ from invokeai.app.invocations.fields import (
    ConditioningField,
    DenoiseMaskField,
    FieldDescriptions,
-    FluxConditioningField,
    ImageField,
    Input,
    InputField,
@ -415,17 +414,6 @@ class MaskOutput(BaseInvocationOutput):
    height: int = OutputField(description="The height of the mask in pixels.")


-@invocation_output("flux_conditioning_output")
-class FluxConditioningOutput(BaseInvocationOutput):
-    """Base class for nodes that output a single conditioning tensor"""
-
-    conditioning: FluxConditioningField = OutputField(description=FieldDescriptions.cond)
-
-    @classmethod
-    def build(cls, conditioning_name: str) -> "FluxConditioningOutput":
-        return cls(conditioning=FluxConditioningField(conditioning_name=conditioning_name))
-
-
@invocation_output("conditioning_output")
 class ConditioningOutput(BaseInvocationOutput):
    """Base class for nodes that output a single conditioning tensor"""
--- a/invokeai/app/services/events/events_common.py
+++ b/invokeai/app/services/events/events_common.py
@ -88,7 +88,6 @@ class QueueItemEventBase(QueueEventBase):

    item_id: int = Field(description="The ID of the queue item")
    batch_id: str = Field(description="The ID of the queue batch")
-    origin: str | None = Field(default=None, description="The origin of the batch")


 class InvocationEventBase(QueueItemEventBase):
@ -96,6 +95,8 @@ class InvocationEventBase(QueueItemEventBase):

    session_id: str = Field(description="The ID of the session (aka graph execution state)")
    queue_id: str = Field(description="The ID of the queue")
+    item_id: int = Field(description="The ID of the queue item")
+    batch_id: str = Field(description="The ID of the queue batch")
    session_id: str = Field(description="The ID of the session (aka graph execution state)")
    invocation: AnyInvocation = Field(description="The ID of the invocation")
    invocation_source_id: str = Field(description="The ID of the prepared invocation's source node")
@ -113,7 +114,6 @@ class InvocationStartedEvent(InvocationEventBase):
            queue_id=queue_item.queue_id,
            item_id=queue_item.item_id,
            batch_id=queue_item.batch_id,
-            origin=queue_item.origin,
            session_id=queue_item.session_id,
            invocation=invocation,
            invocation_source_id=queue_item.session.prepared_source_mapping[invocation.id],
@ -147,7 +147,6 @@ class InvocationDenoiseProgressEvent(InvocationEventBase):
            queue_id=queue_item.queue_id,
            item_id=queue_item.item_id,
            batch_id=queue_item.batch_id,
-            origin=queue_item.origin,
            session_id=queue_item.session_id,
            invocation=invocation,
            invocation_source_id=queue_item.session.prepared_source_mapping[invocation.id],
@ -185,7 +184,6 @@ class InvocationCompleteEvent(InvocationEventBase):
            queue_id=queue_item.queue_id,
            item_id=queue_item.item_id,
            batch_id=queue_item.batch_id,
-            origin=queue_item.origin,
            session_id=queue_item.session_id,
            invocation=invocation,
            invocation_source_id=queue_item.session.prepared_source_mapping[invocation.id],
@ -218,7 +216,6 @@ class InvocationErrorEvent(InvocationEventBase):
            queue_id=queue_item.queue_id,
            item_id=queue_item.item_id,
            batch_id=queue_item.batch_id,
-            origin=queue_item.origin,
            session_id=queue_item.session_id,
            invocation=invocation,
            invocation_source_id=queue_item.session.prepared_source_mapping[invocation.id],
@ -256,7 +253,6 @@ class QueueItemStatusChangedEvent(QueueItemEventBase):
            queue_id=queue_item.queue_id,
            item_id=queue_item.item_id,
            batch_id=queue_item.batch_id,
-            origin=queue_item.origin,
            session_id=queue_item.session_id,
            status=queue_item.status,
            error_type=queue_item.error_type,
@ -283,14 +279,12 @@ class BatchEnqueuedEvent(QueueEventBase):
        description="The number of invocations initially requested to be enqueued (may be less than enqueued if queue was full)"
    )
    priority: int = Field(description="The priority of the batch")
-    origin: str | None = Field(default=None, description="The origin of the batch")

    @classmethod
    def build(cls, enqueue_result: EnqueueBatchResult) -> "BatchEnqueuedEvent":
        return cls(
            queue_id=enqueue_result.queue_id,
            batch_id=enqueue_result.batch.batch_id,
-            origin=enqueue_result.batch.origin,
            enqueued=enqueue_result.enqueued,
            requested=enqueue_result.requested,
            priority=enqueue_result.priority,
--- a/invokeai/app/services/model_install/model_install_default.py
+++ b/invokeai/app/services/model_install/model_install_default.py
@ -783,9 +783,8 @@ class ModelInstallService(ModelInstallServiceBase):
        # So what we do is to synthesize a folder named "sdxl-turbo_vae" here.
        if subfolder:
            top = Path(remote_files[0].path.parts[0])  # e.g. "sdxl-turbo/"
-            path_to_remove = top / subfolder  # sdxl-turbo/vae/
-            subfolder_rename = subfolder.name.replace("/", "_").replace("\\", "_")
-            path_to_add = Path(f"{top}_{subfolder_rename}")
+            path_to_remove = top / subfolder.parts[-1]  # sdxl-turbo/vae/
+            path_to_add = Path(f"{top}_{subfolder}")
        else:
            path_to_remove = Path(".")
            path_to_add = Path(".")
--- a/invokeai/app/services/model_records/model_records_base.py
+++ b/invokeai/app/services/model_records/model_records_base.py
@ -77,7 +77,6 @@ class ModelRecordChanges(BaseModelExcludeNull):
    type: Optional[ModelType] = Field(description="Type of model", default=None)
    key: Optional[str] = Field(description="Database ID for this model", default=None)
    hash: Optional[str] = Field(description="hash of model file", default=None)
-    format: Optional[str] = Field(description="format of model file", default=None)
    trigger_phrases: Optional[set[str]] = Field(description="Set of trigger phrases for this model", default=None)
    default_settings: Optional[MainModelDefaultSettings | ControlAdapterDefaultSettings] = Field(
        description="Default settings for this model", default=None
--- a/invokeai/app/services/session_queue/session_queue_base.py
+++ b/invokeai/app/services/session_queue/session_queue_base.py
@ -6,7 +6,6 @@ from invokeai.app.services.session_queue.session_queue_common import (
    Batch,
    BatchStatus,
    CancelByBatchIDsResult,
-    CancelByOriginResult,
    CancelByQueueIDResult,
    ClearResult,
    EnqueueBatchResult,
@ -96,11 +95,6 @@ class SessionQueueBase(ABC):
        """Cancels all queue items with matching batch IDs"""
        pass

-    @abstractmethod
-    def cancel_by_origin(self, queue_id: str, origin: str) -> CancelByOriginResult:
-        """Cancels all queue items with the given batch origin"""
-        pass
-
    @abstractmethod
    def cancel_by_queue_id(self, queue_id: str) -> CancelByQueueIDResult:
        """Cancels all queue items with matching queue ID"""
--- a/invokeai/app/services/session_queue/session_queue_common.py
+++ b/invokeai/app/services/session_queue/session_queue_common.py
@ -77,7 +77,6 @@ BatchDataCollection: TypeAlias = list[list[BatchDatum]]

 class Batch(BaseModel):
    batch_id: str = Field(default_factory=uuid_string, description="The ID of the batch")
-    origin: str | None = Field(default=None, description="The origin of this batch.")
    data: Optional[BatchDataCollection] = Field(default=None, description="The batch data collection.")
    graph: Graph = Field(description="The graph to initialize the session with")
    workflow: Optional[WorkflowWithoutID] = Field(
@ -196,7 +195,6 @@ class SessionQueueItemWithoutGraph(BaseModel):
    status: QUEUE_ITEM_STATUS = Field(default="pending", description="The status of this queue item")
    priority: int = Field(default=0, description="The priority of this queue item")
    batch_id: str = Field(description="The ID of the batch associated with this queue item")
-    origin: str | None = Field(default=None, description="The origin of this queue item. ")
    session_id: str = Field(
        description="The ID of the session associated with this queue item. The session doesn't exist in graph_executions until the queue item is executed."
    )
@ -296,7 +294,6 @@ class SessionQueueStatus(BaseModel):
 class BatchStatus(BaseModel):
    queue_id: str = Field(..., description="The ID of the queue")
    batch_id: str = Field(..., description="The ID of the batch")
-    origin: str | None = Field(..., description="The origin of the batch")
    pending: int = Field(..., description="Number of queue items with status 'pending'")
    in_progress: int = Field(..., description="Number of queue items with status 'in_progress'")
    completed: int = Field(..., description="Number of queue items with status 'complete'")
@ -331,12 +328,6 @@ class CancelByBatchIDsResult(BaseModel):
    canceled: int = Field(..., description="Number of queue items canceled")


-class CancelByOriginResult(BaseModel):
-    """Result of canceling by list of batch ids"""
-
-    canceled: int = Field(..., description="Number of queue items canceled")
-
-
 class CancelByQueueIDResult(CancelByBatchIDsResult):
    """Result of canceling by queue id"""

@ -442,7 +433,6 @@ class SessionQueueValueToInsert(NamedTuple):
    field_values: Optional[str]  # field_values json
    priority: int  # priority
    workflow: Optional[str]  # workflow json
-    origin: str | None


 ValuesToInsert: TypeAlias = list[SessionQueueValueToInsert]
@ -463,7 +453,6 @@ def prepare_values_to_insert(queue_id: str, batch: Batch, priority: int, max_new
                json.dumps(field_values, default=to_jsonable_python) if field_values else None,  # field_values (json)
                priority,  # priority
                json.dumps(workflow, default=to_jsonable_python) if workflow else None,  # workflow (json)
-                batch.origin,  # origin
            )
        )
    return values_to_insert
--- a/invokeai/app/services/session_queue/session_queue_sqlite.py
+++ b/invokeai/app/services/session_queue/session_queue_sqlite.py
@ -10,7 +10,6 @@ from invokeai.app.services.session_queue.session_queue_common import (
    Batch,
    BatchStatus,
    CancelByBatchIDsResult,
-    CancelByOriginResult,
    CancelByQueueIDResult,
    ClearResult,
    EnqueueBatchResult,
@ -128,8 +127,8 @@ class SqliteSessionQueue(SessionQueueBase):

            self.__cursor.executemany(
                """--sql
-                INSERT INTO session_queue (queue_id, session, session_id, batch_id, field_values, priority, workflow, origin)
-                VALUES (?, ?, ?, ?, ?, ?, ?, ?)
+                INSERT INTO session_queue (queue_id, session, session_id, batch_id, field_values, priority, workflow)
+                VALUES (?, ?, ?, ?, ?, ?, ?)
                """,
                values_to_insert,
            )
@ -418,7 +417,11 @@ class SqliteSessionQueue(SessionQueueBase):
            )
            self.__conn.commit()
            if current_queue_item is not None and current_queue_item.batch_id in batch_ids:
-                self._set_queue_item_status(current_queue_item.item_id, "canceled")
+                batch_status = self.get_batch_status(queue_id=queue_id, batch_id=current_queue_item.batch_id)
+                queue_status = self.get_queue_status(queue_id=queue_id)
+                self.__invoker.services.events.emit_queue_item_status_changed(
+                    current_queue_item, batch_status, queue_status
+                )
        except Exception:
            self.__conn.rollback()
            raise
@ -426,46 +429,6 @@ class SqliteSessionQueue(SessionQueueBase):
            self.__lock.release()
        return CancelByBatchIDsResult(canceled=count)

-    def cancel_by_origin(self, queue_id: str, origin: str) -> CancelByOriginResult:
-        try:
-            current_queue_item = self.get_current(queue_id)
-            self.__lock.acquire()
-            where = """--sql
-                WHERE
-                  queue_id == ?
-                  AND origin == ?
-                  AND status != 'canceled'
-                  AND status != 'completed'
-                  AND status != 'failed'
-                """
-            params = (queue_id, origin)
-            self.__cursor.execute(
-                f"""--sql
-                SELECT COUNT(*)
-                FROM session_queue
-                {where};
-                """,
-                params,
-            )
-            count = self.__cursor.fetchone()[0]
-            self.__cursor.execute(
-                f"""--sql
-                UPDATE session_queue
-                SET status = 'canceled'
-                {where};
-                """,
-                params,
-            )
-            self.__conn.commit()
-            if current_queue_item is not None and current_queue_item.origin == origin:
-                self._set_queue_item_status(current_queue_item.item_id, "canceled")
-        except Exception:
-            self.__conn.rollback()
-            raise
-        finally:
-            self.__lock.release()
-        return CancelByOriginResult(canceled=count)
-
    def cancel_by_queue_id(self, queue_id: str) -> CancelByQueueIDResult:
        try:
            current_queue_item = self.get_current(queue_id)
@ -578,8 +541,7 @@ class SqliteSessionQueue(SessionQueueBase):
                    started_at,
                    session_id,
                    batch_id,
-                    queue_id,
-                    origin
+                    queue_id
                FROM session_queue
                WHERE queue_id = ?
            """
@ -659,7 +621,7 @@ class SqliteSessionQueue(SessionQueueBase):
            self.__lock.acquire()
            self.__cursor.execute(
                """--sql
-                SELECT status, count(*), origin
+                SELECT status, count(*)
                FROM session_queue
                WHERE
                  queue_id = ?
@ -671,7 +633,6 @@ class SqliteSessionQueue(SessionQueueBase):
            result = cast(list[sqlite3.Row], self.__cursor.fetchall())
            total = sum(row[1] for row in result)
            counts: dict[str, int] = {row[0]: row[1] for row in result}
-            origin = result[0]["origin"] if result else None
        except Exception:
            self.__conn.rollback()
            raise
@ -680,7 +641,6 @@ class SqliteSessionQueue(SessionQueueBase):

        return BatchStatus(
            batch_id=batch_id,
-            origin=origin,
            queue_id=queue_id,
            pending=counts.get("pending", 0),
            in_progress=counts.get("in_progress", 0),
--- a/invokeai/app/services/shared/sqlite/sqlite_util.py
+++ b/invokeai/app/services/shared/sqlite/sqlite_util.py
@ -17,7 +17,6 @@ from invokeai.app.services.shared.sqlite_migrator.migrations.migration_11 import
 from invokeai.app.services.shared.sqlite_migrator.migrations.migration_12 import build_migration_12
 from invokeai.app.services.shared.sqlite_migrator.migrations.migration_13 import build_migration_13
 from invokeai.app.services.shared.sqlite_migrator.migrations.migration_14 import build_migration_14
-from invokeai.app.services.shared.sqlite_migrator.migrations.migration_15 import build_migration_15
 from invokeai.app.services.shared.sqlite_migrator.sqlite_migrator_impl import SqliteMigrator


@ -52,7 +51,6 @@ def init_db(config: InvokeAIAppConfig, logger: Logger, image_files: ImageFileSto
    migrator.register_migration(build_migration_12(app_config=config))
    migrator.register_migration(build_migration_13())
    migrator.register_migration(build_migration_14())
-    migrator.register_migration(build_migration_15())
    migrator.run_migrations()

    return db
--- a/invokeai/app/services/shared/sqlite_migrator/migrations/migration_15.py
+++ b/invokeai/app/services/shared/sqlite_migrator/migrations/migration_15.py
@ -1,31 +0,0 @@
-import sqlite3
-
-from invokeai.app.services.shared.sqlite_migrator.sqlite_migrator_common import Migration
-
-
-class Migration15Callback:
-    def __call__(self, cursor: sqlite3.Cursor) -> None:
-        self._add_origin_col(cursor)
-
-    def _add_origin_col(self, cursor: sqlite3.Cursor) -> None:
-        """
-        - Adds `origin` column to the session queue table.
-        """
-
-        cursor.execute("ALTER TABLE session_queue ADD COLUMN origin TEXT;")
-
-
-def build_migration_15() -> Migration:
-    """
-    Build the migration from database version 14 to 15.
-
-    This migration does the following:
-        - Adds `origin` column to the session queue table.
-    """
-    migration_15 = Migration(
-        from_version=14,
-        to_version=15,
-        callback=Migration15Callback(),
-    )
-
-    return migration_15
--- a/invokeai/app/services/style_preset_records/style_preset_records_common.py
+++ b/invokeai/app/services/style_preset_records/style_preset_records_common.py
@ -32,7 +32,6 @@ class PresetType(str, Enum, metaclass=MetaEnum):
 class StylePresetChanges(BaseModel, extra="forbid"):
    name: Optional[str] = Field(default=None, description="The style preset's new name.")
    preset_data: Optional[PresetData] = Field(default=None, description="The updated data for style preset.")
-    type: Optional[PresetType] = Field(description="The updated type of the style preset")


 class StylePresetWithoutId(BaseModel):
--- a/invokeai/app/services/workflow_records/default_workflows/Flux
+++ b/invokeai/app/services/workflow_records/default_workflows/Flux
@ -1,266 +0,0 @@
-{
-  "name": "FLUX Text to Image",
-  "author": "InvokeAI",
-  "description": "A simple text-to-image workflow using FLUX dev or schnell models. Prerequisite model downloads: T5 Encoder, CLIP-L Encoder, and FLUX VAE. Quantized and un-quantized versions can be found in the starter models tab within your Model Manager. We recommend 4 steps for FLUX schnell models and 30 steps for FLUX dev models.",
-  "version": "1.0.0",
-  "contact": "",
-  "tags": "text2image, flux",
-  "notes": "Prerequisite model downloads: T5 Encoder, CLIP-L Encoder, and FLUX VAE. Quantized and un-quantized versions can be found in the starter models tab within your Model Manager. We recommend 4 steps for FLUX schnell models and 30 steps for FLUX dev models.",
-  "exposedFields": [
-    {
-      "nodeId": "4f0207c2-ff40-41fd-b047-ad33fbb1c33a",
-      "fieldName": "model"
-    },
-    {
-      "nodeId": "01f674f8-b3d1-4df1-acac-6cb8e0bfb63c",
-      "fieldName": "prompt"
-    },
-    {
-      "nodeId": "159bdf1b-79e7-4174-b86e-d40e646964c8",
-      "fieldName": "num_steps"
-    },
-    {
-      "nodeId": "4f0207c2-ff40-41fd-b047-ad33fbb1c33a",
-      "fieldName": "t5_encoder"
-    }
-  ],
-  "meta": {
-    "version": "3.0.0",
-    "category": "default"
-  },
-  "nodes": [
-    {
-      "id": "4f0207c2-ff40-41fd-b047-ad33fbb1c33a",
-      "type": "invocation",
-      "data": {
-        "id": "4f0207c2-ff40-41fd-b047-ad33fbb1c33a",
-        "type": "flux_model_loader",
-        "version": "1.0.3",
-        "label": "",
-        "notes": "",
-        "isOpen": true,
-        "isIntermediate": true,
-        "useCache": false,
-        "inputs": {
-          "model": {
-            "name": "model",
-            "label": "Model (Starter Models can be found in Model Manager)",
-            "value": {
-              "key": "f04a7a2f-c74d-4538-8d5e-879a53501662",
-              "hash": "random:4875da7a9508444ffa706f61961c260d0c6729f6181a86b31fad06df1277b850",
-              "name": "FLUX Dev (Quantized)",
-              "base": "flux",
-              "type": "main"
-            }
-          },
-          "t5_encoder": {
-            "name": "t5_encoder",
-            "label": "T 5 Encoder (Starter Models can be found in Model Manager)",
-            "value": {
-              "key": "20dcd9ec-5fbb-4012-8401-049e707da5e5",
-              "hash": "random:f986be43ff3502169e4adbdcee158afb0e0a65a1edc4cab16ae59963630cfd8f",
-              "name": "t5_bnb_int8_quantized_encoder",
-              "base": "any",
-              "type": "t5_encoder"
-            }
-          }
-        }
-      },
-      "position": {
-        "x": 337.09365228062825,
-        "y": 40.63469521079861
-      }
-    },
-    {
-      "id": "01f674f8-b3d1-4df1-acac-6cb8e0bfb63c",
-      "type": "invocation",
-      "data": {
-        "id": "01f674f8-b3d1-4df1-acac-6cb8e0bfb63c",
-        "type": "flux_text_encoder",
-        "version": "1.0.0",
-        "label": "",
-        "notes": "",
-        "isOpen": true,
-        "isIntermediate": true,
-        "useCache": true,
-        "inputs": {
-          "clip": {
-            "name": "clip",
-            "label": ""
-          },
-          "t5_encoder": {
-            "name": "t5_encoder",
-            "label": ""
-          },
-          "t5_max_seq_len": {
-            "name": "t5_max_seq_len",
-            "label": "T5 Max Seq Len",
-            "value": 256
-          },
-          "prompt": {
-            "name": "prompt",
-            "label": "",
-            "value": "a cat"
-          }
-        }
-      },
-      "position": {
-        "x": 824.1970602278849,
-        "y": 146.98251001061735
-      }
-    },
-    {
-      "id": "4754c534-a5f3-4ad0-9382-7887985e668c",
-      "type": "invocation",
-      "data": {
-        "id": "4754c534-a5f3-4ad0-9382-7887985e668c",
-        "type": "rand_int",
-        "version": "1.0.1",
-        "label": "",
-        "notes": "",
-        "isOpen": true,
-        "isIntermediate": true,
-        "useCache": false,
-        "inputs": {
-          "low": {
-            "name": "low",
-            "label": "",
-            "value": 0
-          },
-          "high": {
-            "name": "high",
-            "label": "",
-            "value": 2147483647
-          }
-        }
-      },
-      "position": {
-        "x": 822.9899179655476,
-        "y": 360.9657214885052
-      }
-    },
-    {
-      "id": "159bdf1b-79e7-4174-b86e-d40e646964c8",
-      "type": "invocation",
-      "data": {
-        "id": "159bdf1b-79e7-4174-b86e-d40e646964c8",
-        "type": "flux_text_to_image",
-        "version": "1.0.0",
-        "label": "",
-        "notes": "",
-        "isOpen": true,
-        "isIntermediate": false,
-        "useCache": true,
-        "inputs": {
-          "board": {
-            "name": "board",
-            "label": ""
-          },
-          "metadata": {
-            "name": "metadata",
-            "label": ""
-          },
-          "transformer": {
-            "name": "transformer",
-            "label": ""
-          },
-          "vae": {
-            "name": "vae",
-            "label": ""
-          },
-          "positive_text_conditioning": {
-            "name": "positive_text_conditioning",
-            "label": ""
-          },
-          "width": {
-            "name": "width",
-            "label": "",
-            "value": 1024
-          },
-          "height": {
-            "name": "height",
-            "label": "",
-            "value": 1024
-          },
-          "num_steps": {
-            "name": "num_steps",
-            "label": "Steps (Recommend 30 for Dev, 4 for Schnell)",
-            "value": 30
-          },
-          "guidance": {
-            "name": "guidance",
-            "label": "",
-            "value": 4
-          },
-          "seed": {
-            "name": "seed",
-            "label": "",
-            "value": 0
-          }
-        }
-      },
-      "position": {
-        "x": 1216.3900791301849,
-        "y": 5.500841807102248
-      }
-    }
-  ],
-  "edges": [
-    {
-      "id": "reactflow__edge-4f0207c2-ff40-41fd-b047-ad33fbb1c33amax_seq_len-01f674f8-b3d1-4df1-acac-6cb8e0bfb63ct5_max_seq_len",
-      "type": "default",
-      "source": "4f0207c2-ff40-41fd-b047-ad33fbb1c33a",
-      "target": "01f674f8-b3d1-4df1-acac-6cb8e0bfb63c",
-      "sourceHandle": "max_seq_len",
-      "targetHandle": "t5_max_seq_len"
-    },
-    {
-      "id": "reactflow__edge-4f0207c2-ff40-41fd-b047-ad33fbb1c33avae-159bdf1b-79e7-4174-b86e-d40e646964c8vae",
-      "type": "default",
-      "source": "4f0207c2-ff40-41fd-b047-ad33fbb1c33a",
-      "target": "159bdf1b-79e7-4174-b86e-d40e646964c8",
-      "sourceHandle": "vae",
-      "targetHandle": "vae"
-    },
-    {
-      "id": "reactflow__edge-4f0207c2-ff40-41fd-b047-ad33fbb1c33atransformer-159bdf1b-79e7-4174-b86e-d40e646964c8transformer",
-      "type": "default",
-      "source": "4f0207c2-ff40-41fd-b047-ad33fbb1c33a",
-      "target": "159bdf1b-79e7-4174-b86e-d40e646964c8",
-      "sourceHandle": "transformer",
-      "targetHandle": "transformer"
-    },
-    {
-      "id": "reactflow__edge-4f0207c2-ff40-41fd-b047-ad33fbb1c33at5_encoder-01f674f8-b3d1-4df1-acac-6cb8e0bfb63ct5_encoder",
-      "type": "default",
-      "source": "4f0207c2-ff40-41fd-b047-ad33fbb1c33a",
-      "target": "01f674f8-b3d1-4df1-acac-6cb8e0bfb63c",
-      "sourceHandle": "t5_encoder",
-      "targetHandle": "t5_encoder"
-    },
-    {
-      "id": "reactflow__edge-4f0207c2-ff40-41fd-b047-ad33fbb1c33aclip-01f674f8-b3d1-4df1-acac-6cb8e0bfb63cclip",
-      "type": "default",
-      "source": "4f0207c2-ff40-41fd-b047-ad33fbb1c33a",
-      "target": "01f674f8-b3d1-4df1-acac-6cb8e0bfb63c",
-      "sourceHandle": "clip",
-      "targetHandle": "clip"
-    },
-    {
-      "id": "reactflow__edge-01f674f8-b3d1-4df1-acac-6cb8e0bfb63cconditioning-159bdf1b-79e7-4174-b86e-d40e646964c8positive_text_conditioning",
-      "type": "default",
-      "source": "01f674f8-b3d1-4df1-acac-6cb8e0bfb63c",
-      "target": "159bdf1b-79e7-4174-b86e-d40e646964c8",
-      "sourceHandle": "conditioning",
-      "targetHandle": "positive_text_conditioning"
-    },
-    {
-      "id": "reactflow__edge-4754c534-a5f3-4ad0-9382-7887985e668cvalue-159bdf1b-79e7-4174-b86e-d40e646964c8seed",
-      "type": "default",
-      "source": "4754c534-a5f3-4ad0-9382-7887985e668c",
-      "target": "159bdf1b-79e7-4174-b86e-d40e646964c8",
-      "sourceHandle": "value",
-      "targetHandle": "seed"
-    }
-  ]
-}
--- a/invokeai/backend/assets/sd_base_conf_files/controlnet_sd15/config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/controlnet_sd15/config.json
@ -0,0 +1,42 @@
+{
+  "_class_name": "ControlNetModel",
+  "_diffusers_version": "0.16.0.dev0",
+  "_name_or_path": "/home/patrick/controlnet_v1_1/control_v11p_sd15_canny",
+  "act_fn": "silu",
+  "attention_head_dim": 8,
+  "block_out_channels": [
+    320,
+    640,
+    1280,
+    1280
+  ],
+  "class_embed_type": null,
+  "conditioning_embedding_out_channels": [
+    16,
+    32,
+    96,
+    256
+  ],
+  "controlnet_conditioning_channel_order": "rgb",
+  "cross_attention_dim": 768,
+  "down_block_types": [
+    "CrossAttnDownBlock2D",
+    "CrossAttnDownBlock2D",
+    "CrossAttnDownBlock2D",
+    "DownBlock2D"
+  ],
+  "downsample_padding": 1,
+  "flip_sin_to_cos": true,
+  "freq_shift": 0,
+  "in_channels": 4,
+  "layers_per_block": 2,
+  "mid_block_scale_factor": 1,
+  "norm_eps": 1e-05,
+  "norm_num_groups": 32,
+  "num_class_embeds": null,
+  "only_cross_attention": false,
+  "projection_class_embeddings_input_dim": null,
+  "resnet_time_scale_shift": "default",
+  "upcast_attention": false,
+  "use_linear_projection": false
+}
--- a/invokeai/backend/assets/sd_base_conf_files/controlnet_sdxl/config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/controlnet_sdxl/config.json
@ -0,0 +1,56 @@
+{
+  "_class_name": "ControlNetModel",
+  "_diffusers_version": "0.19.3",
+  "act_fn": "silu",
+  "addition_embed_type": "text_time",
+  "addition_embed_type_num_heads": 64,
+  "addition_time_embed_dim": 256,
+  "attention_head_dim": [
+    5,
+    10,
+    20
+  ],
+  "block_out_channels": [
+    320,
+    640,
+    1280
+  ],
+  "class_embed_type": null,
+  "conditioning_channels": 3,
+  "conditioning_embedding_out_channels": [
+    16,
+    32,
+    96,
+    256
+  ],
+  "controlnet_conditioning_channel_order": "rgb",
+  "cross_attention_dim": 2048,
+  "down_block_types": [
+    "DownBlock2D",
+    "CrossAttnDownBlock2D",
+    "CrossAttnDownBlock2D"
+  ],
+  "downsample_padding": 1,
+  "encoder_hid_dim": null,
+  "encoder_hid_dim_type": null,
+  "flip_sin_to_cos": true,
+  "freq_shift": 0,
+  "global_pool_conditions": false,
+  "in_channels": 4,
+  "layers_per_block": 2,
+  "mid_block_scale_factor": 1,
+  "norm_eps": 1e-05,
+  "norm_num_groups": 32,
+  "num_attention_heads": null,
+  "num_class_embeds": null,
+  "only_cross_attention": false,
+  "projection_class_embeddings_input_dim": 2816,
+  "resnet_time_scale_shift": "default",
+  "transformer_layers_per_block": [
+    1,
+    2,
+    10
+  ],
+  "upcast_attention": null,
+  "use_linear_projection": true
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-epsilon/feature_extractor/preprocessor_config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-epsilon/feature_extractor/preprocessor_config.json
@ -0,0 +1,20 @@
+{
+  "crop_size": 224,
+  "do_center_crop": true,
+  "do_convert_rgb": true,
+  "do_normalize": true,
+  "do_resize": true,
+  "feature_extractor_type": "CLIPFeatureExtractor",
+  "image_mean": [
+    0.48145466,
+    0.4578275,
+    0.40821073
+  ],
+  "image_std": [
+    0.26862954,
+    0.26130258,
+    0.27577711
+  ],
+  "resample": 3,
+  "size": 224
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-epsilon/model_index.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-epsilon/model_index.json
@ -0,0 +1,32 @@
+{
+  "_class_name": "StableDiffusionPipeline",
+  "_diffusers_version": "0.6.0",
+  "feature_extractor": [
+    "transformers",
+    "CLIPImageProcessor"
+  ],
+  "safety_checker": [
+    "stable_diffusion",
+    "StableDiffusionSafetyChecker"
+  ],
+  "scheduler": [
+    "diffusers",
+    "PNDMScheduler"
+  ],
+  "text_encoder": [
+    "transformers",
+    "CLIPTextModel"
+  ],
+  "tokenizer": [
+    "transformers",
+    "CLIPTokenizer"
+  ],
+  "unet": [
+    "diffusers",
+    "UNet2DConditionModel"
+  ],
+  "vae": [
+    "diffusers",
+    "AutoencoderKL"
+  ]
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-epsilon/safety_checker/config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-epsilon/safety_checker/config.json
@ -0,0 +1,175 @@
+{
+  "_commit_hash": "4bb648a606ef040e7685bde262611766a5fdd67b",
+  "_name_or_path": "CompVis/stable-diffusion-safety-checker",
+  "architectures": [
+    "StableDiffusionSafetyChecker"
+  ],
+  "initializer_factor": 1.0,
+  "logit_scale_init_value": 2.6592,
+  "model_type": "clip",
+  "projection_dim": 768,
+  "text_config": {
+    "_name_or_path": "",
+    "add_cross_attention": false,
+    "architectures": null,
+    "attention_dropout": 0.0,
+    "bad_words_ids": null,
+    "bos_token_id": 0,
+    "chunk_size_feed_forward": 0,
+    "cross_attention_hidden_size": null,
+    "decoder_start_token_id": null,
+    "diversity_penalty": 0.0,
+    "do_sample": false,
+    "dropout": 0.0,
+    "early_stopping": false,
+    "encoder_no_repeat_ngram_size": 0,
+    "eos_token_id": 2,
+    "exponential_decay_length_penalty": null,
+    "finetuning_task": null,
+    "forced_bos_token_id": null,
+    "forced_eos_token_id": null,
+    "hidden_act": "quick_gelu",
+    "hidden_size": 768,
+    "id2label": {
+      "0": "LABEL_0",
+      "1": "LABEL_1"
+    },
+    "initializer_factor": 1.0,
+    "initializer_range": 0.02,
+    "intermediate_size": 3072,
+    "is_decoder": false,
+    "is_encoder_decoder": false,
+    "label2id": {
+      "LABEL_0": 0,
+      "LABEL_1": 1
+    },
+    "layer_norm_eps": 1e-05,
+    "length_penalty": 1.0,
+    "max_length": 20,
+    "max_position_embeddings": 77,
+    "min_length": 0,
+    "model_type": "clip_text_model",
+    "no_repeat_ngram_size": 0,
+    "num_attention_heads": 12,
+    "num_beam_groups": 1,
+    "num_beams": 1,
+    "num_hidden_layers": 12,
+    "num_return_sequences": 1,
+    "output_attentions": false,
+    "output_hidden_states": false,
+    "output_scores": false,
+    "pad_token_id": 1,
+    "prefix": null,
+    "problem_type": null,
+    "pruned_heads": {},
+    "remove_invalid_values": false,
+    "repetition_penalty": 1.0,
+    "return_dict": true,
+    "return_dict_in_generate": false,
+    "sep_token_id": null,
+    "task_specific_params": null,
+    "temperature": 1.0,
+    "tf_legacy_loss": false,
+    "tie_encoder_decoder": false,
+    "tie_word_embeddings": true,
+    "tokenizer_class": null,
+    "top_k": 50,
+    "top_p": 1.0,
+    "torch_dtype": null,
+    "torchscript": false,
+    "transformers_version": "4.22.0.dev0",
+    "typical_p": 1.0,
+    "use_bfloat16": false,
+    "vocab_size": 49408
+  },
+  "text_config_dict": {
+    "hidden_size": 768,
+    "intermediate_size": 3072,
+    "num_attention_heads": 12,
+    "num_hidden_layers": 12
+  },
+  "torch_dtype": "float32",
+  "transformers_version": null,
+  "vision_config": {
+    "_name_or_path": "",
+    "add_cross_attention": false,
+    "architectures": null,
+    "attention_dropout": 0.0,
+    "bad_words_ids": null,
+    "bos_token_id": null,
+    "chunk_size_feed_forward": 0,
+    "cross_attention_hidden_size": null,
+    "decoder_start_token_id": null,
+    "diversity_penalty": 0.0,
+    "do_sample": false,
+    "dropout": 0.0,
+    "early_stopping": false,
+    "encoder_no_repeat_ngram_size": 0,
+    "eos_token_id": null,
+    "exponential_decay_length_penalty": null,
+    "finetuning_task": null,
+    "forced_bos_token_id": null,
+    "forced_eos_token_id": null,
+    "hidden_act": "quick_gelu",
+    "hidden_size": 1024,
+    "id2label": {
+      "0": "LABEL_0",
+      "1": "LABEL_1"
+    },
+    "image_size": 224,
+    "initializer_factor": 1.0,
+    "initializer_range": 0.02,
+    "intermediate_size": 4096,
+    "is_decoder": false,
+    "is_encoder_decoder": false,
+    "label2id": {
+      "LABEL_0": 0,
+      "LABEL_1": 1
+    },
+    "layer_norm_eps": 1e-05,
+    "length_penalty": 1.0,
+    "max_length": 20,
+    "min_length": 0,
+    "model_type": "clip_vision_model",
+    "no_repeat_ngram_size": 0,
+    "num_attention_heads": 16,
+    "num_beam_groups": 1,
+    "num_beams": 1,
+    "num_channels": 3,
+    "num_hidden_layers": 24,
+    "num_return_sequences": 1,
+    "output_attentions": false,
+    "output_hidden_states": false,
+    "output_scores": false,
+    "pad_token_id": null,
+    "patch_size": 14,
+    "prefix": null,
+    "problem_type": null,
+    "pruned_heads": {},
+    "remove_invalid_values": false,
+    "repetition_penalty": 1.0,
+    "return_dict": true,
+    "return_dict_in_generate": false,
+    "sep_token_id": null,
+    "task_specific_params": null,
+    "temperature": 1.0,
+    "tf_legacy_loss": false,
+    "tie_encoder_decoder": false,
+    "tie_word_embeddings": true,
+    "tokenizer_class": null,
+    "top_k": 50,
+    "top_p": 1.0,
+    "torch_dtype": null,
+    "torchscript": false,
+    "transformers_version": "4.22.0.dev0",
+    "typical_p": 1.0,
+    "use_bfloat16": false
+  },
+  "vision_config_dict": {
+    "hidden_size": 1024,
+    "intermediate_size": 4096,
+    "num_attention_heads": 16,
+    "num_hidden_layers": 24,
+    "patch_size": 14
+  }
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-epsilon/scheduler/scheduler_config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-epsilon/scheduler/scheduler_config.json
@ -0,0 +1,13 @@
+{
+  "_class_name": "PNDMScheduler",
+  "_diffusers_version": "0.6.0",
+  "beta_end": 0.012,
+  "beta_schedule": "scaled_linear",
+  "beta_start": 0.00085,
+  "num_train_timesteps": 1000,
+  "set_alpha_to_one": false,
+  "skip_prk_steps": true,
+  "steps_offset": 1,
+  "trained_betas": null,
+  "clip_sample": false
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-epsilon/text_encoder/config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-epsilon/text_encoder/config.json
@ -0,0 +1,25 @@
+{
+  "_name_or_path": "openai/clip-vit-large-patch14",
+  "architectures": [
+    "CLIPTextModel"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 0,
+  "dropout": 0.0,
+  "eos_token_id": 2,
+  "hidden_act": "quick_gelu",
+  "hidden_size": 768,
+  "initializer_factor": 1.0,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "layer_norm_eps": 1e-05,
+  "max_position_embeddings": 77,
+  "model_type": "clip_text_model",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 1,
+  "projection_dim": 768,
+  "torch_dtype": "float32",
+  "transformers_version": "4.22.0.dev0",
+  "vocab_size": 49408
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-epsilon/tokenizer/merges.txt
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-epsilon/tokenizer/merges.txt
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-epsilon/tokenizer/special_tokens_map.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-epsilon/tokenizer/special_tokens_map.json
@ -0,0 +1,24 @@
+{
+  "bos_token": {
+    "content": "<|startoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "<|endoftext|>",
+  "unk_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-epsilon/tokenizer/tokenizer_config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-epsilon/tokenizer/tokenizer_config.json
@ -0,0 +1,34 @@
+{
+  "add_prefix_space": false,
+  "bos_token": {
+    "__type": "AddedToken",
+    "content": "<|startoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "do_lower_case": true,
+  "eos_token": {
+    "__type": "AddedToken",
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "errors": "replace",
+  "model_max_length": 77,
+  "name_or_path": "openai/clip-vit-large-patch14",
+  "pad_token": "<|endoftext|>",
+  "special_tokens_map_file": "./special_tokens_map.json",
+  "tokenizer_class": "CLIPTokenizer",
+  "unk_token": {
+    "__type": "AddedToken",
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-epsilon/tokenizer/vocab.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-epsilon/tokenizer/vocab.json
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-epsilon/unet/config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-epsilon/unet/config.json
@ -0,0 +1,36 @@
+{
+  "_class_name": "UNet2DConditionModel",
+  "_diffusers_version": "0.6.0",
+  "act_fn": "silu",
+  "attention_head_dim": 8,
+  "block_out_channels": [
+    320,
+    640,
+    1280,
+    1280
+  ],
+  "center_input_sample": false,
+  "cross_attention_dim": 768,
+  "down_block_types": [
+    "CrossAttnDownBlock2D",
+    "CrossAttnDownBlock2D",
+    "CrossAttnDownBlock2D",
+    "DownBlock2D"
+  ],
+  "downsample_padding": 1,
+  "flip_sin_to_cos": true,
+  "freq_shift": 0,
+  "in_channels": 4,
+  "layers_per_block": 2,
+  "mid_block_scale_factor": 1,
+  "norm_eps": 1e-05,
+  "norm_num_groups": 32,
+  "out_channels": 4,
+  "sample_size": 64,
+  "up_block_types": [
+    "UpBlock2D",
+    "CrossAttnUpBlock2D",
+    "CrossAttnUpBlock2D",
+    "CrossAttnUpBlock2D"
+  ]
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-epsilon/vae/config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-epsilon/vae/config.json
@ -0,0 +1,29 @@
+{
+  "_class_name": "AutoencoderKL",
+  "_diffusers_version": "0.6.0",
+  "act_fn": "silu",
+  "block_out_channels": [
+    128,
+    256,
+    512,
+    512
+  ],
+  "down_block_types": [
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D"
+  ],
+  "in_channels": 3,
+  "latent_channels": 4,
+  "layers_per_block": 2,
+  "norm_num_groups": 32,
+  "out_channels": 3,
+  "sample_size": 512,
+  "up_block_types": [
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D"
+  ]
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-v_prediction/feature_extractor/preprocessor_config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-v_prediction/feature_extractor/preprocessor_config.json
@ -0,0 +1,28 @@
+{
+  "crop_size": {
+    "height": 224,
+    "width": 224
+  },
+  "do_center_crop": true,
+  "do_convert_rgb": true,
+  "do_normalize": true,
+  "do_rescale": true,
+  "do_resize": true,
+  "feature_extractor_type": "CLIPFeatureExtractor",
+  "image_mean": [
+    0.48145466,
+    0.4578275,
+    0.40821073
+  ],
+  "image_processor_type": "CLIPFeatureExtractor",
+  "image_std": [
+    0.26862954,
+    0.26130258,
+    0.27577711
+  ],
+  "resample": 3,
+  "rescale_factor": 0.00392156862745098,
+  "size": {
+    "shortest_edge": 224
+  }
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-v_prediction/model_index.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-v_prediction/model_index.json
@ -0,0 +1,33 @@
+{
+  "_class_name": "StableDiffusionPipeline",
+  "_diffusers_version": "0.18.0.dev0",
+  "feature_extractor": [
+    "transformers",
+    "CLIPFeatureExtractor"
+  ],
+  "requires_safety_checker": true,
+  "safety_checker": [
+    "stable_diffusion",
+    "StableDiffusionSafetyChecker"
+  ],
+  "scheduler": [
+    "diffusers",
+    "DPMSolverMultistepScheduler"
+  ],
+  "text_encoder": [
+    "transformers",
+    "CLIPTextModel"
+  ],
+  "tokenizer": [
+    "transformers",
+    "CLIPTokenizer"
+  ],
+  "unet": [
+    "diffusers",
+    "UNet2DConditionModel"
+  ],
+  "vae": [
+    "diffusers",
+    "AutoencoderKL"
+  ]
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-v_prediction/safety_checker/config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-v_prediction/safety_checker/config.json
@ -0,0 +1,168 @@
+{
+  "_commit_hash": "cb41f3a270d63d454d385fc2e4f571c487c253c5",
+  "_name_or_path": "CompVis/stable-diffusion-safety-checker",
+  "architectures": [
+    "StableDiffusionSafetyChecker"
+  ],
+  "initializer_factor": 1.0,
+  "logit_scale_init_value": 2.6592,
+  "model_type": "clip",
+  "projection_dim": 768,
+  "text_config": {
+    "_name_or_path": "",
+    "add_cross_attention": false,
+    "architectures": null,
+    "attention_dropout": 0.0,
+    "bad_words_ids": null,
+    "begin_suppress_tokens": null,
+    "bos_token_id": 0,
+    "chunk_size_feed_forward": 0,
+    "cross_attention_hidden_size": null,
+    "decoder_start_token_id": null,
+    "diversity_penalty": 0.0,
+    "do_sample": false,
+    "dropout": 0.0,
+    "early_stopping": false,
+    "encoder_no_repeat_ngram_size": 0,
+    "eos_token_id": 2,
+    "exponential_decay_length_penalty": null,
+    "finetuning_task": null,
+    "forced_bos_token_id": null,
+    "forced_eos_token_id": null,
+    "hidden_act": "quick_gelu",
+    "hidden_size": 768,
+    "id2label": {
+      "0": "LABEL_0",
+      "1": "LABEL_1"
+    },
+    "initializer_factor": 1.0,
+    "initializer_range": 0.02,
+    "intermediate_size": 3072,
+    "is_decoder": false,
+    "is_encoder_decoder": false,
+    "label2id": {
+      "LABEL_0": 0,
+      "LABEL_1": 1
+    },
+    "layer_norm_eps": 1e-05,
+    "length_penalty": 1.0,
+    "max_length": 20,
+    "max_position_embeddings": 77,
+    "min_length": 0,
+    "model_type": "clip_text_model",
+    "no_repeat_ngram_size": 0,
+    "num_attention_heads": 12,
+    "num_beam_groups": 1,
+    "num_beams": 1,
+    "num_hidden_layers": 12,
+    "num_return_sequences": 1,
+    "output_attentions": false,
+    "output_hidden_states": false,
+    "output_scores": false,
+    "pad_token_id": 1,
+    "prefix": null,
+    "problem_type": null,
+    "projection_dim": 512,
+    "pruned_heads": {},
+    "remove_invalid_values": false,
+    "repetition_penalty": 1.0,
+    "return_dict": true,
+    "return_dict_in_generate": false,
+    "sep_token_id": null,
+    "suppress_tokens": null,
+    "task_specific_params": null,
+    "temperature": 1.0,
+    "tf_legacy_loss": false,
+    "tie_encoder_decoder": false,
+    "tie_word_embeddings": true,
+    "tokenizer_class": null,
+    "top_k": 50,
+    "top_p": 1.0,
+    "torch_dtype": null,
+    "torchscript": false,
+    "transformers_version": "4.30.2",
+    "typical_p": 1.0,
+    "use_bfloat16": false,
+    "vocab_size": 49408
+  },
+  "torch_dtype": "float16",
+  "transformers_version": null,
+  "vision_config": {
+    "_name_or_path": "",
+    "add_cross_attention": false,
+    "architectures": null,
+    "attention_dropout": 0.0,
+    "bad_words_ids": null,
+    "begin_suppress_tokens": null,
+    "bos_token_id": null,
+    "chunk_size_feed_forward": 0,
+    "cross_attention_hidden_size": null,
+    "decoder_start_token_id": null,
+    "diversity_penalty": 0.0,
+    "do_sample": false,
+    "dropout": 0.0,
+    "early_stopping": false,
+    "encoder_no_repeat_ngram_size": 0,
+    "eos_token_id": null,
+    "exponential_decay_length_penalty": null,
+    "finetuning_task": null,
+    "forced_bos_token_id": null,
+    "forced_eos_token_id": null,
+    "hidden_act": "quick_gelu",
+    "hidden_size": 1024,
+    "id2label": {
+      "0": "LABEL_0",
+      "1": "LABEL_1"
+    },
+    "image_size": 224,
+    "initializer_factor": 1.0,
+    "initializer_range": 0.02,
+    "intermediate_size": 4096,
+    "is_decoder": false,
+    "is_encoder_decoder": false,
+    "label2id": {
+      "LABEL_0": 0,
+      "LABEL_1": 1
+    },
+    "layer_norm_eps": 1e-05,
+    "length_penalty": 1.0,
+    "max_length": 20,
+    "min_length": 0,
+    "model_type": "clip_vision_model",
+    "no_repeat_ngram_size": 0,
+    "num_attention_heads": 16,
+    "num_beam_groups": 1,
+    "num_beams": 1,
+    "num_channels": 3,
+    "num_hidden_layers": 24,
+    "num_return_sequences": 1,
+    "output_attentions": false,
+    "output_hidden_states": false,
+    "output_scores": false,
+    "pad_token_id": null,
+    "patch_size": 14,
+    "prefix": null,
+    "problem_type": null,
+    "projection_dim": 512,
+    "pruned_heads": {},
+    "remove_invalid_values": false,
+    "repetition_penalty": 1.0,
+    "return_dict": true,
+    "return_dict_in_generate": false,
+    "sep_token_id": null,
+    "suppress_tokens": null,
+    "task_specific_params": null,
+    "temperature": 1.0,
+    "tf_legacy_loss": false,
+    "tie_encoder_decoder": false,
+    "tie_word_embeddings": true,
+    "tokenizer_class": null,
+    "top_k": 50,
+    "top_p": 1.0,
+    "torch_dtype": null,
+    "torchscript": false,
+    "transformers_version": "4.30.2",
+    "typical_p": 1.0,
+    "use_bfloat16": false
+  }
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-v_prediction/scheduler/scheduler_config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-v_prediction/scheduler/scheduler_config.json
@ -0,0 +1,26 @@
+{
+  "_class_name": "DPMSolverMultistepScheduler",
+  "_diffusers_version": "0.18.0.dev0",
+  "algorithm_type": "dpmsolver++",
+  "beta_end": 0.012,
+  "beta_schedule": "scaled_linear",
+  "beta_start": 0.00085,
+  "clip_sample": false,
+  "clip_sample_range": 1.0,
+  "dynamic_thresholding_ratio": 0.995,
+  "lambda_min_clipped": -Infinity,
+  "lower_order_final": true,
+  "num_train_timesteps": 1000,
+  "prediction_type": "v_prediction",
+  "rescale_betas_zero_snr": false,
+  "sample_max_value": 1.0,
+  "set_alpha_to_one": false,
+  "solver_order": 2,
+  "solver_type": "midpoint",
+  "steps_offset": 1,
+  "thresholding": false,
+  "timestep_spacing": "leading",
+  "trained_betas": null,
+  "use_karras_sigmas": false,
+  "variance_type": null
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-v_prediction/text_encoder/config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-v_prediction/text_encoder/config.json
@ -0,0 +1,25 @@
+{
+  "_name_or_path": "openai/clip-vit-large-patch14",
+  "architectures": [
+    "CLIPTextModel"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 0,
+  "dropout": 0.0,
+  "eos_token_id": 2,
+  "hidden_act": "quick_gelu",
+  "hidden_size": 768,
+  "initializer_factor": 1.0,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "layer_norm_eps": 1e-05,
+  "max_position_embeddings": 77,
+  "model_type": "clip_text_model",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 1,
+  "projection_dim": 768,
+  "torch_dtype": "float16",
+  "transformers_version": "4.30.2",
+  "vocab_size": 49408
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-v_prediction/tokenizer/merges.txt
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-v_prediction/tokenizer/merges.txt
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-v_prediction/tokenizer/special_tokens_map.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-v_prediction/tokenizer/special_tokens_map.json
@ -0,0 +1,24 @@
+{
+  "bos_token": {
+    "content": "<|startoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "<|endoftext|>",
+  "unk_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-v_prediction/tokenizer/tokenizer_config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-v_prediction/tokenizer/tokenizer_config.json
@ -0,0 +1,33 @@
+{
+  "add_prefix_space": false,
+  "bos_token": {
+    "__type": "AddedToken",
+    "content": "<|startoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "clean_up_tokenization_spaces": true,
+  "do_lower_case": true,
+  "eos_token": {
+    "__type": "AddedToken",
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "errors": "replace",
+  "model_max_length": 77,
+  "pad_token": "<|endoftext|>",
+  "tokenizer_class": "CLIPTokenizer",
+  "unk_token": {
+    "__type": "AddedToken",
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-v_prediction/tokenizer/vocab.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-v_prediction/tokenizer/vocab.json
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-v_prediction/unet/config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-v_prediction/unet/config.json
@ -0,0 +1,62 @@
+{
+  "_class_name": "UNet2DConditionModel",
+  "_diffusers_version": "0.18.0.dev0",
+  "act_fn": "silu",
+  "addition_embed_type": null,
+  "addition_embed_type_num_heads": 64,
+  "attention_head_dim": 8,
+  "block_out_channels": [
+    320,
+    640,
+    1280,
+    1280
+  ],
+  "center_input_sample": false,
+  "class_embed_type": null,
+  "class_embeddings_concat": false,
+  "conv_in_kernel": 3,
+  "conv_out_kernel": 3,
+  "cross_attention_dim": 768,
+  "cross_attention_norm": null,
+  "down_block_types": [
+    "CrossAttnDownBlock2D",
+    "CrossAttnDownBlock2D",
+    "CrossAttnDownBlock2D",
+    "DownBlock2D"
+  ],
+  "downsample_padding": 1,
+  "dual_cross_attention": false,
+  "encoder_hid_dim": null,
+  "encoder_hid_dim_type": null,
+  "flip_sin_to_cos": true,
+  "freq_shift": 0,
+  "in_channels": 4,
+  "layers_per_block": 2,
+  "mid_block_only_cross_attention": null,
+  "mid_block_scale_factor": 1,
+  "mid_block_type": "UNetMidBlock2DCrossAttn",
+  "norm_eps": 1e-05,
+  "norm_num_groups": 32,
+  "num_attention_heads": null,
+  "num_class_embeds": null,
+  "only_cross_attention": false,
+  "out_channels": 4,
+  "projection_class_embeddings_input_dim": null,
+  "resnet_out_scale_factor": 1.0,
+  "resnet_skip_time_act": false,
+  "resnet_time_scale_shift": "default",
+  "sample_size": 96,
+  "time_cond_proj_dim": null,
+  "time_embedding_act_fn": null,
+  "time_embedding_dim": null,
+  "time_embedding_type": "positional",
+  "timestep_post_act": null,
+  "up_block_types": [
+    "UpBlock2D",
+    "CrossAttnUpBlock2D",
+    "CrossAttnUpBlock2D",
+    "CrossAttnUpBlock2D"
+  ],
+  "upcast_attention": null,
+  "use_linear_projection": false
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-v_prediction/vae/config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-1.5-v_prediction/vae/config.json
@ -0,0 +1,30 @@
+{
+  "_class_name": "AutoencoderKL",
+  "_diffusers_version": "0.18.0.dev0",
+  "act_fn": "silu",
+  "block_out_channels": [
+    128,
+    256,
+    512,
+    512
+  ],
+  "down_block_types": [
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D"
+  ],
+  "in_channels": 3,
+  "latent_channels": 4,
+  "layers_per_block": 2,
+  "norm_num_groups": 32,
+  "out_channels": 3,
+  "sample_size": 768,
+  "scaling_factor": 0.18215,
+  "up_block_types": [
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D"
+  ]
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-2.0-v_prediction/feature_extractor/preprocessor_config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-2.0-v_prediction/feature_extractor/preprocessor_config.json
@ -0,0 +1,20 @@
+{
+  "crop_size": 224,
+  "do_center_crop": true,
+  "do_convert_rgb": true,
+  "do_normalize": true,
+  "do_resize": true,
+  "feature_extractor_type": "CLIPFeatureExtractor",
+  "image_mean": [
+    0.48145466,
+    0.4578275,
+    0.40821073
+  ],
+  "image_std": [
+    0.26862954,
+    0.26130258,
+    0.27577711
+  ],
+  "resample": 3,
+  "size": 224
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-2.0-v_prediction/model_index.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-2.0-v_prediction/model_index.json
@ -0,0 +1,33 @@
+{
+  "_class_name": "StableDiffusionPipeline",
+  "_diffusers_version": "0.8.0",
+  "feature_extractor": [
+    "transformers",
+    "CLIPImageProcessor"
+  ],
+  "requires_safety_checker": false,
+  "safety_checker": [
+    null,
+    null
+  ],
+  "scheduler": [
+    "diffusers",
+    "DDIMScheduler"
+  ],
+  "text_encoder": [
+    "transformers",
+    "CLIPTextModel"
+  ],
+  "tokenizer": [
+    "transformers",
+    "CLIPTokenizer"
+  ],
+  "unet": [
+    "diffusers",
+    "UNet2DConditionModel"
+  ],
+  "vae": [
+    "diffusers",
+    "AutoencoderKL"
+  ]
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-2.0-v_prediction/scheduler/scheduler_config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-2.0-v_prediction/scheduler/scheduler_config.json
@ -0,0 +1,14 @@
+{
+  "_class_name": "DDIMScheduler",
+  "_diffusers_version": "0.8.0",
+  "beta_end": 0.012,
+  "beta_schedule": "scaled_linear",
+  "beta_start": 0.00085,
+  "clip_sample": false,
+  "num_train_timesteps": 1000,
+  "prediction_type": "v_prediction",
+  "set_alpha_to_one": false,
+  "skip_prk_steps": true,
+  "steps_offset": 1,
+  "trained_betas": null
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-2.0-v_prediction/text_encoder/config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-2.0-v_prediction/text_encoder/config.json
@ -0,0 +1,25 @@
+{
+  "_name_or_path": "hf-models/stable-diffusion-v2-768x768/text_encoder",
+  "architectures": [
+    "CLIPTextModel"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 0,
+  "dropout": 0.0,
+  "eos_token_id": 2,
+  "hidden_act": "gelu",
+  "hidden_size": 1024,
+  "initializer_factor": 1.0,
+  "initializer_range": 0.02,
+  "intermediate_size": 4096,
+  "layer_norm_eps": 1e-05,
+  "max_position_embeddings": 77,
+  "model_type": "clip_text_model",
+  "num_attention_heads": 16,
+  "num_hidden_layers": 23,
+  "pad_token_id": 1,
+  "projection_dim": 512,
+  "torch_dtype": "float32",
+  "transformers_version": "4.25.0.dev0",
+  "vocab_size": 49408
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-2.0-v_prediction/tokenizer/merges.txt
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-2.0-v_prediction/tokenizer/merges.txt
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-2.0-v_prediction/tokenizer/special_tokens_map.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-2.0-v_prediction/tokenizer/special_tokens_map.json
@ -0,0 +1,24 @@
+{
+  "bos_token": {
+    "content": "<|startoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "!",
+  "unk_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-2.0-v_prediction/tokenizer/tokenizer_config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-2.0-v_prediction/tokenizer/tokenizer_config.json
@ -0,0 +1,34 @@
+{
+  "add_prefix_space": false,
+  "bos_token": {
+    "__type": "AddedToken",
+    "content": "<|startoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "do_lower_case": true,
+  "eos_token": {
+    "__type": "AddedToken",
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "errors": "replace",
+  "model_max_length": 77,
+  "name_or_path": "hf-models/stable-diffusion-v2-768x768/tokenizer",
+  "pad_token": "<|endoftext|>",
+  "special_tokens_map_file": "./special_tokens_map.json",
+  "tokenizer_class": "CLIPTokenizer",
+  "unk_token": {
+    "__type": "AddedToken",
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-2.0-v_prediction/tokenizer/vocab.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-2.0-v_prediction/tokenizer/vocab.json
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-2.0-v_prediction/unet/config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-2.0-v_prediction/unet/config.json
@ -0,0 +1,46 @@
+{
+  "_class_name": "UNet2DConditionModel",
+  "_diffusers_version": "0.10.0.dev0",
+  "act_fn": "silu",
+  "attention_head_dim": [
+    5,
+    10,
+    20,
+    20
+  ],
+  "block_out_channels": [
+    320,
+    640,
+    1280,
+    1280
+  ],
+  "center_input_sample": false,
+  "cross_attention_dim": 1024,
+  "down_block_types": [
+    "CrossAttnDownBlock2D",
+    "CrossAttnDownBlock2D",
+    "CrossAttnDownBlock2D",
+    "DownBlock2D"
+  ],
+  "downsample_padding": 1,
+  "dual_cross_attention": false,
+  "flip_sin_to_cos": true,
+  "freq_shift": 0,
+  "in_channels": 4,
+  "layers_per_block": 2,
+  "mid_block_scale_factor": 1,
+  "norm_eps": 1e-05,
+  "norm_num_groups": 32,
+  "num_class_embeds": null,
+  "only_cross_attention": false,
+  "out_channels": 4,
+  "sample_size": 96,
+  "up_block_types": [
+    "UpBlock2D",
+    "CrossAttnUpBlock2D",
+    "CrossAttnUpBlock2D",
+    "CrossAttnUpBlock2D"
+  ],
+  "use_linear_projection": true,
+  "upcast_attention": true
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-2.0-v_prediction/vae/config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-2.0-v_prediction/vae/config.json
@ -0,0 +1,30 @@
+{
+  "_class_name": "AutoencoderKL",
+  "_diffusers_version": "0.8.0",
+  "_name_or_path": "hf-models/stable-diffusion-v2-768x768/vae",
+  "act_fn": "silu",
+  "block_out_channels": [
+    128,
+    256,
+    512,
+    512
+  ],
+  "down_block_types": [
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D"
+  ],
+  "in_channels": 3,
+  "latent_channels": 4,
+  "layers_per_block": 2,
+  "norm_num_groups": 32,
+  "out_channels": 3,
+  "sample_size": 768,
+  "up_block_types": [
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D"
+  ]
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-base-1.0/model_index.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-base-1.0/model_index.json
@ -0,0 +1,34 @@
+{
+  "_class_name": "StableDiffusionXLPipeline",
+  "_diffusers_version": "0.19.0.dev0",
+  "force_zeros_for_empty_prompt": true,
+  "add_watermarker": null,
+  "scheduler": [
+    "diffusers",
+    "EulerDiscreteScheduler"
+  ],
+  "text_encoder": [
+    "transformers",
+    "CLIPTextModel"
+  ],
+  "text_encoder_2": [
+    "transformers",
+    "CLIPTextModelWithProjection"
+  ],
+  "tokenizer": [
+    "transformers",
+    "CLIPTokenizer"
+  ],
+  "tokenizer_2": [
+    "transformers",
+    "CLIPTokenizer"
+  ],
+  "unet": [
+    "diffusers",
+    "UNet2DConditionModel"
+  ],
+  "vae": [
+    "diffusers",
+    "AutoencoderKL"
+  ]
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-base-1.0/scheduler/scheduler_config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-base-1.0/scheduler/scheduler_config.json
@ -0,0 +1,18 @@
+{
+  "_class_name": "EulerDiscreteScheduler",
+  "_diffusers_version": "0.19.0.dev0",
+  "beta_end": 0.012,
+  "beta_schedule": "scaled_linear",
+  "beta_start": 0.00085,
+  "clip_sample": false,
+  "interpolation_type": "linear",
+  "num_train_timesteps": 1000,
+  "prediction_type": "epsilon",
+  "sample_max_value": 1.0,
+  "set_alpha_to_one": false,
+  "skip_prk_steps": true,
+  "steps_offset": 1,
+  "timestep_spacing": "leading",
+  "trained_betas": null,
+  "use_karras_sigmas": false
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-base-1.0/text_encoder/config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-base-1.0/text_encoder/config.json
@ -0,0 +1,24 @@
+{
+  "architectures": [
+    "CLIPTextModel"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 0,
+  "dropout": 0.0,
+  "eos_token_id": 2,
+  "hidden_act": "quick_gelu",
+  "hidden_size": 768,
+  "initializer_factor": 1.0,
+  "initializer_range": 0.02,
+  "intermediate_size": 3072,
+  "layer_norm_eps": 1e-05,
+  "max_position_embeddings": 77,
+  "model_type": "clip_text_model",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 12,
+  "pad_token_id": 1,
+  "projection_dim": 768,
+  "torch_dtype": "float16",
+  "transformers_version": "4.32.0.dev0",
+  "vocab_size": 49408
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-base-1.0/text_encoder_2/config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-base-1.0/text_encoder_2/config.json
@ -0,0 +1,24 @@
+{
+  "architectures": [
+    "CLIPTextModelWithProjection"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 0,
+  "dropout": 0.0,
+  "eos_token_id": 2,
+  "hidden_act": "gelu",
+  "hidden_size": 1280,
+  "initializer_factor": 1.0,
+  "initializer_range": 0.02,
+  "intermediate_size": 5120,
+  "layer_norm_eps": 1e-05,
+  "max_position_embeddings": 77,
+  "model_type": "clip_text_model",
+  "num_attention_heads": 20,
+  "num_hidden_layers": 32,
+  "pad_token_id": 1,
+  "projection_dim": 1280,
+  "torch_dtype": "float16",
+  "transformers_version": "4.32.0.dev0",
+  "vocab_size": 49408
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-base-1.0/tokenizer/merges.txt
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-base-1.0/tokenizer/merges.txt
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-base-1.0/tokenizer/special_tokens_map.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-base-1.0/tokenizer/special_tokens_map.json
@ -0,0 +1,24 @@
+{
+  "bos_token": {
+    "content": "<|startoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "<|endoftext|>",
+  "unk_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-base-1.0/tokenizer/tokenizer_config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-base-1.0/tokenizer/tokenizer_config.json
@ -0,0 +1,33 @@
+{
+  "add_prefix_space": false,
+  "bos_token": {
+    "__type": "AddedToken",
+    "content": "<|startoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "clean_up_tokenization_spaces": true,
+  "do_lower_case": true,
+  "eos_token": {
+    "__type": "AddedToken",
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "errors": "replace",
+  "model_max_length": 77,
+  "pad_token": "<|endoftext|>",
+  "tokenizer_class": "CLIPTokenizer",
+  "unk_token": {
+    "__type": "AddedToken",
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-base-1.0/tokenizer/vocab.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-base-1.0/tokenizer/vocab.json
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-base-1.0/tokenizer_2/merges.txt
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-base-1.0/tokenizer_2/merges.txt
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-base-1.0/tokenizer_2/special_tokens_map.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-base-1.0/tokenizer_2/special_tokens_map.json
@ -0,0 +1,24 @@
+{
+  "bos_token": {
+    "content": "<|startoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "!",
+  "unk_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-base-1.0/tokenizer_2/tokenizer_config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-base-1.0/tokenizer_2/tokenizer_config.json
@ -0,0 +1,33 @@
+{
+  "add_prefix_space": false,
+  "bos_token": {
+    "__type": "AddedToken",
+    "content": "<|startoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "clean_up_tokenization_spaces": true,
+  "do_lower_case": true,
+  "eos_token": {
+    "__type": "AddedToken",
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "errors": "replace",
+  "model_max_length": 77,
+  "pad_token": "!",
+  "tokenizer_class": "CLIPTokenizer",
+  "unk_token": {
+    "__type": "AddedToken",
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-base-1.0/tokenizer_2/vocab.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-base-1.0/tokenizer_2/vocab.json
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-base-1.0/unet/config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-base-1.0/unet/config.json
@ -0,0 +1,69 @@
+{
+  "_class_name": "UNet2DConditionModel",
+  "_diffusers_version": "0.19.0.dev0",
+  "act_fn": "silu",
+  "addition_embed_type": "text_time",
+  "addition_embed_type_num_heads": 64,
+  "addition_time_embed_dim": 256,
+  "attention_head_dim": [
+    5,
+    10,
+    20
+  ],
+  "block_out_channels": [
+    320,
+    640,
+    1280
+  ],
+  "center_input_sample": false,
+  "class_embed_type": null,
+  "class_embeddings_concat": false,
+  "conv_in_kernel": 3,
+  "conv_out_kernel": 3,
+  "cross_attention_dim": 2048,
+  "cross_attention_norm": null,
+  "down_block_types": [
+    "DownBlock2D",
+    "CrossAttnDownBlock2D",
+    "CrossAttnDownBlock2D"
+  ],
+  "downsample_padding": 1,
+  "dual_cross_attention": false,
+  "encoder_hid_dim": null,
+  "encoder_hid_dim_type": null,
+  "flip_sin_to_cos": true,
+  "freq_shift": 0,
+  "in_channels": 4,
+  "layers_per_block": 2,
+  "mid_block_only_cross_attention": null,
+  "mid_block_scale_factor": 1,
+  "mid_block_type": "UNetMidBlock2DCrossAttn",
+  "norm_eps": 1e-05,
+  "norm_num_groups": 32,
+  "num_attention_heads": null,
+  "num_class_embeds": null,
+  "only_cross_attention": false,
+  "out_channels": 4,
+  "projection_class_embeddings_input_dim": 2816,
+  "resnet_out_scale_factor": 1.0,
+  "resnet_skip_time_act": false,
+  "resnet_time_scale_shift": "default",
+  "sample_size": 128,
+  "time_cond_proj_dim": null,
+  "time_embedding_act_fn": null,
+  "time_embedding_dim": null,
+  "time_embedding_type": "positional",
+  "timestep_post_act": null,
+  "transformer_layers_per_block": [
+    1,
+    2,
+    10
+  ],
+  "up_block_types": [
+    "CrossAttnUpBlock2D",
+    "CrossAttnUpBlock2D",
+    "UpBlock2D"
+  ],
+  "upcast_attention": null,
+  "use_linear_projection": true
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-base-1.0/vae/config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-base-1.0/vae/config.json
@ -0,0 +1,32 @@
+{
+  "_class_name": "AutoencoderKL",
+  "_diffusers_version": "0.20.0.dev0",
+  "_name_or_path": "../sdxl-vae/",
+  "act_fn": "silu",
+  "block_out_channels": [
+    128,
+    256,
+    512,
+    512
+  ],
+  "down_block_types": [
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D"
+  ],
+  "force_upcast": true,
+  "in_channels": 3,
+  "latent_channels": 4,
+  "layers_per_block": 2,
+  "norm_num_groups": 32,
+  "out_channels": 3,
+  "sample_size": 1024,
+  "scaling_factor": 0.13025,
+  "up_block_types": [
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D"
+  ]
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-base-1.0/vae_1_0/config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-base-1.0/vae_1_0/config.json
@ -0,0 +1,31 @@
+{
+  "_class_name": "AutoencoderKL",
+  "_diffusers_version": "0.19.0.dev0",
+  "act_fn": "silu",
+  "block_out_channels": [
+    128,
+    256,
+    512,
+    512
+  ],
+  "down_block_types": [
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D"
+  ],
+  "force_upcast": true,
+  "in_channels": 3,
+  "latent_channels": 4,
+  "layers_per_block": 2,
+  "norm_num_groups": 32,
+  "out_channels": 3,
+  "sample_size": 1024,
+  "scaling_factor": 0.13025,
+  "up_block_types": [
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D"
+  ]
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-base-1.0/vae_decoder/config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-base-1.0/vae_decoder/config.json
@ -0,0 +1,31 @@
+{
+  "_class_name": "AutoencoderKL",
+  "_diffusers_version": "0.19.0.dev0",
+  "act_fn": "silu",
+  "block_out_channels": [
+    128,
+    256,
+    512,
+    512
+  ],
+  "down_block_types": [
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D"
+  ],
+  "force_upcast": true,
+  "in_channels": 3,
+  "latent_channels": 4,
+  "layers_per_block": 2,
+  "norm_num_groups": 32,
+  "out_channels": 3,
+  "sample_size": 1024,
+  "scaling_factor": 0.13025,
+  "up_block_types": [
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D"
+  ]
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-base-1.0/vae_encoder/config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-base-1.0/vae_encoder/config.json
@ -0,0 +1,31 @@
+{
+  "_class_name": "AutoencoderKL",
+  "_diffusers_version": "0.19.0.dev0",
+  "act_fn": "silu",
+  "block_out_channels": [
+    128,
+    256,
+    512,
+    512
+  ],
+  "down_block_types": [
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D"
+  ],
+  "force_upcast": true,
+  "in_channels": 3,
+  "latent_channels": 4,
+  "layers_per_block": 2,
+  "norm_num_groups": 32,
+  "out_channels": 3,
+  "sample_size": 1024,
+  "scaling_factor": 0.13025,
+  "up_block_types": [
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D"
+  ]
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-refiner-1.0/model_index.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-refiner-1.0/model_index.json
@ -0,0 +1,35 @@
+{
+  "_class_name": "StableDiffusionXLImg2ImgPipeline",
+  "_diffusers_version": "0.19.0.dev0",
+  "force_zeros_for_empty_prompt": false,
+  "add_watermarker": null,
+  "requires_aesthetics_score": true,
+  "scheduler": [
+    "diffusers",
+    "EulerDiscreteScheduler"
+  ],
+  "text_encoder": [
+    null,
+    null
+  ],
+  "text_encoder_2": [
+    "transformers",
+    "CLIPTextModelWithProjection"
+  ],
+  "tokenizer": [
+    null,
+    null
+  ],
+  "tokenizer_2": [
+    "transformers",
+    "CLIPTokenizer"
+  ],
+  "unet": [
+    "diffusers",
+    "UNet2DConditionModel"
+  ],
+  "vae": [
+    "diffusers",
+    "AutoencoderKL"
+  ]
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-refiner-1.0/scheduler/scheduler_config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-refiner-1.0/scheduler/scheduler_config.json
@ -0,0 +1,18 @@
+{
+  "_class_name": "EulerDiscreteScheduler",
+  "_diffusers_version": "0.19.0.dev0",
+  "beta_end": 0.012,
+  "beta_schedule": "scaled_linear",
+  "beta_start": 0.00085,
+  "clip_sample": false,
+  "interpolation_type": "linear",
+  "num_train_timesteps": 1000,
+  "prediction_type": "epsilon",
+  "sample_max_value": 1.0,
+  "set_alpha_to_one": false,
+  "skip_prk_steps": true,
+  "steps_offset": 1,
+  "timestep_spacing": "leading",
+  "trained_betas": null,
+  "use_karras_sigmas": false
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-refiner-1.0/text_encoder_2/config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-refiner-1.0/text_encoder_2/config.json
@ -0,0 +1,24 @@
+{
+  "architectures": [
+    "CLIPTextModelWithProjection"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 0,
+  "dropout": 0.0,
+  "eos_token_id": 2,
+  "hidden_act": "gelu",
+  "hidden_size": 1280,
+  "initializer_factor": 1.0,
+  "initializer_range": 0.02,
+  "intermediate_size": 5120,
+  "layer_norm_eps": 1e-05,
+  "max_position_embeddings": 77,
+  "model_type": "clip_text_model",
+  "num_attention_heads": 20,
+  "num_hidden_layers": 32,
+  "pad_token_id": 1,
+  "projection_dim": 1280,
+  "torch_dtype": "float16",
+  "transformers_version": "4.32.0.dev0",
+  "vocab_size": 49408
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-refiner-1.0/tokenizer_2/merges.txt
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-refiner-1.0/tokenizer_2/merges.txt
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-refiner-1.0/tokenizer_2/special_tokens_map.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-refiner-1.0/tokenizer_2/special_tokens_map.json
@ -0,0 +1,24 @@
+{
+  "bos_token": {
+    "content": "<|startoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "!",
+  "unk_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-refiner-1.0/tokenizer_2/tokenizer_config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-refiner-1.0/tokenizer_2/tokenizer_config.json
@ -0,0 +1,33 @@
+{
+  "add_prefix_space": false,
+  "bos_token": {
+    "__type": "AddedToken",
+    "content": "<|startoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "clean_up_tokenization_spaces": true,
+  "do_lower_case": true,
+  "eos_token": {
+    "__type": "AddedToken",
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "errors": "replace",
+  "model_max_length": 77,
+  "pad_token": "!",
+  "tokenizer_class": "CLIPTokenizer",
+  "unk_token": {
+    "__type": "AddedToken",
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-refiner-1.0/tokenizer_2/vocab.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-refiner-1.0/tokenizer_2/vocab.json
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-refiner-1.0/unet/config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-refiner-1.0/unet/config.json
@ -0,0 +1,69 @@
+{
+  "_class_name": "UNet2DConditionModel",
+  "_diffusers_version": "0.19.0.dev0",
+  "act_fn": "silu",
+  "addition_embed_type": "text_time",
+  "addition_embed_type_num_heads": 64,
+  "addition_time_embed_dim": 256,
+  "attention_head_dim": [
+    6,
+    12,
+    24,
+    24
+  ],
+  "block_out_channels": [
+    384,
+    768,
+    1536,
+    1536
+  ],
+  "center_input_sample": false,
+  "class_embed_type": null,
+  "class_embeddings_concat": false,
+  "conv_in_kernel": 3,
+  "conv_out_kernel": 3,
+  "cross_attention_dim": 1280,
+  "cross_attention_norm": null,
+  "down_block_types": [
+    "DownBlock2D",
+    "CrossAttnDownBlock2D",
+    "CrossAttnDownBlock2D",
+    "DownBlock2D"
+  ],
+  "downsample_padding": 1,
+  "dual_cross_attention": false,
+  "encoder_hid_dim": null,
+  "encoder_hid_dim_type": null,
+  "flip_sin_to_cos": true,
+  "freq_shift": 0,
+  "in_channels": 4,
+  "layers_per_block": 2,
+  "mid_block_only_cross_attention": null,
+  "mid_block_scale_factor": 1,
+  "mid_block_type": "UNetMidBlock2DCrossAttn",
+  "norm_eps": 1e-05,
+  "norm_num_groups": 32,
+  "num_attention_heads": null,
+  "num_class_embeds": null,
+  "only_cross_attention": false,
+  "out_channels": 4,
+  "projection_class_embeddings_input_dim": 2560,
+  "resnet_out_scale_factor": 1.0,
+  "resnet_skip_time_act": false,
+  "resnet_time_scale_shift": "default",
+  "sample_size": 128,
+  "time_cond_proj_dim": null,
+  "time_embedding_act_fn": null,
+  "time_embedding_dim": null,
+  "time_embedding_type": "positional",
+  "timestep_post_act": null,
+  "transformer_layers_per_block": 4,
+  "up_block_types": [
+    "UpBlock2D",
+    "CrossAttnUpBlock2D",
+    "CrossAttnUpBlock2D",
+    "UpBlock2D"
+  ],
+  "upcast_attention": null,
+  "use_linear_projection": true
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-refiner-1.0/vae/config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-refiner-1.0/vae/config.json
@ -0,0 +1,32 @@
+{
+  "_class_name": "AutoencoderKL",
+  "_diffusers_version": "0.20.0.dev0",
+  "_name_or_path": "../sdxl-vae/",
+  "act_fn": "silu",
+  "block_out_channels": [
+    128,
+    256,
+    512,
+    512
+  ],
+  "down_block_types": [
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D"
+  ],
+  "force_upcast": true,
+  "in_channels": 3,
+  "latent_channels": 4,
+  "layers_per_block": 2,
+  "norm_num_groups": 32,
+  "out_channels": 3,
+  "sample_size": 1024,
+  "scaling_factor": 0.13025,
+  "up_block_types": [
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D"
+  ]
+}
--- a/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-refiner-1.0/vae_1_0/config.json
+++ b/invokeai/backend/assets/sd_base_conf_files/stable-diffusion-xl-refiner-1.0/vae_1_0/config.json
@ -0,0 +1,31 @@
+{
+  "_class_name": "AutoencoderKL",
+  "_diffusers_version": "0.19.0.dev0",
+  "act_fn": "silu",
+  "block_out_channels": [
+    128,
+    256,
+    512,
+    512
+  ],
+  "down_block_types": [
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D",
+    "DownEncoderBlock2D"
+  ],
+  "force_upcast": true,
+  "in_channels": 3,
+  "latent_channels": 4,
+  "layers_per_block": 2,
+  "norm_num_groups": 32,
+  "out_channels": 3,
+  "sample_size": 1024,
+  "scaling_factor": 0.13025,
+  "up_block_types": [
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D",
+    "UpDecoderBlock2D"
+  ]
+}
--- a/invokeai/backend/flux/math.py
+++ b/invokeai/backend/flux/math.py
@ -1,32 +0,0 @@
-# Initially pulled from https://github.com/black-forest-labs/flux
-
-import torch
-from einops import rearrange
-from torch import Tensor
-
-
-def attention(q: Tensor, k: Tensor, v: Tensor, pe: Tensor) -> Tensor:
-    q, k = apply_rope(q, k, pe)
-
-    x = torch.nn.functional.scaled_dot_product_attention(q, k, v)
-    x = rearrange(x, "B H L D -> B L (H D)")
-
-    return x
-
-
-def rope(pos: Tensor, dim: int, theta: int) -> Tensor:
-    assert dim % 2 == 0
-    scale = torch.arange(0, dim, 2, dtype=torch.float64, device=pos.device) / dim
-    omega = 1.0 / (theta**scale)
-    out = torch.einsum("...n,d->...nd", pos, omega)
-    out = torch.stack([torch.cos(out), -torch.sin(out), torch.sin(out), torch.cos(out)], dim=-1)
-    out = rearrange(out, "b n d (i j) -> b n d i j", i=2, j=2)
-    return out.float()
-
-
-def apply_rope(xq: Tensor, xk: Tensor, freqs_cis: Tensor) -> tuple[Tensor, Tensor]:
-    xq_ = xq.float().reshape(*xq.shape[:-1], -1, 1, 2)
-    xk_ = xk.float().reshape(*xk.shape[:-1], -1, 1, 2)
-    xq_out = freqs_cis[..., 0] * xq_[..., 0] + freqs_cis[..., 1] * xq_[..., 1]
-    xk_out = freqs_cis[..., 0] * xk_[..., 0] + freqs_cis[..., 1] * xk_[..., 1]
-    return xq_out.reshape(*xq.shape).type_as(xq), xk_out.reshape(*xk.shape).type_as(xk)
--- a/invokeai/backend/flux/model.py
+++ b/invokeai/backend/flux/model.py
@ -1,117 +0,0 @@
-# Initially pulled from https://github.com/black-forest-labs/flux
-
-from dataclasses import dataclass
-
-import torch
-from torch import Tensor, nn
-
-from invokeai.backend.flux.modules.layers import (
-    DoubleStreamBlock,
-    EmbedND,
-    LastLayer,
-    MLPEmbedder,
-    SingleStreamBlock,
-    timestep_embedding,
-)
-
-
-@dataclass
-class FluxParams:
-    in_channels: int
-    vec_in_dim: int
-    context_in_dim: int
-    hidden_size: int
-    mlp_ratio: float
-    num_heads: int
-    depth: int
-    depth_single_blocks: int
-    axes_dim: list[int]
-    theta: int
-    qkv_bias: bool
-    guidance_embed: bool
-
-
-class Flux(nn.Module):
-    """
-    Transformer model for flow matching on sequences.
-    """
-
-    def __init__(self, params: FluxParams):
-        super().__init__()
-
-        self.params = params
-        self.in_channels = params.in_channels
-        self.out_channels = self.in_channels
-        if params.hidden_size % params.num_heads != 0:
-            raise ValueError(f"Hidden size {params.hidden_size} must be divisible by num_heads {params.num_heads}")
-        pe_dim = params.hidden_size // params.num_heads
-        if sum(params.axes_dim) != pe_dim:
-            raise ValueError(f"Got {params.axes_dim} but expected positional dim {pe_dim}")
-        self.hidden_size = params.hidden_size
-        self.num_heads = params.num_heads
-        self.pe_embedder = EmbedND(dim=pe_dim, theta=params.theta, axes_dim=params.axes_dim)
-        self.img_in = nn.Linear(self.in_channels, self.hidden_size, bias=True)
-        self.time_in = MLPEmbedder(in_dim=256, hidden_dim=self.hidden_size)
-        self.vector_in = MLPEmbedder(params.vec_in_dim, self.hidden_size)
-        self.guidance_in = (
-            MLPEmbedder(in_dim=256, hidden_dim=self.hidden_size) if params.guidance_embed else nn.Identity()
-        )
-        self.txt_in = nn.Linear(params.context_in_dim, self.hidden_size)
-
-        self.double_blocks = nn.ModuleList(
-            [
-                DoubleStreamBlock(
-                    self.hidden_size,
-                    self.num_heads,
-                    mlp_ratio=params.mlp_ratio,
-                    qkv_bias=params.qkv_bias,
-                )
-                for _ in range(params.depth)
-            ]
-        )
-
-        self.single_blocks = nn.ModuleList(
-            [
-                SingleStreamBlock(self.hidden_size, self.num_heads, mlp_ratio=params.mlp_ratio)
-                for _ in range(params.depth_single_blocks)
-            ]
-        )
-
-        self.final_layer = LastLayer(self.hidden_size, 1, self.out_channels)
-
-    def forward(
-        self,
-        img: Tensor,
-        img_ids: Tensor,
-        txt: Tensor,
-        txt_ids: Tensor,
-        timesteps: Tensor,
-        y: Tensor,
-        guidance: Tensor | None = None,
-    ) -> Tensor:
-        if img.ndim != 3 or txt.ndim != 3:
-            raise ValueError("Input img and txt tensors must have 3 dimensions.")
-
-        # running on sequences img
-        img = self.img_in(img)
-        vec = self.time_in(timestep_embedding(timesteps, 256))
-        if self.params.guidance_embed:
-            if guidance is None:
-                raise ValueError("Didn't get guidance strength for guidance distilled model.")
-            vec = vec + self.guidance_in(timestep_embedding(guidance, 256))
-        vec = vec + self.vector_in(y)
-        txt = self.txt_in(txt)
-
-        ids = torch.cat((txt_ids, img_ids), dim=1)
-        pe = self.pe_embedder(ids)
-
-        for block in self.double_blocks:
-            img, txt = block(img=img, txt=txt, vec=vec, pe=pe)
-
-        img = torch.cat((txt, img), 1)
-        for block in self.single_blocks:
-            img = block(img, vec=vec, pe=pe)
-        img = img[:, txt.shape[1] :, ...]
-
-        img = self.final_layer(img, vec)  # (N, T, patch_size ** 2 * out_channels)
-        return img
--- a/invokeai/backend/flux/modules/autoencoder.py
+++ b/invokeai/backend/flux/modules/autoencoder.py
@ -1,310 +0,0 @@
-# Initially pulled from https://github.com/black-forest-labs/flux
-
-from dataclasses import dataclass
-
-import torch
-from einops import rearrange
-from torch import Tensor, nn
-
-
-@dataclass
-class AutoEncoderParams:
-    resolution: int
-    in_channels: int
-    ch: int
-    out_ch: int
-    ch_mult: list[int]
-    num_res_blocks: int
-    z_channels: int
-    scale_factor: float
-    shift_factor: float
-
-
-class AttnBlock(nn.Module):
-    def __init__(self, in_channels: int):
-        super().__init__()
-        self.in_channels = in_channels
-
-        self.norm = nn.GroupNorm(num_groups=32, num_channels=in_channels, eps=1e-6, affine=True)
-
-        self.q = nn.Conv2d(in_channels, in_channels, kernel_size=1)
-        self.k = nn.Conv2d(in_channels, in_channels, kernel_size=1)
-        self.v = nn.Conv2d(in_channels, in_channels, kernel_size=1)
-        self.proj_out = nn.Conv2d(in_channels, in_channels, kernel_size=1)
-
-    def attention(self, h_: Tensor) -> Tensor:
-        h_ = self.norm(h_)
-        q = self.q(h_)
-        k = self.k(h_)
-        v = self.v(h_)
-
-        b, c, h, w = q.shape
-        q = rearrange(q, "b c h w -> b 1 (h w) c").contiguous()
-        k = rearrange(k, "b c h w -> b 1 (h w) c").contiguous()
-        v = rearrange(v, "b c h w -> b 1 (h w) c").contiguous()
-        h_ = nn.functional.scaled_dot_product_attention(q, k, v)
-
-        return rearrange(h_, "b 1 (h w) c -> b c h w", h=h, w=w, c=c, b=b)
-
-    def forward(self, x: Tensor) -> Tensor:
-        return x + self.proj_out(self.attention(x))
-
-
-class ResnetBlock(nn.Module):
-    def __init__(self, in_channels: int, out_channels: int):
-        super().__init__()
-        self.in_channels = in_channels
-        out_channels = in_channels if out_channels is None else out_channels
-        self.out_channels = out_channels
-
-        self.norm1 = nn.GroupNorm(num_groups=32, num_channels=in_channels, eps=1e-6, affine=True)
-        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=1, padding=1)
-        self.norm2 = nn.GroupNorm(num_groups=32, num_channels=out_channels, eps=1e-6, affine=True)
-        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1)
-        if self.in_channels != self.out_channels:
-            self.nin_shortcut = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, padding=0)
-
-    def forward(self, x):
-        h = x
-        h = self.norm1(h)
-        h = torch.nn.functional.silu(h)
-        h = self.conv1(h)
-
-        h = self.norm2(h)
-        h = torch.nn.functional.silu(h)
-        h = self.conv2(h)
-
-        if self.in_channels != self.out_channels:
-            x = self.nin_shortcut(x)
-
-        return x + h
-
-
-class Downsample(nn.Module):
-    def __init__(self, in_channels: int):
-        super().__init__()
-        # no asymmetric padding in torch conv, must do it ourselves
-        self.conv = nn.Conv2d(in_channels, in_channels, kernel_size=3, stride=2, padding=0)
-
-    def forward(self, x: Tensor):
-        pad = (0, 1, 0, 1)
-        x = nn.functional.pad(x, pad, mode="constant", value=0)
-        x = self.conv(x)
-        return x
-
-
-class Upsample(nn.Module):
-    def __init__(self, in_channels: int):
-        super().__init__()
-        self.conv = nn.Conv2d(in_channels, in_channels, kernel_size=3, stride=1, padding=1)
-
-    def forward(self, x: Tensor):
-        x = nn.functional.interpolate(x, scale_factor=2.0, mode="nearest")
-        x = self.conv(x)
-        return x
-
-
-class Encoder(nn.Module):
-    def __init__(
-        self,
-        resolution: int,
-        in_channels: int,
-        ch: int,
-        ch_mult: list[int],
-        num_res_blocks: int,
-        z_channels: int,
-    ):
-        super().__init__()
-        self.ch = ch
-        self.num_resolutions = len(ch_mult)
-        self.num_res_blocks = num_res_blocks
-        self.resolution = resolution
-        self.in_channels = in_channels
-        # downsampling
-        self.conv_in = nn.Conv2d(in_channels, self.ch, kernel_size=3, stride=1, padding=1)
-
-        curr_res = resolution
-        in_ch_mult = (1,) + tuple(ch_mult)
-        self.in_ch_mult = in_ch_mult
-        self.down = nn.ModuleList()
-        block_in = self.ch
-        for i_level in range(self.num_resolutions):
-            block = nn.ModuleList()
-            attn = nn.ModuleList()
-            block_in = ch * in_ch_mult[i_level]
-            block_out = ch * ch_mult[i_level]
-            for _ in range(self.num_res_blocks):
-                block.append(ResnetBlock(in_channels=block_in, out_channels=block_out))
-                block_in = block_out
-            down = nn.Module()
-            down.block = block
-            down.attn = attn
-            if i_level != self.num_resolutions - 1:
-                down.downsample = Downsample(block_in)
-                curr_res = curr_res // 2
-            self.down.append(down)
-
-        # middle
-        self.mid = nn.Module()
-        self.mid.block_1 = ResnetBlock(in_channels=block_in, out_channels=block_in)
-        self.mid.attn_1 = AttnBlock(block_in)
-        self.mid.block_2 = ResnetBlock(in_channels=block_in, out_channels=block_in)
-
-        # end
-        self.norm_out = nn.GroupNorm(num_groups=32, num_channels=block_in, eps=1e-6, affine=True)
-        self.conv_out = nn.Conv2d(block_in, 2 * z_channels, kernel_size=3, stride=1, padding=1)
-
-    def forward(self, x: Tensor) -> Tensor:
-        # downsampling
-        hs = [self.conv_in(x)]
-        for i_level in range(self.num_resolutions):
-            for i_block in range(self.num_res_blocks):
-                h = self.down[i_level].block[i_block](hs[-1])
-                if len(self.down[i_level].attn) > 0:
-                    h = self.down[i_level].attn[i_block](h)
-                hs.append(h)
-            if i_level != self.num_resolutions - 1:
-                hs.append(self.down[i_level].downsample(hs[-1]))
-
-        # middle
-        h = hs[-1]
-        h = self.mid.block_1(h)
-        h = self.mid.attn_1(h)
-        h = self.mid.block_2(h)
-        # end
-        h = self.norm_out(h)
-        h = torch.nn.functional.silu(h)
-        h = self.conv_out(h)
-        return h
-
-
-class Decoder(nn.Module):
-    def __init__(
-        self,
-        ch: int,
-        out_ch: int,
-        ch_mult: list[int],
-        num_res_blocks: int,
-        in_channels: int,
-        resolution: int,
-        z_channels: int,
-    ):
-        super().__init__()
-        self.ch = ch
-        self.num_resolutions = len(ch_mult)
-        self.num_res_blocks = num_res_blocks
-        self.resolution = resolution
-        self.in_channels = in_channels
-        self.ffactor = 2 ** (self.num_resolutions - 1)
-
-        # compute in_ch_mult, block_in and curr_res at lowest res
-        block_in = ch * ch_mult[self.num_resolutions - 1]
-        curr_res = resolution // 2 ** (self.num_resolutions - 1)
-        self.z_shape = (1, z_channels, curr_res, curr_res)
-
-        # z to block_in
-        self.conv_in = nn.Conv2d(z_channels, block_in, kernel_size=3, stride=1, padding=1)
-
-        # middle
-        self.mid = nn.Module()
-        self.mid.block_1 = ResnetBlock(in_channels=block_in, out_channels=block_in)
-        self.mid.attn_1 = AttnBlock(block_in)
-        self.mid.block_2 = ResnetBlock(in_channels=block_in, out_channels=block_in)
-
-        # upsampling
-        self.up = nn.ModuleList()
-        for i_level in reversed(range(self.num_resolutions)):
-            block = nn.ModuleList()
-            attn = nn.ModuleList()
-            block_out = ch * ch_mult[i_level]
-            for _ in range(self.num_res_blocks + 1):
-                block.append(ResnetBlock(in_channels=block_in, out_channels=block_out))
-                block_in = block_out
-            up = nn.Module()
-            up.block = block
-            up.attn = attn
-            if i_level != 0:
-                up.upsample = Upsample(block_in)
-                curr_res = curr_res * 2
-            self.up.insert(0, up)  # prepend to get consistent order
-
-        # end
-        self.norm_out = nn.GroupNorm(num_groups=32, num_channels=block_in, eps=1e-6, affine=True)
-        self.conv_out = nn.Conv2d(block_in, out_ch, kernel_size=3, stride=1, padding=1)
-
-    def forward(self, z: Tensor) -> Tensor:
-        # z to block_in
-        h = self.conv_in(z)
-
-        # middle
-        h = self.mid.block_1(h)
-        h = self.mid.attn_1(h)
-        h = self.mid.block_2(h)
-
-        # upsampling
-        for i_level in reversed(range(self.num_resolutions)):
-            for i_block in range(self.num_res_blocks + 1):
-                h = self.up[i_level].block[i_block](h)
-                if len(self.up[i_level].attn) > 0:
-                    h = self.up[i_level].attn[i_block](h)
-            if i_level != 0:
-                h = self.up[i_level].upsample(h)
-
-        # end
-        h = self.norm_out(h)
-        h = torch.nn.functional.silu(h)
-        h = self.conv_out(h)
-        return h
-
-
-class DiagonalGaussian(nn.Module):
-    def __init__(self, sample: bool = True, chunk_dim: int = 1):
-        super().__init__()
-        self.sample = sample
-        self.chunk_dim = chunk_dim
-
-    def forward(self, z: Tensor) -> Tensor:
-        mean, logvar = torch.chunk(z, 2, dim=self.chunk_dim)
-        if self.sample:
-            std = torch.exp(0.5 * logvar)
-            return mean + std * torch.randn_like(mean)
-        else:
-            return mean
-
-
-class AutoEncoder(nn.Module):
-    def __init__(self, params: AutoEncoderParams):
-        super().__init__()
-        self.encoder = Encoder(
-            resolution=params.resolution,
-            in_channels=params.in_channels,
-            ch=params.ch,
-            ch_mult=params.ch_mult,
-            num_res_blocks=params.num_res_blocks,
-            z_channels=params.z_channels,
-        )
-        self.decoder = Decoder(
-            resolution=params.resolution,
-            in_channels=params.in_channels,
-            ch=params.ch,
-            out_ch=params.out_ch,
-            ch_mult=params.ch_mult,
-            num_res_blocks=params.num_res_blocks,
-            z_channels=params.z_channels,
-        )
-        self.reg = DiagonalGaussian()
-
-        self.scale_factor = params.scale_factor
-        self.shift_factor = params.shift_factor
-
-    def encode(self, x: Tensor) -> Tensor:
-        z = self.reg(self.encoder(x))
-        z = self.scale_factor * (z - self.shift_factor)
-        return z
-
-    def decode(self, z: Tensor) -> Tensor:
-        z = z / self.scale_factor + self.shift_factor
-        return self.decoder(z)
-
-    def forward(self, x: Tensor) -> Tensor:
-        return self.decode(self.encode(x))
--- a/invokeai/backend/flux/modules/conditioner.py
+++ b/invokeai/backend/flux/modules/conditioner.py
@ -1,33 +0,0 @@
-# Initially pulled from https://github.com/black-forest-labs/flux
-
-from torch import Tensor, nn
-from transformers import PreTrainedModel, PreTrainedTokenizer
-
-
-class HFEncoder(nn.Module):
-    def __init__(self, encoder: PreTrainedModel, tokenizer: PreTrainedTokenizer, is_clip: bool, max_length: int):
-        super().__init__()
-        self.max_length = max_length
-        self.is_clip = is_clip
-        self.output_key = "pooler_output" if self.is_clip else "last_hidden_state"
-        self.tokenizer = tokenizer
-        self.hf_module = encoder
-        self.hf_module = self.hf_module.eval().requires_grad_(False)
-
-    def forward(self, text: list[str]) -> Tensor:
-        batch_encoding = self.tokenizer(
-            text,
-            truncation=True,
-            max_length=self.max_length,
-            return_length=False,
-            return_overflowing_tokens=False,
-            padding="max_length",
-            return_tensors="pt",
-        )
-
-        outputs = self.hf_module(
-            input_ids=batch_encoding["input_ids"].to(self.hf_module.device),
-            attention_mask=None,
-            output_hidden_states=False,
-        )
-        return outputs[self.output_key]
--- a/invokeai/backend/flux/modules/layers.py
+++ b/invokeai/backend/flux/modules/layers.py
@ -1,253 +0,0 @@
-# Initially pulled from https://github.com/black-forest-labs/flux
-
-import math
-from dataclasses import dataclass
-
-import torch
-from einops import rearrange
-from torch import Tensor, nn
-
-from invokeai.backend.flux.math import attention, rope
-
-
-class EmbedND(nn.Module):
-    def __init__(self, dim: int, theta: int, axes_dim: list[int]):
-        super().__init__()
-        self.dim = dim
-        self.theta = theta
-        self.axes_dim = axes_dim
-
-    def forward(self, ids: Tensor) -> Tensor:
-        n_axes = ids.shape[-1]
-        emb = torch.cat(
-            [rope(ids[..., i], self.axes_dim[i], self.theta) for i in range(n_axes)],
-            dim=-3,
-        )
-
-        return emb.unsqueeze(1)
-
-
-def timestep_embedding(t: Tensor, dim, max_period=10000, time_factor: float = 1000.0):
-    """
-    Create sinusoidal timestep embeddings.
-    :param t: a 1-D Tensor of N indices, one per batch element.
-                      These may be fractional.
-    :param dim: the dimension of the output.
-    :param max_period: controls the minimum frequency of the embeddings.
-    :return: an (N, D) Tensor of positional embeddings.
-    """
-    t = time_factor * t
-    half = dim // 2
-    freqs = torch.exp(-math.log(max_period) * torch.arange(start=0, end=half, dtype=torch.float32) / half).to(t.device)
-
-    args = t[:, None].float() * freqs[None]
-    embedding = torch.cat([torch.cos(args), torch.sin(args)], dim=-1)
-    if dim % 2:
-        embedding = torch.cat([embedding, torch.zeros_like(embedding[:, :1])], dim=-1)
-    if torch.is_floating_point(t):
-        embedding = embedding.to(t)
-    return embedding
-
-
-class MLPEmbedder(nn.Module):
-    def __init__(self, in_dim: int, hidden_dim: int):
-        super().__init__()
-        self.in_layer = nn.Linear(in_dim, hidden_dim, bias=True)
-        self.silu = nn.SiLU()
-        self.out_layer = nn.Linear(hidden_dim, hidden_dim, bias=True)
-
-    def forward(self, x: Tensor) -> Tensor:
-        return self.out_layer(self.silu(self.in_layer(x)))
-
-
-class RMSNorm(torch.nn.Module):
-    def __init__(self, dim: int):
-        super().__init__()
-        self.scale = nn.Parameter(torch.ones(dim))
-
-    def forward(self, x: Tensor):
-        x_dtype = x.dtype
-        x = x.float()
-        rrms = torch.rsqrt(torch.mean(x**2, dim=-1, keepdim=True) + 1e-6)
-        return (x * rrms).to(dtype=x_dtype) * self.scale
-
-
-class QKNorm(torch.nn.Module):
-    def __init__(self, dim: int):
-        super().__init__()
-        self.query_norm = RMSNorm(dim)
-        self.key_norm = RMSNorm(dim)
-
-    def forward(self, q: Tensor, k: Tensor, v: Tensor) -> tuple[Tensor, Tensor]:
-        q = self.query_norm(q)
-        k = self.key_norm(k)
-        return q.to(v), k.to(v)
-
-
-class SelfAttention(nn.Module):
-    def __init__(self, dim: int, num_heads: int = 8, qkv_bias: bool = False):
-        super().__init__()
-        self.num_heads = num_heads
-        head_dim = dim // num_heads
-
-        self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)
-        self.norm = QKNorm(head_dim)
-        self.proj = nn.Linear(dim, dim)
-
-    def forward(self, x: Tensor, pe: Tensor) -> Tensor:
-        qkv = self.qkv(x)
-        q, k, v = rearrange(qkv, "B L (K H D) -> K B H L D", K=3, H=self.num_heads)
-        q, k = self.norm(q, k, v)
-        x = attention(q, k, v, pe=pe)
-        x = self.proj(x)
-        return x
-
-
-@dataclass
-class ModulationOut:
-    shift: Tensor
-    scale: Tensor
-    gate: Tensor
-
-
-class Modulation(nn.Module):
-    def __init__(self, dim: int, double: bool):
-        super().__init__()
-        self.is_double = double
-        self.multiplier = 6 if double else 3
-        self.lin = nn.Linear(dim, self.multiplier * dim, bias=True)
-
-    def forward(self, vec: Tensor) -> tuple[ModulationOut, ModulationOut | None]:
-        out = self.lin(nn.functional.silu(vec))[:, None, :].chunk(self.multiplier, dim=-1)
-
-        return (
-            ModulationOut(*out[:3]),
-            ModulationOut(*out[3:]) if self.is_double else None,
-        )
-
-
-class DoubleStreamBlock(nn.Module):
-    def __init__(self, hidden_size: int, num_heads: int, mlp_ratio: float, qkv_bias: bool = False):
-        super().__init__()
-
-        mlp_hidden_dim = int(hidden_size * mlp_ratio)
-        self.num_heads = num_heads
-        self.hidden_size = hidden_size
-        self.img_mod = Modulation(hidden_size, double=True)
-        self.img_norm1 = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)
-        self.img_attn = SelfAttention(dim=hidden_size, num_heads=num_heads, qkv_bias=qkv_bias)
-
-        self.img_norm2 = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)
-        self.img_mlp = nn.Sequential(
-            nn.Linear(hidden_size, mlp_hidden_dim, bias=True),
-            nn.GELU(approximate="tanh"),
-            nn.Linear(mlp_hidden_dim, hidden_size, bias=True),
-        )
-
-        self.txt_mod = Modulation(hidden_size, double=True)
-        self.txt_norm1 = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)
-        self.txt_attn = SelfAttention(dim=hidden_size, num_heads=num_heads, qkv_bias=qkv_bias)
-
-        self.txt_norm2 = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)
-        self.txt_mlp = nn.Sequential(
-            nn.Linear(hidden_size, mlp_hidden_dim, bias=True),
-            nn.GELU(approximate="tanh"),
-            nn.Linear(mlp_hidden_dim, hidden_size, bias=True),
-        )
-
-    def forward(self, img: Tensor, txt: Tensor, vec: Tensor, pe: Tensor) -> tuple[Tensor, Tensor]:
-        img_mod1, img_mod2 = self.img_mod(vec)
-        txt_mod1, txt_mod2 = self.txt_mod(vec)
-
-        # prepare image for attention
-        img_modulated = self.img_norm1(img)
-        img_modulated = (1 + img_mod1.scale) * img_modulated + img_mod1.shift
-        img_qkv = self.img_attn.qkv(img_modulated)
-        img_q, img_k, img_v = rearrange(img_qkv, "B L (K H D) -> K B H L D", K=3, H=self.num_heads)
-        img_q, img_k = self.img_attn.norm(img_q, img_k, img_v)
-
-        # prepare txt for attention
-        txt_modulated = self.txt_norm1(txt)
-        txt_modulated = (1 + txt_mod1.scale) * txt_modulated + txt_mod1.shift
-        txt_qkv = self.txt_attn.qkv(txt_modulated)
-        txt_q, txt_k, txt_v = rearrange(txt_qkv, "B L (K H D) -> K B H L D", K=3, H=self.num_heads)
-        txt_q, txt_k = self.txt_attn.norm(txt_q, txt_k, txt_v)
-
-        # run actual attention
-        q = torch.cat((txt_q, img_q), dim=2)
-        k = torch.cat((txt_k, img_k), dim=2)
-        v = torch.cat((txt_v, img_v), dim=2)
-
-        attn = attention(q, k, v, pe=pe)
-        txt_attn, img_attn = attn[:, : txt.shape[1]], attn[:, txt.shape[1] :]
-
-        # calculate the img bloks
-        img = img + img_mod1.gate * self.img_attn.proj(img_attn)
-        img = img + img_mod2.gate * self.img_mlp((1 + img_mod2.scale) * self.img_norm2(img) + img_mod2.shift)
-
-        # calculate the txt bloks
-        txt = txt + txt_mod1.gate * self.txt_attn.proj(txt_attn)
-        txt = txt + txt_mod2.gate * self.txt_mlp((1 + txt_mod2.scale) * self.txt_norm2(txt) + txt_mod2.shift)
-        return img, txt
-
-
-class SingleStreamBlock(nn.Module):
-    """
-    A DiT block with parallel linear layers as described in
-    https://arxiv.org/abs/2302.05442 and adapted modulation interface.
-    """
-
-    def __init__(
-        self,
-        hidden_size: int,
-        num_heads: int,
-        mlp_ratio: float = 4.0,
-        qk_scale: float | None = None,
-    ):
-        super().__init__()
-        self.hidden_dim = hidden_size
-        self.num_heads = num_heads
-        head_dim = hidden_size // num_heads
-        self.scale = qk_scale or head_dim**-0.5
-
-        self.mlp_hidden_dim = int(hidden_size * mlp_ratio)
-        # qkv and mlp_in
-        self.linear1 = nn.Linear(hidden_size, hidden_size * 3 + self.mlp_hidden_dim)
-        # proj and mlp_out
-        self.linear2 = nn.Linear(hidden_size + self.mlp_hidden_dim, hidden_size)
-
-        self.norm = QKNorm(head_dim)
-
-        self.hidden_size = hidden_size
-        self.pre_norm = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)
-
-        self.mlp_act = nn.GELU(approximate="tanh")
-        self.modulation = Modulation(hidden_size, double=False)
-
-    def forward(self, x: Tensor, vec: Tensor, pe: Tensor) -> Tensor:
-        mod, _ = self.modulation(vec)
-        x_mod = (1 + mod.scale) * self.pre_norm(x) + mod.shift
-        qkv, mlp = torch.split(self.linear1(x_mod), [3 * self.hidden_size, self.mlp_hidden_dim], dim=-1)
-
-        q, k, v = rearrange(qkv, "B L (K H D) -> K B H L D", K=3, H=self.num_heads)
-        q, k = self.norm(q, k, v)
-
-        # compute attention
-        attn = attention(q, k, v, pe=pe)
-        # compute activation in mlp stream, cat again and run second linear layer
-        output = self.linear2(torch.cat((attn, self.mlp_act(mlp)), 2))
-        return x + mod.gate * output
-
-
-class LastLayer(nn.Module):
-    def __init__(self, hidden_size: int, patch_size: int, out_channels: int):
-        super().__init__()
-        self.norm_final = nn.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6)
-        self.linear = nn.Linear(hidden_size, patch_size * patch_size * out_channels, bias=True)
-        self.adaLN_modulation = nn.Sequential(nn.SiLU(), nn.Linear(hidden_size, 2 * hidden_size, bias=True))
-
-    def forward(self, x: Tensor, vec: Tensor) -> Tensor:
-        shift, scale = self.adaLN_modulation(vec).chunk(2, dim=1)
-        x = (1 + scale[:, None, :]) * self.norm_final(x) + shift[:, None, :]
-        x = self.linear(x)
-        return x
--- a/invokeai/backend/flux/sampling.py
+++ b/invokeai/backend/flux/sampling.py
@ -1,176 +0,0 @@
-# Initially pulled from https://github.com/black-forest-labs/flux
-
-import math
-from typing import Callable
-
-import torch
-from einops import rearrange, repeat
-from torch import Tensor
-from tqdm import tqdm
-
-from invokeai.backend.flux.model import Flux
-from invokeai.backend.flux.modules.conditioner import HFEncoder
-
-
-def get_noise(
-    num_samples: int,
-    height: int,
-    width: int,
-    device: torch.device,
-    dtype: torch.dtype,
-    seed: int,
-):
-    # We always generate noise on the same device and dtype then cast to ensure consistency across devices/dtypes.
-    rand_device = "cpu"
-    rand_dtype = torch.float16
-    return torch.randn(
-        num_samples,
-        16,
-        # allow for packing
-        2 * math.ceil(height / 16),
-        2 * math.ceil(width / 16),
-        device=rand_device,
-        dtype=rand_dtype,
-        generator=torch.Generator(device=rand_device).manual_seed(seed),
-    ).to(device=device, dtype=dtype)
-
-
-def prepare(t5: HFEncoder, clip: HFEncoder, img: Tensor, prompt: str | list[str]) -> dict[str, Tensor]:
-    bs, c, h, w = img.shape
-    if bs == 1 and not isinstance(prompt, str):
-        bs = len(prompt)
-
-    img = rearrange(img, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=2, pw=2)
-    if img.shape[0] == 1 and bs > 1:
-        img = repeat(img, "1 ... -> bs ...", bs=bs)
-
-    img_ids = torch.zeros(h // 2, w // 2, 3)
-    img_ids[..., 1] = img_ids[..., 1] + torch.arange(h // 2)[:, None]
-    img_ids[..., 2] = img_ids[..., 2] + torch.arange(w // 2)[None, :]
-    img_ids = repeat(img_ids, "h w c -> b (h w) c", b=bs)
-
-    if isinstance(prompt, str):
-        prompt = [prompt]
-    txt = t5(prompt)
-    if txt.shape[0] == 1 and bs > 1:
-        txt = repeat(txt, "1 ... -> bs ...", bs=bs)
-    txt_ids = torch.zeros(bs, txt.shape[1], 3)
-
-    vec = clip(prompt)
-    if vec.shape[0] == 1 and bs > 1:
-        vec = repeat(vec, "1 ... -> bs ...", bs=bs)
-
-    return {
-        "img": img,
-        "img_ids": img_ids.to(img.device),
-        "txt": txt.to(img.device),
-        "txt_ids": txt_ids.to(img.device),
-        "vec": vec.to(img.device),
-    }
-
-
-def time_shift(mu: float, sigma: float, t: Tensor):
-    return math.exp(mu) / (math.exp(mu) + (1 / t - 1) ** sigma)
-
-
-def get_lin_function(x1: float = 256, y1: float = 0.5, x2: float = 4096, y2: float = 1.15) -> Callable[[float], float]:
-    m = (y2 - y1) / (x2 - x1)
-    b = y1 - m * x1
-    return lambda x: m * x + b
-
-
-def get_schedule(
-    num_steps: int,
-    image_seq_len: int,
-    base_shift: float = 0.5,
-    max_shift: float = 1.15,
-    shift: bool = True,
-) -> list[float]:
-    # extra step for zero
-    timesteps = torch.linspace(1, 0, num_steps + 1)
-
-    # shifting the schedule to favor high timesteps for higher signal images
-    if shift:
-        # eastimate mu based on linear estimation between two points
-        mu = get_lin_function(y1=base_shift, y2=max_shift)(image_seq_len)
-        timesteps = time_shift(mu, 1.0, timesteps)
-
-    return timesteps.tolist()
-
-
-def denoise(
-    model: Flux,
-    # model input
-    img: Tensor,
-    img_ids: Tensor,
-    txt: Tensor,
-    txt_ids: Tensor,
-    vec: Tensor,
-    # sampling parameters
-    timesteps: list[float],
-    step_callback: Callable[[], None],
-    guidance: float = 4.0,
-):
-    dtype = model.txt_in.bias.dtype
-
-    # TODO(ryand): This shouldn't be necessary if we manage the dtypes properly in the caller.
-    img = img.to(dtype=dtype)
-    img_ids = img_ids.to(dtype=dtype)
-    txt = txt.to(dtype=dtype)
-    txt_ids = txt_ids.to(dtype=dtype)
-    vec = vec.to(dtype=dtype)
-
-    # this is ignored for schnell
-    guidance_vec = torch.full((img.shape[0],), guidance, device=img.device, dtype=img.dtype)
-    for t_curr, t_prev in tqdm(list(zip(timesteps[:-1], timesteps[1:], strict=True))):
-        t_vec = torch.full((img.shape[0],), t_curr, dtype=img.dtype, device=img.device)
-        pred = model(
-            img=img,
-            img_ids=img_ids,
-            txt=txt,
-            txt_ids=txt_ids,
-            y=vec,
-            timesteps=t_vec,
-            guidance=guidance_vec,
-        )
-
-        img = img + (t_prev - t_curr) * pred
-        step_callback()
-
-    return img
-
-
-def unpack(x: Tensor, height: int, width: int) -> Tensor:
-    return rearrange(
-        x,
-        "b (h w) (c ph pw) -> b c (h ph) (w pw)",
-        h=math.ceil(height / 16),
-        w=math.ceil(width / 16),
-        ph=2,
-        pw=2,
-    )
-
-
-def prepare_latent_img_patches(latent_img: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor]:
-    """Convert an input image in latent space to patches for diffusion.
-
-    This implementation was extracted from:
-    https://github.com/black-forest-labs/flux/blob/c00d7c60b085fce8058b9df845e036090873f2ce/src/flux/sampling.py#L32
-
-    Returns:
-        tuple[Tensor, Tensor]: (img, img_ids), as defined in the original flux repo.
-    """
-    bs, c, h, w = latent_img.shape
-
-    # Pixel unshuffle with a scale of 2, and flatten the height/width dimensions to get an array of patches.
-    img = rearrange(latent_img, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=2, pw=2)
-    if img.shape[0] == 1 and bs > 1:
-        img = repeat(img, "1 ... -> bs ...", bs=bs)
-
-    # Generate patch position ids.
-    img_ids = torch.zeros(h // 2, w // 2, 3, device=img.device)
-    img_ids[..., 1] = img_ids[..., 1] + torch.arange(h // 2, device=img.device)[:, None]
-    img_ids[..., 2] = img_ids[..., 2] + torch.arange(w // 2, device=img.device)[None, :]
-    img_ids = repeat(img_ids, "h w c -> b (h w) c", b=bs)
-
-    return img, img_ids
--- a/invokeai/backend/flux/util.py
+++ b/invokeai/backend/flux/util.py
@ -1,71 +0,0 @@
-# Initially pulled from https://github.com/black-forest-labs/flux
-
-from dataclasses import dataclass
-from typing import Dict, Literal
-
-from invokeai.backend.flux.model import FluxParams
-from invokeai.backend.flux.modules.autoencoder import AutoEncoderParams
-
-
-@dataclass
-class ModelSpec:
-    params: FluxParams
-    ae_params: AutoEncoderParams
-    ckpt_path: str | None
-    ae_path: str | None
-    repo_id: str | None
-    repo_flow: str | None
-    repo_ae: str | None
-
-
-max_seq_lengths: Dict[str, Literal[256, 512]] = {
-    "flux-dev": 512,
-    "flux-schnell": 256,
-}
-
-
-ae_params = {
-    "flux": AutoEncoderParams(
-        resolution=256,
-        in_channels=3,
-        ch=128,
-        out_ch=3,
-        ch_mult=[1, 2, 4, 4],
-        num_res_blocks=2,
-        z_channels=16,
-        scale_factor=0.3611,
-        shift_factor=0.1159,
-    )
-}
-
-
-params = {
-    "flux-dev": FluxParams(
-        in_channels=64,
-        vec_in_dim=768,
-        context_in_dim=4096,
-        hidden_size=3072,
-        mlp_ratio=4.0,
-        num_heads=24,
-        depth=19,
-        depth_single_blocks=38,
-        axes_dim=[16, 56, 56],
-        theta=10_000,
-        qkv_bias=True,
-        guidance_embed=True,
-    ),
-    "flux-schnell": FluxParams(
-        in_channels=64,
-        vec_in_dim=768,
-        context_in_dim=4096,
-        hidden_size=3072,
-        mlp_ratio=4.0,
-        num_heads=24,
-        depth=19,
-        depth_single_blocks=38,
-        axes_dim=[16, 56, 56],
-        theta=10_000,
-        qkv_bias=True,
-        guidance_embed=False,
-    ),
-}
--- a/invokeai/backend/model_manager/config.py
+++ b/invokeai/backend/model_manager/config.py
@ -52,7 +52,6 @@ class BaseModelType(str, Enum):
    StableDiffusion2 = "sd-2"
    StableDiffusionXL = "sdxl"
    StableDiffusionXLRefiner = "sdxl-refiner"
-    Flux = "flux"
    # Kandinsky2_1 = "kandinsky-2.1"


@ -67,9 +66,7 @@ class ModelType(str, Enum):
    TextualInversion = "embedding"
    IPAdapter = "ip_adapter"
    CLIPVision = "clip_vision"
-    CLIPEmbed = "clip_embed"
    T2IAdapter = "t2i_adapter"
-    T5Encoder = "t5_encoder"
    SpandrelImageToImage = "spandrel_image_to_image"


@ -77,7 +74,6 @@ class SubModelType(str, Enum):
    """Submodel type."""

    UNet = "unet"
-    Transformer = "transformer"
    TextEncoder = "text_encoder"
    TextEncoder2 = "text_encoder_2"
    Tokenizer = "tokenizer"
@ -108,9 +104,6 @@ class ModelFormat(str, Enum):
    EmbeddingFile = "embedding_file"
    EmbeddingFolder = "embedding_folder"
    InvokeAI = "invokeai"
-    T5Encoder = "t5_encoder"
-    BnbQuantizedLlmInt8b = "bnb_quantized_int8b"
-    BnbQuantizednf4b = "bnb_quantized_nf4b"


 class SchedulerPredictionType(str, Enum):
@ -193,9 +186,7 @@ class ModelConfigBase(BaseModel):
 class CheckpointConfigBase(ModelConfigBase):
    """Model config for checkpoint-style models."""

-    format: Literal[ModelFormat.Checkpoint, ModelFormat.BnbQuantizednf4b] = Field(
-        description="Format of the provided checkpoint model", default=ModelFormat.Checkpoint
-    )
+    format: Literal[ModelFormat.Checkpoint] = ModelFormat.Checkpoint
    config_path: str = Field(description="path to the checkpoint model config file")
    converted_at: Optional[float] = Field(
        description="When this model was last converted to diffusers", default_factory=time.time
@ -214,26 +205,6 @@ class LoRAConfigBase(ModelConfigBase):
    trigger_phrases: Optional[set[str]] = Field(description="Set of trigger phrases for this model", default=None)


-class T5EncoderConfigBase(ModelConfigBase):
-    type: Literal[ModelType.T5Encoder] = ModelType.T5Encoder
-
-
-class T5EncoderConfig(T5EncoderConfigBase):
-    format: Literal[ModelFormat.T5Encoder] = ModelFormat.T5Encoder
-
-    @staticmethod
-    def get_tag() -> Tag:
-        return Tag(f"{ModelType.T5Encoder.value}.{ModelFormat.T5Encoder.value}")
-
-
-class T5EncoderBnbQuantizedLlmInt8bConfig(T5EncoderConfigBase):
-    format: Literal[ModelFormat.BnbQuantizedLlmInt8b] = ModelFormat.BnbQuantizedLlmInt8b
-
-    @staticmethod
-    def get_tag() -> Tag:
-        return Tag(f"{ModelType.T5Encoder.value}.{ModelFormat.BnbQuantizedLlmInt8b.value}")
-
-
 class LoRALyCORISConfig(LoRAConfigBase):
    """Model config for LoRA/Lycoris models."""

@ -258,6 +229,7 @@ class VAECheckpointConfig(CheckpointConfigBase):
    """Model config for standalone VAE models."""

    type: Literal[ModelType.VAE] = ModelType.VAE
+    format: Literal[ModelFormat.Checkpoint] = ModelFormat.Checkpoint

    @staticmethod
    def get_tag() -> Tag:
@ -296,6 +268,7 @@ class ControlNetCheckpointConfig(CheckpointConfigBase, ControlAdapterConfigBase)
    """Model config for ControlNet models (diffusers version)."""

    type: Literal[ModelType.ControlNet] = ModelType.ControlNet
+    format: Literal[ModelFormat.Checkpoint] = ModelFormat.Checkpoint

    @staticmethod
    def get_tag() -> Tag:
@ -344,21 +317,6 @@ class MainCheckpointConfig(CheckpointConfigBase, MainConfigBase):
        return Tag(f"{ModelType.Main.value}.{ModelFormat.Checkpoint.value}")


-class MainBnbQuantized4bCheckpointConfig(CheckpointConfigBase, MainConfigBase):
-    """Model config for main checkpoint models."""
-
-    prediction_type: SchedulerPredictionType = SchedulerPredictionType.Epsilon
-    upcast_attention: bool = False
-
-    def __init__(self, *args, **kwargs):
-        super().__init__(*args, **kwargs)
-        self.format = ModelFormat.BnbQuantizednf4b
-
-    @staticmethod
-    def get_tag() -> Tag:
-        return Tag(f"{ModelType.Main.value}.{ModelFormat.BnbQuantizednf4b.value}")
-
-
 class MainDiffusersConfig(DiffusersConfigBase, MainConfigBase):
    """Model config for main diffusers models."""

@ -392,17 +350,6 @@ class IPAdapterCheckpointConfig(IPAdapterBaseConfig):
        return Tag(f"{ModelType.IPAdapter.value}.{ModelFormat.Checkpoint.value}")


-class CLIPEmbedDiffusersConfig(DiffusersConfigBase):
-    """Model config for Clip Embeddings."""
-
-    type: Literal[ModelType.CLIPEmbed] = ModelType.CLIPEmbed
-    format: Literal[ModelFormat.Diffusers] = ModelFormat.Diffusers
-
-    @staticmethod
-    def get_tag() -> Tag:
-        return Tag(f"{ModelType.CLIPEmbed.value}.{ModelFormat.Diffusers.value}")
-
-
 class CLIPVisionDiffusersConfig(DiffusersConfigBase):
    """Model config for CLIPVision."""

@ -461,15 +408,12 @@ AnyModelConfig = Annotated[
    Union[
        Annotated[MainDiffusersConfig, MainDiffusersConfig.get_tag()],
        Annotated[MainCheckpointConfig, MainCheckpointConfig.get_tag()],
-        Annotated[MainBnbQuantized4bCheckpointConfig, MainBnbQuantized4bCheckpointConfig.get_tag()],
        Annotated[VAEDiffusersConfig, VAEDiffusersConfig.get_tag()],
        Annotated[VAECheckpointConfig, VAECheckpointConfig.get_tag()],
        Annotated[ControlNetDiffusersConfig, ControlNetDiffusersConfig.get_tag()],
        Annotated[ControlNetCheckpointConfig, ControlNetCheckpointConfig.get_tag()],
        Annotated[LoRALyCORISConfig, LoRALyCORISConfig.get_tag()],
        Annotated[LoRADiffusersConfig, LoRADiffusersConfig.get_tag()],
-        Annotated[T5EncoderConfig, T5EncoderConfig.get_tag()],
-        Annotated[T5EncoderBnbQuantizedLlmInt8bConfig, T5EncoderBnbQuantizedLlmInt8bConfig.get_tag()],
        Annotated[TextualInversionFileConfig, TextualInversionFileConfig.get_tag()],
        Annotated[TextualInversionFolderConfig, TextualInversionFolderConfig.get_tag()],
        Annotated[IPAdapterInvokeAIConfig, IPAdapterInvokeAIConfig.get_tag()],
@ -477,7 +421,6 @@ AnyModelConfig = Annotated[
        Annotated[T2IAdapterConfig, T2IAdapterConfig.get_tag()],
        Annotated[SpandrelImageToImageConfig, SpandrelImageToImageConfig.get_tag()],
        Annotated[CLIPVisionDiffusersConfig, CLIPVisionDiffusersConfig.get_tag()],
-        Annotated[CLIPEmbedDiffusersConfig, CLIPEmbedDiffusersConfig.get_tag()],
    ],
    Discriminator(get_model_discriminator_value),
 ]
--- a/invokeai/backend/model_manager/load/model_loaders/controlnet.py
+++ b/invokeai/backend/model_manager/load/model_loaders/controlnet.py
@ -1,10 +1,12 @@
 # Copyright (c) 2024, Lincoln D. Stein and the InvokeAI Development Team
 """Class for ControlNet model loading in InvokeAI."""

+from pathlib import Path
 from typing import Optional

 from diffusers import ControlNetModel

+import invokeai.backend.assets.sd_base_conf_files as conf_file_cache
 from invokeai.backend.model_manager import (
    AnyModel,
    AnyModelConfig,
@ -27,9 +29,20 @@ class ControlNetLoader(GenericDiffusersLoader):
        config: AnyModelConfig,
        submodel_type: Optional[SubModelType] = None,
    ) -> AnyModel:
+        config_dirs = {
+            BaseModelType.StableDiffusion1: "controlnet_sd15",
+            BaseModelType.StableDiffusionXL: "controlnet_sdxl",
+        }
+        try:
+            config_dir = config_dirs[config.base]
+        except KeyError:
+            raise Exception(f"No configuration template known for controlnet model with base={config.base}")
+
        if isinstance(config, ControlNetCheckpointConfig):
            return ControlNetModel.from_single_file(
                config.path,
+                config=Path(conf_file_cache.__path__[0], config_dir).as_posix(),
+                local_files_only=True,
                torch_dtype=self._torch_dtype,
            )
        else:
--- a/invokeai/backend/model_manager/load/model_loaders/flux.py
+++ b/invokeai/backend/model_manager/load/model_loaders/flux.py
@ -1,234 +0,0 @@
-# Copyright (c) 2024, Brandon W. Rising and the InvokeAI Development Team
-"""Class for Flux model loading in InvokeAI."""
-
-from pathlib import Path
-from typing import Optional
-
-import accelerate
-import torch
-from safetensors.torch import load_file
-from transformers import AutoConfig, AutoModelForTextEncoding, CLIPTextModel, CLIPTokenizer, T5EncoderModel, T5Tokenizer
-
-from invokeai.app.services.config.config_default import get_config
-from invokeai.backend.flux.model import Flux
-from invokeai.backend.flux.modules.autoencoder import AutoEncoder
-from invokeai.backend.flux.util import ae_params, params
-from invokeai.backend.model_manager import (
-    AnyModel,
-    AnyModelConfig,
-    BaseModelType,
-    ModelFormat,
-    ModelType,
-    SubModelType,
-)
-from invokeai.backend.model_manager.config import (
-    CheckpointConfigBase,
-    CLIPEmbedDiffusersConfig,
-    MainBnbQuantized4bCheckpointConfig,
-    MainCheckpointConfig,
-    T5EncoderBnbQuantizedLlmInt8bConfig,
-    T5EncoderConfig,
-    VAECheckpointConfig,
-)
-from invokeai.backend.model_manager.load.load_default import ModelLoader
-from invokeai.backend.model_manager.load.model_loader_registry import ModelLoaderRegistry
-from invokeai.backend.util.silence_warnings import SilenceWarnings
-
-try:
-    from invokeai.backend.quantization.bnb_llm_int8 import quantize_model_llm_int8
-    from invokeai.backend.quantization.bnb_nf4 import quantize_model_nf4
-
-    bnb_available = True
-except ImportError:
-    bnb_available = False
-
-app_config = get_config()
-
-
-@ModelLoaderRegistry.register(base=BaseModelType.Flux, type=ModelType.VAE, format=ModelFormat.Checkpoint)
-class FluxVAELoader(ModelLoader):
-    """Class to load VAE models."""
-
-    def _load_model(
-        self,
-        config: AnyModelConfig,
-        submodel_type: Optional[SubModelType] = None,
-    ) -> AnyModel:
-        if not isinstance(config, VAECheckpointConfig):
-            raise ValueError("Only VAECheckpointConfig models are currently supported here.")
-        model_path = Path(config.path)
-
-        with SilenceWarnings():
-            model = AutoEncoder(ae_params[config.config_path])
-            sd = load_file(model_path)
-            model.load_state_dict(sd, assign=True)
-            model.to(dtype=self._torch_dtype)
-
-        return model
-
-
-@ModelLoaderRegistry.register(base=BaseModelType.Any, type=ModelType.CLIPEmbed, format=ModelFormat.Diffusers)
-class ClipCheckpointModel(ModelLoader):
-    """Class to load main models."""
-
-    def _load_model(
-        self,
-        config: AnyModelConfig,
-        submodel_type: Optional[SubModelType] = None,
-    ) -> AnyModel:
-        if not isinstance(config, CLIPEmbedDiffusersConfig):
-            raise ValueError("Only CLIPEmbedDiffusersConfig models are currently supported here.")
-
-        match submodel_type:
-            case SubModelType.Tokenizer:
-                return CLIPTokenizer.from_pretrained(Path(config.path) / "tokenizer")
-            case SubModelType.TextEncoder:
-                return CLIPTextModel.from_pretrained(Path(config.path) / "text_encoder")
-
-        raise ValueError(
-            f"Only Tokenizer and TextEncoder submodels are currently supported. Received: {submodel_type.value if submodel_type else 'None'}"
-        )
-
-
-@ModelLoaderRegistry.register(base=BaseModelType.Any, type=ModelType.T5Encoder, format=ModelFormat.BnbQuantizedLlmInt8b)
-class BnbQuantizedLlmInt8bCheckpointModel(ModelLoader):
-    """Class to load main models."""
-
-    def _load_model(
-        self,
-        config: AnyModelConfig,
-        submodel_type: Optional[SubModelType] = None,
-    ) -> AnyModel:
-        if not isinstance(config, T5EncoderBnbQuantizedLlmInt8bConfig):
-            raise ValueError("Only T5EncoderBnbQuantizedLlmInt8bConfig models are currently supported here.")
-        if not bnb_available:
-            raise ImportError(
-                "The bnb modules are not available. Please install bitsandbytes if available on your platform."
-            )
-        match submodel_type:
-            case SubModelType.Tokenizer2:
-                return T5Tokenizer.from_pretrained(Path(config.path) / "tokenizer_2", max_length=512)
-            case SubModelType.TextEncoder2:
-                te2_model_path = Path(config.path) / "text_encoder_2"
-                model_config = AutoConfig.from_pretrained(te2_model_path)
-                with accelerate.init_empty_weights():
-                    model = AutoModelForTextEncoding.from_config(model_config)
-                    model = quantize_model_llm_int8(model, modules_to_not_convert=set())
-
-                state_dict_path = te2_model_path / "bnb_llm_int8_model.safetensors"
-                state_dict = load_file(state_dict_path)
-                self._load_state_dict_into_t5(model, state_dict)
-
-                return model
-
-        raise ValueError(
-            f"Only Tokenizer and TextEncoder submodels are currently supported. Received: {submodel_type.value if submodel_type else 'None'}"
-        )
-
-    @classmethod
-    def _load_state_dict_into_t5(cls, model: T5EncoderModel, state_dict: dict[str, torch.Tensor]):
-        # There is a shared reference to a single weight tensor in the model.
-        # Both "encoder.embed_tokens.weight" and "shared.weight" refer to the same tensor, so only the latter should
-        # be present in the state_dict.
-        missing_keys, unexpected_keys = model.load_state_dict(state_dict, strict=False, assign=True)
-        assert len(unexpected_keys) == 0
-        assert set(missing_keys) == {"encoder.embed_tokens.weight"}
-        # Assert that the layers we expect to be shared are actually shared.
-        assert model.encoder.embed_tokens.weight is model.shared.weight
-
-
-@ModelLoaderRegistry.register(base=BaseModelType.Any, type=ModelType.T5Encoder, format=ModelFormat.T5Encoder)
-class T5EncoderCheckpointModel(ModelLoader):
-    """Class to load main models."""
-
-    def _load_model(
-        self,
-        config: AnyModelConfig,
-        submodel_type: Optional[SubModelType] = None,
-    ) -> AnyModel:
-        if not isinstance(config, T5EncoderConfig):
-            raise ValueError("Only T5EncoderConfig models are currently supported here.")
-
-        match submodel_type:
-            case SubModelType.Tokenizer2:
-                return T5Tokenizer.from_pretrained(Path(config.path) / "tokenizer_2", max_length=512)
-            case SubModelType.TextEncoder2:
-                return T5EncoderModel.from_pretrained(Path(config.path) / "text_encoder_2")
-
-        raise ValueError(
-            f"Only Tokenizer and TextEncoder submodels are currently supported. Received: {submodel_type.value if submodel_type else 'None'}"
-        )
-
-
-@ModelLoaderRegistry.register(base=BaseModelType.Flux, type=ModelType.Main, format=ModelFormat.Checkpoint)
-class FluxCheckpointModel(ModelLoader):
-    """Class to load main models."""
-
-    def _load_model(
-        self,
-        config: AnyModelConfig,
-        submodel_type: Optional[SubModelType] = None,
-    ) -> AnyModel:
-        if not isinstance(config, CheckpointConfigBase):
-            raise ValueError("Only CheckpointConfigBase models are currently supported here.")
-
-        match submodel_type:
-            case SubModelType.Transformer:
-                return self._load_from_singlefile(config)
-
-        raise ValueError(
-            f"Only Transformer submodels are currently supported. Received: {submodel_type.value if submodel_type else 'None'}"
-        )
-
-    def _load_from_singlefile(
-        self,
-        config: AnyModelConfig,
-    ) -> AnyModel:
-        assert isinstance(config, MainCheckpointConfig)
-        model_path = Path(config.path)
-
-        with SilenceWarnings():
-            model = Flux(params[config.config_path])
-            sd = load_file(model_path)
-            model.load_state_dict(sd, assign=True)
-        return model
-
-
-@ModelLoaderRegistry.register(base=BaseModelType.Flux, type=ModelType.Main, format=ModelFormat.BnbQuantizednf4b)
-class FluxBnbQuantizednf4bCheckpointModel(ModelLoader):
-    """Class to load main models."""
-
-    def _load_model(
-        self,
-        config: AnyModelConfig,
-        submodel_type: Optional[SubModelType] = None,
-    ) -> AnyModel:
-        if not isinstance(config, CheckpointConfigBase):
-            raise ValueError("Only CheckpointConfigBase models are currently supported here.")
-
-        match submodel_type:
-            case SubModelType.Transformer:
-                return self._load_from_singlefile(config)
-
-        raise ValueError(
-            f"Only Transformer submodels are currently supported. Received: {submodel_type.value if submodel_type else 'None'}"
-        )
-
-    def _load_from_singlefile(
-        self,
-        config: AnyModelConfig,
-    ) -> AnyModel:
-        assert isinstance(config, MainBnbQuantized4bCheckpointConfig)
-        if not bnb_available:
-            raise ImportError(
-                "The bnb modules are not available. Please install bitsandbytes if available on your platform."
-            )
-        model_path = Path(config.path)
-
-        with SilenceWarnings():
-            with accelerate.init_empty_weights():
-                model = Flux(params[config.config_path])
-                model = quantize_model_nf4(model, modules_to_not_convert=set(), compute_dtype=torch.bfloat16)
-            sd = load_file(model_path)
-            model.load_state_dict(sd, assign=True)
-        return model
--- a/invokeai/backend/model_manager/load/model_loaders/generic_diffusers.py
+++ b/invokeai/backend/model_manager/load/model_loaders/generic_diffusers.py
@ -78,12 +78,7 @@ class GenericDiffusersLoader(ModelLoader):

    # TO DO: Add exception handling
    def _hf_definition_to_type(self, module: str, class_name: str) -> ModelMixin:  # fix with correct type
-        if module in [
-            "diffusers",
-            "transformers",
-            "invokeai.backend.quantization.fast_quantized_transformers_model",
-            "invokeai.backend.quantization.fast_quantized_diffusion_model",
-        ]:
+        if module in ["diffusers", "transformers"]:
            res_type = sys.modules[module]
        else:
            res_type = sys.modules["diffusers"].pipelines
--- a/invokeai/backend/model_manager/load/model_loaders/stable_diffusion.py
+++ b/invokeai/backend/model_manager/load/model_loaders/stable_diffusion.py
@ -11,6 +11,7 @@ from diffusers import (
    StableDiffusionXLPipeline,
 )

+import invokeai.backend.assets.sd_base_conf_files as conf_file_cache
 from invokeai.backend.model_manager import (
    AnyModel,
    AnyModelConfig,
@ -18,6 +19,7 @@ from invokeai.backend.model_manager import (
    ModelFormat,
    ModelType,
    ModelVariantType,
+    SchedulerPredictionType,
    SubModelType,
 )
 from invokeai.backend.model_manager.config import (
@ -36,18 +38,8 @@ VARIANT_TO_IN_CHANNEL_MAP = {
 }


-@ModelLoaderRegistry.register(base=BaseModelType.StableDiffusion1, type=ModelType.Main, format=ModelFormat.Diffusers)
-@ModelLoaderRegistry.register(base=BaseModelType.StableDiffusion2, type=ModelType.Main, format=ModelFormat.Diffusers)
-@ModelLoaderRegistry.register(base=BaseModelType.StableDiffusionXL, type=ModelType.Main, format=ModelFormat.Diffusers)
-@ModelLoaderRegistry.register(
-    base=BaseModelType.StableDiffusionXLRefiner, type=ModelType.Main, format=ModelFormat.Diffusers
-)
-@ModelLoaderRegistry.register(base=BaseModelType.StableDiffusion1, type=ModelType.Main, format=ModelFormat.Checkpoint)
-@ModelLoaderRegistry.register(base=BaseModelType.StableDiffusion2, type=ModelType.Main, format=ModelFormat.Checkpoint)
-@ModelLoaderRegistry.register(base=BaseModelType.StableDiffusionXL, type=ModelType.Main, format=ModelFormat.Checkpoint)
-@ModelLoaderRegistry.register(
-    base=BaseModelType.StableDiffusionXLRefiner, type=ModelType.Main, format=ModelFormat.Checkpoint
-)
+@ModelLoaderRegistry.register(base=BaseModelType.Any, type=ModelType.Main, format=ModelFormat.Diffusers)
+@ModelLoaderRegistry.register(base=BaseModelType.Any, type=ModelType.Main, format=ModelFormat.Checkpoint)
 class StableDiffusionDiffusersModel(GenericDiffusersLoader):
    """Class to load main models."""

@ -112,13 +104,34 @@ class StableDiffusionDiffusersModel(GenericDiffusersLoader):
                ModelVariantType.Normal: StableDiffusionXLPipeline,
            },
        }
+        config_dirs = {
+            BaseModelType.StableDiffusion1: {
+                SchedulerPredictionType.Epsilon: "stable-diffusion-1.5-epsilon",
+                SchedulerPredictionType.VPrediction: "stable-diffusion-1.5-v_prediction",
+            },
+            BaseModelType.StableDiffusion2: {
+                SchedulerPredictionType.VPrediction: "stable-diffusion-2.0-v_prediction",
+            },
+            BaseModelType.StableDiffusionXL: {
+                SchedulerPredictionType.Epsilon: "stable-diffusion-xl-base-1.0",
+            },
+            BaseModelType.StableDiffusionXLRefiner: {
+                SchedulerPredictionType.Epsilon: "stable-diffusion-xl-refiner-1.0",
+            },
+        }
+
        assert isinstance(config, MainCheckpointConfig)
        try:
            load_class = load_classes[config.base][config.variant]
        except KeyError as e:
            raise Exception(f"No diffusers pipeline known for base={config.base}, variant={config.variant}") from e
-        prediction_type = config.prediction_type.value
-        upcast_attention = config.upcast_attention
+
+        try:
+            config_dir = config_dirs[config.base][config.prediction_type]
+        except KeyError as e:
+            raise Exception(
+                f"No configuration template known for base={config.base}, prediction_type={config.prediction_type}"
+            ) from e

        # Without SilenceWarnings we get log messages like this:
        # site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
@ -128,13 +141,16 @@ class StableDiffusionDiffusersModel(GenericDiffusersLoader):
        # Some weights of the model checkpoint were not used when initializing CLIPTextModelWithProjection:
        # ['text_model.embeddings.position_ids']

+        original_config_file = self._app_config.legacy_conf_path / config.config_path
+
        with SilenceWarnings():
            pipeline = load_class.from_single_file(
                config.path,
+                config=Path(conf_file_cache.__path__[0], config_dir).as_posix(),
+                original_config=original_config_file,
                torch_dtype=self._torch_dtype,
-                prediction_type=prediction_type,
-                upcast_attention=upcast_attention,
-                load_safety_checker=False,
+                local_files_only=True,
+                kwargs={"load_safety_checker": False},
            )

        if not submodel_type:
--- a/invokeai/backend/model_manager/load/model_util.py
+++ b/invokeai/backend/model_manager/load/model_util.py
@ -9,7 +9,7 @@ from typing import Optional
 import torch
 from diffusers.pipelines.pipeline_utils import DiffusionPipeline
 from diffusers.schedulers.scheduling_utils import SchedulerMixin
-from transformers import CLIPTokenizer, T5Tokenizer, T5TokenizerFast
+from transformers import CLIPTokenizer

 from invokeai.backend.image_util.depth_anything.depth_anything_pipeline import DepthAnythingPipeline
 from invokeai.backend.image_util.grounding_dino.grounding_dino_pipeline import GroundingDinoPipeline
@ -50,17 +50,6 @@ def calc_model_size_by_data(logger: logging.Logger, model: AnyModel) -> int:
        ),
    ):
        return model.calc_size()
-    elif isinstance(
-        model,
-        (
-            T5TokenizerFast,
-            T5Tokenizer,
-        ),
-    ):
-        # HACK(ryand): len(model) just returns the vocabulary size, so this is blatantly wrong. It should be small
-        # relative to the text encoder that it's used with, so shouldn't matter too much, but we should fix this at some
-        # point.
-        return len(model)
    else:
        # TODO(ryand): Promote this from a log to an exception once we are confident that we are handling all of the
        # supported model types.
--- a/invokeai/backend/model_manager/probe.py
+++ b/invokeai/backend/model_manager/probe.py
@ -95,7 +95,6 @@ class ModelProbe(object):
    }

    CLASS2TYPE = {
-        "FluxPipeline": ModelType.Main,
        "StableDiffusionPipeline": ModelType.Main,
        "StableDiffusionInpaintPipeline": ModelType.Main,
        "StableDiffusionXLPipeline": ModelType.Main,
@ -107,7 +106,6 @@ class ModelProbe(object):
        "ControlNetModel": ModelType.ControlNet,
        "CLIPVisionModelWithProjection": ModelType.CLIPVision,
        "T2IAdapter": ModelType.T2IAdapter,
-        "CLIPModel": ModelType.CLIPEmbed,
    }

    @classmethod
@ -163,7 +161,7 @@ class ModelProbe(object):
        fields["description"] = (
            fields.get("description") or f"{fields['base'].value} {model_type.value} model {fields['name']}"
        )
-        fields["format"] = ModelFormat(fields.get("format")) if "format" in fields else probe.get_format()
+        fields["format"] = fields.get("format") or probe.get_format()
        fields["hash"] = fields.get("hash") or ModelHash(algorithm=hash_algo).hash(model_path)

        fields["default_settings"] = fields.get("default_settings")
@ -178,10 +176,10 @@ class ModelProbe(object):
            fields["repo_variant"] = fields.get("repo_variant") or probe.get_repo_variant()

        # additional fields needed for main and controlnet models
-        if fields["type"] in [ModelType.Main, ModelType.ControlNet, ModelType.VAE] and fields["format"] in [
-            ModelFormat.Checkpoint,
-            ModelFormat.BnbQuantizednf4b,
-        ]:
+        if (
+            fields["type"] in [ModelType.Main, ModelType.ControlNet, ModelType.VAE]
+            and fields["format"] is ModelFormat.Checkpoint
+        ):
            ckpt_config_path = cls._get_checkpoint_config_path(
                model_path,
                model_type=fields["type"],
@ -224,8 +222,7 @@ class ModelProbe(object):
        ckpt = ckpt.get("state_dict", ckpt)

        for key in [str(k) for k in ckpt.keys()]:
-            if key.startswith(("cond_stage_model.", "first_stage_model.", "model.diffusion_model.", "double_blocks.")):
-                # Keys starting with double_blocks are associated with Flux models
+            if key.startswith(("cond_stage_model.", "first_stage_model.", "model.diffusion_model.")):
                return ModelType.Main
            elif key.startswith(("encoder.conv_in", "decoder.conv_in")):
                return ModelType.VAE
@ -324,27 +321,10 @@ class ModelProbe(object):
            return possible_conf.absolute()

        if model_type is ModelType.Main:
-            if base_type == BaseModelType.Flux:
-                # TODO: Decide between dev/schnell
-                checkpoint = ModelProbe._scan_and_load_checkpoint(model_path)
-                state_dict = checkpoint.get("state_dict") or checkpoint
-                if "guidance_in.out_layer.weight" in state_dict:
-                    # For flux, this is a key in invokeai.backend.flux.util.params
-                    #   Due to model type and format being the descriminator for model configs this
-                    #   is used rather than attempting to support flux with separate model types and format
-                    #   If changed in the future, please fix me
-                    config_file = "flux-dev"
-                else:
-                    # For flux, this is a key in invokeai.backend.flux.util.params
-                    #   Due to model type and format being the descriminator for model configs this
-                    #   is used rather than attempting to support flux with separate model types and format
-                    #   If changed in the future, please fix me
-                    config_file = "flux-schnell"
-            else:
-                config_file = LEGACY_CONFIGS[base_type][variant_type]
-                if isinstance(config_file, dict):  # need another tier for sd-2.x models
-                    config_file = config_file[prediction_type]
-                config_file = f"stable-diffusion/{config_file}"
+            config_file = LEGACY_CONFIGS[base_type][variant_type]
+            if isinstance(config_file, dict):  # need another tier for sd-2.x models
+                config_file = config_file[prediction_type]
+            config_file = f"stable-diffusion/{config_file}"
        elif model_type is ModelType.ControlNet:
            config_file = (
                "controlnet/cldm_v15.yaml"
@ -353,13 +333,7 @@ class ModelProbe(object):
            )
        elif model_type is ModelType.VAE:
            config_file = (
-                # For flux, this is a key in invokeai.backend.flux.util.ae_params
-                #   Due to model type and format being the descriminator for model configs this
-                #   is used rather than attempting to support flux with separate model types and format
-                #   If changed in the future, please fix me
-                "flux"
-                if base_type is BaseModelType.Flux
-                else "stable-diffusion/v1-inference.yaml"
+                "stable-diffusion/v1-inference.yaml"
                if base_type is BaseModelType.StableDiffusion1
                else "stable-diffusion/sd_xl_base.yaml"
                if base_type is BaseModelType.StableDiffusionXL
@ -442,15 +416,11 @@ class CheckpointProbeBase(ProbeBase):
        self.checkpoint = ModelProbe._scan_and_load_checkpoint(model_path)

    def get_format(self) -> ModelFormat:
-        state_dict = self.checkpoint.get("state_dict") or self.checkpoint
-        if "double_blocks.0.img_attn.proj.weight.quant_state.bitsandbytes__nf4" in state_dict:
-            return ModelFormat.BnbQuantizednf4b
        return ModelFormat("checkpoint")

    def get_variant_type(self) -> ModelVariantType:
        model_type = ModelProbe.get_model_type_from_checkpoint(self.model_path, self.checkpoint)
-        base_type = self.get_base_type()
-        if model_type != ModelType.Main or base_type == BaseModelType.Flux:
+        if model_type != ModelType.Main:
            return ModelVariantType.Normal
        state_dict = self.checkpoint.get("state_dict") or self.checkpoint
        in_channels = state_dict["model.diffusion_model.input_blocks.0.0.weight"].shape[1]
@ -470,8 +440,6 @@ class PipelineCheckpointProbe(CheckpointProbeBase):
    def get_base_type(self) -> BaseModelType:
        checkpoint = self.checkpoint
        state_dict = self.checkpoint.get("state_dict") or checkpoint
-        if "double_blocks.0.img_attn.norm.key_norm.scale" in state_dict:
-            return BaseModelType.Flux
        key_name = "model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_k.weight"
        if key_name in state_dict and state_dict[key_name].shape[-1] == 768:
            return BaseModelType.StableDiffusion1
@ -514,7 +482,6 @@ class VaeCheckpointProbe(CheckpointProbeBase):
            (r"xl", BaseModelType.StableDiffusionXL),
            (r"sd2", BaseModelType.StableDiffusion2),
            (r"vae", BaseModelType.StableDiffusion1),
-            (r"FLUX.1-schnell_ae", BaseModelType.Flux),
        ]:
            if re.search(regexp, self.model_path.name, re.IGNORECASE):
                return basetype
@ -746,11 +713,6 @@ class TextualInversionFolderProbe(FolderProbeBase):
        return TextualInversionCheckpointProbe(path).get_base_type()


-class T5EncoderFolderProbe(FolderProbeBase):
-    def get_format(self) -> ModelFormat:
-        return ModelFormat.T5Encoder
-
-
 class ONNXFolderProbe(PipelineFolderProbe):
    def get_base_type(self) -> BaseModelType:
        # Due to the way the installer is set up, the configuration file for safetensors
@ -843,11 +805,6 @@ class CLIPVisionFolderProbe(FolderProbeBase):
        return BaseModelType.Any


-class CLIPEmbedFolderProbe(FolderProbeBase):
-    def get_base_type(self) -> BaseModelType:
-        return BaseModelType.Any
-
-
 class SpandrelImageToImageFolderProbe(FolderProbeBase):
    def get_base_type(self) -> BaseModelType:
        raise NotImplementedError()
@ -878,10 +835,8 @@ ModelProbe.register_probe("diffusers", ModelType.Main, PipelineFolderProbe)
 ModelProbe.register_probe("diffusers", ModelType.VAE, VaeFolderProbe)
 ModelProbe.register_probe("diffusers", ModelType.LoRA, LoRAFolderProbe)
 ModelProbe.register_probe("diffusers", ModelType.TextualInversion, TextualInversionFolderProbe)
-ModelProbe.register_probe("diffusers", ModelType.T5Encoder, T5EncoderFolderProbe)
 ModelProbe.register_probe("diffusers", ModelType.ControlNet, ControlNetFolderProbe)
 ModelProbe.register_probe("diffusers", ModelType.IPAdapter, IPAdapterFolderProbe)
-ModelProbe.register_probe("diffusers", ModelType.CLIPEmbed, CLIPEmbedFolderProbe)
 ModelProbe.register_probe("diffusers", ModelType.CLIPVision, CLIPVisionFolderProbe)
 ModelProbe.register_probe("diffusers", ModelType.T2IAdapter, T2IAdapterFolderProbe)
 ModelProbe.register_probe("diffusers", ModelType.SpandrelImageToImage, SpandrelImageToImageFolderProbe)
--- a/invokeai/backend/model_manager/starter_models.py
+++ b/invokeai/backend/model_manager/starter_models.py
@ -2,7 +2,7 @@ from typing import Optional

 from pydantic import BaseModel

-from invokeai.backend.model_manager.config import BaseModelType, ModelFormat, ModelType
+from invokeai.backend.model_manager.config import BaseModelType, ModelType


 class StarterModelWithoutDependencies(BaseModel):
@ -11,7 +11,6 @@ class StarterModelWithoutDependencies(BaseModel):
    name: str
    base: BaseModelType
    type: ModelType
-    format: Optional[ModelFormat] = None
    is_installed: bool = False


@ -52,76 +51,10 @@ cyberrealistic_negative = StarterModel(
    type=ModelType.TextualInversion,
 )

-t5_base_encoder = StarterModel(
-    name="t5_base_encoder",
-    base=BaseModelType.Any,
-    source="InvokeAI/t5-v1_1-xxl::bfloat16",
-    description="T5-XXL text encoder (used in FLUX pipelines). ~8GB",
-    type=ModelType.T5Encoder,
-)
-
-t5_8b_quantized_encoder = StarterModel(
-    name="t5_bnb_int8_quantized_encoder",
-    base=BaseModelType.Any,
-    source="InvokeAI/t5-v1_1-xxl::bnb_llm_int8",
-    description="T5-XXL text encoder with bitsandbytes LLM.int8() quantization (used in FLUX pipelines). ~5GB",
-    type=ModelType.T5Encoder,
-    format=ModelFormat.BnbQuantizedLlmInt8b,
-)
-
-clip_l_encoder = StarterModel(
-    name="clip-vit-large-patch14",
-    base=BaseModelType.Any,
-    source="InvokeAI/clip-vit-large-patch14-text-encoder::bfloat16",
-    description="CLIP-L text encoder (used in FLUX pipelines). ~250MB",
-    type=ModelType.CLIPEmbed,
-)
-
-flux_vae = StarterModel(
-    name="FLUX.1-schnell_ae",
-    base=BaseModelType.Flux,
-    source="black-forest-labs/FLUX.1-schnell::ae.safetensors",
-    description="FLUX VAE compatible with both schnell and dev variants.",
-    type=ModelType.VAE,
-)
-
-
 # List of starter models, displayed on the frontend.
 # The order/sort of this list is not changed by the frontend - set it how you want it here.
 STARTER_MODELS: list[StarterModel] = [
    # region: Main
-    StarterModel(
-        name="FLUX Schnell (Quantized)",
-        base=BaseModelType.Flux,
-        source="InvokeAI/flux_schnell::transformer/bnb_nf4/flux1-schnell-bnb_nf4.safetensors",
-        description="FLUX schnell transformer quantized to bitsandbytes NF4 format. Total size with dependencies: ~12GB",
-        type=ModelType.Main,
-        dependencies=[t5_8b_quantized_encoder, flux_vae, clip_l_encoder],
-    ),
-    StarterModel(
-        name="FLUX Dev (Quantized)",
-        base=BaseModelType.Flux,
-        source="InvokeAI/flux_dev::transformer/bnb_nf4/flux1-dev-bnb_nf4.safetensors",
-        description="FLUX dev transformer quantized to bitsandbytes NF4 format. Total size with dependencies: ~12GB",
-        type=ModelType.Main,
-        dependencies=[t5_8b_quantized_encoder, flux_vae, clip_l_encoder],
-    ),
-    StarterModel(
-        name="FLUX Schnell",
-        base=BaseModelType.Flux,
-        source="InvokeAI/flux_schnell::transformer/base/flux1-schnell.safetensors",
-        description="FLUX schnell transformer in bfloat16. Total size with dependencies: ~33GB",
-        type=ModelType.Main,
-        dependencies=[t5_base_encoder, flux_vae, clip_l_encoder],
-    ),
-    StarterModel(
-        name="FLUX Dev",
-        base=BaseModelType.Flux,
-        source="InvokeAI/flux_dev::transformer/base/flux1-dev.safetensors",
-        description="FLUX dev transformer in bfloat16. Total size with dependencies: ~33GB",
-        type=ModelType.Main,
-        dependencies=[t5_base_encoder, flux_vae, clip_l_encoder],
-    ),
    StarterModel(
        name="CyberRealistic v4.1",
        base=BaseModelType.StableDiffusion1,
@ -192,7 +125,6 @@ STARTER_MODELS: list[StarterModel] = [
    # endregion
    # region VAE
    sdxl_fp16_vae_fix,
-    flux_vae,
    # endregion
    # region LoRA
    StarterModel(
@ -518,11 +450,6 @@ STARTER_MODELS: list[StarterModel] = [
        type=ModelType.SpandrelImageToImage,
    ),
    # endregion
-    # region TextEncoders
-    t5_base_encoder,
-    t5_8b_quantized_encoder,
-    clip_l_encoder,
-    # endregion
 ]

 assert len(STARTER_MODELS) == len({m.source for m in STARTER_MODELS}), "Duplicate starter models"
--- a/invokeai/backend/model_manager/util/select_hf_files.py
+++ b/invokeai/backend/model_manager/util/select_hf_files.py
@ -54,7 +54,6 @@ def filter_files(
                "lora_weights.safetensors",
                "weights.pb",
                "onnx_data",
-                "spiece.model",  # Added for `black-forest-labs/FLUX.1-schnell`.
            )
        ):
            paths.append(file)
@ -63,13 +62,13 @@ def filter_files(
        # downloading random checkpoints that might also be in the repo. However there is no guarantee
        # that a checkpoint doesn't contain "model" in its name, and no guarantee that future diffusers models
        # will adhere to this naming convention, so this is an area to be careful of.
-        elif re.search(r"model.*\.(safetensors|bin|onnx|xml|pth|pt|ckpt|msgpack)$", file.name):
+        elif re.search(r"model(\.[^.]+)?\.(safetensors|bin|onnx|xml|pth|pt|ckpt|msgpack)$", file.name):
            paths.append(file)

    # limit search to subfolder if requested
    if subfolder:
        subfolder = root / subfolder
-        paths = [x for x in paths if Path(subfolder) in x.parents]
+        paths = [x for x in paths if x.parent == Path(subfolder)]

    # _filter_by_variant uniquifies the paths and returns a set
    return sorted(_filter_by_variant(paths, variant))
@ -98,9 +97,7 @@ def _filter_by_variant(files: List[Path], variant: ModelRepoVariant) -> Set[Path
            if variant == ModelRepoVariant.Flax:
                result.add(path)

-        # Note: '.model' was added to support:
-        # https://huggingface.co/black-forest-labs/FLUX.1-schnell/blob/768d12a373ed5cc9ef9a9dea7504dc09fcc14842/tokenizer_2/spiece.model
-        elif path.suffix in [".json", ".txt", ".model"]:
+        elif path.suffix in [".json", ".txt"]:
            result.add(path)

        elif variant in [
@ -143,23 +140,6 @@ def _filter_by_variant(files: List[Path], variant: ModelRepoVariant) -> Set[Path
            continue

    for candidate_list in subfolder_weights.values():
-        # Check if at least one of the files has the explicit fp16 variant.
-        at_least_one_fp16 = False
-        for candidate in candidate_list:
-            if len(candidate.path.suffixes) == 2 and candidate.path.suffixes[0] == ".fp16":
-                at_least_one_fp16 = True
-                break
-
-        if not at_least_one_fp16:
-            # If none of the candidates in this candidate_list have the explicit fp16 variant label, then this
-            # candidate_list probably doesn't adhere to the variant naming convention that we expected. In this case,
-            # we'll simply keep all the candidates. An example of a model that hits this case is
-            # `black-forest-labs/FLUX.1-schnell` (as of commit 012d2fd).
-            for candidate in candidate_list:
-                result.add(candidate.path)
-
-        # The candidate_list seems to have the expected variant naming convention. We'll select the highest scoring
-        # candidate.
        highest_score_candidate = max(candidate_list, key=lambda candidate: candidate.score)
        if highest_score_candidate:
            result.add(highest_score_candidate.path)
--- a/invokeai/backend/quantization/init.py
+++ b/invokeai/backend/quantization/init.py
--- a/invokeai/backend/quantization/bnb_llm_int8.py
+++ b/invokeai/backend/quantization/bnb_llm_int8.py
@ -1,125 +0,0 @@
-import bitsandbytes as bnb
-import torch
-
-# This file contains utils for working with models that use bitsandbytes LLM.int8() quantization.
-# The utils in this file are partially inspired by:
-# https://github.com/Lightning-AI/pytorch-lightning/blob/1551a16b94f5234a4a78801098f64d0732ef5cb5/src/lightning/fabric/plugins/precision/bitsandbytes.py
-
-
-# NOTE(ryand): All of the custom state_dict manipulation logic in this file is pretty hacky. This could be made much
-# cleaner by re-implementing bnb.nn.Linear8bitLt with proper use of buffers and less magic. But, for now, we try to
-# stick close to the bitsandbytes classes to make interoperability easier with other models that might use bitsandbytes.
-
-
-class InvokeInt8Params(bnb.nn.Int8Params):
-    """We override cuda() to avoid re-quantizing the weights in the following cases:
-    - We loaded quantized weights from a state_dict on the cpu, and then moved the model to the gpu.
-    - We are moving the model back-and-forth between the cpu and gpu.
-    """
-
-    def cuda(self, device):
-        if self.has_fp16_weights:
-            return super().cuda(device)
-        elif self.CB is not None and self.SCB is not None:
-            self.data = self.data.cuda()
-            self.CB = self.data
-            self.SCB = self.SCB.cuda()
-        else:
-            # we store the 8-bit rows-major weight
-            # we convert this weight to the turning/ampere weight during the first inference pass
-            B = self.data.contiguous().half().cuda(device)
-            CB, CBt, SCB, SCBt, coo_tensorB = bnb.functional.double_quant(B)
-            del CBt
-            del SCBt
-            self.data = CB
-            self.CB = CB
-            self.SCB = SCB
-
-        return self
-
-
-class InvokeLinear8bitLt(bnb.nn.Linear8bitLt):
-    def _load_from_state_dict(
-        self,
-        state_dict: dict[str, torch.Tensor],
-        prefix: str,
-        local_metadata,
-        strict,
-        missing_keys,
-        unexpected_keys,
-        error_msgs,
-    ):
-        weight = state_dict.pop(prefix + "weight")
-        bias = state_dict.pop(prefix + "bias", None)
-
-        # See `bnb.nn.Linear8bitLt._save_to_state_dict()` for the serialization logic of SCB and weight_format.
-        scb = state_dict.pop(prefix + "SCB", None)
-        # weight_format is unused, but we pop it so we can validate that there are no unexpected keys.
-        _weight_format = state_dict.pop(prefix + "weight_format", None)
-
-        # TODO(ryand): Technically, we should be using `strict`, `missing_keys`, `unexpected_keys`, and `error_msgs`
-        # rather than raising an exception to correctly implement this API.
-        assert len(state_dict) == 0
-
-        if scb is not None:
-            # We are loading a pre-quantized state dict.
-            self.weight = InvokeInt8Params(
-                data=weight,
-                requires_grad=self.weight.requires_grad,
-                has_fp16_weights=False,
-                # Note: After quantization, CB is the same as weight.
-                CB=weight,
-                SCB=scb,
-            )
-            self.bias = bias if bias is None else torch.nn.Parameter(bias)
-        else:
-            # We are loading a non-quantized state dict.
-
-            # We could simply call the `super()._load_from_state_dict()` method here, but then we wouldn't be able to
-            # load from a state_dict into a model on the "meta" device. Attempting to load into a model on the "meta"
-            # device requires setting `assign=True`, doing this with the default `super()._load_from_state_dict()`
-            # implementation causes `Params4Bit` to be replaced by a `torch.nn.Parameter`. By initializing a new
-            # `Params4bit` object, we work around this issue. It's a bit hacky, but it gets the job done.
-            self.weight = InvokeInt8Params(
-                data=weight,
-                requires_grad=self.weight.requires_grad,
-                has_fp16_weights=False,
-                CB=None,
-                SCB=None,
-            )
-            self.bias = bias if bias is None else torch.nn.Parameter(bias)
-
-
-def _convert_linear_layers_to_llm_8bit(
-    module: torch.nn.Module, ignore_modules: set[str], outlier_threshold: float, prefix: str = ""
-) -> None:
-    """Convert all linear layers in the module to bnb.nn.Linear8bitLt layers."""
-    for name, child in module.named_children():
-        fullname = f"{prefix}.{name}" if prefix else name
-        if isinstance(child, torch.nn.Linear) and not any(fullname.startswith(s) for s in ignore_modules):
-            has_bias = child.bias is not None
-            replacement = InvokeLinear8bitLt(
-                child.in_features,
-                child.out_features,
-                bias=has_bias,
-                has_fp16_weights=False,
-                threshold=outlier_threshold,
-            )
-            replacement.weight.data = child.weight.data
-            if has_bias:
-                replacement.bias.data = child.bias.data
-            replacement.requires_grad_(False)
-            module.__setattr__(name, replacement)
-        else:
-            _convert_linear_layers_to_llm_8bit(
-                child, ignore_modules, outlier_threshold=outlier_threshold, prefix=fullname
-            )
-
-
-def quantize_model_llm_int8(model: torch.nn.Module, modules_to_not_convert: set[str], outlier_threshold: float = 6.0):
-    """Apply bitsandbytes LLM.8bit() quantization to the model."""
-    _convert_linear_layers_to_llm_8bit(
-        module=model, ignore_modules=modules_to_not_convert, outlier_threshold=outlier_threshold
-    )
-
-    return model
--- a/invokeai/backend/quantization/bnb_nf4.py
+++ b/invokeai/backend/quantization/bnb_nf4.py
@ -1,156 +0,0 @@
-import bitsandbytes as bnb
-import torch
-
-# This file contains utils for working with models that use bitsandbytes NF4 quantization.
-# The utils in this file are partially inspired by:
-# https://github.com/Lightning-AI/pytorch-lightning/blob/1551a16b94f5234a4a78801098f64d0732ef5cb5/src/lightning/fabric/plugins/precision/bitsandbytes.py
-
-# NOTE(ryand): All of the custom state_dict manipulation logic in this file is pretty hacky. This could be made much
-# cleaner by re-implementing bnb.nn.LinearNF4 with proper use of buffers and less magic. But, for now, we try to stick
-# close to the bitsandbytes classes to make interoperability easier with other models that might use bitsandbytes.
-
-
-class InvokeLinearNF4(bnb.nn.LinearNF4):
-    """A class that extends `bnb.nn.LinearNF4` to add the following functionality:
-    - Ability to load Linear NF4 layers from a pre-quantized state_dict.
-    - Ability to load Linear NF4 layers from a state_dict when the model is on the "meta" device.
-    """
-
-    def _load_from_state_dict(
-        self,
-        state_dict: dict[str, torch.Tensor],
-        prefix: str,
-        local_metadata,
-        strict,
-        missing_keys,
-        unexpected_keys,
-        error_msgs,
-    ):
-        """This method is based on the logic in the bitsandbytes serialization unit tests for `Linear4bit`:
-        https://github.com/bitsandbytes-foundation/bitsandbytes/blob/6d714a5cce3db5bd7f577bc447becc7a92d5ccc7/tests/test_linear4bit.py#L52-L71
-        """
-        weight = state_dict.pop(prefix + "weight")
-        bias = state_dict.pop(prefix + "bias", None)
-        # We expect the remaining keys to be quant_state keys.
-        quant_state_sd = state_dict
-
-        # During serialization, the quant_state is stored as subkeys of "weight." (See
-        # `bnb.nn.LinearNF4._save_to_state_dict()`). We validate that they at least have the correct prefix.
-        # TODO(ryand): Technically, we should be using `strict`, `missing_keys`, `unexpected_keys`, and `error_msgs`
-        # rather than raising an exception to correctly implement this API.
-        assert all(k.startswith(prefix + "weight.") for k in quant_state_sd.keys())
-
-        if len(quant_state_sd) > 0:
-            # We are loading a pre-quantized state dict.
-            self.weight = bnb.nn.Params4bit.from_prequantized(
-                data=weight, quantized_stats=quant_state_sd, device=weight.device
-            )
-            self.bias = bias if bias is None else torch.nn.Parameter(bias, requires_grad=False)
-        else:
-            # We are loading a non-quantized state dict.
-
-            # We could simply call the `super()._load_from_state_dict()` method here, but then we wouldn't be able to
-            # load from a state_dict into a model on the "meta" device. Attempting to load into a model on the "meta"
-            # device requires setting `assign=True`, doing this with the default `super()._load_from_state_dict()`
-            # implementation causes `Params4Bit` to be replaced by a `torch.nn.Parameter`. By initializing a new
-            # `Params4bit` object, we work around this issue. It's a bit hacky, but it gets the job done.
-            self.weight = bnb.nn.Params4bit(
-                data=weight,
-                requires_grad=self.weight.requires_grad,
-                compress_statistics=self.weight.compress_statistics,
-                quant_type=self.weight.quant_type,
-                quant_storage=self.weight.quant_storage,
-                module=self,
-            )
-            self.bias = bias if bias is None else torch.nn.Parameter(bias)
-
-
-def _replace_param(
-    param: torch.nn.Parameter | bnb.nn.Params4bit,
-    data: torch.Tensor,
-) -> torch.nn.Parameter:
-    """A helper function to replace the data of a model parameter with new data in a way that allows replacing params on
-    the "meta" device.
-
-    Supports both `torch.nn.Parameter` and `bnb.nn.Params4bit` parameters.
-    """
-    if param.device.type == "meta":
-        # Doing `param.data = data` raises a RuntimeError if param.data was on the "meta" device, so we need to
-        # re-create the param instead of overwriting the data.
-        if isinstance(param, bnb.nn.Params4bit):
-            return bnb.nn.Params4bit(
-                data,
-                requires_grad=data.requires_grad,
-                quant_state=param.quant_state,
-                compress_statistics=param.compress_statistics,
-                quant_type=param.quant_type,
-            )
-        return torch.nn.Parameter(data, requires_grad=data.requires_grad)
-
-    param.data = data
-    return param
-
-
-def _convert_linear_layers_to_nf4(
-    module: torch.nn.Module,
-    ignore_modules: set[str],
-    compute_dtype: torch.dtype,
-    compress_statistics: bool = False,
-    prefix: str = "",
-) -> None:
-    """Convert all linear layers in the model to NF4 quantized linear layers.
-
-    Args:
-        module: All linear layers in this module will be converted.
-        ignore_modules: A set of module prefixes to ignore when converting linear layers.
-        compute_dtype: The dtype to use for computation in the quantized linear layers.
-        compress_statistics: Whether to enable nested quantization (aka double quantization) where the quantization
-           constants from the first quantization are quantized again.
-        prefix: The prefix of the current module in the model. Used to call this function recursively.
-    """
-    for name, child in module.named_children():
-        fullname = f"{prefix}.{name}" if prefix else name
-        if isinstance(child, torch.nn.Linear) and not any(fullname.startswith(s) for s in ignore_modules):
-            has_bias = child.bias is not None
-            replacement = InvokeLinearNF4(
-                child.in_features,
-                child.out_features,
-                bias=has_bias,
-                compute_dtype=compute_dtype,
-                compress_statistics=compress_statistics,
-            )
-            if has_bias:
-                replacement.bias = _replace_param(replacement.bias, child.bias.data)
-            replacement.weight = _replace_param(replacement.weight, child.weight.data)
-            replacement.requires_grad_(False)
-            module.__setattr__(name, replacement)
-        else:
-            _convert_linear_layers_to_nf4(child, ignore_modules, compute_dtype=compute_dtype, prefix=fullname)
-
-
-def quantize_model_nf4(model: torch.nn.Module, modules_to_not_convert: set[str], compute_dtype: torch.dtype):
-    """Apply bitsandbytes nf4 quantization to the model.
-
-    You likely want to call this function inside a `accelerate.init_empty_weights()` context.
-
-    Example usage:
-    ```
-    # Initialize the model from a config on the meta device.
-    with accelerate.init_empty_weights():
-        model = ModelClass.from_config(...)
-
-    # Add NF4 quantization linear layers to the model - still on the meta device.
-    with accelerate.init_empty_weights():
-        model = quantize_model_nf4(model, modules_to_not_convert=set(), compute_dtype=torch.float16)
-
-    # Load a state_dict into the model. (Could be either a prequantized or non-quantized state_dict.)
-    model.load_state_dict(state_dict, strict=True, assign=True)
-
-    # Move the model to the "cuda" device. If the model was non-quantized, this is where the weight quantization takes
-    # place.
-    model.to("cuda")
-    ```
-    """
-    _convert_linear_layers_to_nf4(module=model, ignore_modules=modules_to_not_convert, compute_dtype=compute_dtype)
-
-    return model
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Lincoln Stein	4556343fa6	use correct controlnet config file	2024-08-27 11:39:34 -04:00
Lincoln Stein	5daaaa3b70	Merge remote-tracking branch 'refs/remotes/origin/lstein/feat/diffusers-v0.30' into lstein/feat/diffusers-v0.30	2024-08-17 15:58:58 -04:00
Lincoln Stein	7a9a1694a4	pass configuration templates to from_single_file() using the config option	2024-08-17 15:57:02 -04:00
Lincoln Stein	5b296d3c87	Merge branch 'main' into lstein/feat/diffusers-v0.30	2024-08-17 14:13:33 -04:00
Lincoln Stein	6af84434e0	enable offline loading of main sd-1, sd-2 and sdxl models	2024-08-17 14:06:55 -04:00
Lincoln Stein	5bde4eaa7a	renamed deprecated original_config_file argument	2024-08-13 23:14:38 -04:00
Lincoln Stein	b5ec04f10c	pass original_config_file to load_single_file()	2024-08-13 21:12:40 -04:00