feat(api): chore: pydantic & fastapi upgrade

Upgrade pydantic and fastapi to latest. - pydantic~=2.4.2 - fastapi~=103.2 - fastapi-events~=0.9.1 **Big Changes** There are a number of logic changes needed to support pydantic v2. Most changes are very simple, like using the new methods to serialized and deserialize models, but there are a few more complex changes. **Invocations** The biggest change relates to invocation creation, instantiation and validation. Because pydantic v2 moves all validation logic into the rust pydantic-core, we may no longer directly stick our fingers into the validation pie. Previously, we (ab)used models and fields to allow invocation fields to be optional at instantiation, but required when `invoke()` is called. We directly manipulated the fields and invocation models when calling `invoke()`. With pydantic v2, this is much more involved. Changes to the python wrapper do not propagate down to the rust validation logic - you have to rebuild the model. This causes problem with concurrent access to the invocation classes and is not a free operation. This logic has been totally refactored and we do not need to change the model any more. The details are in `baseinvocation.py`, in the `InputField` function and `BaseInvocation.invoke_internal()` method. In the end, this implementation is cleaner. **Invocation Fields** In pydantic v2, you can no longer directly add or remove fields from a model. Previously, we did this to add the `type` field to invocations. **Invocation Decorators** With pydantic v2, we instead use the imperative `create_model()` API to create a new model with the additional field. This is done in `baseinvocation.py` in the `invocation()` wrapper. A similar technique is used for `invocation_output()`. **Minor Changes** There are a number of minor changes around the pydantic v2 models API. **Protected `model_` Namespace** All models' pydantic-provided methods and attributes are prefixed with `model_` and this is considered a protected namespace. This causes some conflict, because "model" means something to us, and we have a ton of pydantic models with attributes starting with "model_". Forunately, there are no direct conflicts. However, in any pydantic model where we define an attribute or method that starts with "model_", we must tell set the protected namespaces to an empty tuple. ```py class IPAdapterModelField(BaseModel): model_name: str = Field(description="Name of the IP-Adapter model") base_model: BaseModelType = Field(description="Base model") model_config = ConfigDict(protected_namespaces=()) ``` **Model Serialization** Pydantic models no longer have `Model.dict()` or `Model.json()`. Instead, we use `Model.model_dump()` or `Model.model_dump_json()`. **Model Deserialization** Pydantic models no longer have `Model.parse_obj()` or `Model.parse_raw()`, and there are no `parse_raw_as()` or `parse_obj_as()` functions. Instead, you need to create a `TypeAdapter` object to parse python objects or JSON into a model. ```py adapter_graph = TypeAdapter(Graph) deserialized_graph_from_json = adapter_graph.validate_json(graph_json) deserialized_graph_from_dict = adapter_graph.validate_python(graph_dict) ``` **Field Customisation** Pydantic `Field`s no longer accept arbitrary args. Now, you must put all additional arbitrary args in a `json_schema_extra` arg on the field. **Schema Customisation** FastAPI and pydantic schema generation now follows the OpenAPI version 3.1 spec. This necessitates two changes: - Our schema customization logic has been revised - Schema parsing to build node templates has been revised The specific aren't important, but this does present additional surface area for bugs. **Performance Improvements** Pydantic v2 is a full rewrite with a rust backend. This offers a substantial performance improvement (pydantic claims 5x to 50x depending on the task). We'll notice this the most during serialization and deserialization of sessions/graphs, which happens very very often - a couple times per node. I haven't done any benchmarks, but anecdotally, graph execution is much faster. Also, very larges graphs - like with massive iterators - are much, much faster.
2024-08-30 20:32:17 +00:00 · 2023-09-24 18:11:07 +10:00
parent 19c5435332
commit c238a7f18b
74 changed files with 2788 additions and 3116 deletions
--- a/invokeai/app/invocations/latent.py
+++ b/invokeai/app/invocations/latent.py
@ -19,7 +19,7 @@ from diffusers.models.attention_processor import (
 )
 from diffusers.schedulers import DPMSolverSDEScheduler
 from diffusers.schedulers import SchedulerMixin as Scheduler
-from pydantic import validator
+from pydantic import field_validator
 from torchvision.transforms.functional import resize as tv_resize

 from invokeai.app.invocations.ip_adapter import IPAdapterField
@ -84,12 +84,20 @@ class SchedulerOutput(BaseInvocationOutput):
    scheduler: SAMPLER_NAME_VALUES = OutputField(description=FieldDescriptions.scheduler, ui_type=UIType.Scheduler)


-@invocation("scheduler", title="Scheduler", tags=["scheduler"], category="latents", version="1.0.0")
+@invocation(
+    "scheduler",
+    title="Scheduler",
+    tags=["scheduler"],
+    category="latents",
+    version="1.0.0",
+)
 class SchedulerInvocation(BaseInvocation):
    """Selects a scheduler."""

    scheduler: SAMPLER_NAME_VALUES = InputField(
-        default="euler", description=FieldDescriptions.scheduler, ui_type=UIType.Scheduler
+        default="euler",
+        description=FieldDescriptions.scheduler,
+        ui_type=UIType.Scheduler,
    )

    def invoke(self, context: InvocationContext) -> SchedulerOutput:
@ -97,7 +105,11 @@ class SchedulerInvocation(BaseInvocation):


@invocation(
-    "create_denoise_mask", title="Create Denoise Mask", tags=["mask", "denoise"], category="latents", version="1.0.0"
+    "create_denoise_mask",
+    title="Create Denoise Mask",
+    tags=["mask", "denoise"],
+    category="latents",
+    version="1.0.0",
 )
 class CreateDenoiseMaskInvocation(BaseInvocation):
    """Creates mask for denoising model run."""
@ -106,7 +118,11 @@ class CreateDenoiseMaskInvocation(BaseInvocation):
    image: Optional[ImageField] = InputField(default=None, description="Image which will be masked", ui_order=1)
    mask: ImageField = InputField(description="The mask to use when pasting", ui_order=2)
    tiled: bool = InputField(default=False, description=FieldDescriptions.tiled, ui_order=3)
-    fp32: bool = InputField(default=DEFAULT_PRECISION == "float32", description=FieldDescriptions.fp32, ui_order=4)
+    fp32: bool = InputField(
+        default=DEFAULT_PRECISION == "float32",
+        description=FieldDescriptions.fp32,
+        ui_order=4,
+    )

    def prep_mask_tensor(self, mask_image):
        if mask_image.mode != "L":
@ -134,7 +150,7 @@ class CreateDenoiseMaskInvocation(BaseInvocation):

        if image is not None:
            vae_info = context.services.model_manager.get_model(
-                **self.vae.vae.dict(),
+                **self.vae.vae.model_dump(),
                context=context,
            )

@ -167,7 +183,7 @@ def get_scheduler(
 ) -> Scheduler:
    scheduler_class, scheduler_extra_config = SCHEDULER_MAP.get(scheduler_name, SCHEDULER_MAP["ddim"])
    orig_scheduler_info = context.services.model_manager.get_model(
-        **scheduler_info.dict(),
+        **scheduler_info.model_dump(),
        context=context,
    )
    with orig_scheduler_info as orig_scheduler:
@ -209,34 +225,64 @@ class DenoiseLatentsInvocation(BaseInvocation):
    negative_conditioning: ConditioningField = InputField(
        description=FieldDescriptions.negative_cond, input=Input.Connection, ui_order=1
    )
-    noise: Optional[LatentsField] = InputField(description=FieldDescriptions.noise, input=Input.Connection, ui_order=3)
+    noise: Optional[LatentsField] = InputField(
+        default=None,
+        description=FieldDescriptions.noise,
+        input=Input.Connection,
+        ui_order=3,
+    )
    steps: int = InputField(default=10, gt=0, description=FieldDescriptions.steps)
    cfg_scale: Union[float, List[float]] = InputField(
        default=7.5, ge=1, description=FieldDescriptions.cfg_scale, title="CFG Scale"
    )
-    denoising_start: float = InputField(default=0.0, ge=0, le=1, description=FieldDescriptions.denoising_start)
+    denoising_start: float = InputField(
+        default=0.0,
+        ge=0,
+        le=1,
+        description=FieldDescriptions.denoising_start,
+    )
    denoising_end: float = InputField(default=1.0, ge=0, le=1, description=FieldDescriptions.denoising_end)
    scheduler: SAMPLER_NAME_VALUES = InputField(
-        default="euler", description=FieldDescriptions.scheduler, ui_type=UIType.Scheduler
+        default="euler",
+        description=FieldDescriptions.scheduler,
+        ui_type=UIType.Scheduler,
    )
-    unet: UNetField = InputField(description=FieldDescriptions.unet, input=Input.Connection, title="UNet", ui_order=2)
-    control: Union[ControlField, list[ControlField]] = InputField(
+    unet: UNetField = InputField(
+        description=FieldDescriptions.unet,
+        input=Input.Connection,
+        title="UNet",
+        ui_order=2,
+    )
+    control: Optional[Union[ControlField, list[ControlField]]] = InputField(
        default=None,
        input=Input.Connection,
        ui_order=5,
    )
    ip_adapter: Optional[Union[IPAdapterField, list[IPAdapterField]]] = InputField(
-        description=FieldDescriptions.ip_adapter, title="IP-Adapter", default=None, input=Input.Connection, ui_order=6
+        description=FieldDescriptions.ip_adapter,
+        title="IP-Adapter",
+        default=None,
+        input=Input.Connection,
+        ui_order=6,
    )
-    t2i_adapter: Union[T2IAdapterField, list[T2IAdapterField]] = InputField(
-        description=FieldDescriptions.t2i_adapter, title="T2I-Adapter", default=None, input=Input.Connection, ui_order=7
+    t2i_adapter: Optional[Union[T2IAdapterField, list[T2IAdapterField]]] = InputField(
+        description=FieldDescriptions.t2i_adapter,
+        title="T2I-Adapter",
+        default=None,
+        input=Input.Connection,
+        ui_order=7,
+    )
+    latents: Optional[LatentsField] = InputField(
+        default=None, description=FieldDescriptions.latents, input=Input.Connection
    )
-    latents: Optional[LatentsField] = InputField(description=FieldDescriptions.latents, input=Input.Connection)
    denoise_mask: Optional[DenoiseMaskField] = InputField(
-        default=None, description=FieldDescriptions.mask, input=Input.Connection, ui_order=8
+        default=None,
+        description=FieldDescriptions.mask,
+        input=Input.Connection,
+        ui_order=8,
    )

-    @validator("cfg_scale")
+    @field_validator("cfg_scale")
    def ge_one(cls, v):
        """validate that all cfg_scale values are >= 1"""
        if isinstance(v, list):
@ -259,7 +305,7 @@ class DenoiseLatentsInvocation(BaseInvocation):
        stable_diffusion_step_callback(
            context=context,
            intermediate_state=intermediate_state,
-            node=self.dict(),
+            node=self.model_dump(),
            source_node_id=source_node_id,
            base_model=base_model,
        )
@ -451,9 +497,10 @@ class DenoiseLatentsInvocation(BaseInvocation):
            # models are needed in memory. This would help to reduce peak memory utilization in low-memory environments.
            with image_encoder_model_info as image_encoder_model:
                # Get image embeddings from CLIP and ImageProjModel.
-                image_prompt_embeds, uncond_image_prompt_embeds = ip_adapter_model.get_image_embeds(
-                    input_image, image_encoder_model
-                )
+                (
+                    image_prompt_embeds,
+                    uncond_image_prompt_embeds,
+                ) = ip_adapter_model.get_image_embeds(input_image, image_encoder_model)
                conditioning_data.ip_adapter_conditioning.append(
                    IPAdapterConditioningInfo(image_prompt_embeds, uncond_image_prompt_embeds)
                )
@ -628,7 +675,10 @@ class DenoiseLatentsInvocation(BaseInvocation):
            # TODO(ryand): I have hard-coded `do_classifier_free_guidance=True` to mirror the behaviour of ControlNets,
            # below. Investigate whether this is appropriate.
            t2i_adapter_data = self.run_t2i_adapters(
-                context, self.t2i_adapter, latents.shape, do_classifier_free_guidance=True
+                context,
+                self.t2i_adapter,
+                latents.shape,
+                do_classifier_free_guidance=True,
            )

            # Get the source node id (we are invoking the prepared node)
@ -641,7 +691,7 @@ class DenoiseLatentsInvocation(BaseInvocation):
            def _lora_loader():
                for lora in self.unet.loras:
                    lora_info = context.services.model_manager.get_model(
-                        **lora.dict(exclude={"weight"}),
+                        **lora.model_dump(exclude={"weight"}),
                        context=context,
                    )
                    yield (lora_info.context.model, lora.weight)
@ -649,7 +699,7 @@ class DenoiseLatentsInvocation(BaseInvocation):
                return

            unet_info = context.services.model_manager.get_model(
-                **self.unet.unet.dict(),
+                **self.unet.unet.model_dump(),
                context=context,
            )
            with (
@ -700,7 +750,10 @@ class DenoiseLatentsInvocation(BaseInvocation):
                    denoising_end=self.denoising_end,
                )

-                result_latents, result_attention_map_saver = pipeline.latents_from_embeddings(
+                (
+                    result_latents,
+                    result_attention_map_saver,
+                ) = pipeline.latents_from_embeddings(
                    latents=latents,
                    timesteps=timesteps,
                    init_timestep=init_timestep,
@ -728,7 +781,11 @@ class DenoiseLatentsInvocation(BaseInvocation):


@invocation(
-    "l2i", title="Latents to Image", tags=["latents", "image", "vae", "l2i"], category="latents", version="1.0.0"
+    "l2i",
+    title="Latents to Image",
+    tags=["latents", "image", "vae", "l2i"],
+    category="latents",
+    version="1.0.0",
 )
 class LatentsToImageInvocation(BaseInvocation):
    """Generates an image from latents."""
@ -743,7 +800,7 @@ class LatentsToImageInvocation(BaseInvocation):
    )
    tiled: bool = InputField(default=False, description=FieldDescriptions.tiled)
    fp32: bool = InputField(default=DEFAULT_PRECISION == "float32", description=FieldDescriptions.fp32)
-    metadata: CoreMetadata = InputField(
+    metadata: Optional[CoreMetadata] = InputField(
        default=None,
        description=FieldDescriptions.core_metadata,
        ui_hidden=True,
@ -754,7 +811,7 @@ class LatentsToImageInvocation(BaseInvocation):
        latents = context.services.latents.get(self.latents.latents_name)

        vae_info = context.services.model_manager.get_model(
-            **self.vae.vae.dict(),
+            **self.vae.vae.model_dump(),
            context=context,
        )

@ -816,7 +873,7 @@ class LatentsToImageInvocation(BaseInvocation):
            node_id=self.id,
            session_id=context.graph_execution_state_id,
            is_intermediate=self.is_intermediate,
-            metadata=self.metadata.dict() if self.metadata else None,
+            metadata=self.metadata.model_dump() if self.metadata else None,
            workflow=self.workflow,
        )

@ -830,7 +887,13 @@ class LatentsToImageInvocation(BaseInvocation):
 LATENTS_INTERPOLATION_MODE = Literal["nearest", "linear", "bilinear", "bicubic", "trilinear", "area", "nearest-exact"]


-@invocation("lresize", title="Resize Latents", tags=["latents", "resize"], category="latents", version="1.0.0")
+@invocation(
+    "lresize",
+    title="Resize Latents",
+    tags=["latents", "resize"],
+    category="latents",
+    version="1.0.0",
+)
 class ResizeLatentsInvocation(BaseInvocation):
    """Resizes latents to explicit width/height (in pixels). Provided dimensions are floor-divided by 8."""

@ -876,7 +939,13 @@ class ResizeLatentsInvocation(BaseInvocation):
        return build_latents_output(latents_name=name, latents=resized_latents, seed=self.latents.seed)


-@invocation("lscale", title="Scale Latents", tags=["latents", "resize"], category="latents", version="1.0.0")
+@invocation(
+    "lscale",
+    title="Scale Latents",
+    tags=["latents", "resize"],
+    category="latents",
+    version="1.0.0",
+)
 class ScaleLatentsInvocation(BaseInvocation):
    """Scales latents by a given factor."""

@ -915,7 +984,11 @@ class ScaleLatentsInvocation(BaseInvocation):


@invocation(
-    "i2l", title="Image to Latents", tags=["latents", "image", "vae", "i2l"], category="latents", version="1.0.0"
+    "i2l",
+    title="Image to Latents",
+    tags=["latents", "image", "vae", "i2l"],
+    category="latents",
+    version="1.0.0",
 )
 class ImageToLatentsInvocation(BaseInvocation):
    """Encodes an image into latents."""
@ -979,7 +1052,7 @@ class ImageToLatentsInvocation(BaseInvocation):
        image = context.services.images.get_pil_image(self.image.image_name)

        vae_info = context.services.model_manager.get_model(
-            **self.vae.vae.dict(),
+            **self.vae.vae.model_dump(),
            context=context,
        )

@ -1007,7 +1080,13 @@ class ImageToLatentsInvocation(BaseInvocation):
        return vae.encode(image_tensor).latents


-@invocation("lblend", title="Blend Latents", tags=["latents", "blend"], category="latents", version="1.0.0")
+@invocation(
+    "lblend",
+    title="Blend Latents",
+    tags=["latents", "blend"],
+    category="latents",
+    version="1.0.0",
+)
 class BlendLatentsInvocation(BaseInvocation):
    """Blend two latents using a given alpha. Latents must have same size."""