InvokeAI/invokeai/backend/model_management
psychedelicious c238a7f18b feat(api): chore: pydantic & fastapi upgrade
Upgrade pydantic and fastapi to latest.

- pydantic~=2.4.2
- fastapi~=103.2
- fastapi-events~=0.9.1

**Big Changes**

There are a number of logic changes needed to support pydantic v2. Most changes are very simple, like using the new methods to serialized and deserialize models, but there are a few more complex changes.

**Invocations**

The biggest change relates to invocation creation, instantiation and validation.

Because pydantic v2 moves all validation logic into the rust pydantic-core, we may no longer directly stick our fingers into the validation pie.

Previously, we (ab)used models and fields to allow invocation fields to be optional at instantiation, but required when `invoke()` is called. We directly manipulated the fields and invocation models when calling `invoke()`.

With pydantic v2, this is much more involved. Changes to the python wrapper do not propagate down to the rust validation logic - you have to rebuild the model. This causes problem with concurrent access to the invocation classes and is not a free operation.

This logic has been totally refactored and we do not need to change the model any more. The details are in `baseinvocation.py`, in the `InputField` function and `BaseInvocation.invoke_internal()` method.

In the end, this implementation is cleaner.

**Invocation Fields**

In pydantic v2, you can no longer directly add or remove fields from a model.

Previously, we did this to add the `type` field to invocations.

**Invocation Decorators**

With pydantic v2, we instead use the imperative `create_model()` API to create a new model with the additional field. This is done in `baseinvocation.py` in the `invocation()` wrapper.

A similar technique is used for `invocation_output()`.

**Minor Changes**

There are a number of minor changes around the pydantic v2 models API.

**Protected `model_` Namespace**

All models' pydantic-provided methods and attributes are prefixed with `model_` and this is considered a protected namespace. This causes some conflict, because "model" means something to us, and we have a ton of pydantic models with attributes starting with "model_".

Forunately, there are no direct conflicts. However, in any pydantic model where we define an attribute or method that starts with "model_", we must tell set the protected namespaces to an empty tuple.

```py
class IPAdapterModelField(BaseModel):
    model_name: str = Field(description="Name of the IP-Adapter model")
    base_model: BaseModelType = Field(description="Base model")

    model_config = ConfigDict(protected_namespaces=())
```

**Model Serialization**

Pydantic models no longer have `Model.dict()` or `Model.json()`.

Instead, we use `Model.model_dump()` or `Model.model_dump_json()`.

**Model Deserialization**

Pydantic models no longer have `Model.parse_obj()` or `Model.parse_raw()`, and there are no `parse_raw_as()` or `parse_obj_as()` functions.

Instead, you need to create a `TypeAdapter` object to parse python objects or JSON into a model.

```py
adapter_graph = TypeAdapter(Graph)
deserialized_graph_from_json = adapter_graph.validate_json(graph_json)
deserialized_graph_from_dict = adapter_graph.validate_python(graph_dict)
```

**Field Customisation**

Pydantic `Field`s no longer accept arbitrary args.

Now, you must put all additional arbitrary args in a `json_schema_extra` arg on the field.

**Schema Customisation**

FastAPI and pydantic schema generation now follows the OpenAPI version 3.1 spec.

This necessitates two changes:
- Our schema customization logic has been revised
- Schema parsing to build node templates has been revised

The specific aren't important, but this does present additional surface area for bugs.

**Performance Improvements**

Pydantic v2 is a full rewrite with a rust backend. This offers a substantial performance improvement (pydantic claims 5x to 50x depending on the task). We'll notice this the most during serialization and deserialization of sessions/graphs, which happens very very often - a couple times per node.

I haven't done any benchmarks, but anecdotally, graph execution is much faster. Also, very larges graphs - like with massive iterators - are much, much faster.
2023-10-17 14:59:25 +11:00
..
models feat(api): chore: pydantic & fastapi upgrade 2023-10-17 14:59:25 +11:00
__init__.py isort wip 3 2023-09-12 13:01:58 -04:00
convert_ckpt_to_diffusers.py enable v_prediction for sd-1 models 2023-09-24 12:22:29 -04:00
libc_util.py (minor) clean up typos. 2023-10-03 15:00:03 -04:00
lora.py isort wip 3 2023-09-12 13:01:58 -04:00
memory_snapshot.py fix(backend): handle systems with glibc < 2.33 2023-10-15 07:56:55 +11:00
model_cache.py Demote model cache logs from warning to debug based on the conversation here: https://discord.com/channels/1020123559063990373/1049495067846524939/1161647290189090816 2023-10-11 12:02:46 -04:00
model_load_optimizations.py Fix bug in skip_torch_weight_init() where the original behavior of torch.nn.Conv*d modules wasn't being restored correctly. 2023-10-10 10:05:50 -04:00
model_manager.py feat(api): chore: pydantic & fastapi upgrade 2023-10-17 14:59:25 +11:00
model_merge.py isort wip 3 2023-09-12 13:01:58 -04:00
model_probe.py Add support for T2I-Adapter in node workflows (#4612) 2023-10-05 16:29:16 +11:00
model_search.py fix probing for ip_adapter folders 2023-09-23 22:32:03 -04:00
README.md Add README with info about glib memory fragmentation caused by the model cache. 2023-10-03 14:25:34 -04:00
seamless.py feat(api): chore: pydantic & fastapi upgrade 2023-10-17 14:59:25 +11:00
util.py Format by black 2023-08-11 03:20:56 +03:00

Model Cache

glibc Memory Allocator Fragmentation

Python (and PyTorch) relies on the memory allocator from the C Standard Library (libc). On linux, with the GNU C Standard Library implementation (glibc), our memory access patterns have been observed to cause severe memory fragmentation. This fragmentation results in large amounts of memory that has been freed but can't be released back to the OS. Loading models from disk and moving them between CPU/CUDA seem to be the operations that contribute most to the fragmentation. This memory fragmentation issue can result in OOM crashes during frequent model switching, even if max_cache_size is set to a reasonable value (e.g. a OOM crash with max_cache_size=16 on a system with 32GB of RAM).

This problem may also exist on other OSes, and other libc implementations. But, at the time of writing, it has only been investigated on linux with glibc.

To better understand how the glibc memory allocator works, see these references:

Note the differences between memory allocated as chunks in an arena vs. memory allocated with mmap. Under glibc's default configuration, most model tensors get allocated as chunks in an arena making them vulnerable to the problem of fragmentation.

We can work around this memory fragmentation issue by setting the following env var:

# Force blocks >1MB to be allocated with `mmap` so that they are released to the system immediately when they are freed.
MALLOC_MMAP_THRESHOLD_=1048576

See the following references for more information about the malloc tunable parameters:

The model cache emits debug logs that provide visibility into the state of the libc memory allocator. See the LibcUtil class for more info on how these libc malloc stats are collected.