Commit Graph

25 Commits

Author SHA1 Message Date
psychedelicious
2f9ebdec69 fix(app): openapi schema generation
Some tech debt related to dynamic pydantic schemas for invocations became problematic. Including the invocations and results in the event schemas was breaking pydantic's handling of ref schemas. I don't really understand why - I think it's a pydantic bug in a remote edge case that we are hitting.

After many failed attempts I landed on this implementation, which is actually much tidier than what was in there before.

- Create pydantic-enabled types for `AnyInvocation` and `AnyInvocationOutput` and use these in place of the janky dynamic unions. Actually, they are kinda the same, but better encapsulated. Use these in `Graph`, `GraphExecutionState`, `InvocationEventBase` and `InvocationCompleteEvent`.
- Revise the custom openapi function to work with the new models.
- Split out the custom openapi function to a separate file. Add a `post_transform` callback so consumers can customize the output schema.
- Update makefile scripts.
2024-05-30 12:03:03 +10:00
psychedelicious
25b9c19eed feat(app): handle preparation errors as node errors
We were not handling node preparation errors as node errors before. Here's the explanation, copied from a comment that is no longer required:

---

TODO(psyche): Sessions only support errors on nodes, not on the session itself. When an error occurs outside
node execution, it bubbles up to the processor where it is treated as a queue item error.

Nodes are pydantic models. When we prepare a node in `session.next()`, we set its inputs. This can cause a
pydantic validation error. For example, consider a resize image node which has a constraint on its `width`
input field - it must be greater than zero. During preparation, if the width is set to zero, pydantic will
raise a validation error.

When this happens, it breaks the flow before `invocation` is set. We can't set an error on the invocation
because we didn't get far enough to get it - we don't know its id. Hence, we just set it as a queue item error.

---

This change wraps the node preparation step with exception handling. A new `NodeInputError` exception is raised when there is a validation error. This error has the node (in the state it was in just prior to the error) and an identifier of the input that failed.

This allows us to mark the node that failed preparation as errored, correctly making such errors _node_ errors and not _processor_ errors. It's much easier to diagnose these situations. The error messages look like this:

> Node b5ac87c6-0678-4b8c-96b9-d215aee12175 has invalid incoming input for height

Some of the exception handling logic is cleaned up.
2024-05-24 20:02:24 +10:00
Joe Kubler
83b3828b55 prioritize iterate in _get_next_node 2024-03-26 09:18:46 +11:00
psychedelicious
7e71effa17 tidy(nodes): remove no-op model_config
Because we now customize the JSON Schema creation for GraphExecutionState, the model_config did nothing.
2024-03-01 10:42:33 +11:00
psychedelicious
e93bd15392 tidy(nodes): remove LibraryGraphs
The workflow library supersedes this unused feature.
2024-03-01 10:42:33 +11:00
psychedelicious
641d235102 tidy(nodes): remove GraphInvocation
`GraphInvocation` is a node that can contain a whole graph. It is removed for a number of reasons:

1. This feature was unused (the UI doesn't support it) and there is no plan for it to be used.

The use-case it served is known in other node execution engines as "node groups" or "blocks" - a self-contained group of nodes, which has group inputs and outputs. This is a planned feature that will be handled client-side.

2. It adds substantial complexity to the graph processing logic. It's probably not enough to have a measurable performance impact but it does make it harder to work in the graph logic.

3. It allows for graphs to be recursive, and the improved invocations union handling does not play well with it. Actually, it works fine within `graph.py` but not in the tests for some reason. I do not understand why. There's probably a workaround, but I took this as encouragement to remove `GraphInvocation` from the app since we don't use it.
2024-03-01 10:42:33 +11:00
psychedelicious
b79ae3a101 fix(nodes): fix OpenAPI schema generation
The change to `Graph.nodes` and `GraphExecutionState.results` validation requires some fanagling to get the OpenAPI schema generation to work. See new comments for a details.
2024-03-01 10:42:33 +11:00
psychedelicious
731860c332 feat(nodes): JIT graph nodes validation
We use pydantic to validate a union of valid invocations when instantiating a graph.

Previously, we constructed the union while creating the `Graph` class. This introduces a dependency on the order of imports.

For example, consider a setup where we have 3 invocations in the app:

- Python executes the module where `FirstInvocation` is defined, registering `FirstInvocation`.
- Python executes the module where `SecondInvocation` is defined, registering `SecondInvocation`.
- Python executes the module where `Graph` is defined. A union of invocations is created and used to define the `Graph.nodes` field. The union contains `FirstInvocation` and `SecondInvocation`.
- Python executes the module where `ThirdInvocation` is defined, registering `ThirdInvocation`.
- A graph is created that includes `ThirdInvocation`. Pydantic validates the graph using the union, which does not know about `ThirdInvocation`, raising a `ValidationError` about an unknown invocation type.

This scenario has been particularly problematic in tests, where we may create invocations dynamically. The test files have to be structured in such a way that the imports happen in the right order. It's a major pain.

This PR refactors the validation of graph nodes to resolve this issue:

- `BaseInvocation` gets a new method `get_typeadapter`. This builds a pydantic `TypeAdapter` for the union of all registered invocations, caching it after the first call.
- `Graph.nodes`'s type is widened to `dict[str, BaseInvocation]`. This actually is a nice bonus, because we get better type hints whenever we reference `some_graph.nodes`.
- A "plain" field validator takes over the validation logic for `Graph.nodes`. "Plain" validators totally override pydantic's own validation logic. The validator grabs the `TypeAdapter` from `BaseInvocation`, then validates each node with it. The validation is identical to the previous implementation - we get the same errors.

`BaseInvocationOutput` gets the same treatment.
2024-03-01 10:42:33 +11:00
psychedelicious
4ce21087d3 fix(nodes): restore type annotations for InvocationContext 2024-03-01 10:42:33 +11:00
psychedelicious
8637c40661 feat(nodes): update all invocations to use new invocation context
Update all invocations to use the new context. The changes are all fairly simple, but there are a lot of them.

Supporting minor changes:
- Patch bump for all nodes that use the context
- Update invocation processor to provide new context
- Minor change to `EventServiceBase` to accept a node's ID instead of the dict version of a node
- Minor change to `ModelManagerService` to support the new wrapped context
- Fanagling of imports to avoid circular dependencies
2024-03-01 10:42:33 +11:00
psychedelicious
992b02aa65 tidy(nodes): move all field things to fields.py
Unfortunately, this is necessary to prevent circular imports at runtime.
2024-03-01 10:42:33 +11:00
psychedelicious
3726293258 feat(nodes): improve types in graph.py
Methods `get_node` and `complete` were typed as returning a dynamically created unions `InvocationsUnion` and `InvocationOutputsUnion`, respectively.

Static type analysers cannot work with dynamic objects, so these methods end up as effectively un-annotated, returning `Unknown`.

They now return `BaseInvocation` and `BaseInvocationOutput`, respectively, which are the superclasses of all members of each union. This gives us the best type annotation that is possible.

Note: the return types of these methods are never introspected, so it doesn't really matter what they are at runtime.
2024-02-14 07:56:10 +11:00
psychedelicious
d20f98fb4f fix(nodes): deep copy graph inputs
The change to memory session storage brings a subtle behaviour change.

Previously, we serialized and deserialized everything (e.g. field state, invocation outputs, etc) constantly. The meant we were effectively working with deep-copied objects at all time. We could mutate objects freely without worrying about other references to the object.

With memory storage, objects are now passed around by reference, and we cannot handle them in the same way.

This is problematic for nodes that mutate their own inputs. There are two ways this causes a problem:

- An output is used as input for multiple nodes. If the first node mutates the output object while `invoke`ing, the next node will get the mutated object.
- The invocation cache stores live python objects. When a node mutates an output pulled from the cache, the next node that uses the cached object will get the mutated object.

The solution is to deep-copy a node's inputs as they are set, effectively reproducing the same behaviour as we had with the SQLite session storage. Nodes can safely mutate their inputs and those changes never leave the node's scope.

Closes  #5665
2024-02-09 21:17:32 +11:00
psychedelicious
0fdcc0af65 feat(nodes): add index and total to iterate output 2023-12-04 14:11:32 +11:00
psychedelicious
514c49d946 feat(nodes): warn if node has no version specified; fall back on 1.0.0 2023-11-29 10:49:31 +11:00
psychedelicious
858bcdd3ff feat(nodes): improve docstrings in baseinvocation, disambiguate method names 2023-11-29 10:49:31 +11:00
psychedelicious
86a74e929a feat(ui): add support for custom field types
Node authors may now create their own arbitrary/custom field types. Any pydantic model is supported.

Two notes:
1. Your field type's class name must be unique.

Suggest prefixing fields with something related to the node pack as a kind of namespace.

2. Custom field types function as connection-only fields.

For example, if your custom field has string attributes, you will not get a text input for that attribute when you give a node a field with your custom type.

This is the same behaviour as other complex fields that don't have custom UIs in the workflow editor - like, say, a string collection.

feat(ui): fix tooltips for custom types

We need to hold onto the original type of the field so they don't all just show up as "Unknown".

fix(ui): fix ts error with custom fields

feat(ui): custom field types connection validation

In the initial commit, a custom field's original type was added to the *field templates* only as `originalType`. Custom fields' `type` property was `"Custom"`*. This allowed for type safety throughout the UI logic.

*Actually, it was `"Unknown"`, but I changed it to custom for clarity.

Connection validation logic, however, uses the *field instance* of the node/field. Like the templates, *field instances* with custom types have their `type` set to `"Custom"`, but they didn't have an `originalType` property. As a result, all custom fields could be connected to all other custom fields.

To resolve this, we need to add `originalType` to the *field instances*, then switch the validation logic to use this instead of `type`.

This ended up needing a bit of fanagling:

- If we make `originalType` a required property on field instances, existing workflows will break during connection validation, because they won't have this property. We'd need a new layer of logic to migrate the workflows, adding the new `originalType` property.

While this layer is probably needed anyways, typing `originalType` as optional is much simpler. Workflow migration logic can come layer.

(Technically, we could remove all references to field types from the workflow files, and let the templates hold all this information. This feels like a significant change and I'm reluctant to do it now.)

- Because `originalType` is optional, anywhere we care about the type of a field, we need to use it over `type`. So there are a number of `field.originalType ?? field.type` expressions. This is a bit of a gotcha, we'll need to remember this in the future.

- We use `Array.prototype.includes()` often in the workflow editor, e.g. `COLLECTION_TYPES.includes(type)`. In these cases, the const array is of type `FieldType[]`, and `type` is is `FieldType`.

Because we now support custom types, the arg `type` is now widened from `FieldType` to `string`.

This causes a TS error. This behaviour is somewhat controversial (see https://github.com/microsoft/TypeScript/issues/14520). These expressions are now rewritten as `COLLECTION_TYPES.some((t) => t === type)` to satisfy TS. It's logically equivalent.

fix(ui): typo

feat(ui): add CustomCollection and CustomPolymorphic field types

feat(ui): add validation for CustomCollection & CustomPolymorphic types

- Update connection validation for custom types
- Use simple string parsing to determine if a field is a collection or polymorphic type.
- No longer need to keep a list of collection and polymorphic types.
- Added runtime checks in `baseinvocation.py` to ensure no fields are named in such a way that it could mess up the new parsing

chore(ui): remove errant console.log

fix(ui): rename 'nodes.currentConnectionFieldType' -> 'nodes.connectionStartFieldType'

This was confusingly named and kept tripping me up. Renamed to be consistent with the `reactflow` `ConnectionStartParams` type.

fix(ui): fix ts error

feat(nodes): add runtime check for custom field names

"Custom", "CustomCollection" and "CustomPolymorphic" are reserved field names.

chore(ui): add TODO for revising field type names

wip refactor fieldtype structured

wip refactor field types

wip refactor types

wip refactor types

fix node layout

refactor field types

chore: mypy

organisation

organisation

organisation

fix(nodes): fix field orig_required, field_kind and input statuses

feat(nodes): remove broken implementation of default_factory on InputField

Use of this could break connection validation due to the difference in node schemas required fields and invoke() required args.

Removed entirely for now. It wasn't ever actually used by the system, because all graphs always had values provided for fields where default_factory was used.

Also, pydantic is smart enough to not reuse the same object when specifying a default value - it clones the object first. So, the common pattern of `default_factory=list` is extraneous. It can just be `default=[]`.

fix(nodes): fix InputField name validation

workflow validation

validation

chore: ruff

feat(nodes): fix up baseinvocation comments

fix(ui): improve typing & logic of buildFieldInputTemplate

improved error handling in parseFieldType

fix: back compat for deprecated default_factory and UIType

feat(nodes): do not show node packs loaded log if none loaded

chore(ui): typegen
2023-11-29 10:49:31 +11:00
psychedelicious
6494e8e551 chore: ruff format 2023-11-11 10:55:40 +11:00
psychedelicious
99a8ebe3a0 chore: ruff check - fix flake8-bugbear 2023-11-11 10:55:28 +11:00
psychedelicious
3a136420d5 chore: ruff check - fix flake8-comprensions 2023-11-11 10:55:23 +11:00
psychedelicious
86c3acf184 fix(nodes): revert optional graph 2023-10-20 12:05:13 +11:00
psychedelicious
bbae4045c9 fix(nodes): GraphInvocation should use InputField 2023-10-20 12:05:13 +11:00
psychedelicious
f0db4d36e4 feat: metadata refactor
- Refactor how metadata is handled to support a user-defined metadata in graphs
- Update workflow embed handling
- Update UI to work with these changes
- Update tests to support metadata/workflow changes
2023-10-20 12:05:13 +11:00
psychedelicious
c238a7f18b feat(api): chore: pydantic & fastapi upgrade
Upgrade pydantic and fastapi to latest.

- pydantic~=2.4.2
- fastapi~=103.2
- fastapi-events~=0.9.1

**Big Changes**

There are a number of logic changes needed to support pydantic v2. Most changes are very simple, like using the new methods to serialized and deserialize models, but there are a few more complex changes.

**Invocations**

The biggest change relates to invocation creation, instantiation and validation.

Because pydantic v2 moves all validation logic into the rust pydantic-core, we may no longer directly stick our fingers into the validation pie.

Previously, we (ab)used models and fields to allow invocation fields to be optional at instantiation, but required when `invoke()` is called. We directly manipulated the fields and invocation models when calling `invoke()`.

With pydantic v2, this is much more involved. Changes to the python wrapper do not propagate down to the rust validation logic - you have to rebuild the model. This causes problem with concurrent access to the invocation classes and is not a free operation.

This logic has been totally refactored and we do not need to change the model any more. The details are in `baseinvocation.py`, in the `InputField` function and `BaseInvocation.invoke_internal()` method.

In the end, this implementation is cleaner.

**Invocation Fields**

In pydantic v2, you can no longer directly add or remove fields from a model.

Previously, we did this to add the `type` field to invocations.

**Invocation Decorators**

With pydantic v2, we instead use the imperative `create_model()` API to create a new model with the additional field. This is done in `baseinvocation.py` in the `invocation()` wrapper.

A similar technique is used for `invocation_output()`.

**Minor Changes**

There are a number of minor changes around the pydantic v2 models API.

**Protected `model_` Namespace**

All models' pydantic-provided methods and attributes are prefixed with `model_` and this is considered a protected namespace. This causes some conflict, because "model" means something to us, and we have a ton of pydantic models with attributes starting with "model_".

Forunately, there are no direct conflicts. However, in any pydantic model where we define an attribute or method that starts with "model_", we must tell set the protected namespaces to an empty tuple.

```py
class IPAdapterModelField(BaseModel):
    model_name: str = Field(description="Name of the IP-Adapter model")
    base_model: BaseModelType = Field(description="Base model")

    model_config = ConfigDict(protected_namespaces=())
```

**Model Serialization**

Pydantic models no longer have `Model.dict()` or `Model.json()`.

Instead, we use `Model.model_dump()` or `Model.model_dump_json()`.

**Model Deserialization**

Pydantic models no longer have `Model.parse_obj()` or `Model.parse_raw()`, and there are no `parse_raw_as()` or `parse_obj_as()` functions.

Instead, you need to create a `TypeAdapter` object to parse python objects or JSON into a model.

```py
adapter_graph = TypeAdapter(Graph)
deserialized_graph_from_json = adapter_graph.validate_json(graph_json)
deserialized_graph_from_dict = adapter_graph.validate_python(graph_dict)
```

**Field Customisation**

Pydantic `Field`s no longer accept arbitrary args.

Now, you must put all additional arbitrary args in a `json_schema_extra` arg on the field.

**Schema Customisation**

FastAPI and pydantic schema generation now follows the OpenAPI version 3.1 spec.

This necessitates two changes:
- Our schema customization logic has been revised
- Schema parsing to build node templates has been revised

The specific aren't important, but this does present additional surface area for bugs.

**Performance Improvements**

Pydantic v2 is a full rewrite with a rust backend. This offers a substantial performance improvement (pydantic claims 5x to 50x depending on the task). We'll notice this the most during serialization and deserialization of sessions/graphs, which happens very very often - a couple times per node.

I haven't done any benchmarks, but anecdotally, graph execution is much faster. Also, very larges graphs - like with massive iterators - are much, much faster.
2023-10-17 14:59:25 +11:00
psychedelicious
402cf9b0ee feat: refactor services folder/module structure
Refactor services folder/module structure.

**Motivation**

While working on our services I've repeatedly encountered circular imports and a general lack of clarity regarding where to put things. The structure introduced goes a long way towards resolving those issues, setting us up for a clean structure going forward.

**Services**

Services are now in their own folder with a few files:

- `services/{service_name}/__init__.py`: init as needed, mostly empty now
- `services/{service_name}/{service_name}_base.py`: the base class for the service
- `services/{service_name}/{service_name}_{impl_type}.py`: the default concrete implementation of the service - typically one of `sqlite`, `default`, or `memory`
- `services/{service_name}/{service_name}_common.py`: any common items - models, exceptions, utilities, etc

Though it's a bit verbose to have the service name both as the folder name and the prefix for files, I found it is _extremely_ confusing to have all of the base classes just be named `base.py`. So, at the cost of some verbosity when importing things, I've included the service name in the filename.

There are some minor logic changes. For example, in `InvocationProcessor`, instead of assigning the model manager service to a variable to be used later in the file, the service is used directly via the `Invoker`.

**Shared**

Things that are used across disparate services are in `services/shared/`:

- `default_graphs.py`: previously in `services/`
- `graphs.py`: previously in `services/`
- `paginatation`: generic pagination models used in a few services
- `sqlite`: the `SqliteDatabase` class, other sqlite-specific things
2023-10-12 12:15:06 -04:00