diff --git a/invokeai/frontend/web/docs/WORKFLOWS_DESIGN_IMPLEMENTATION.md b/invokeai/frontend/web/docs/WORKFLOWS_DESIGN_IMPLEMENTATION.md index 150f06b45d..70013499d0 100644 --- a/invokeai/frontend/web/docs/WORKFLOWS_DESIGN_IMPLEMENTATION.md +++ b/invokeai/frontend/web/docs/WORKFLOWS_DESIGN_IMPLEMENTATION.md @@ -5,29 +5,47 @@ - [Workflows - Design and Implementation](#workflows---design-and-implementation) - - [Linear UI](#linear-ui) - - [Workflow Editor](#workflow-editor) - - [Workflows](#workflows) - - [Workflow -> reactflow state -> InvokeAI graph](#workflow---reactflow-state---invokeai-graph) - - [Nodes vs Invocations](#nodes-vs-invocations) - - [Workflow Linear View](#workflow-linear-view) - - [OpenAPI Schema Parsing](#openapi-schema-parsing) - - [Field Instances and Templates](#field-instances-and-templates) - - [Stateful vs Stateless Fields](#stateful-vs-stateless-fields) - - [Collection and Polymorphic Fields](#collection-and-polymorphic-fields) + - [Design](#design) + - [Linear UI](#linear-ui) + - [Workflow Editor](#workflow-editor) + - [Workflows](#workflows) + - [Workflow -\> reactflow state -\> InvokeAI graph](#workflow---reactflow-state---invokeai-graph) + - [Nodes vs Invocations](#nodes-vs-invocations) + - [Workflow Linear View](#workflow-linear-view) + - [OpenAPI Schema](#openapi-schema) + - [Field Instances and Templates](#field-instances-and-templates) + - [Stateful vs Stateless Fields](#stateful-vs-stateless-fields) + - [Collection and Polymorphic Fields](#collection-and-polymorphic-fields) - [Implementation](#implementation) + - [zod Schemas and Types](#zod-schemas-and-types) + - [OpenAPI Schema Parsing](#openapi-schema-parsing) + - [Parsing Field Types](#parsing-field-types) + - [Primitive Types](#primitive-types) + - [Complex Types](#complex-types) + - [Collection Types](#collection-types) + - [Polymorphic Types](#polymorphic-types) + - [Optional Fields](#optional-fields) + - [Building Field Input Templates](#building-field-input-templates) + - [Building Field Output Templates](#building-field-output-templates) + - [Workflow Migrations](#workflow-migrations) InvokeAI's backend uses graphs, composed of **nodes** and **edges**, to process data and generate images. -Nodes have any number of **input fields** and one **output field**. Edges connect nodes together via their inputs and outputs. +Nodes have any number of **input fields** and **output fields**. Edges connect nodes together via their inputs and outputs. Fields have data types which dictate how they may be connected. -During execution, a nodes' output may be passed along to any number of other nodes' inputs. +During execution, a nodes' outputs may be passed along to any number of other nodes' inputs. -We provide two ways to build graphs in the frontend: the [Linear UI](#linear-ui) and [Workflow Editor](#workflow-editor). +Workflows are an enriched abstraction over a graph. -## Linear UI +## Design + +InvokeAI provide two ways to build graphs in the frontend: the [Linear UI](#linear-ui) and [Workflow Editor](#workflow-editor). + +To better understand the use case and challenges related to workflows, we will review both of these modes. + +### Linear UI This includes the **Text to Image**, **Image to Image** and **Unified Canvas** tabs. @@ -42,17 +60,15 @@ There are many other graph builders in the same folder for different tabs or bas In the Linear UI, we go straight from **simple application state** to **graph** via these builders. -## Workflow Editor +### Workflow Editor The Workflow Editor is a visual graph editor, allowing users to draw edges from node to node to construct a graph. This _far_ more approachable way to create complex graphs. InvokeAI uses the [reactflow](https://github.com/xyflow/xyflow) library to power the Workflow Editor. It provides both a graph editor UI and manages its own internal graph state. -### Workflows +#### Workflows -So far, we've described two different graph representations used by InvokeAI - the InvokeAI execution graph and the reactflow state. - -Neither of these is sufficient to represent a _workflow_, though. A workflow must have a representation of a its graph's nodes and edges, but it also has other data: +A workflow is a representation of a graph plus additional metadata: - Name - Description @@ -69,7 +85,7 @@ Workflows should have other qualities: To support these qualities, workflows are serializable, have a versioned schemas, and represent graphs as minimally as possible. Fortunately, the reactflow state for nodes and edges works perfectly for this.. -#### Workflow -> reactflow state -> InvokeAI graph +##### Workflow -> reactflow state -> InvokeAI graph Given a workflow, we need to be able to derive reactflow state and/or an InvokeAI graph from it. @@ -78,7 +94,7 @@ The first step - workflow to reactflow state - is very simple. The logic is in ` The reactflow state is, however, structurally incompatible with our backend's graph structure. When a user invokes on a Workflow, we need to convert the reactflow state into an InvokeAI graph. This is far simpler than the graph building logic from the Linear UI: `invokeai/frontend/web/src/features/nodes/util/graphBuilders/buildNodesGraph.ts` -#### Nodes vs Invocations +##### Nodes vs Invocations We often use the terms "node" and "invocation" interchangeably, but they may refer to different things in the frontend. @@ -88,7 +104,7 @@ reactflow [has its own definitions](https://reactflow.dev/learn/concepts/terms-a - A reactflow edge is roughly equivalent to an InvokeAI edge. - A reactflow handle is roughly equivalent to an InvokeAI input or output field. -#### Workflow Linear View +##### Workflow Linear View Graphs are very capable data structures, but not everyone wants to work with them all the time. @@ -96,7 +112,7 @@ To allow less technical users - or anyone who wants a less visually noisy worksp A workflow input field can be added to this Linear View, and its input component can be presented similarly to the Linear UI tabs. Internally, we add the field to the workflow's list of exposed fields. -### OpenAPI Schema Parsing +#### OpenAPI Schema OpenAPI is a schema specification that can represent complex data structures and relationships. The backend is capable of generating an OpenAPI schema for all invocations. @@ -106,7 +122,7 @@ Invocation and field templates are the "source of truth" for graphs, because the When a user adds a new node to their workflow, these templates are used to instantiate a node with fields instantiated from the input and output field templates. -#### Field Instances and Templates +##### Field Instances and Templates Field templates consist of: @@ -121,7 +137,7 @@ The type of the field determines the UI components that are rendered for it. A field instance's name associates it with its template. -#### Stateful vs Stateless Fields +##### Stateful vs Stateless Fields **Stateful** fields store their value in the frontend graph. Think primitives, model identifiers, images, etc. Fields are only stateful if the frontend allows the user to directly input a value for them. @@ -131,7 +147,7 @@ Stateless fields do not store their value in the node, so their field instances "Custom" fields will always be treated as stateless fields. -#### Collection and Polymorphic Fields +##### Collection and Polymorphic Fields Field types have a name and two flags which may identify it as a **collection** or **polymorphic** field. @@ -143,18 +159,103 @@ If it is annotated as a union of a type and list, the type will be flagged as a The majority of data structures in the backend are [pydantic](https://github.com/pydantic/pydantic) models. Pydantic provides OpenAPI schemas for all models and we then generate TypeScript types from those. +The OpenAPI schema is parsed at runtime into our invocation templates. + Workflows and all related data are modeled in the frontend using [zod](https://github.com/colinhacks/zod). Related types are inferred from the zod schemas. -### Schemas and Types +> In python, invocations are pydantic models with fields. These fields become inputs. The invocation's `invoke()` function returns a pydantic model - its output. Like the invocation itself, the output model has any number of fields, which become outputs. -The schemas, inferred types, type guards and related constants are in `invokeai/frontend/web/src/features/nodes/types/`. +### zod Schemas and Types -Roughly in order from lowest-level to highest: +The zod schemas, inferred types, and type guards are in `invokeai/frontend/web/src/features/nodes/types/`. + +Roughly order from lowest-level to highest: - `common.ts`: stateful field data, and couple other misc types - `field.ts`: fields - types, values, instances, templates -- `metadata.ts`: core metadata - `invocation.ts`: invocations and other node types - `workflow.ts`: workflows and constituents +We customize the OpenAPI schema to include additional properties on invocation and field schemas. To facilitate parsing this schema into templates, we modify/wrap the types from [openapi-types](https://github.com/kogosoftwarellc/open-api/tree/main/packages/openapi-types) in `openapi.ts`. + +### OpenAPI Schema Parsing + +The entrypoint for the OpenAPI schema parsing is `invokeai/frontend/web/src/features/nodes/util/parseSchema.ts`. + +General logic flow: + +- Iterate over all invocation schema objects + - Extract relevant invocation-level attributes (e.g. title, type, version, etc) + - Iterate over the invocation's input fields + - [Parse each field's type](#parsing-field-types) + - [Build a field input template](#building-field-input-templates) from the type - either a stateful template or "generic" stateless template + - Iterate over the invocation's output fields + - Parse the field's type (same as inputs) + - [Build a field output template](#building-field-output-templates) + - Assemble the attributes and fields into an invocation template + +Most of these involve very straightforward `reduce`s, but the less intuitive steps are detailed below. + +#### Parsing Field Types + +Field types are represented as structured objects: + +```ts +type FieldType = { + name: string; + isCollection: boolean; + isPolymorphic: boolean; +}; +``` + +The parsing logic is in `invokeai/frontend/web/src/features/nodes/util/parseFieldType.ts`. + +There are 4 general cases for field type parsing. + +##### Primitive Types + +When a field is annotated as a primitive values (e.g. `int`, `str`, `float`), the field type parsing is fairly straightforward. The field is represented by a simple OpenAPI **schema object**, which has a `type` property. + +We create a field type name from this `type` string (e.g. `string` -> `StringField`). + +##### Complex Types + +When a field is annotated as a pydantic model (e.g. `ImageField`, `MainModelField`, `ControlField`), it is represented as a **reference object**. Reference objects are pointers to another schema or reference object within the schema. + +We need to **dereference**[^dereference] the schema to pull these out. Dereferencing may require recursion. We use the reference object's name directly for the field type name. + +##### Collection Types + +When a field is annotated as a list of a single type, the schema object has an `items` property. They may be a schema object or reference object and must be parsed to determine the item type. + +We use the item type for field type name, adding `isCollection: true` to the field type. + +##### Polymorphic Types + +When a field is annotated as a union of a type and list of that type, the schema object has an `anyOf` property, which holds a list of valid types for the union. + +After verifying that the union has two members (a type and list of the same type), we use the type for field type name, adding `isPolymorphic: true` to the field type. + +##### Optional Fields + +In OpenAPI v3.1, when an object is optional, it is put into an `anyOf` along with a primitive schema object with `type: 'null'`. + +Handling this adds a fair bit of complexity, as we now must filter out the `'null'` types and work with the remaining types as described above. + +If there is a single remaining schema object, we must recursively call to `parseFieldType()` to get parse it. + +[^dereference]: Unfortunately, at this time, we've had limited success using external libraries to deference at runtime, so we do this ourselves. + +#### Building Field Input Templates + +Now that we have a field type, we can build an input template for the field. This logic is in `invokeai/frontend/web/src/features/nodes/util/buildFieldInputTemplate.ts`. + +Stateful fields all get a function to build their template, while stateless fields are constructed directly. This is possible because stateless fields have no default value or constraints. + +#### Building Field Output Templates + +Field outputs are similar to stateless fields - they do not have any value in the frontend. When building their templates, we don't need a special function for each field type. + +The logic is in `invokeai/frontend/web/src/features/nodes/util/buildFieldOutputTemplate.ts`. + ### Workflow Migrations