wip(docs): ELI5 Tutorial For Invocations

2024-08-30 20:32:17 +00:00 · 2023-06-09 22:32:16 +12:00 · 2023-06-09 22:32:16 +12:00 · 48258c4bb8
commit 48258c4bb8
parent 267f0408bb
3 changed files with 516 additions and 3 deletions
--- a/docs/assets/contributing/resize_invocation.png
+++ b/docs/assets/contributing/resize_invocation.png
--- a/docs/assets/contributing/resize_node_editor.png
+++ b/docs/assets/contributing/resize_node_editor.png
--- a/docs/contributing/INVOCATIONS.md
+++ b/docs/contributing/INVOCATIONS.md
@ -1,8 +1,521 @@
 # Invocations

-Invocations represent a single operation, its inputs, and its outputs. These
-operations and their outputs can be chained together to generate and modify
-images.
+Features in InvokeAI are added in the form of modular node-like systems called
+**Invocations**.
+
+An Invocation is simply a single operation that takes in some inputs and gives
+out some outputs. We can then chain multiple Invocations together to create more
+complex functionality.
+
+## Invocations Directory
+
+InvokeAI Invocations can be found in the `invokeai/app/invocations` directory.
+
+You can add your new functionality to one of the existing Invocations in this
+directory or create a new file in this directory as per your needs.
+
+**Note:** _All Invocations must be inside this directory for InvokeAI to
+recognize them as valid Invocations._
+
+## Creating A New Invocation
+
+In order to understand the process of creating a new Invocation, let us actually
+create one.
+
+In our example, let us create an Invocation that will take in an image, resize
+it and output the resized image.
+
+The first set of things we need to do when creating a new Invocation are -
+
+- Create a new class that derives from a predefined parent class called
+  `BaseInvocation`.
+- The name of every Invocation must end with the word `Invocation` in order for
+  it to be recognized as an Invocation.
+- Every Invocation must have a `docstring` that describes what this Invocation
+  does.
+- Every Invocation must have a unique `type` field defined which becomes its
+  indentifier.
+- Invocations are strictly typed. We make use of the native
+  [typing](https://docs.python.org/3/library/typing.html) library and the
+  installed [pydantic](https://pydantic-docs.helpmanual.io/) library for
+  validation.
+
+So let us do that.
+
+```python
+from typing import Literal
+from .baseinvocation import BaseInvocation
+
+class ResizeInvocation(BaseInvocation):
+    '''Resizes an image'''
+    type: Literal['resize'] = 'resize'
+```
+
+That's great.
+
+Now we have setup the base of our new Invocation. Let us think about what inputs
+our Invocation takes.
+
+- We need an `image` that we are going to resize.
+- We will need new `width` and `height` values to which we need to resize the
+  image to.
+
+### **Inputs**
+
+Every Invocation input is a pydantic `Field` and like everything else should be
+strictly typed and defined.
+
+So let us create these inputs for our Invocation. First up, the `image` input we
+need. Generally, we can use standard variable types in Python but InvokeAI
+already has a custom `ImageField` type that handles all the stuff that is needed
+for image inputs.
+
+But what is this `ImageField` ..? It is a special class type specifically
+written to handle how images are dealt with in InvokeAI. We will cover how to
+create your own custom field types later in this guide. For now, let's go ahead
+and use it.
+
+```python
+from typing import Literal, Union
+from pydantic import Field
+
+from .baseinvocation import BaseInvocation
+from ..models.image import ImageField
+
+class ResizeInvocation(BaseInvocation):
+    '''Resizes an image'''
+    type: Literal['resize'] = 'resize'
+
+    # Inputs
+    image: Union[ImageField, None] = Field(description="The input image", default=None)
+```
+
+Let us break down our input code.
+
+```python
+image: Union[ImageField, None] = Field(description="The input image", default=None)
+```
+
+| Part      | Value                                                | Description                                                                                        |
+| --------- | ---------------------------------------------------- | -------------------------------------------------------------------------------------------------- |
+| Name      | `image`                                              | The variable that will hold our image                                                              |
+| Type Hint | `Union[ImageField, None]`                            | The types for our field. Indicates that the image can either be an `ImageField` type or `None`     |
+| Field     | `Field(description="The input image", default=None)` | The image variable is a field which needs a description and a default value that we set to `None`. |
+
+Great. Now let us create our other inputs for `width` and `height`
+
+```python
+from typing import Literal, Union
+from pydantic import Field
+
+from .baseinvocation import BaseInvocation
+from ..models.image import ImageField
+
+class ResizeInvocation(BaseInvocation):
+    '''Resizes an image'''
+    type: Literal['resize'] = 'resize'
+
+    # Inputs
+    image: Union[ImageField, None] = Field(description="The input image", default=None)
+    width: int = Field(default=512, ge=64, le=2048, description="Width of the new image")
+    height: int = Field(default=512, ge=64, le=2048, description="Height of the new image")
+```
+
+As you might have noticed, we added two new parameters to the field type for
+`width` and `height` called `gt` and `le`. These basically stand for _greater
+than or equal to_ and _less than or equal to_. There are various other param
+types for field that you can find on the **pydantic** documentation.
+
+**Note:** _Any time it is possible to define constraints for our field, we
+should do it so the frontend has more information on how to parse this field._
+
+Perfect. We now have our inputs. Let us do something with these.
+
+### **Invoke Function**
+
+The `invoke` function is where all the magic happens. This function provides you
+the `context` parameter that is of the type `InvocationContext` which will give
+you access to the current context of the generation and all the other services
+that are provided by it by InvokeAI.
+
+Let us create this function first.
+
+```python
+from typing import Literal, Union
+from pydantic import Field
+
+from .baseinvocation import BaseInvocation, InvocationContext
+from ..models.image import ImageField
+
+class ResizeInvocation(BaseInvocation):
+    '''Resizes an image'''
+    type: Literal['resize'] = 'resize'
+
+    # Inputs
+    image: Union[ImageField, None] = Field(description="The input image", default=None)
+    width: int = Field(default=512, ge=64, le=2048, description="Width of the new image")
+    height: int = Field(default=512, ge=64, le=2048, description="Height of the new image")
+
+    def invoke(self, context: InvocationContext):
+        pass
+```
+
+### **Outputs**
+
+The output of our Invocation will be whatever is returned by this `invoke`
+function. Like with our inputs, we need to strongly type and define our outputs
+too.
+
+What is our output going to be? Another image. Normally you'd have to create a
+type for this but InvokeAI already offers you an `ImageOutput` type that handles
+all the necessary info related to image outputs. So let us use that.
+
+We will cover how to create your own output types later in this guide.
+
+```python
+from typing import Literal, Union
+from pydantic import Field
+
+from .baseinvocation import BaseInvocation, InvocationContext
+from ..models.image import ImageField
+from .image import ImageOutput
+
+class ResizeInvocation(BaseInvocation):
+    '''Resizes an image'''
+    type: Literal['resize'] = 'resize'
+
+    # Inputs
+    image: Union[ImageField, None] = Field(description="The input image", default=None)
+    width: int = Field(default=512, ge=64, le=2048, description="Width of the new image")
+    height: int = Field(default=512, ge=64, le=2048, description="Height of the new image")
+
+    def invoke(self, context: InvocationContext) -> ImageOutput:
+        pass
+```
+
+Perfect. Now that we have our Invocation setup, let us do what we want to do.
+
+- We will first load the image. Generally we do this using the `PIL` library but
+  we can use one of the services provided by InvokeAI to load the image.
+- We will resize the image using `PIL` to our input data.
+- We will output this image in the format we set above.
+
+So let's do that.
+
+```python
+from typing import Literal, Union
+from pydantic import Field
+
+from .baseinvocation import BaseInvocation, InvocationContext
+from ..models.image import ImageField, ResourceOrigin, ImageCategory
+from .image import ImageOutput
+
+class ResizeInvocation(BaseInvocation):
+    '''Resizes an image'''
+    type: Literal['resize'] = 'resize'
+
+    # Inputs
+    image: Union[ImageField, None] = Field(description="The input image", default=None)
+    width: int = Field(default=512, ge=64, le=2048, description="Width of the new image")
+    height: int = Field(default=512, ge=64, le=2048, description="Height of the new image")
+
+    def invoke(self, context: InvocationContext) -> ImageOutput:
+        # Load the image using InvokeAI's predefined Image Service.
+        image = context.services.images.get_pil_image(self.image.image_origin, self.image.image_name)
+
+        # Resizing the image
+        # Because we used the above service, we already have a PIL image. So we can simply resize.
+        resized_image = image.resize((self.width, self.height))
+
+        # Preparing the image for output using InvokeAI's predefined Image Service.
+        output_image = context.services.images.create(
+            image=resized_image,
+            image_origin=ResourceOrigin.INTERNAL,
+            image_category=ImageCategory.GENERAL,
+            node_id=self.id,
+            session_id=context.graph_execution_state_id,
+            is_intermediate=self.is_intermediate,
+        )
+
+        # Returning the Image
+        return ImageOutput(
+            image=ImageField(
+                image_name=output_image.image_name,
+                image_origin=output_image.image_origin,
+            ),
+            width=output_image.width,
+            height=output_image.height,
+        )
+```
+
+**Note:** Do not be overwhelmed by the `ImageOutput` process. InvokeAI has a
+certain way that the images need to be dispatched in order to be stored and read
+correctly. In 99% of the cases when dealing with an image output, you can simply
+copy-paste the template above.
+
+That's it. You made your own **Resize Invocation**.
+
+## Result
+
+Once you make your Invocation correctly, the rest of the process is fully
+automated for you.
+
+When you launch InvokeAI, you can go to `http://localhost:9090/docs` and see
+your new Invocation show up there with all the relevant info.
+
+![resize invocation](../assets/contributing/resize_invocation.png)
+
+When you launch the frontend UI, you can go to the Node Editor tab and find your
+new Invocation ready to be used.
+
+![resize node editor](../assets/contributing/resize_node_editor.png)
+
+# Advanced
+
+## Custom Input Fields
+
+Now that you know how to create your own Invocations, let us dive into slightly
+more advanced topics.
+
+While creating your own Invocations, you might run into a scenario where the
+existing input types in InvokeAI do not meet your requirements. In such cases,
+you can create your own input types.
+
+Let us create one as an example. Let us say we want to create a color input
+field that represents a color code. But before we start on that here are some
+general good practices to keep in mind.
+
+**Good Practices**
+
+- There is no naming convention for input fields but we highly recommend that
+  you name it something appropriate like `ColorField`.
+- It is not mandatory but it is heavily recommended to add a relevant
+  `docstring` to describe your input field.
+- Keep your field in the same file as the Invocation that it is made for or in
+  another file where it is relevant.
+
+All input types a class that derive from the `BaseModel` type from `pydantic`.
+So let's create one.
+
+```python
+from pydantic import BaseModel
+
+class ColorField(BaseModel):
+    '''A field that holds the rgba values of a color'''
+    pass
+```
+
+Perfect. Now let us create our custom inputs for our field. This is exactly
+similar how you created input fields for your Invocation. All the same rules
+apply. Let us create four fields representing the _red(r)_, _blue(b)_,
+_green(g)_ and _alpha(a)_ channel of the color.
+
+```python
+class ColorField(BaseModel):
+    '''A field that holds the rgba values of a color'''
+    r: int = Field(ge=0, le=255, description="The red channel")
+    g: int = Field(ge=0, le=255, description="The green channel")
+    b: int = Field(ge=0, le=255, description="The blue channel")
+    a: int = Field(ge=0, le=255, description="The alpha channel")
+```
+
+That's it. We now have a new input field type that we can use in our Invocations
+like this.
+
+```python
+color: ColorField = Field(default=ColorField(r=0, g=0, b=0, a=0), description='Background color of an image')
+```
+
+**Extra Config**
+
+All input fields also take an additional `Config` class that you can use to do
+various advanced things like setting required parameters and etc.
+
+Let us do that for our _ColorField_ and enforce all the values because we did
+not define any defaults for our fields.
+
+```python
+class ColorField(BaseModel):
+    '''A field that holds the rgba values of a color'''
+    r: int = Field(ge=0, le=255, description="The red channel")
+    g: int = Field(ge=0, le=255, description="The green channel")
+    b: int = Field(ge=0, le=255, description="The blue channel")
+    a: int = Field(ge=0, le=255, description="The alpha channel")
+
+    class Config:
+        schema_extra = {"required": ["r", "g", "b", "a"]}
+```
+
+Now it becomes mandatory for the user to supply all the values required by our
+input field.
+
+We will discuss the `Config` class in extra detail later in this guide and how
+you can use it to make your Invocations more robust.
+
+## Custom Output Types
+
+Like with custom inputs, sometimes you might find yourself needing custom
+outputs that InvokeAI does not provide. We can easily set one up.
+
+Now that you are familiar with Invocations and Inputs, let us use that knowledge
+to put together a custom output type for an Invocation that returns _width_,
+_height_ and _background_color_ that we need to create a blank image.
+
+- A custom output type is a class that derives from the parent class of
+  `BaseInvocationOutput`.
+- It is not mandatory but we recommend using names ending with `Output` for
+  output types. So we'll call our class `BlankImageOutput`
+- It is not mandatory but we highly recommend adding a `docstring` to describe
+  what your output type is for.
+- Like Invocations, each output type should have a `type` variable that is
+  **unique**
+
+Now that we know the basic rules for creating a new output type, let us go ahead
+and make it.
+
+```python
+from typing import Literal
+from pydantic import Field
+
+from .baseinvocation import BaseInvocationOutput
+
+class BlankImageOutput(BaseInvocationOutput):
+    '''Base output type for creating a blank image'''
+    type: Literal['blank_image_output'] = 'blank_image_output'
+
+    # Inputs
+    width: int = Field(description='Width of blank image')
+    height: int = Field(description='Height of blank image')
+    bg_color: ColorField = Field(description='Background color of blank image')
+
+    class Config:
+        schema_extra = {"required": ["type", "width", "height", "bg_color"]}
+```
+
+All set. We now have an output type that requires what we need to create a
+blank_image. And if you noticed it, we even used the `Config` class to ensure
+the fields are required.
+
+## Custom Configuration
+
+As you might have noticed when making inputs and outputs, we used a class called
+`Config` from _pydantic_ to further customize them. Because our inputs and
+outputs essentially inherit from _pydantic_'s `BaseModel` class, all
+[configuration options](https://docs.pydantic.dev/latest/usage/schema/#schema-customization)
+that are valid for _pydantic_ classes are also valid for our inputs and outputs.
+You can do the same for your Invocations too but InvokeAI makes our life a
+little bit easier on that end.
+
+InvokeAI provides a custom configuration class called `InvocationConfig`
+particularly for configuring Invocations. This is exactly the same as the raw
+`Config` class from _pydantic_ with some extra stuff on top to help faciliate
+parsing of the scheme in the frontend UI.
+
+At the current moment, tihs `InvocationConfig` class is further improved with
+the following features related the `ui`.
+
+| Config Option | Field Type                                                                                                    | Example                                                                                                               |
+| ------------- | ------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------- |
+| type_hints    | `Dict[str, Literal["integer", "float", "boolean", "string", "enum", "image", "latents", "model", "control"]]` | `type_hint: "model"` provides type hints related to the model like displaying a list of available models              |
+| tags          | `List[str]`                                                                                                   | `tags: ['resize', 'image']` will classify your invocation under the tags of resize and image.                         |
+| title         | `str`                                                                                                         | `title: 'Resize Image` will rename your to this custom title rather than infer from the name of the Invocation class. |
+
+So let us update your `ResizeInvocation` with some extra configuration and see
+how that works.
+
+```python
+from typing import Literal, Union
+from pydantic import Field
+
+from .baseinvocation import BaseInvocation, InvocationContext, InvocationConfig
+from ..models.image import ImageField, ResourceOrigin, ImageCategory
+from .image import ImageOutput
+
+class ResizeInvocation(BaseInvocation):
+    '''Resizes an image'''
+    type: Literal['resize'] = 'resize'
+
+    # Inputs
+    image: Union[ImageField, None] = Field(description="The input image", default=None)
+    width: int = Field(default=512, ge=64, le=2048, description="Width of the new image")
+    height: int = Field(default=512, ge=64, le=2048, description="Height of the new image")
+
+    class Config(InvocationConfig):
+        schema_extra: {
+            ui: {
+                tags: ['resize', 'image'],
+                title: ['My Custom Resize']
+            }
+        }
+
+    def invoke(self, context: InvocationContext) -> ImageOutput:
+        # Load the image using InvokeAI's predefined Image Service.
+        image = context.services.images.get_pil_image(self.image.image_origin, self.image.image_name)
+
+        # Resizing the image
+        # Because we used the above service, we already have a PIL image. So we can simply resize.
+        resized_image = image.resize((self.width, self.height))
+
+        # Preparing the image for output using InvokeAI's predefined Image Service.
+        output_image = context.services.images.create(
+            image=resized_image,
+            image_origin=ResourceOrigin.INTERNAL,
+            image_category=ImageCategory.GENERAL,
+            node_id=self.id,
+            session_id=context.graph_execution_state_id,
+            is_intermediate=self.is_intermediate,
+        )
+
+        # Returning the Image
+        return ImageOutput(
+            image=ImageField(
+                image_name=output_image.image_name,
+                image_origin=output_image.image_origin,
+            ),
+            width=output_image.width,
+            height=output_image.height,
+        )
+```
+
+We now customized our code to let the frontend know that our Invocation falls
+under `resize` and `image` categories. So when the user searches for these
+particular words, our Invocation will show up too.
+
+We also set a custom title for our Invocation. So instead of being called
+`Resize`, it will be called `My Custom Resize`.
+
+As simple as that.
+
+As time goes by, InvokeAI will further improve and add more customizability for
+Invocation configuration. We will have more documentation regarding this at a
+later time.
+
+# **[TODO]**
+
+## Custom Components For Frontend
+
+Every backend input type should have a corresponding frontend component so the
+UI knows what to render when you use a particular field type.
+
+If you are using existing field types, we already have components for those. So
+you don't have to worry about creating anything new. But this might not always
+be the case. Sometimes you might want to create new field types and have the
+frontend UI deal with it in a different way.
+
+This is where we venture into the world of React and Javascript and create our
+own new components for our Invocations. Do not fear the world of JS. It's
+actually pretty straightforward.
+
+Let us create a new component for our custom color field we created above. When
+we use a color field, let us say we want the UI to display a color picker for
+the user to pick from rather than entering values. That is what we will build
+now.
+
+---
+
+# OLD -- TO BE DELETED OR MOVED LATER
+
+---

 ## Creating a new invocation