InvokeAI/docs/features/CONTROLNET.md
2023-07-19 12:16:03 -04:00

6.9 KiB

title
ControlNet

:material-loupe: ControlNet

ControlNet

ControlNet

ControlNet is a powerful set of features developed by the open-source community (notably, Stanford researcher @ilyasviel) that allows you to apply a secondary neural network model to your image generation process in Invoke.

With ControlNet, you can get more control over the output of your image generation, providing you with a way to direct the network towards generating images that better fit your desired style or outcome.

How it works

ControlNet works by analyzing an input image, pre-processing that image to identify relevant information that can be interpreted by each specific ControlNet model, and then inserting that control information into the generation process. This can be used to adjust the style, composition, or other aspects of the image to better achieve a specific result.

Models

InvokeAI provides access to a series of ControlNet models that provide different effects or styles in your generated images. Currently InvokeAI only supports "diffuser" style ControlNet models. These are directories that contain the files config.json and either diffusion_pytorch_model.safetensors or diffusion_pytorch_model.fp16.safetensors. The name of the directory is the same as the name of the model. We do not support checkpoint-format ControlNets, which come in a single file with the extension .safetensors.

Diffuser-style ControlNet models are available at HuggingFace (http://huggingface.co) and accessed via their repo IDs (identifiers in the format "author/modelname"). The easiest way to install them is to use the InvokeAI model installer application. If you use the invoke.sh/invoke.bat launcher, select item [5] and navigate to the CONTROLNETS section. From the command line, the model installer can be started with invokeai-model-install.

You can select as many ControlNet models from the checkbox list as you like, or enter additional HuggingFace repo_ids in the "Additional models" textbox:

Model Installer - Controlnetl{:width="640px"}

Be aware that some ControlNet models require additional code functionality in order to work properly, so just installing a third-party ControlNet model may not have the desired effect. Please read and follow the documentation for installing a third party model not currently included among InvokeAI's default list.

The models currently supported include:

Canny:

When the Canny model is used in ControlNet, Invoke will attempt to generate images that match the edges detected.

Canny edge detection works by detecting the edges in an image by looking for abrupt changes in intensity. It is known for its ability to detect edges accurately while reducing noise and false edges, and the preprocessor can identify more information by decreasing the thresholds.

M-LSD:

M-LSD is another edge detection algorithm used in ControlNet. It stands for Multi-Scale Line Segment Detector.

It detects straight line segments in an image by analyzing the local structure of the image at multiple scales. It can be useful for architectural imagery, or anything where straight-line structural information is needed for the resulting output.

Lineart:

The Lineart model in ControlNet generates line drawings from an input image. The resulting pre-processed image is a simplified version of the original, with only the outlines of objects visible.The Lineart model in ControlNet is known for its ability to accurately capture the contours of the objects in an input sketch.

Lineart Anime:

A variant of the Lineart model that generates line drawings with a distinct style inspired by anime and manga art styles.

Depth: A model that generates depth maps of images, allowing you to create more realistic 3D models or to simulate depth effects in post-processing.

Normal Map (BAE): A model that generates normal maps from input images, allowing for more realistic lighting effects in 3D rendering.

Image Segmentation: A model that divides input images into segments or regions, each of which corresponds to a different object or part of the image. (More details coming soon)

Openpose: The OpenPose control model allows for the identification of the general pose of a character by pre-processing an existing image with a clear human structure. With advanced options, Openpose can also detect the face or hands in the image.

Mediapipe Face:

The MediaPipe Face identification processor is able to clearly identify facial features in order to capture vivid expressions of human faces.

Tile (experimental):

The Tile model fills out details in the image to match the image, rather than the prompt. The Tile Model is a versatile tool that offers a range of functionalities. Its primary capabilities can be boiled down to two main behaviors:

  • It can reinterpret specific details within an image and create fresh, new elements.
  • It has the ability to disregard global instructions if there's a discrepancy between them and the local context or specific parts of the image. In such cases, it uses the local context to guide the process.

The Tile Model can be a powerful tool in your arsenal for enhancing image quality and details. If there are undesirable elements in your images, such as blurriness caused by resizing, this model can effectively eliminate these issues, resulting in cleaner, crisper images. Moreover, it can generate and add refined details to your images, improving their overall quality and appeal.

Pix2Pix (experimental)

With Pix2Pix, you can input an image into the controlnet, and then "instruct" the model to change it using your prompt. For example, you can say "Make it winter" to add more wintry elements to a scene.

Inpaint: Coming Soon - Currently this model is available but not functional on the Canvas. An upcoming release will provide additional capabilities for using this model when inpainting.

Each of these models can be adjusted and combined with other ControlNet models to achieve different results, giving you even more control over your image generation process.

Using ControlNet

To use ControlNet, you can simply select the desired model and adjust both the ControlNet and Pre-processor settings to achieve the desired result. You can also use multiple ControlNet models at the same time, allowing you to achieve even more complex effects or styles in your generated images.

Each ControlNet has two settings that are applied to the ControlNet.

Weight - Strength of the Controlnet model applied to the generation for the section, defined by start/end.

Start/End - 0 represents the start of the generation, 1 represents the end. The Start/end setting controls what steps during the generation process have the ControlNet applied.

Additionally, each ControlNet section can be expanded in order to manipulate settings for the image pre-processor that adjusts your uploaded image before using it in when you Invoke.