Nodes-FaceTools (FaceIdentifier, FaceOff, FaceMask) (#4576)

* node-FaceTools * Added more documentation for facetools * invert FaceMask masking - FaceMask had face protected and surroundings change by default (face white, else black) - Change to how FaceOff/others work: the opposite where surroundings protected, face changes by default (face black, else white) * reflect changed facemask behaviour in docs * add FaceOff+FaceMask workflows - Add FaceOff and FaceMask example workflows to docs/workflows * add FaceMask+FaceOff workflows to exampleworkflows.md - used invokeai URL paths mimicking other workflow URLs, hopefully they translate when/if merged * inheriting, typehints, black/isort/flake8 - modified FaceMask and FaceOff output classes to inherit base image, height, width from ImageOutput - Added type annotations to helper functions, required some reworking of code's stored data * remove credit header - Was in my personal/repo copy, don't think it's necessary if merged. * Optionals & image declaration duplication - Added Optional[] to optional outputs and types - removed duplication of image = context.services.images.get_pil_images(self.image.image_name) declaration - Still need to find a way to deal with mask_pil None typing errors * face(facetools): fix typing issues, add validation, clean up structure * feat(facetools): update field descriptions * Update FaceOff_FaceScale2x.json - update FaceOff workflow after Bounded Image field removed in place of inheriting Image out field from ImageOutput * feat(facetools): pass through original image on facemask if invalid face ids requested * feat(facetools): tidy variable names & fn calls * feat(facetools): bundle inter font, draw ids with it Inter is a SIL Open Font license. The license is included and is fully permissive. Inter is the same font the UI and commercial application already uses. Only the "regular" version is bundled. * chore(facetools): isort & fix mypy issues * docs(facetools): update and format docs --------- Co-authored-by: Millun Atluri <millun.atluri@gmail.com> Co-authored-by: Millun Atluri <Millu@users.noreply.github.com> Co-authored-by: psychedelicious <4822129+psychedelicious@users.noreply.github.com>
2024-08-30 20:32:17 +00:00 · 2023-09-29 09:54:13 +02:00
parent 5f4eb0c3b3
commit 95fd2ee6ff
10 changed files with 3380 additions and 18 deletions
--- a/docs/contributing/contribution_guides/documentation.md
+++ b/docs/contributing/contribution_guides/documentation.md
@ -10,4 +10,4 @@ When updating or creating documentation, please keep in mind InvokeAI is a tool

 ## Help & Questions

-Please ping @imic1 or @hipsterusername in the [Discord](https://discord.com/channels/1020123559063990373/1049495067846524939) if you have any questions.
+Please ping @imic or @hipsterusername in the [Discord](https://discord.com/channels/1020123559063990373/1049495067846524939) if you have any questions.
--- a/docs/nodes/communityNodes.md
+++ b/docs/nodes/communityNodes.md
@ -10,18 +10,6 @@ To use a community workflow, download the the `.json` node graph file and load i

 --------------------------------

-### FaceTools
-
-**Description:** FaceTools is a collection of nodes created to manipulate faces as you would in Unified Canvas. It includes FaceMask, FaceOff, and FacePlace. FaceMask autodetects a face in the image using MediaPipe and creates a mask from it. FaceOff similarly detects a face, then takes the face off of the image by adding a square bounding box around it and cropping/scaling it. FacePlace puts the bounded face image from FaceOff back onto the original image. Using these nodes with other inpainting node(s), you can put new faces on existing things, put new things around existing faces, and work closer with a face as a bounded image. Additionally, you can supply X and Y offset values to scale/change the shape of the mask for finer control on FaceMask and FaceOff. See GitHub repository below for usage examples.
-
-**Node Link:** https://github.com/ymgenesis/FaceTools/
-
-**FaceMask Output Examples** 
-
-![5cc8abce-53b0-487a-b891-3bf94dcc8960](https://github.com/invoke-ai/InvokeAI/assets/25252829/43f36d24-1429-4ab1-bd06-a4bedfe0955e)
-![b920b710-1882-49a0-8d02-82dff2cca907](https://github.com/invoke-ai/InvokeAI/assets/25252829/7660c1ed-bf7d-4d0a-947f-1fc1679557ba)
-![71a91805-fda5-481c-b380-264665703133](https://github.com/invoke-ai/InvokeAI/assets/25252829/f8f6a2ee-2b68-4482-87da-b90221d5c3e2)
-
 --------------------------------
 ### Ideal Size

--- a/docs/nodes/defaultNodes.md
+++ b/docs/nodes/defaultNodes.md
@ -1,6 +1,6 @@
 # List of Default Nodes

-The table below contains a list of the default nodes shipped with InvokeAI and their descriptions. 
+The table below contains a list of the default nodes shipped with InvokeAI and their descriptions.

 | Node <img width=160 align="right"> | Function                                                                              |
 |: ---------------------------------- | :--------------------------------------------------------------------------------------|
@ -17,11 +17,12 @@ The table below contains a list of the default nodes shipped with InvokeAI and t
 |Conditioning Primitive 			| A conditioning tensor primitive value|
 |Content Shuffle Processor 			| Applies content shuffle processing to image|
 |ControlNet 			| Collects ControlNet info to pass to other nodes|
-|OpenCV Inpaint 			| Simple inpaint using opencv.|
 |Denoise Latents 			| Denoises noisy latents to decodable images|
 |Divide Integers 			| Divides two numbers|
 |Dynamic Prompt 			| Parses a prompt using adieyal/dynamicprompts' random or combinatorial generator|
-|Upscale (RealESRGAN) 			| Upscales an image using RealESRGAN.|
+|[FaceMask](./detailedNodes/faceTools.md#facemask) | Generates masks for faces in an image to use with Inpainting|
+|[FaceIdentifier](./detailedNodes/faceTools.md#faceidentifier)             | Identifies and labels faces in an image|
+|[FaceOff](./detailedNodes/faceTools.md#faceoff)             | Creates a new image that is a scaled bounding box with a mask on the face for Inpainting|
 |Float Math             | Perform basic math operations on two floats|
 |Float Primitive Collection 			| A collection of float primitive values|
 |Float Primitive 			| A float primitive value|
@ -76,6 +77,7 @@ The table below contains a list of the default nodes shipped with InvokeAI and t
 |ONNX Prompt (Raw) 			| A node to process inputs and produce outputs. May use dependency injection in __init__ to receive providers.|
 |ONNX Text to Latents 			| Generates latents from conditionings.|
 |ONNX Model Loader 			| Loads a main model, outputting its submodels.|
+|OpenCV Inpaint 			| Simple inpaint using opencv.|
 |Openpose Processor 			| Applies Openpose processing to image|
 |PIDI Processor 			| Applies PIDI processing to image|
 |Prompts from File 			| Loads prompts from a text file|
@ -97,5 +99,6 @@ The table below contains a list of the default nodes shipped with InvokeAI and t
 |String Primitive 			| A string primitive value|
 |Subtract Integers 			| Subtracts two numbers|
 |Tile Resample Processor 			| Tile resampler processor|
+|Upscale (RealESRGAN) 			| Upscales an image using RealESRGAN.|
 |VAE Loader 			| Loads a VAE model, outputting a VaeLoaderOutput|
 |Zoe (Depth) Processor 			| Applies Zoe depth processing to image|
--- a/docs/nodes/detailedNodes/faceTools.md
+++ b/docs/nodes/detailedNodes/faceTools.md
@ -0,0 +1,154 @@
+# Face Nodes
+
+## FaceOff
+
+FaceOff mimics a user finding a face in an image and resizing the bounding box
+around the head in Canvas.
+
+Enter a face ID (found with FaceIdentifier) to choose which face to mask.
+
+Just as you would add more context inside the bounding box by making it larger
+in Canvas, the node gives you a padding input (in pixels) which will
+simultaneously add more context, and increase the resolution of the bounding box
+so the face remains the same size inside it.
+
+The "Minimum Confidence" input defaults to 0.5 (50%), and represents a pass/fail
+threshold a detected face must reach for it to be processed. Lowering this value
+may help if detection is failing. If the detected masks are imperfect and stray
+too far outside/inside of faces, the node gives you X & Y offsets to shrink/grow
+the masks by a multiplier.
+
+FaceOff will output the face in a bounded image, taking the face off of the
+original image for input into any node that accepts image inputs. The node also
+outputs a face mask with the dimensions of the bounded image. The X & Y outputs
+are for connecting to the X & Y inputs of the Paste Image node, which will place
+the bounded image back on the original image using these coordinates.
+
+###### Inputs/Outputs
+
+| Input              | Description                                                                                                                                                                              |
+| ------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Image              | Image for face detection                                                                                                                                                                 |
+| Face ID            | The face ID to process, numbered from 0. Multiple faces not supported. Find a face's ID with FaceIdentifier node.                                                                        |
+| Minimum Confidence | Minimum confidence for face detection (lower if detection is failing)                                                                                                                    |
+| X Offset           | X-axis offset of the mask                                                                                                                                                                |
+| Y Offset           | Y-axis offset of the mask                                                                                                                                                                |
+| Padding            | All-axis padding around the mask in pixels                                                                                                                                               |
+| Chunk              | Chunk (or divide) the image into sections to greatly improve face detection success. Defaults to off, but will activate if no faces are detected normally. Activate to chunk by default. |
+
+| Output        | Description                                      |
+| ------------- | ------------------------------------------------ |
+| Bounded Image | Original image bound, cropped, and resized       |
+| Width         | The width of the bounded image in pixels         |
+| Height        | The height of the bounded image in pixels        |
+| Mask          | The output mask                                  |
+| X             | The x coordinate of the bounding box's left side |
+| Y             | The y coordinate of the bounding box's top side  |
+
+## FaceMask
+
+FaceMask mimics a user drawing masks on faces in an image in Canvas.
+
+The "Face IDs" input allows the user to select specific faces to be masked.
+Leave empty to detect and mask all faces, or a comma-separated list for a
+specific combination of faces (ex: `1,2,4`). A single integer will detect and
+mask that specific face. Find face IDs with the FaceIdentifier node.
+
+The "Minimum Confidence" input defaults to 0.5 (50%), and represents a pass/fail
+threshold a detected face must reach for it to be processed. Lowering this value
+may help if detection is failing.
+
+If the detected masks are imperfect and stray too far outside/inside of faces,
+the node gives you X & Y offsets to shrink/grow the masks by a multiplier. All
+masks shrink/grow together by the X & Y offset values.
+
+By default, masks are created to change faces. When masks are inverted, they
+change surrounding areas, protecting faces.
+
+###### Inputs/Outputs
+
+| Input              | Description                                                                                                                                                                              |
+| ------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Image              | Image for face detection                                                                                                                                                                 |
+| Face IDs           | Comma-separated list of face ids to mask eg '0,2,7'. Numbered from 0. Leave empty to mask all. Find face IDs with FaceIdentifier node.                                                   |
+| Minimum Confidence | Minimum confidence for face detection (lower if detection is failing)                                                                                                                    |
+| X Offset           | X-axis offset of the mask                                                                                                                                                                |
+| Y Offset           | Y-axis offset of the mask                                                                                                                                                                |
+| Chunk              | Chunk (or divide) the image into sections to greatly improve face detection success. Defaults to off, but will activate if no faces are detected normally. Activate to chunk by default. |
+| Invert Mask        | Toggle to invert the face mask                                                                                                                                                           |
+
+| Output | Description                       |
+| ------ | --------------------------------- |
+| Image  | The original image                |
+| Width  | The width of the image in pixels  |
+| Height | The height of the image in pixels |
+| Mask   | The output face mask              |
+
+## FaceIdentifier
+
+FaceIdentifier outputs an image with detected face IDs printed in white numbers
+onto each face.
+
+Face IDs can then be used in FaceMask and FaceOff to selectively mask all, a
+specific combination, or single faces.
+
+The FaceIdentifier output image is generated for user reference, and isn't meant
+to be passed on to other image-processing nodes.
+
+The "Minimum Confidence" input defaults to 0.5 (50%), and represents a pass/fail
+threshold a detected face must reach for it to be processed. Lowering this value
+may help if detection is failing. If an image is changed in the slightest, run
+it through FaceIdentifier again to get updated FaceIDs.
+
+###### Inputs/Outputs
+
+| Input              | Description                                                                                                                                                                              |
+| ------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Image              | Image for face detection                                                                                                                                                                 |
+| Minimum Confidence | Minimum confidence for face detection (lower if detection is failing)                                                                                                                    |
+| Chunk              | Chunk (or divide) the image into sections to greatly improve face detection success. Defaults to off, but will activate if no faces are detected normally. Activate to chunk by default. |
+
+| Output | Description                                                                                      |
+| ------ | ------------------------------------------------------------------------------------------------ |
+| Image  | The original image with small face ID numbers printed in white onto each face for user reference |
+| Width  | The width of the original image in pixels                                                        |
+| Height | The height of the original image in pixels                                                       |
+
+## Tips
+
+- If not all target faces are being detected, activate Chunk to bypass full
+  image face detection and greatly improve detection success.
+- Final results will vary between full-image detection and chunking for faces
+  that are detectable by both due to the nature of the process. Try either to
+  your taste.
+- Be sure Minimum Confidence is set the same when using FaceIdentifier with
+  FaceOff/FaceMask.
+- For FaceOff, use the color correction node before faceplace to correct edges
+  being noticeable in the final image (see example screenshot).
+- Non-inpainting models may struggle to paint/generate correctly around faces.
+- If your face won't change the way you want it to no matter what you change,
+  consider that the change you're trying to make is too much at that resolution.
+  For example, if an image is only 512x768 total, the face might only be 128x128
+  or 256x256, much smaller than the 512x512 your SD1.5 model was probably
+  trained on. Try increasing the resolution of the image by upscaling or
+  resizing, add padding to increase the bounding box's resolution, or use an
+  image where the face takes up more pixels.
+- If the resulting face seems out of place pasted back on the original image
+  (ie. too large, not proportional), add more padding on the FaceOff node to
+  give inpainting more context. Context and good prompting are important to
+  keeping things proportional.
+- If you find the mask is too big/small and going too far outside/inside the
+  area you want to affect, adjust the x & y offsets to shrink/grow the mask area
+- Use a higher denoise start value to resemble aspects of the original face or
+  surroundings. Denoise start = 0 & denoise end = 1 will make something new,
+  while denoise start = 0.50 & denoise end = 1 will be 50% old and 50% new.
+- mediapipe isn't good at detecting faces with lots of face paint, hair covering
+  the face, etc. Anything that obstructs the face will likely result in no faces
+  being detected.
+- If you find your face isn't being detected, try lowering the minimum
+  confidence value from 0.5. This could result in false positives, however
+  (random areas being detected as faces and masked).
+- After altering an image and wanting to process a different face in the newly
+  altered image, run the altered image through FaceIdentifier again to see the
+  new Face IDs. MediaPipe will most likely detect faces in a different order
+  after an image has been changed in the slightest.
--- a/docs/nodes/exampleWorkflows.md
+++ b/docs/nodes/exampleWorkflows.md
@ -9,5 +9,6 @@ If you're interested in finding more workflows, checkout the [#share-your-workfl
 * [SD1.5 / SD2 Text to Image](https://github.com/invoke-ai/InvokeAI/blob/main/docs/workflows/Text_to_Image.json)
 * [SDXL Text to Image](https://github.com/invoke-ai/InvokeAI/blob/main/docs/workflows/SDXL_Text_to_Image.json)
 * [SDXL (with Refiner) Text to Image](https://github.com/invoke-ai/InvokeAI/blob/main/docs/workflows/SDXL_Text_to_Image.json) 
-* [Tiled Upscaling with ControlNet](https://github.com/invoke-ai/InvokeAI/blob/main/docs/workflows/ESRGAN_img2img_upscale w_Canny_ControlNet.json)ß
-
+* [Tiled Upscaling with ControlNet](https://github.com/invoke-ai/InvokeAI/blob/main/docs/workflows/ESRGAN_img2img_upscale w_Canny_ControlNet.json)
+* [FaceMask](https://github.com/invoke-ai/InvokeAI/blob/main/docs/workflows/FaceMask.json)
+* [FaceOff with 2x Face Scaling](https://github.com/invoke-ai/InvokeAI/blob/main/docs/workflows/FaceOff_FaceScale2x.json)
--- a/docs/workflows/FaceMask.json
+++ b/docs/workflows/FaceMask.json
--- a/docs/workflows/FaceOff_FaceScale2x.json
+++ b/docs/workflows/FaceOff_FaceScale2x.json
--- a/invokeai/app/invocations/facetools.py
+++ b/invokeai/app/invocations/facetools.py
@ -0,0 +1,686 @@
+import math
+import re
+from typing import Optional, TypedDict
+
+import cv2
+import numpy as np
+from mediapipe.python.solutions.face_mesh import FaceMesh  # type: ignore[import]
+from PIL import Image, ImageDraw, ImageFilter, ImageFont, ImageOps
+from PIL.Image import Image as ImageType
+from pydantic import validator
+
+from invokeai.app.invocations.baseinvocation import (
+    BaseInvocation,
+    InputField,
+    InvocationContext,
+    OutputField,
+    invocation,
+    invocation_output,
+)
+from invokeai.app.invocations.primitives import ImageField, ImageOutput
+from invokeai.app.models.image import ImageCategory, ResourceOrigin
+
+
+@invocation_output("face_mask_output")
+class FaceMaskOutput(ImageOutput):
+    """Base class for FaceMask output"""
+
+    mask: ImageField = OutputField(description="The output mask")
+
+
+@invocation_output("face_off_output")
+class FaceOffOutput(ImageOutput):
+    """Base class for FaceOff Output"""
+
+    mask: ImageField = OutputField(description="The output mask")
+    x: int = OutputField(description="The x coordinate of the bounding box's left side")
+    y: int = OutputField(description="The y coordinate of the bounding box's top side")
+
+
+class FaceResultData(TypedDict):
+    image: ImageType
+    mask: ImageType
+    x_center: float
+    y_center: float
+    mesh_width: int
+    mesh_height: int
+
+
+class FaceResultDataWithId(FaceResultData):
+    face_id: int
+
+
+class ExtractFaceData(TypedDict):
+    bounded_image: ImageType
+    bounded_mask: ImageType
+    x_min: int
+    y_min: int
+    x_max: int
+    y_max: int
+
+
+class FaceMaskResult(TypedDict):
+    image: ImageType
+    mask: ImageType
+
+
+def create_white_image(w: int, h: int) -> ImageType:
+    return Image.new("L", (w, h), color=255)
+
+
+def create_black_image(w: int, h: int) -> ImageType:
+    return Image.new("L", (w, h), color=0)
+
+
+FONT_SIZE = 32
+FONT_STROKE_WIDTH = 4
+
+font = ImageFont.truetype("invokeai/assets/fonts/inter/Inter-Regular.ttf", FONT_SIZE)
+
+
+def prepare_faces_list(
+    face_result_list: list[FaceResultData],
+) -> list[FaceResultDataWithId]:
+    """Deduplicates a list of faces, adding IDs to them."""
+    deduped_faces: list[FaceResultData] = []
+
+    if len(face_result_list) == 0:
+        return list()
+
+    for candidate in face_result_list:
+        should_add = True
+        candidate_x_center = candidate["x_center"]
+        candidate_y_center = candidate["y_center"]
+        for face in deduped_faces:
+            face_center_x = face["x_center"]
+            face_center_y = face["y_center"]
+            face_radius_w = face["mesh_width"] / 2
+            face_radius_h = face["mesh_height"] / 2
+            # Determine if the center of the candidate_face is inside the ellipse of the added face
+            # p < 1 -> Inside
+            # p = 1 -> Exactly on the ellipse
+            # p > 1 -> Outside
+            p = (math.pow((candidate_x_center - face_center_x), 2) / math.pow(face_radius_w, 2)) + (
+                math.pow((candidate_y_center - face_center_y), 2) / math.pow(face_radius_h, 2)
+            )
+
+            if p < 1:  # Inside of the already-added face's radius
+                should_add = False
+                break
+
+        if should_add is True:
+            deduped_faces.append(candidate)
+
+    sorted_faces = sorted(deduped_faces, key=lambda x: x["y_center"])
+    sorted_faces = sorted(sorted_faces, key=lambda x: x["x_center"])
+
+    # add face_id for reference
+    sorted_faces_with_ids: list[FaceResultDataWithId] = []
+    face_id_counter = 0
+    for face in sorted_faces:
+        sorted_faces_with_ids.append(
+            FaceResultDataWithId(
+                **face,
+                face_id=face_id_counter,
+            )
+        )
+        face_id_counter += 1
+
+    return sorted_faces_with_ids
+
+
+def generate_face_box_mask(
+    context: InvocationContext,
+    minimum_confidence: float,
+    x_offset: float,
+    y_offset: float,
+    pil_image: ImageType,
+    chunk_x_offset: int = 0,
+    chunk_y_offset: int = 0,
+    draw_mesh: bool = True,
+) -> list[FaceResultData]:
+    result = []
+    mask_pil = None
+
+    # Convert the PIL image to a NumPy array.
+    np_image = np.array(pil_image, dtype=np.uint8)
+
+    # Check if the input image has four channels (RGBA).
+    if np_image.shape[2] == 4:
+        # Convert RGBA to RGB by removing the alpha channel.
+        np_image = np_image[:, :, :3]
+
+    # Create a FaceMesh object for face landmark detection and mesh generation.
+    face_mesh = FaceMesh(
+        max_num_faces=999,
+        min_detection_confidence=minimum_confidence,
+        min_tracking_confidence=minimum_confidence,
+    )
+
+    # Detect the face landmarks and mesh in the input image.
+    results = face_mesh.process(np_image)
+
+    # Check if any face is detected.
+    if results.multi_face_landmarks:  # type: ignore # this are via protobuf and not typed
+        # Search for the face_id in the detected faces.
+        for face_id, face_landmarks in enumerate(results.multi_face_landmarks):  # type: ignore #this are via protobuf and not typed
+            # Get the bounding box of the face mesh.
+            x_coordinates = [landmark.x for landmark in face_landmarks.landmark]
+            y_coordinates = [landmark.y for landmark in face_landmarks.landmark]
+            x_min, x_max = min(x_coordinates), max(x_coordinates)
+            y_min, y_max = min(y_coordinates), max(y_coordinates)
+
+            # Calculate the width and height of the face mesh.
+            mesh_width = int((x_max - x_min) * np_image.shape[1])
+            mesh_height = int((y_max - y_min) * np_image.shape[0])
+
+            # Get the center of the face.
+            x_center = np.mean([landmark.x * np_image.shape[1] for landmark in face_landmarks.landmark])
+            y_center = np.mean([landmark.y * np_image.shape[0] for landmark in face_landmarks.landmark])
+
+            face_landmark_points = np.array(
+                [
+                    [landmark.x * np_image.shape[1], landmark.y * np_image.shape[0]]
+                    for landmark in face_landmarks.landmark
+                ]
+            )
+
+            # Apply the scaling offsets to the face landmark points with a multiplier.
+            scale_multiplier = 0.2
+            x_center = np.mean(face_landmark_points[:, 0])
+            y_center = np.mean(face_landmark_points[:, 1])
+
+            if draw_mesh:
+                x_scaled = face_landmark_points[:, 0] + scale_multiplier * x_offset * (
+                    face_landmark_points[:, 0] - x_center
+                )
+                y_scaled = face_landmark_points[:, 1] + scale_multiplier * y_offset * (
+                    face_landmark_points[:, 1] - y_center
+                )
+
+                convex_hull = cv2.convexHull(np.column_stack((x_scaled, y_scaled)).astype(np.int32))
+
+                # Generate a binary face mask using the face mesh.
+                mask_image = np.ones(np_image.shape[:2], dtype=np.uint8) * 255
+                cv2.fillConvexPoly(mask_image, convex_hull, 0)
+
+                # Convert the binary mask image to a PIL Image.
+                init_mask_pil = Image.fromarray(mask_image, mode="L")
+                w, h = init_mask_pil.size
+                mask_pil = create_white_image(w + chunk_x_offset, h + chunk_y_offset)
+                mask_pil.paste(init_mask_pil, (chunk_x_offset, chunk_y_offset))
+
+            left_side = x_center - mesh_width
+            right_side = x_center + mesh_width
+            top_side = y_center - mesh_height
+            bottom_side = y_center + mesh_height
+            im_width, im_height = pil_image.size
+            over_w = im_width * 0.1
+            over_h = im_height * 0.1
+            if (
+                (left_side >= -over_w)
+                and (right_side < im_width + over_w)
+                and (top_side >= -over_h)
+                and (bottom_side < im_height + over_h)
+            ):
+                x_center = float(x_center)
+                y_center = float(y_center)
+                face = FaceResultData(
+                    image=pil_image,
+                    mask=mask_pil or create_white_image(*pil_image.size),
+                    x_center=x_center + chunk_x_offset,
+                    y_center=y_center + chunk_y_offset,
+                    mesh_width=mesh_width,
+                    mesh_height=mesh_height,
+                )
+
+                result.append(face)
+            else:
+                context.services.logger.info("FaceTools --> Face out of bounds, ignoring.")
+
+    return result
+
+
+def extract_face(
+    context: InvocationContext,
+    image: ImageType,
+    face: FaceResultData,
+    padding: int,
+) -> ExtractFaceData:
+    mask = face["mask"]
+    center_x = face["x_center"]
+    center_y = face["y_center"]
+    mesh_width = face["mesh_width"]
+    mesh_height = face["mesh_height"]
+
+    # Determine the minimum size of the square crop
+    min_size = min(mask.width, mask.height)
+
+    # Calculate the crop boundaries for the output image and mask.
+    mesh_width += 128 + padding  # add pixels to account for mask variance
+    mesh_height += 128 + padding  # add pixels to account for mask variance
+    crop_size = min(
+        max(mesh_width, mesh_height, 128), min_size
+    )  # Choose the smaller of the two (given value or face mask size)
+    if crop_size > 128:
+        crop_size = (crop_size + 7) // 8 * 8  # Ensure crop side is multiple of 8
+
+    # Calculate the actual crop boundaries within the bounds of the original image.
+    x_min = int(center_x - crop_size / 2)
+    y_min = int(center_y - crop_size / 2)
+    x_max = int(center_x + crop_size / 2)
+    y_max = int(center_y + crop_size / 2)
+
+    # Adjust the crop boundaries to stay within the original image's dimensions
+    if x_min < 0:
+        context.services.logger.warning("FaceTools --> -X-axis padding reached image edge.")
+        x_max -= x_min
+        x_min = 0
+    elif x_max > mask.width:
+        context.services.logger.warning("FaceTools --> +X-axis padding reached image edge.")
+        x_min -= x_max - mask.width
+        x_max = mask.width
+
+    if y_min < 0:
+        context.services.logger.warning("FaceTools --> +Y-axis padding reached image edge.")
+        y_max -= y_min
+        y_min = 0
+    elif y_max > mask.height:
+        context.services.logger.warning("FaceTools --> -Y-axis padding reached image edge.")
+        y_min -= y_max - mask.height
+        y_max = mask.height
+
+    # Ensure the crop is square and adjust the boundaries if needed
+    if x_max - x_min != crop_size:
+        context.services.logger.warning("FaceTools --> Limiting x-axis padding to constrain bounding box to a square.")
+        diff = crop_size - (x_max - x_min)
+        x_min -= diff // 2
+        x_max += diff - diff // 2
+
+    if y_max - y_min != crop_size:
+        context.services.logger.warning("FaceTools --> Limiting y-axis padding to constrain bounding box to a square.")
+        diff = crop_size - (y_max - y_min)
+        y_min -= diff // 2
+        y_max += diff - diff // 2
+
+    context.services.logger.info(f"FaceTools --> Calculated bounding box (8 multiple): {crop_size}")
+
+    # Crop the output image to the specified size with the center of the face mesh as the center.
+    mask = mask.crop((x_min, y_min, x_max, y_max))
+    bounded_image = image.crop((x_min, y_min, x_max, y_max))
+
+    # blur mask edge by small radius
+    mask = mask.filter(ImageFilter.GaussianBlur(radius=2))
+
+    return ExtractFaceData(
+        bounded_image=bounded_image,
+        bounded_mask=mask,
+        x_min=x_min,
+        y_min=y_min,
+        x_max=x_max,
+        y_max=y_max,
+    )
+
+
+def get_faces_list(
+    context: InvocationContext,
+    image: ImageType,
+    should_chunk: bool,
+    minimum_confidence: float,
+    x_offset: float,
+    y_offset: float,
+    draw_mesh: bool = True,
+) -> list[FaceResultDataWithId]:
+    result = []
+
+    # Generate the face box mask and get the center of the face.
+    if not should_chunk:
+        context.services.logger.info("FaceTools --> Attempting full image face detection.")
+        result = generate_face_box_mask(
+            context=context,
+            minimum_confidence=minimum_confidence,
+            x_offset=x_offset,
+            y_offset=y_offset,
+            pil_image=image,
+            chunk_x_offset=0,
+            chunk_y_offset=0,
+            draw_mesh=draw_mesh,
+        )
+    if should_chunk or len(result) == 0:
+        context.services.logger.info("FaceTools --> Chunking image (chunk toggled on, or no face found in full image).")
+        width, height = image.size
+        image_chunks = []
+        x_offsets = []
+        y_offsets = []
+        result = []
+
+        # If width == height, there's nothing more we can do... otherwise...
+        if width > height:
+            # Landscape - slice the image horizontally
+            fx = 0.0
+            steps = int(width * 2 / height)
+            while fx <= (width - height):
+                x = int(fx)
+                image_chunks.append(image.crop((x, 0, x + height - 1, height - 1)))
+                x_offsets.append(x)
+                y_offsets.append(0)
+                fx += (width - height) / steps
+                context.services.logger.info(f"FaceTools --> Chunk starting at x = {x}")
+        elif height > width:
+            # Portrait - slice the image vertically
+            fy = 0.0
+            steps = int(height * 2 / width)
+            while fy <= (height - width):
+                y = int(fy)
+                image_chunks.append(image.crop((0, y, width - 1, y + width - 1)))
+                x_offsets.append(0)
+                y_offsets.append(y)
+                fy += (height - width) / steps
+                context.services.logger.info(f"FaceTools --> Chunk starting at y = {y}")
+
+        for idx in range(len(image_chunks)):
+            context.services.logger.info(f"FaceTools --> Evaluating faces in chunk {idx}")
+            result = result + generate_face_box_mask(
+                context=context,
+                minimum_confidence=minimum_confidence,
+                x_offset=x_offset,
+                y_offset=y_offset,
+                pil_image=image_chunks[idx],
+                chunk_x_offset=x_offsets[idx],
+                chunk_y_offset=y_offsets[idx],
+                draw_mesh=draw_mesh,
+            )
+
+        if len(result) == 0:
+            # Give up
+            context.services.logger.warning(
+                "FaceTools --> No face detected in chunked input image. Passing through original image."
+            )
+
+    all_faces = prepare_faces_list(result)
+
+    return all_faces
+
+
+@invocation("face_off", title="FaceOff", tags=["image", "faceoff", "face", "mask"], category="image", version="1.0.0")
+class FaceOffInvocation(BaseInvocation):
+    """Bound, extract, and mask a face from an image using MediaPipe detection"""
+
+    image: ImageField = InputField(description="Image for face detection")
+    face_id: int = InputField(
+        default=0,
+        ge=0,
+        description="The face ID to process, numbered from 0. Multiple faces not supported. Find a face's ID with FaceIdentifier node.",
+    )
+    minimum_confidence: float = InputField(
+        default=0.5, description="Minimum confidence for face detection (lower if detection is failing)"
+    )
+    x_offset: float = InputField(default=0.0, description="X-axis offset of the mask")
+    y_offset: float = InputField(default=0.0, description="Y-axis offset of the mask")
+    padding: int = InputField(default=0, description="All-axis padding around the mask in pixels")
+    chunk: bool = InputField(
+        default=False,
+        description="Whether to bypass full image face detection and default to image chunking. Chunking will occur if no faces are found in the full image.",
+    )
+
+    def faceoff(self, context: InvocationContext, image: ImageType) -> Optional[ExtractFaceData]:
+        all_faces = get_faces_list(
+            context=context,
+            image=image,
+            should_chunk=self.chunk,
+            minimum_confidence=self.minimum_confidence,
+            x_offset=self.x_offset,
+            y_offset=self.y_offset,
+            draw_mesh=True,
+        )
+
+        if len(all_faces) == 0:
+            context.services.logger.warning("FaceOff --> No faces detected. Passing through original image.")
+            return None
+
+        if self.face_id > len(all_faces) - 1:
+            context.services.logger.warning(
+                f"FaceOff --> Face ID {self.face_id} is outside of the number of faces detected ({len(all_faces)}). Passing through original image."
+            )
+            return None
+
+        face_data = extract_face(context=context, image=image, face=all_faces[self.face_id], padding=self.padding)
+        # Convert the input image to RGBA mode to ensure it has an alpha channel.
+        face_data["bounded_image"] = face_data["bounded_image"].convert("RGBA")
+
+        return face_data
+
+    def invoke(self, context: InvocationContext) -> FaceOffOutput:
+        image = context.services.images.get_pil_image(self.image.image_name)
+        result = self.faceoff(context=context, image=image)
+
+        if result is None:
+            result_image = image
+            result_mask = create_white_image(*image.size)
+            x = 0
+            y = 0
+        else:
+            result_image = result["bounded_image"]
+            result_mask = result["bounded_mask"]
+            x = result["x_min"]
+            y = result["y_min"]
+
+        image_dto = context.services.images.create(
+            image=result_image,
+            image_origin=ResourceOrigin.INTERNAL,
+            image_category=ImageCategory.GENERAL,
+            node_id=self.id,
+            session_id=context.graph_execution_state_id,
+            is_intermediate=self.is_intermediate,
+            workflow=self.workflow,
+        )
+
+        mask_dto = context.services.images.create(
+            image=result_mask,
+            image_origin=ResourceOrigin.INTERNAL,
+            image_category=ImageCategory.MASK,
+            node_id=self.id,
+            session_id=context.graph_execution_state_id,
+            is_intermediate=self.is_intermediate,
+        )
+
+        output = FaceOffOutput(
+            image=ImageField(image_name=image_dto.image_name),
+            width=image_dto.width,
+            height=image_dto.height,
+            mask=ImageField(image_name=mask_dto.image_name),
+            x=x,
+            y=y,
+        )
+
+        return output
+
+
+@invocation("face_mask_detection", title="FaceMask", tags=["image", "face", "mask"], category="image", version="1.0.0")
+class FaceMaskInvocation(BaseInvocation):
+    """Face mask creation using mediapipe face detection"""
+
+    image: ImageField = InputField(description="Image to face detect")
+    face_ids: str = InputField(
+        default="",
+        description="Comma-separated list of face ids to mask eg '0,2,7'. Numbered from 0. Leave empty to mask all. Find face IDs with FaceIdentifier node.",
+    )
+    minimum_confidence: float = InputField(
+        default=0.5, description="Minimum confidence for face detection (lower if detection is failing)"
+    )
+    x_offset: float = InputField(default=0.0, description="Offset for the X-axis of the face mask")
+    y_offset: float = InputField(default=0.0, description="Offset for the Y-axis of the face mask")
+    chunk: bool = InputField(
+        default=False,
+        description="Whether to bypass full image face detection and default to image chunking. Chunking will occur if no faces are found in the full image.",
+    )
+    invert_mask: bool = InputField(default=False, description="Toggle to invert the mask")
+
+    @validator("face_ids")
+    def validate_comma_separated_ints(cls, v) -> str:
+        comma_separated_ints_regex = re.compile(r"^\d*(,\d+)*$")
+        if comma_separated_ints_regex.match(v) is None:
+            raise ValueError('Face IDs must be a comma-separated list of integers (e.g. "1,2,3")')
+        return v
+
+    def facemask(self, context: InvocationContext, image: ImageType) -> FaceMaskResult:
+        all_faces = get_faces_list(
+            context=context,
+            image=image,
+            should_chunk=self.chunk,
+            minimum_confidence=self.minimum_confidence,
+            x_offset=self.x_offset,
+            y_offset=self.y_offset,
+            draw_mesh=True,
+        )
+
+        mask_pil = create_white_image(*image.size)
+
+        id_range = list(range(0, len(all_faces)))
+        ids_to_extract = id_range
+        if self.face_ids != "":
+            parsed_face_ids = [int(id) for id in self.face_ids.split(",")]
+            # get requested face_ids that are in range
+            intersected_face_ids = set(parsed_face_ids) & set(id_range)
+
+            if len(intersected_face_ids) == 0:
+                id_range_str = ",".join([str(id) for id in id_range])
+                context.services.logger.warning(
+                    f"Face IDs must be in range of detected faces - requested {self.face_ids}, detected {id_range_str}. Passing through original image."
+                )
+                return FaceMaskResult(
+                    image=image,  # original image
+                    mask=mask_pil,  # white mask
+                )
+
+            ids_to_extract = list(intersected_face_ids)
+
+        for face_id in ids_to_extract:
+            face_data = extract_face(context=context, image=image, face=all_faces[face_id], padding=0)
+            face_mask_pil = face_data["bounded_mask"]
+            x_min = face_data["x_min"]
+            y_min = face_data["y_min"]
+            x_max = face_data["x_max"]
+            y_max = face_data["y_max"]
+
+            mask_pil.paste(
+                create_black_image(x_max - x_min, y_max - y_min),
+                box=(x_min, y_min),
+                mask=ImageOps.invert(face_mask_pil),
+            )
+
+        if self.invert_mask:
+            mask_pil = ImageOps.invert(mask_pil)
+
+        # Create an RGBA image with transparency
+        image = image.convert("RGBA")
+
+        return FaceMaskResult(
+            image=image,
+            mask=mask_pil,
+        )
+
+    def invoke(self, context: InvocationContext) -> FaceMaskOutput:
+        image = context.services.images.get_pil_image(self.image.image_name)
+        result = self.facemask(context=context, image=image)
+
+        image_dto = context.services.images.create(
+            image=result["image"],
+            image_origin=ResourceOrigin.INTERNAL,
+            image_category=ImageCategory.GENERAL,
+            node_id=self.id,
+            session_id=context.graph_execution_state_id,
+            is_intermediate=self.is_intermediate,
+            workflow=self.workflow,
+        )
+
+        mask_dto = context.services.images.create(
+            image=result["mask"],
+            image_origin=ResourceOrigin.INTERNAL,
+            image_category=ImageCategory.MASK,
+            node_id=self.id,
+            session_id=context.graph_execution_state_id,
+            is_intermediate=self.is_intermediate,
+        )
+
+        output = FaceMaskOutput(
+            image=ImageField(image_name=image_dto.image_name),
+            width=image_dto.width,
+            height=image_dto.height,
+            mask=ImageField(image_name=mask_dto.image_name),
+        )
+
+        return output
+
+
+@invocation(
+    "face_identifier", title="FaceIdentifier", tags=["image", "face", "identifier"], category="image", version="1.0.0"
+)
+class FaceIdentifierInvocation(BaseInvocation):
+    """Outputs an image with detected face IDs printed on each face. For use with other FaceTools."""
+
+    image: ImageField = InputField(description="Image to face detect")
+    minimum_confidence: float = InputField(
+        default=0.5, description="Minimum confidence for face detection (lower if detection is failing)"
+    )
+    chunk: bool = InputField(
+        default=False,
+        description="Whether to bypass full image face detection and default to image chunking. Chunking will occur if no faces are found in the full image.",
+    )
+
+    def faceidentifier(self, context: InvocationContext, image: ImageType) -> ImageType:
+        image = image.copy()
+
+        all_faces = get_faces_list(
+            context=context,
+            image=image,
+            should_chunk=self.chunk,
+            minimum_confidence=self.minimum_confidence,
+            x_offset=0,
+            y_offset=0,
+            draw_mesh=False,
+        )
+
+        # Paste face IDs on the output image
+        draw = ImageDraw.Draw(image)
+        for face in all_faces:
+            x_coord = face["x_center"]
+            y_coord = face["y_center"]
+            text = str(face["face_id"])
+            # get bbox of the text so we can center the id on the face
+            _, _, bbox_w, bbox_h = draw.textbbox(xy=(0, 0), text=text, font=font, stroke_width=FONT_STROKE_WIDTH)
+            x = x_coord - bbox_w / 2
+            y = y_coord - bbox_h / 2
+            draw.text(
+                xy=(x, y),
+                text=str(text),
+                fill=(255, 255, 255, 255),
+                font=font,
+                stroke_width=FONT_STROKE_WIDTH,
+                stroke_fill=(0, 0, 0, 255),
+            )
+
+        # Create an RGBA image with transparency
+        image = image.convert("RGBA")
+
+        return image
+
+    def invoke(self, context: InvocationContext) -> ImageOutput:
+        image = context.services.images.get_pil_image(self.image.image_name)
+        result_image = self.faceidentifier(context=context, image=image)
+
+        image_dto = context.services.images.create(
+            image=result_image,
+            image_origin=ResourceOrigin.INTERNAL,
+            image_category=ImageCategory.GENERAL,
+            node_id=self.id,
+            session_id=context.graph_execution_state_id,
+            is_intermediate=self.is_intermediate,
+            workflow=self.workflow,
+        )
+
+        return ImageOutput(
+            image=ImageField(image_name=image_dto.image_name),
+            width=image_dto.width,
+            height=image_dto.height,
+        )
--- a/invokeai/assets/fonts/inter/Inter-Regular.ttf
+++ b/invokeai/assets/fonts/inter/Inter-Regular.ttf
--- a/invokeai/assets/fonts/inter/LICENSE.txt
+++ b/invokeai/assets/fonts/inter/LICENSE.txt
@ -0,0 +1,94 @@
+Copyright (c) 2016-2020 The Inter Project Authors.
+"Inter" is trademark of Rasmus Andersson.
+https://github.com/rsms/inter
+
+This Font Software is licensed under the SIL Open Font License, Version 1.1.
+This license is copied below, and is also available with a FAQ at:
+http://scripts.sil.org/OFL
+
+-----------------------------------------------------------
+SIL OPEN FONT LICENSE Version 1.1 - 26 February 2007
+-----------------------------------------------------------
+
+PREAMBLE
+The goals of the Open Font License (OFL) are to stimulate worldwide
+development of collaborative font projects, to support the font creation
+efforts of academic and linguistic communities, and to provide a free and
+open framework in which fonts may be shared and improved in partnership
+with others.
+
+The OFL allows the licensed fonts to be used, studied, modified and
+redistributed freely as long as they are not sold by themselves. The
+fonts, including any derivative works, can be bundled, embedded,
+redistributed and/or sold with any software provided that any reserved
+names are not used by derivative works. The fonts and derivatives,
+however, cannot be released under any other type of license. The
+requirement for fonts to remain under this license does not apply
+to any document created using the fonts or their derivatives.
+
+DEFINITIONS
+"Font Software" refers to the set of files released by the Copyright
+Holder(s) under this license and clearly marked as such. This may
+include source files, build scripts and documentation.
+
+"Reserved Font Name" refers to any names specified as such after the
+copyright statement(s).
+
+"Original Version" refers to the collection of Font Software components as
+distributed by the Copyright Holder(s).
+
+"Modified Version" refers to any derivative made by adding to, deleting,
+or substituting -- in part or in whole -- any of the components of the
+Original Version, by changing formats or by porting the Font Software to a
+new environment.
+
+"Author" refers to any designer, engineer, programmer, technical
+writer or other person who contributed to the Font Software.
+
+PERMISSION AND CONDITIONS
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of the Font Software, to use, study, copy, merge, embed, modify,
+redistribute, and sell modified and unmodified copies of the Font
+Software, subject to the following conditions:
+
+1) Neither the Font Software nor any of its individual components,
+in Original or Modified Versions, may be sold by itself.
+
+2) Original or Modified Versions of the Font Software may be bundled,
+redistributed and/or sold with any software, provided that each copy
+contains the above copyright notice and this license. These can be
+included either as stand-alone text files, human-readable headers or
+in the appropriate machine-readable metadata fields within text or
+binary files as long as those fields can be easily viewed by the user.
+
+3) No Modified Version of the Font Software may use the Reserved Font
+Name(s) unless explicit written permission is granted by the corresponding
+Copyright Holder. This restriction only applies to the primary font name as
+presented to the users.
+
+4) The name(s) of the Copyright Holder(s) or the Author(s) of the Font
+Software shall not be used to promote, endorse or advertise any
+Modified Version, except to acknowledge the contribution(s) of the
+Copyright Holder(s) and the Author(s) or with their explicit written
+permission.
+
+5) The Font Software, modified or unmodified, in part or in whole,
+must be distributed entirely under this license, and must not be
+distributed under any other license. The requirement for fonts to
+remain under this license does not apply to any document created
+using the Font Software.
+
+TERMINATION
+This license becomes null and void if any of the above conditions are
+not met.
+
+DISCLAIMER
+THE FONT SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT
+OF COPYRIGHT, PATENT, TRADEMARK, OR OTHER RIGHT. IN NO EVENT SHALL THE
+COPYRIGHT HOLDER BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+INCLUDING ANY GENERAL, SPECIAL, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL
+DAMAGES, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+FROM, OUT OF THE USE OR INABILITY TO USE THE FONT SOFTWARE OR FROM
+OTHER DEALINGS IN THE FONT SOFTWARE.