Update Transformers to 4.35 and fix pad_to_multiple_of (#4817)

## What type of PR is this? (check all applicable) - [ ] Refactor - [ ] Feature - [X] Bug Fix - [X] Optimization - [ ] Documentation Update - [ ] Community Node Submission ## Have you discussed this change with the InvokeAI team? - [X] Yes, with @blessedcoolant - [ ] No, because: ## Have you updated all relevant documentation? - [ ] Yes - [ ] No ## Description This PR updates Transformers to the most recent version and fixes the value `pad_to_multiple_of` for `text_encoder.resize_token_embeddings` which was introduced with https://github.com/huggingface/transformers/pull/25088 in Transformers 4.32.0. According to the [Nvidia Documentation](https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc), `Performance is better when equivalent matrix dimensions M, N, and K are aligned to multiples of 8 bytes (or 64 bytes on A100) for FP16` This fixes the following error that was popping up before every invocation starting with Transformers 4.32.0 `You are resizing the embedding layer without providing a pad_to_multiple_of parameter. This means that the new embedding dimension will be None. This might induce some performance reduction as Tensor Cores will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc` This is my first "real" fix PR, so I hope this is fine. Please inform me if there is anything wrong with this. I am glad to help. Have a nice day and thank you! ## Related Tickets & Documents  - Related Issue: https://github.com/huggingface/transformers/issues/26303 - Related Discord discussion: https://discord.com/channels/1020123559063990373/1154152783579197571 - Closes # ## QA Instructions, Screenshots, Recordings  ## Added/updated tests? - [ ] Yes - [ ] No : _please replace this line with details on why tests have not been included_ ## [optional] Are there any post deployment tasks we need to perform?
2024-08-30 20:32:17 +00:00 · 2023-11-11 10:38:33 +11:00
parent 8702a63197 6001d3d71d
commit d63a614b8b
2 changed files with 13 additions and 4 deletions
--- a/invokeai/backend/model_management/lora.py
+++ b/invokeai/backend/model_management/lora.py
@ -166,6 +166,15 @@ class ModelPatcher:
        init_tokens_count = None
        new_tokens_added = None

+        # TODO: This is required since Transformers 4.32 see
+        # https://github.com/huggingface/transformers/pull/25088
+        # More information by NVIDIA:
+        # https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc
+        # This value might need to be changed in the future and take the GPUs model into account as there seem
+        # to be ideal values for different GPUS. This value is temporary!
+        # For references to the current discussion please see https://github.com/invoke-ai/InvokeAI/pull/4817
+        pad_to_multiple_of = 8
+
        try:
            # HACK: The CLIPTokenizer API does not include a way to remove tokens after calling add_tokens(...). As a
            # workaround, we create a full copy of `tokenizer` so that its original behavior can be restored after
@ -175,7 +184,7 @@ class ModelPatcher:
            # but a pickle roundtrip was found to be much faster (1 sec vs. 0.05 secs).
            ti_tokenizer = pickle.loads(pickle.dumps(tokenizer))
            ti_manager = TextualInversionManager(ti_tokenizer)
-            init_tokens_count = text_encoder.resize_token_embeddings(None).num_embeddings
+            init_tokens_count = text_encoder.resize_token_embeddings(None, pad_to_multiple_of).num_embeddings

            def _get_trigger(ti_name, index):
                trigger = ti_name
@ -190,7 +199,7 @@ class ModelPatcher:
                    new_tokens_added += ti_tokenizer.add_tokens(_get_trigger(ti_name, i))

            # modify text_encoder
-            text_encoder.resize_token_embeddings(init_tokens_count + new_tokens_added)
+            text_encoder.resize_token_embeddings(init_tokens_count + new_tokens_added, pad_to_multiple_of)
            model_embeddings = text_encoder.get_input_embeddings()

            for ti_name, ti in ti_list:
@ -222,7 +231,7 @@ class ModelPatcher:

        finally:
            if init_tokens_count and new_tokens_added:
-                text_encoder.resize_token_embeddings(init_tokens_count)
+                text_encoder.resize_token_embeddings(init_tokens_count, pad_to_multiple_of)

    @classmethod
    @contextmanager
--- a/pyproject.toml
+++ b/pyproject.toml
@ -82,7 +82,7 @@ dependencies = [
  "torchvision~=0.16",
  "torchmetrics~=0.11.0",
  "torchsde~=0.2.5",
-  "transformers~=4.31.0",
+  "transformers~=4.35.0",
  "uvicorn[standard]~=0.21.1",
  "windows-curses; sys_platform=='win32'",
 ]