* attention maps saving to /tmp
* tidy up diffusers branch backporting of cross attention refactoring
* base64-encoding the attention maps image for generationResult
* cleanup/refactor conditioning.py
* attention maps and tokens being sent to web UI
* attention maps: restrict count to actual token count and improve robustness
* add argument type hint to image_to_dataURL function
Co-authored-by: psychedelicious <4822129+psychedelicious@users.noreply.github.com>
Co-authored-by: damian <git@damianstewart.com>
Co-authored-by: psychedelicious <4822129+psychedelicious@users.noreply.github.com>
In the event where no `init_mask` is given and `invert_mask` is set to True, the script will raise the following error:
```bash
AttributeError: 'NoneType' object has no attribute 'mode'
```
The new implementation will only run inversion when both variables are valid.
prompt token sequences begin with a "beginning-of-sequence" marker <bos> and end with a repeated "end-of-sequence" marker <eos> - to make a default prompt length of <bos> + 75 prompt tokens + <eos>. the .swap() code was failing to take the column for <bos> at index 0 into account. the changes here do that, and also add extra handling for a single <eos> (which may be redundant but which is included for completeness).
based on my understanding and some assumptions about how this all works, the reason .swap() nevertheless seemed to do the right thing, to some extent, is because over multiple steps the conditioning process in Stable Diffusion operates as a feedback loop. a change to token n-1 has flow-on effects to how the [1x4x64x64] latent tensor is modified by all the tokens after it, - and as the next step is processed, all the tokens before it as well. intuitively, a token's conditioning effects "echo" throughout the whole length of the prompt. so even though the token at n-1 was being edited when what the user actually wanted was to edit the token at n, it nevertheless still had some non-negligible effect, in roughly the right direction, often enough that it seemed like it was working properly.
prompt token sequences begin with a "beginning-of-sequence" marker <bos> and end with a repeated "end-of-sequence" marker <eos> - to make a default prompt length of <bos> + 75 prompt tokens + <eos>. the .swap() code was failing to take the column for <bos> at index 0 into account. the changes here do that, and also add extra handling for a single <eos> (which may be redundant but which is included for completeness).
based on my understanding and some assumptions about how this all works, the reason .swap() nevertheless seemed to do the right thing, to some extent, is because over multiple steps the conditioning process in Stable Diffusion operates as a feedback loop. a change to token n-1 has flow-on effects to how the [1x4x64x64] latent tensor is modified by all the tokens after it, - and as the next step is processed, all the tokens before it as well. intuitively, a token's conditioning effects "echo" throughout the whole length of the prompt. so even though the token at n-1 was being edited when what the user actually wanted was to edit the token at n, it nevertheless still had some non-negligible effect, in roughly the right direction, often enough that it seemed like it was working properly.
Some users have been complaining that the CLI "freezes" for a while
before the invoke> prompt appears. I believe this is due to internet
delay while the concepts library names are downloaded by the autocompleter.
I have changed logic so that the concepts are downloaded the first time
the user types a < and tabs.
- make the warnings about patchmatch less redundant
- only warn about being unable to load concepts from Hugging Face
library once
- do not crash when unable to load concepts from Hugging Face
due to network connectivity issues