Merge branch 'development' of github.com:lstein/stable-diffusion into development

This commit is contained in:
Lincoln Stein 2022-09-17 13:29:21 -04:00
commit d8685ad66b
2 changed files with 122 additions and 140 deletions

130
README.md
View File

@ -1,21 +1,32 @@
<h1 align='center'><b>Stable Diffusion Dream Script</b></h1> <div align="center">
<p align='center'> # Stable Diffusion Dream Script
<img src="docs/assets/logo.png"/>
</p>
<p align="center"> ![project logo](docs/assets/logo.png)
<a href="https://github.com/lstein/stable-diffusion/releases"><img src="https://flat.badgen.net/github/release/lstein/stable-diffusion/development?icon=github" alt="release"/></a>
<a href="https://github.com/lstein/stable-diffusion/stargazers"><img src="https://flat.badgen.net/github/stars/lstein/stable-diffusion?icon=github" alt="stars"/></a> [![latest release badge]][latest release link] [![github stars badge]][github stars link] [![github forks badge]][github forks link]
<a href="https://useful-forks.github.io/?repo=lstein%2Fstable-diffusion"><img src="https://flat.badgen.net/github/forks/lstein/stable-diffusion?icon=github" alt="forks"/></a>
<br /> [![CI checks on main badge]][CI checks on main link] [![CI checks on dev badge]][CI checks on dev link] [![latest commit to dev badge]][latest commit to dev link]
<a href="https://github.com/lstein/stable-diffusion/actions/workflows/test-dream-conda.yml"><img src="https://flat.badgen.net/github/checks/lstein/stable-diffusion/main?label=CI%20status%20on%20main&cache=900&icon=github" alt="CI status on main"/></a>
<a href="https://github.com/lstein/stable-diffusion/actions/workflows/test-dream-conda.yml"><img src="https://flat.badgen.net/github/checks/lstein/stable-diffusion/development?label=CI%20status%20on%20dev&cache=900&icon=github" alt="CI status on dev"/></a> [![github open issues badge]][github open issues link] [![github open prs badge]][github open prs link]
<a href="https://github.com/lstein/stable-diffusion/commits/development"><img src="https://flat.badgen.net/github/last-commit/lstein/stable-diffusion/development?icon=github&color=yellow&label=last%20dev%20commit&cache=900" alt="last-dev-commit"/></a>
<br /> [CI checks on dev badge]: https://flat.badgen.net/github/checks/lstein/stable-diffusion/development?label=CI%20status%20on%20dev&cache=900&icon=github
<a href="https://github.com/lstein/stable-diffusion/issues?q=is%3Aissue+is%3Aopen"><img src="https://flat.badgen.net/github/open-issues/lstein/stable-diffusion?icon=github" alt="open-issues"/></a> [CI checks on dev link]: https://github.com/lstein/stable-diffusion/actions/workflows/test-dream-conda.yml
<a href="https://github.com/lstein/stable-diffusion/pulls?q=is%3Apr+is%3Aopen"><img src="https://flat.badgen.net/github/open-prs/lstein/stable-diffusion?icon=github" alt="open-prs"/></a> [CI checks on main badge]: https://flat.badgen.net/github/checks/lstein/stable-diffusion/main?label=CI%20status%20on%20main&cache=900&icon=github
</p> [CI checks on main link]: https://github.com/lstein/stable-diffusion/actions/workflows/test-dream-conda.yml
[github forks badge]: https://flat.badgen.net/github/forks/lstein/stable-diffusion?icon=github
[github forks link]: https://useful-forks.github.io/?repo=lstein%2Fstable-diffusion
[github open issues badge]: https://flat.badgen.net/github/open-issues/lstein/stable-diffusion?icon=github
[github open issues link]: https://github.com/lstein/stable-diffusion/issues?q=is%3Aissue+is%3Aopen
[github open prs badge]: https://flat.badgen.net/github/open-prs/lstein/stable-diffusion?icon=github
[github open prs link]: https://github.com/lstein/stable-diffusion/pulls?q=is%3Apr+is%3Aopen
[github stars badge]: https://flat.badgen.net/github/stars/lstein/stable-diffusion?icon=github
[github stars link]: https://github.com/lstein/stable-diffusion/stargazers
[latest commit to dev badge]: https://flat.badgen.net/github/last-commit/lstein/stable-diffusion/development?icon=github&color=yellow&label=last%20dev%20commit&cache=900
[latest commit to dev link]: https://github.com/lstein/stable-diffusion/commits/development
[latest release badge]: https://flat.badgen.net/github/release/lstein/stable-diffusion/development?icon=github
[latest release link]: https://github.com/lstein/stable-diffusion/releases
</div>
This is a fork of [CompVis/stable-diffusion](https://github.com/CompVis/stable-diffusion), the open This is a fork of [CompVis/stable-diffusion](https://github.com/CompVis/stable-diffusion), the open
source text-to-image generator. It provides a streamlined process with various new features and source text-to-image generator. It provides a streamlined process with various new features and
@ -26,7 +37,7 @@ _Note: This fork is rapidly evolving. Please use the
[Issues](https://github.com/lstein/stable-diffusion/issues) tab to report bugs and make feature [Issues](https://github.com/lstein/stable-diffusion/issues) tab to report bugs and make feature
requests. Be sure to use the provided templates. They will help aid diagnose issues faster._ requests. Be sure to use the provided templates. They will help aid diagnose issues faster._
**Table of Contents** ## Table of Contents
1. [Installation](#installation) 1. [Installation](#installation)
2. [Hardware Requirements](#hardware-requirements) 2. [Hardware Requirements](#hardware-requirements)
@ -38,38 +49,38 @@ requests. Be sure to use the provided templates. They will help aid diagnose iss
8. [Support](#support) 8. [Support](#support)
9. [Further Reading](#further-reading) 9. [Further Reading](#further-reading)
## Installation ### Installation
This fork is supported across multiple platforms. You can find individual installation instructions This fork is supported across multiple platforms. You can find individual installation instructions
below. below.
- ### [Linux](docs/installation/INSTALL_LINUX.md) - #### [Linux](docs/installation/INSTALL_LINUX.md)
- ### [Windows](docs/installation/INSTALL_WINDOWS.md) - #### [Windows](docs/installation/INSTALL_WINDOWS.md)
- ### [Macintosh](docs/installation/INSTALL_MAC.md) - #### [Macintosh](docs/installation/INSTALL_MAC.md)
## Hardware Requirements ### Hardware Requirements
**System** #### System
You wil need one of the following: You wil need one of the following:
- An NVIDIA-based graphics card with 4 GB or more VRAM memory. - An NVIDIA-based graphics card with 4 GB or more VRAM memory.
- An Apple computer with an M1 chip. - An Apple computer with an M1 chip.
**Memory** #### Memory
- At least 12 GB Main Memory RAM. - At least 12 GB Main Memory RAM.
**Disk** #### Disk
- At least 6 GB of free disk space for the machine learning model, Python, and all its dependencies. - At least 6 GB of free disk space for the machine learning model, Python, and all its dependencies.
**Note** > Note
>
If you are have a Nvidia 10xx series card (e.g. the 1080ti), please run the dream script in > If you have an Nvidia 10xx series card (e.g. the 1080ti), please run the dream script in
full-precision mode as shown below. > full-precision mode as shown below.
Similarly, specify full-precision mode on Apple M1 hardware. Similarly, specify full-precision mode on Apple M1 hardware.
@ -79,43 +90,30 @@ To run in full-precision mode, start `dream.py` with the `--full_precision` flag
(ldm) ~/stable-diffusion$ python scripts/dream.py --full_precision (ldm) ~/stable-diffusion$ python scripts/dream.py --full_precision
``` ```
## Features ### Features
### Major Features #### Major Features
- #### [Interactive Command Line Interface](docs/features/CLI.md) - [Interactive Command Line Interface](docs/features/CLI.md)
- [Image To Image](docs/features/IMG2IMG.md)
- [Inpainting Support](docs/features/INPAINTING.md)
- [GFPGAN and Real-ESRGAN Support](docs/features/UPSCALE.md)
- [Seamless Tiling](docs/features/OTHER.md#seamless-tiling)
- [Google Colab](docs/features/OTHER.md#google-colab)
- [Web Server](docs/features/WEB.md)
- [Reading Prompts From File](docs/features/OTHER.md#reading-prompts-from-a-file)
- [Shortcut: Reusing Seeds](docs/features/OTHER.md#shortcuts-reusing-seeds)
- [Weighted Prompts](docs/features/OTHER.md#weighted-prompts)
- [Variations](docs/features/VARIATIONS.md)
- [Personalizing Text-to-Image Generation](docs/features/TEXTUAL_INVERSION.md)
- [Simplified API for text to image generation](docs/features/OTHER.md#simplified-api)
- #### [Image To Image](docs/features/IMG2IMG.md) #### Other Features
- #### [Inpainting Support](docs/features/INPAINTING.md) - [Creating Transparent Regions for Inpainting](docs/features/INPAINTING.md#creating-transparent-regions-for-inpainting)
- [Preload Models](docs/features/OTHER.md#preload-models)
- #### [GFPGAN and Real-ESRGAN Support](docs/features/UPSCALE.md) ### Latest Changes
- #### [Seamless Tiling](docs/features/OTHER.md#seamless-tiling)
- #### [Google Colab](docs/features/OTHER.md#google-colab)
- #### [Web Server](docs/features/WEB.md)
- #### [Reading Prompts From File](docs/features/OTHER.md#reading-prompts-from-a-file)
- #### [Shortcut: Reusing Seeds](docs/features/OTHER.md#shortcuts-reusing-seeds)
- #### [Weighted Prompts](docs/features/OTHER.md#weighted-prompts)
- #### [Variations](docs/features/VARIATIONS.md)
- #### [Personalizing Text-to-Image Generation](docs/features/TEXTUAL_INVERSION.md)
- #### [Simplified API for text to image generation](docs/features/OTHER.md#simplified-api)
### Other Features
- #### [Creating Transparent Regions for Inpainting](docs/features/INPAINTING.md#creating-transparent-regions-for-inpainting)
- #### [Preload Models](docs/features/OTHER.md#preload-models)
## Latest Changes
- v1.14 (11 September 2022) - v1.14 (11 September 2022)
@ -147,12 +145,12 @@ To run in full-precision mode, start `dream.py` with the `--full_precision` flag
For older changelogs, please visit the **[CHANGELOG](docs/features/CHANGELOG.md)**. For older changelogs, please visit the **[CHANGELOG](docs/features/CHANGELOG.md)**.
## Troubleshooting ### Troubleshooting
Please check out our **[Q&A](docs/help/TROUBLESHOOT.md)** to get solutions for common installation Please check out our **[Q&A](docs/help/TROUBLESHOOT.md)** to get solutions for common installation
problems and other issues. problems and other issues.
## Contributing ### Contributing
Anyone who wishes to contribute to this project, whether documentation, features, bug fixes, code Anyone who wishes to contribute to this project, whether documentation, features, bug fixes, code
cleanup, testing, or code reviews, is very much encouraged to do so. If you are unfamiliar with how cleanup, testing, or code reviews, is very much encouraged to do so. If you are unfamiliar with how
@ -164,13 +162,13 @@ important thing is to **make your pull request against the "development" branch*
"main". This will help keep public breakage to a minimum and will allow you to propose more radical "main". This will help keep public breakage to a minimum and will allow you to propose more radical
changes. changes.
## Contributors ### Contributors
This fork is a combined effort of various people from across the world. This fork is a combined effort of various people from across the world.
[Check out the list of all these amazing people](docs/other/CONTRIBUTORS.md). We thank them for [Check out the list of all these amazing people](docs/other/CONTRIBUTORS.md). We thank them for
their time, hard work and effort. their time, hard work and effort.
## Support ### Support
For support, please use this repository's GitHub Issues tracking service. Feel free to send me an For support, please use this repository's GitHub Issues tracking service. Feel free to send me an
email if you use and like the script. email if you use and like the script.
@ -178,7 +176,7 @@ email if you use and like the script.
Original portions of the software are Copyright (c) 2020 Original portions of the software are Copyright (c) 2020
[Lincoln D. Stein](https://github.com/lstein) [Lincoln D. Stein](https://github.com/lstein)
## Further Reading ### Further Reading
Please see the original README for more information on this software and underlying algorithm, Please see the original README for more information on this software and underlying algorithm,
located in the file [README-CompViz.md](docs/other/README-CompViz.md). located in the file [README-CompViz.md](docs/other/README-CompViz.md).

View File

@ -168,100 +168,84 @@ class CrossAttention(nn.Module):
nn.Dropout(dropout) nn.Dropout(dropout)
) )
if torch.cuda.is_available(): self.mem_total_gb = psutil.virtual_memory().total // (1 << 30)
self.einsum_op = self.einsum_op_cuda
else:
self.mem_total = psutil.virtual_memory().total / (1024**3)
self.einsum_op = self.einsum_op_mps_v1 if self.mem_total >= 32 else self.einsum_op_mps_v2
def einsum_op_compvis(self, q, k, v, r1): def einsum_op_compvis(self, q, k, v):
s1 = einsum('b i d, b j d -> b i j', q, k) * self.scale # faster s = einsum('b i d, b j d -> b i j', q, k)
s2 = s1.softmax(dim=-1, dtype=q.dtype) s = s.softmax(dim=-1, dtype=s.dtype)
del s1 return einsum('b i j, b j d -> b i d', s, v)
r1 = einsum('b i j, b j d -> b i d', s2, v)
del s2
return r1
def einsum_op_mps_v1(self, q, k, v, r1): def einsum_op_slice_0(self, q, k, v, slice_size):
if q.shape[1] <= 4096: # (512x512) max q.shape[1]: 4096 r = torch.zeros(q.shape[0], q.shape[1], v.shape[2], device=q.device, dtype=q.dtype)
r1 = self.einsum_op_compvis(q, k, v, r1) for i in range(0, q.shape[0], slice_size):
else: end = i + slice_size
slice_size = math.floor(2**30 / (q.shape[0] * q.shape[1])) r[i:end] = self.einsum_op_compvis(q[i:end], k[i:end], v[i:end])
return r
def einsum_op_slice_1(self, q, k, v, slice_size):
r = torch.zeros(q.shape[0], q.shape[1], v.shape[2], device=q.device, dtype=q.dtype)
for i in range(0, q.shape[1], slice_size): for i in range(0, q.shape[1], slice_size):
end = i + slice_size end = i + slice_size
s1 = einsum('b i d, b j d -> b i j', q[:, i:end], k) * self.scale r[:, i:end] = self.einsum_op_compvis(q[:, i:end], k, v)
s2 = s1.softmax(dim=-1, dtype=r1.dtype) return r
del s1
r1[:, i:end] = einsum('b i j, b j d -> b i d', s2, v)
del s2
return r1
def einsum_op_mps_v2(self, q, k, v, r1): def einsum_op_mps_v1(self, q, k, v):
if self.mem_total >= 8 and q.shape[1] <= 4096: if q.shape[1] <= 4096: # (512x512) max q.shape[1]: 4096
r1 = self.einsum_op_compvis(q, k, v, r1) return self.einsum_op_compvis(q, k, v)
else: else:
slice_size = 1 slice_size = math.floor(2**30 / (q.shape[0] * q.shape[1]))
for i in range(0, q.shape[0], slice_size): return self.einsum_op_slice_1(q, k, v, slice_size)
end = min(q.shape[0], i + slice_size)
s1 = einsum('b i d, b j d -> b i j', q[i:end], k[i:end])
s1 *= self.scale
s2 = s1.softmax(dim=-1, dtype=r1.dtype)
del s1
r1[i:end] = einsum('b i j, b j d -> b i d', s2, v[i:end])
del s2
return r1
def einsum_op_cuda(self, q, k, v, r1): def einsum_op_mps_v2(self, q, k, v):
if self.mem_total_gb > 8 and q.shape[1] <= 4096:
return self.einsum_op_compvis(q, k, v)
else:
return self.einsum_op_slice_0(q, k, v, 1)
def einsum_op_tensor_mem(self, q, k, v, max_tensor_mb):
size_mb = q.shape[0] * q.shape[1] * k.shape[1] * q.element_size() // (1 << 20)
if size_mb <= max_tensor_mb:
return self.einsum_op_compvis(q, k, v)
div = 1 << int((size_mb - 1) / max_tensor_mb).bit_length()
if div <= q.shape[0]:
return self.einsum_op_slice_0(q, k, v, q.shape[0] // div)
return self.einsum_op_slice_1(q, k, v, max(q.shape[1] // div, 1))
def einsum_op_cuda(self, q, k, v):
stats = torch.cuda.memory_stats(q.device) stats = torch.cuda.memory_stats(q.device)
mem_active = stats['active_bytes.all.current'] mem_active = stats['active_bytes.all.current']
mem_reserved = stats['reserved_bytes.all.current'] mem_reserved = stats['reserved_bytes.all.current']
mem_free_cuda, _ = torch.cuda.mem_get_info(torch.cuda.current_device()) mem_free_cuda, _ = torch.cuda.mem_get_info(q.device)
mem_free_torch = mem_reserved - mem_active mem_free_torch = mem_reserved - mem_active
mem_free_total = mem_free_cuda + mem_free_torch mem_free_total = mem_free_cuda + mem_free_torch
# Divide factor of safety as there's copying and fragmentation
return self.einsum_op_tensor_mem(q, k, v, mem_free_total / 3.3 / (1 << 20))
gb = 1024 ** 3 def einsum_op(self, q, k, v):
tensor_size = q.shape[0] * q.shape[1] * k.shape[1] * 4 if q.device.type == 'cuda':
mem_required = tensor_size * 2.5 return self.einsum_op_cuda(q, k, v)
steps = 1
if mem_required > mem_free_total: if q.device.type == 'mps':
steps = 2**(math.ceil(math.log(mem_required / mem_free_total, 2))) if self.mem_total_gb >= 32:
return self.einsum_op_mps_v1(q, k, v)
return self.einsum_op_mps_v2(q, k, v)
if steps > 64: # Smaller slices are faster due to L2/L3/SLC caches.
max_res = math.floor(math.sqrt(math.sqrt(mem_free_total / 2.5)) / 8) * 64 # Tested on i7 with 8MB L3 cache.
raise RuntimeError(f'Not enough memory, use lower resolution (max approx. {max_res}x{max_res}). ' return self.einsum_op_tensor_mem(q, k, v, 32)
f'Need: {mem_required/64/gb:0.1f}GB free, Have:{mem_free_total/gb:0.1f}GB free')
slice_size = q.shape[1] // steps if (q.shape[1] % steps) == 0 else q.shape[1]
for i in range(0, q.shape[1], slice_size):
end = min(q.shape[1], i + slice_size)
s1 = einsum('b i d, b j d -> b i j', q[:, i:end], k) * self.scale
s2 = s1.softmax(dim=-1, dtype=r1.dtype)
del s1
r1[:, i:end] = einsum('b i j, b j d -> b i d', s2, v)
del s2
return r1
def forward(self, x, context=None, mask=None): def forward(self, x, context=None, mask=None):
h = self.heads h = self.heads
q_in = self.to_q(x) q = self.to_q(x)
context = default(context, x) context = default(context, x)
k_in = self.to_k(context) k = self.to_k(context) * self.scale
v_in = self.to_v(context) v = self.to_v(context)
device_type = 'mps' if x.device.type == 'mps' else 'cuda'
del context, x del context, x
q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> (b h) n d', h=h), (q_in, k_in, v_in)) q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> (b h) n d', h=h), (q, k, v))
del q_in, k_in, v_in r = self.einsum_op(q, k, v)
r1 = torch.zeros(q.shape[0], q.shape[1], v.shape[2], device=q.device, dtype=q.dtype) return self.to_out(rearrange(r, '(b h) n d -> b n (h d)', h=h))
r1 = self.einsum_op(q, k, v, r1)
del q, k, v
r2 = rearrange(r1, '(b h) n d -> b n (h d)', h=h)
del r1
return self.to_out(r2)
class BasicTransformerBlock(nn.Module): class BasicTransformerBlock(nn.Module):