Commit Graph

20 Commits

Author SHA1 Message Date
Kyle Schouviller
8dc7f119e5 Fix performance issue introduced by torch cuda cache clear during generation 2022-11-10 23:01:32 -08:00
damian0815
178f0c78d8 Fix #1362 by improving VRAM usage patterns when doing .swap()
commit ef3f7a26e242b73c2beb0195c7fd8f654ef47f55
Author: damian0815 <null@damianstewart.com>
Date:   Tue Nov 8 12:18:37 2022 +0100

    remove log spam

commit 7189d649622d4668b120b0dd278388ad672142c4
Author: damian0815 <null@damianstewart.com>
Date:   Tue Nov 8 12:10:28 2022 +0100

    change the way saved slicing strategy is applied

commit 01c40f751ab72955140165c16f95ae411732265b
Author: damian0815 <null@damianstewart.com>
Date:   Tue Nov 8 12:04:43 2022 +0100

    fix slicing_strategy_getter callsite

commit f8cfe25150a346958903316bc710737d99839923
Author: damian0815 <null@damianstewart.com>
Date:   Tue Nov 8 11:56:22 2022 +0100

    cleanup, consistent dim=0 also tested

commit 5bf9b1e890d48e962afd4a668a219b68271e5dc1
Author: damian0815 <null@damianstewart.com>
Date:   Tue Nov 8 11:34:09 2022 +0100

    refactored context, tested with non-sliced cross attention control

commit d58a46e39bf562e7459290d2444256e8c08ad0b6
Author: damian0815 <null@damianstewart.com>
Date:   Sun Nov 6 00:41:52 2022 +0100

    cleanup

commit 7e2c658b4c06fe239311b65b9bb16fa3adec7fd7
Author: damian0815 <null@damianstewart.com>
Date:   Sat Nov 5 22:57:31 2022 +0100

    disable logs

commit 20ee89d93841b070738b3d8a4385c93b097d92eb
Author: damian0815 <null@damianstewart.com>
Date:   Sat Nov 5 22:36:58 2022 +0100

    slice saved attention if necessary

commit 0a7684a22c880ec0f48cc22bfed4526358f71546
Author: damian0815 <null@damianstewart.com>
Date:   Sat Nov 5 22:32:38 2022 +0100

    raise instead of asserting

commit 7083104c7f3a0d8fd96e94a2f391de50a3c942e4
Author: damian0815 <null@damianstewart.com>
Date:   Sat Nov 5 22:31:00 2022 +0100

    store dim when saving slices

commit f7c0808ed383ec1dc70645288a798ed2aa4fa85c
Author: damian0815 <null@damianstewart.com>
Date:   Sat Nov 5 22:27:16 2022 +0100

    don't retry on exception

commit 749a721e939b3fe7c1741e7998dab6bd2c85a0cb
Author: damian0815 <null@damianstewart.com>
Date:   Sat Nov 5 22:24:50 2022 +0100

    stuff

commit 032ab90e9533be8726301ec91b97137e2aadef9a
Author: damian0815 <null@damianstewart.com>
Date:   Sat Nov 5 22:20:17 2022 +0100

    more logging

commit 3dc34b387f033482305360e605809d95a40bf6f8
Author: damian0815 <null@damianstewart.com>
Date:   Sat Nov 5 22:16:47 2022 +0100

    logs

commit 901c4c1aa4b9bcef695a6551867ec8149e6e6a93
Author: damian0815 <null@damianstewart.com>
Date:   Sat Nov 5 22:12:39 2022 +0100

    actually set save_slicing_strategy to True

commit f780e0a0a7c6b6a3db320891064da82589358c8a
Author: damian0815 <null@damianstewart.com>
Date:   Sat Nov 5 22:10:35 2022 +0100

    store slicing strategy

commit 93bb6d566fd18c5c69ef7dacc8f74ba2cf671cb7
Author: damian <git@damianstewart.com>
Date:   Sat Nov 5 20:43:48 2022 +0100

    still not it

commit 5e3a9541f8ae00bde524046963910323e20c40b7
Author: damian <git@damianstewart.com>
Date:   Sat Nov 5 17:20:02 2022 +0100

    wip offloading attention slices on-demand

commit 4c2966aa856b6f3b446216da3619ae931552ef08
Author: damian0815 <null@damianstewart.com>
Date:   Sat Nov 5 15:47:40 2022 +0100

    pre-emptive offloading, idk if it works

commit 572576755e9f0a878d38e8173e485126c0efbefb
Author: root <you@example.com>
Date:   Sat Nov 5 11:25:32 2022 +0000

    push attention slices to cpu. slow but saves memory.

commit b57c83a68f2ac03976ebc89ce2ff03812d6d185f
Author: damian0815 <null@damianstewart.com>
Date:   Sat Nov 5 12:04:22 2022 +0100

    verbose logging

commit 3a5dae116f110a96585d9eb71d713b5ed2bc3d2b
Author: damian0815 <null@damianstewart.com>
Date:   Sat Nov 5 11:50:48 2022 +0100

    wip fixing mem strategy crash (4 test on runpod)

commit 3cf237db5fae0c7b0b4cc3c47c81830bdb2ae7de
Author: damian0815 <null@damianstewart.com>
Date:   Fri Nov 4 09:02:40 2022 +0100

    wip, only works on cuda
2022-11-09 07:21:21 -05:00
Lincoln Stein
3033331f65 remove unneeded warnings from attention.py 2022-10-27 22:50:06 -04:00
Damian at mba
056cb0d8a8 sliced cross-attention wrangler works 2022-10-19 21:08:03 +02:00
Damian at mba
37a204324b go back to using InvokeAI attention 2022-10-19 21:08:03 +02:00
Damian at mba
1fc1f8bf05 cross-attention working with placeholder {} syntax 2022-10-19 21:06:42 +02:00
Damian at mba
8ff507b03b runs but doesn't work properly - see below for test prompt
test prompt:
"a cat sitting on a car {a dog sitting on a car}" -W 384 -H 256 -s 10 -S 12346 -A k_euler
note that substition of dog for cat is currently hard-coded (ksampler.py
	line 43-44)
2022-10-19 21:06:42 +02:00
Mihail Dumitrescu
e0951f28cf Refactor attention.CrossAttention to remove duplicate code and apply optimizations
Apply ~6% speedup by moving * self.scale to earlier on a smaller tensor.
When we have enough VRAM don't make a useless zeros tensor.
Switch between cuda/mps/cpu based on q.device.type to allow cleaner per architecture future optimizations.
For cuda and cpu keep VRAM usage and faster slicing consistent.
For cpu use smaller slices. Tested ~20% faster on i7, 9.8 to 7.7 s/it.
Fix = typo to self.mem_total >= 8 in einsum_op_mps_v2 as per #582 discussion.
2022-09-17 20:19:21 +03:00
Mihai
dd3fff1d3e
~7% speedup by switch to += in ldm.modules.attention. (#569)
Tested on 8GB eGPU nvidia setup so YMMV.
Re-land with .clone() fix, context #508
2022-09-14 18:10:33 -04:00
Any-Winter-4079
d0a71dc361
Update attention.py for 16-32GB M1 performance (#540)
Code cleanup and attention.py einsum_ops update for M1 16-32GB performance.
Expected: On par with fastest ever from 8 to 128GB for 512x512. Allows large images.
2022-09-13 10:53:45 -04:00
Lincoln Stein
9fa1f31bf2 fix opencv and realesrgan dependencies in mac install 2022-09-12 07:07:05 -04:00
Any-Winter-4079
9cdf3aca7d Update attention.py
Performance improvements to generate larger images in M1 #431

Update attention.py

Added dtype=r1.dtype to softmax
2022-09-11 22:36:58 -04:00
Lincoln Stein
7708f4fb98 slight efficiency gain by using += in attention.py 2022-09-11 16:03:54 -04:00
Lincoln Stein
70aa674e9e merge PR #495 - keep using float16 in ldm.modules.attention 2022-09-11 10:34:06 -04:00
Lincoln Stein
10db192cc4 changes to dogettx optimizations to run on m1
* Author @any-winter-4079
* Author @dogettx
Thanks to many individuals who contributed time and hardware to
benchmarking and debugging these changes.
2022-09-09 09:51:41 -04:00
Lincoln Stein
29ab3c2028
disable neonpixel optimizations on M1 hardware (#414)
* disable neonpixel optimizations on M1 hardware

* fix typo that was causing random noise images on m1
2022-09-07 13:28:11 -04:00
Lincoln Stein
720e5cd651
Refactoring simplet2i (#387)
* start refactoring -not yet functional

* first phase of refactor done - not sure weighted prompts working

* Second phase of refactoring. Everything mostly working.
* The refactoring has moved all the hard-core inference work into
ldm.dream.generator.*, where there are submodules for txt2img and
img2img. inpaint will go in there as well.
* Some additional refactoring will be done soon, but relatively
minor work.

* fix -save_orig flag to actually work

* add @neonsecret attention.py memory optimization

* remove unneeded imports

* move token logging into conditioning.py

* add placeholder version of inpaint; porting in progress

* fix crash in img2img

* inpainting working; not tested on variations

* fix crashes in img2img

* ported attention.py memory optimization #117 from basujindal branch

* added @torch_no_grad() decorators to img2img, txt2img, inpaint closures

* Final commit prior to PR against development
* fixup crash when generating intermediate images in web UI
* rename ldm.simplet2i to ldm.generate
* add backward-compatibility simplet2i shell with deprecation warning

* add back in mps exception, addresses @vargol comment in #354

* replaced Conditioning class with exported functions

* fix wrong type of with_variations attribute during intialization

* changed "image_iterator()" to "get_make_image()"

* raise NotImplementedError for calling get_make_image() in parent class

* Update ldm/generate.py

better error message

Co-authored-by: Kevin Gibbons <bakkot@gmail.com>

* minor stylistic fixes and assertion checks from code review

* moved get_noise() method into img2img class

* break get_noise() into two methods, one for txt2img and the other for img2img

* inpainting works on non-square images now

* make get_noise() an abstract method in base class

* much improved inpainting

Co-authored-by: Kevin Gibbons <bakkot@gmail.com>
2022-09-05 20:40:10 -04:00
Lincoln Stein
bdb0651eb2 add support for Apple hardware using MPS acceleration 2022-08-31 00:33:23 -04:00
Lincoln Stein
4f02b72c9c prettified all the code using "blue" at the urging of @tildebyte 2022-08-26 03:15:42 -04:00
ablattmann
e66308c7f2 add code 2021-12-21 03:23:41 +01:00