Apply ~6% speedup by moving * self.scale to earlier on a smaller tensor.
When we have enough VRAM don't make a useless zeros tensor.
Switch between cuda/mps/cpu based on q.device.type to allow cleaner per architecture future optimizations.
For cuda and cpu keep VRAM usage and faster slicing consistent.
For cpu use smaller slices. Tested ~20% faster on i7, 9.8 to 7.7 s/it.
Fix = typo to self.mem_total >= 8 in einsum_op_mps_v2 as per #582 discussion.
Code cleanup and attention.py einsum_ops update for M1 16-32GB performance.
Expected: On par with fastest ever from 8 to 128GB for 512x512. Allows large images.
When running on just cpu (intel), a call to torch.layer_norm would error with RuntimeError: expected scalar type BFloat16 but found Float
Fix buggy device handling in model.py.
Tested with scripts/dream.py --full_precision on just cpu on intel laptop. Works but slow at ~10s/it.
* start refactoring -not yet functional
* first phase of refactor done - not sure weighted prompts working
* Second phase of refactoring. Everything mostly working.
* The refactoring has moved all the hard-core inference work into
ldm.dream.generator.*, where there are submodules for txt2img and
img2img. inpaint will go in there as well.
* Some additional refactoring will be done soon, but relatively
minor work.
* fix -save_orig flag to actually work
* add @neonsecret attention.py memory optimization
* remove unneeded imports
* move token logging into conditioning.py
* add placeholder version of inpaint; porting in progress
* fix crash in img2img
* inpainting working; not tested on variations
* fix crashes in img2img
* ported attention.py memory optimization #117 from basujindal branch
* added @torch_no_grad() decorators to img2img, txt2img, inpaint closures
* Final commit prior to PR against development
* fixup crash when generating intermediate images in web UI
* rename ldm.simplet2i to ldm.generate
* add backward-compatibility simplet2i shell with deprecation warning
* add back in mps exception, addresses @vargol comment in #354
* replaced Conditioning class with exported functions
* fix wrong type of with_variations attribute during intialization
* changed "image_iterator()" to "get_make_image()"
* raise NotImplementedError for calling get_make_image() in parent class
* Update ldm/generate.py
better error message
Co-authored-by: Kevin Gibbons <bakkot@gmail.com>
* minor stylistic fixes and assertion checks from code review
* moved get_noise() method into img2img class
* break get_noise() into two methods, one for txt2img and the other for img2img
* inpainting works on non-square images now
* make get_noise() an abstract method in base class
* much improved inpainting
Co-authored-by: Kevin Gibbons <bakkot@gmail.com>