mirror of
https://github.com/invoke-ai/InvokeAI
synced 2024-08-30 20:32:17 +00:00
e0951f28cf
Apply ~6% speedup by moving * self.scale to earlier on a smaller tensor. When we have enough VRAM don't make a useless zeros tensor. Switch between cuda/mps/cpu based on q.device.type to allow cleaner per architecture future optimizations. For cuda and cpu keep VRAM usage and faster slicing consistent. For cpu use smaller slices. Tested ~20% faster on i7, 9.8 to 7.7 s/it. Fix = typo to self.mem_total >= 8 in einsum_op_mps_v2 as per #582 discussion. |
||
---|---|---|
.. | ||
diffusionmodules | ||
distributions | ||
encoders | ||
image_degradation | ||
losses | ||
attention.py | ||
ema.py | ||
embedding_manager.py | ||
x_transformer.py |