Fixes:
File "stable-diffusion/ldm/modules/diffusionmodules/model.py", line 37, in nonlinearity
return x*torch.sigmoid(x)
RuntimeError: CUDA out of memory. Tried to allocate 1.56 GiB [..]
Now up to 1536x1280 is possible on 8GB VRAM.
Also remove unused SiLU class.
When running on just cpu (intel), a call to torch.layer_norm would error with RuntimeError: expected scalar type BFloat16 but found Float
Fix buggy device handling in model.py.
Tested with scripts/dream.py --full_precision on just cpu on intel laptop. Works but slow at ~10s/it.