When running on just cpu (intel), a call to torch.layer_norm would error with RuntimeError: expected scalar type BFloat16 but found Float
Fix buggy device handling in model.py.
Tested with scripts/dream.py --full_precision on just cpu on intel laptop. Works but slow at ~10s/it.