Lincoln Stein 5c43988862 reduce VRAM memory usage by half during model loading
* This moves the call to half() before model.to(device) to avoid GPU
copy of full model. Improves speed and reduces memory usage dramatically

* This fix contributed by @mh-dm (Mihai)
2022-09-10 10:02:43 -04:00
..
2022-09-05 20:40:10 -04:00
2022-09-05 20:40:10 -04:00
2022-09-08 20:39:51 -04:00