* Explicitly assert that neither of the requested dimensions for an
image are 0. (I think this used to fail later on anyway)
* Don't show the UI alpha premultiply pass in GPU timings in the HUD
debug info display since it only very transiently appears (since this
doesn't run every frame).
even faster.
* GPU based version started in previous commit, but this fixes errors
and bugs and gets it actually compiling and running.
* Add a way to batch together images to use the same render pass for GPU
premultiplication if they all target the same texture.
* Pending premultiplication uploads are automatically done when calling
`Drawer::third_pass`.
* `fast-srgb8` dep removed, we no longer convert to `f32`s to do the
premultiplication. Two `[u16; 256]` tables are combined to compute the
alpa premultiplied color within the same error bounds used by the
`fast-srgb8` crate. We also no longer use explicit simd.
* Remove explicit lifetimes from `PlayState::render` since `&self` and
`Drawer<'_>` don't need to have the same lifetime.
* Fix existing bug where invalidated cache entries were never set to
valid when reusing them.
* `prepare_graphic` now runs some heuristics to determine whether
premultiplication should be executed CPU side or GPU side and then
returns a bool indicating if GPU premultiplication is needed.