even faster.
* GPU based version started in previous commit, but this fixes errors
and bugs and gets it actually compiling and running.
* Add a way to batch together images to use the same render pass for GPU
premultiplication if they all target the same texture.
* Pending premultiplication uploads are automatically done when calling
`Drawer::third_pass`.
* `fast-srgb8` dep removed, we no longer convert to `f32`s to do the
premultiplication. Two `[u16; 256]` tables are combined to compute the
alpa premultiplied color within the same error bounds used by the
`fast-srgb8` crate. We also no longer use explicit simd.
* Remove explicit lifetimes from `PlayState::render` since `&self` and
`Drawer<'_>` don't need to have the same lifetime.
* Fix existing bug where invalidated cache entries were never set to
valid when reusing them.
* `prepare_graphic` now runs some heuristics to determine whether
premultiplication should be executed CPU side or GPU side and then
returns a bool indicating if GPU premultiplication is needed.
* General progress in setting up code paths to support GPU
premultiplication.
* Created `PremultiplyUpload` type to represent an initiated image
upload where the premultiply pass needs to be ran to complete it.
* Converted from compute pass to render pass since current limitations
make it difficult to write directly to a srgb image from a compute
shader.
* Replace `CachedDetails::Immutable` with keeping track of the
parameters used to create the texture (i.e. the border color).
* Create `TextureRequirements`, `TextureParamters`, and `CacheKey` types
to encode parameters that go into texture creation and image caching
and to determine when the space in texture memory should be reused
when replacing a graphic.
* Add custom texture creation logic for the UI textures since those need
certain usage combinations.