Previously, voxels in sparsely populated chunks were stored in a `HashMap`.
However, during usage oftentimes block accesses are followed by subsequent
nearby voxel accesses. Therefore it's possible to provide cache friendliness,
but not with `HashMap`.
The previous merge request [!469](https://gitlab.com/veloren/veloren/merge_requests/469)
proposed to order voxels by their morton order (see https://en.wikipedia.org/wiki/Z-order_curve ).
This provided excellent cache friendliness. However, benchmarks showed that
the required indexing calculations are quite expensive. Particular results
on my _Intel(R) Core(TM) i7-7500U CPU @ 2.70 GHz_ were:
| Benchmark | Before this commit @ d322384bec | Morton Order @ ec8a7caf42 | This commit |
| ---------------------------------------- | --------------------------------- | --------------------------- | -------------------- |
| `full read` (81920 voxels) | 17.7ns per voxel | 8.9ns per voxel | **3.6ns** per voxel |
| `constrained read` (4913 voxels) | 67.0ns per voxel | 40.1ns per voxel | **14.1ns** per voxel |
| `local read` (125 voxels) | 17.5ns per voxel | 14.7ns per voxel | **3.8ns** per voxel |
| `X-direction read` (17 voxels) | 17.8ns per voxel | 25.9ns per voxel | **4.2ns** per voxel |
| `Y-direction read` (17 voxels) | 18.4ns per voxel | 33.3ns per voxel | **4.5ns** per voxel |
| `Z-direction read` (17 voxels) | 18.6ns per voxel | 38.2ns per voxel | **5.4ns** per voxel |
| `long Z-direction read` (65 voxels) | 18.0ns per voxel | 37.7ns per voxel | **5.1ns** per voxel |
| `full write (dense)` (81920 voxels) | 17.9ns per voxel | **10.3ns** per voxel | 12.4ns per voxel |
This commit (instead of utilizing morton order) replaces `HashMap` in the
`Chunk` implementation by the following data structure:
The volume is spatially subdivided into groups of `4*4*4` blocks. Since a
`Chunk` is of total size `32*32*16`, this implies that there are `8*8*4`
groups. (These numbers are generic in the actual code such that there are
always `256` groups. I.e. the group size is chosen depending on the desired
total size of the `Chunk`.)
There's a single vector `self.vox` which consecutively stores these groups.
Each group might or might not be contained in `self.vox`. A group that is
not contained represents that the full group consists only of `self.default`
voxels. This saves a lot of memory because oftentimes a `Chunk` consists of
either a lot of air or a lot of stone.
To track whether a group is contained in `self.vox`, there's an index buffer
`self.indices : [u8; 256]`. It contains for each group
* (a) the order in which it has been inserted into `self.vox`, if the group
is contained in `self.vox` or
* (b) 255, otherwise. That case represents that the whole group consists
only of `self.default` voxels.
(Note that 255 is a valid insertion order for case (a) only if `self.vox` is
full and then no other group has the index 255. Therefore there's no
ambiguity.)
Rationale:
The index buffer should be small because:
* Small size increases the probability that it will always be in cache.
* The index buffer is allocated for every `Chunk` and an almost empty `Chunk`
shall not consume too much memory.
The number of 256 groups is particularly nice because it means that the index
buffer can consist of `u8`s. This keeps the space requirement for the index
buffer as low as 4 cache lines.
See the doc comments in `common/src/vol.rs` for more information on
the API itself.
The changes include:
* Consistent `Err`/`Error` naming.
* Types are named `...Error`.
* `enum` variants are named `...Err`.
* Rename `VolMap{2d, 3d}` -> `VolGrid{2d, 3d}`. This is in preparation
to an upcoming change where a “map” in the game related sense will
be added.
* Add volume iterators. There are two types of them:
* _Position_ iterators obtained from the trait `IntoPosIterator`
using the method
`fn pos_iter(self, lower_bound: Vec3<i32>, upper_bound: Vec3<i32>) -> ...`
which returns an iterator over `Vec3<i32>`.
* _Volume_ iterators obtained from the trait `IntoVolIterator`
using the method
`fn vol_iter(self, lower_bound: Vec3<i32>, upper_bound: Vec3<i32>) -> ...`
which returns an iterator over `(Vec3<i32>, &Self::Vox)`.
Those traits will usually be implemented by references to volume
types (i.e. `impl IntoVolIterator<'a> for &'a T` where `T` is some
type which usually implements several volume traits, such as `Chunk`).
* _Position_ iterators iterate over the positions valid for that
volume.
* _Volume_ iterators do the same but return not only the position
but also the voxel at that position, in each iteration.
* Introduce trait `RectSizedVol` for the use case which we have with
`Chonk`: A `Chonk` is sized only in x and y direction.
* Introduce traits `RasterableVol`, `RectRasterableVol`
* `RasterableVol` represents a volume that is compile-time sized and has
its lower bound at `(0, 0, 0)`. The name `RasterableVol` was chosen
because such a volume can be used with `VolGrid3d`.
* `RectRasterableVol` represents a volume that is compile-time sized at
least in x and y direction and has its lower bound at `(0, 0, z)`.
There's no requirement on he lower bound or size in z direction.
The name `RectRasterableVol` was chosen because such a volume can be
used with `VolGrid2d`.