veloren/client
haslersn b26043b0e6 common: Rework Chunk and Chonk implementation
Previously, voxels in sparsely populated chunks were stored in a `HashMap`.
However, during usage oftentimes block accesses are followed by subsequent
nearby voxel accesses. Therefore it's possible to provide cache friendliness,
but not with `HashMap`.

The previous merge request [!469](https://gitlab.com/veloren/veloren/merge_requests/469)
proposed to order voxels by their morton order (see https://en.wikipedia.org/wiki/Z-order_curve ).
This provided excellent cache friendliness. However, benchmarks showed that
the required indexing calculations are quite expensive. Particular results
on my _Intel(R) Core(TM) i7-7500U CPU @ 2.70 GHz_ were:

| Benchmark                                | Before this commit @ d322384bec | Morton Order @ ec8a7caf42 | This commit          |
| ---------------------------------------- | --------------------------------- | --------------------------- | -------------------- |
| `full read` (81920 voxels)               | 17.7ns per voxel                  | 8.9ns per voxel             | **3.6ns** per voxel  |
| `constrained read` (4913 voxels)         | 67.0ns per voxel                  | 40.1ns per voxel            | **14.1ns** per voxel |
| `local read` (125 voxels)                | 17.5ns per voxel                  | 14.7ns per voxel            | **3.8ns** per voxel  |
| `X-direction read` (17 voxels)           | 17.8ns per voxel                  | 25.9ns per voxel            | **4.2ns** per voxel  |
| `Y-direction read` (17 voxels)           | 18.4ns per voxel                  | 33.3ns per voxel            | **4.5ns** per voxel  |
| `Z-direction read` (17 voxels)           | 18.6ns per voxel                  | 38.2ns per voxel            | **5.4ns** per voxel  |
| `long Z-direction read` (65 voxels)      | 18.0ns per voxel                  | 37.7ns per voxel            | **5.1ns** per voxel  |
| `full write (dense)` (81920 voxels)      | 17.9ns per voxel                  | **10.3ns** per voxel        | 12.4ns per voxel     |

This commit (instead of utilizing morton order) replaces `HashMap` in the
`Chunk` implementation by the following data structure:

The volume is spatially subdivided into groups of `4*4*4` blocks. Since a
`Chunk` is of total size `32*32*16`, this implies that there are `8*8*4`
groups. (These numbers are generic in the actual code such that there are
always `256` groups. I.e. the group size is chosen depending on the desired
total size of the `Chunk`.)

There's a single vector `self.vox` which consecutively stores these groups.
Each group might or might not be contained in `self.vox`. A group that is
not contained represents that the full group consists only of `self.default`
voxels. This saves a lot of memory because oftentimes a `Chunk` consists of
either a lot of air or a lot of stone.

To track whether a group is contained in `self.vox`, there's an index buffer
`self.indices : [u8; 256]`. It contains for each group

* (a) the order in which it has been inserted into `self.vox`, if the group
    is contained in `self.vox` or
* (b) 255, otherwise. That case represents that the whole group consists
    only of `self.default` voxels.

(Note that 255 is a valid insertion order for case (a) only if `self.vox` is
full and then no other group has the index 255. Therefore there's no
ambiguity.)

Rationale:

The index buffer should be small because:

* Small size increases the probability that it will always be in cache.
* The index buffer is allocated for every `Chunk` and an almost empty `Chunk`
    shall not consume too much memory.

The number of 256 groups is particularly nice because it means that the index
buffer can consist of `u8`s. This keeps the space requirement for the index
buffer as low as 4 cache lines.
2019-09-06 18:20:15 +02:00
..
src common: Rework Chunk and Chonk implementation 2019-09-06 18:20:15 +02:00
Cargo.lock Added lz4 compression to networking 2019-05-09 18:57:47 +01:00
Cargo.toml Update to github vek repo 2019-08-26 13:12:45 +02:00