Collision System Part 2: Sparse Hash-Based Grid Partitioning, Multithreading, and Future

What happened last time … performance issues

So we got collision working, but its performance is unacceptable. If I load the entire world geometry up as collision mesh, the collision routine would take up to about 100ms to process per frame, that’s insane!

One well-known technique that helps with performance is spatial partitioning. As its name implies, a spatial partitioning algorithm takes a bunch of geometries and groups them in terms of vicinity. When geometry needs to be queried for collision detection, we can only pick out the geometries that are close enough to our moving character and ignore the rest.

There are a lot of choices: octree, kd-tree, etc. I chose to go with a uniform grid, since it seems to be the easiest to implement.

Grid storage problem & wasted space

In order to partition the entire world geometry, the grid must be big enough to contain all of it. And each grid cell holds all the polygons that overlaps with that cell. To generate such a grid, the obvious first step is to compute a bounding box for the entire world geometry, then use that as the grid’s size. However, consider the following example:

This is the smallest bounding box for this scene, but it wastes space storing cells that are completely useless. Only about 60% of the grid is being used:

(the blue cells are unused)

This is a generous illustration of the problem. It could be much worse. Having uneven terrain means the grid will always be too big in one dimension. It’s almost impossible to fit a box volume onto an arbitrary game scene and not waste memory, because geometries in games are almost never distributed uniformly in space.

Virtualize the grid (sparse)

So instead of holding a physical grid in memory, we can instead have a “virtual” grid, which user can query data from. I call this grid “virtual” because we defer the allocation of each cell until a polygon is inserted into it from the user. When a cell is allocated, its physical coordinate will be hashed into an entry index, which is used to map to an entry in a pre-allocated table. So if the user query a cell, we can use the hashing process to get it from the hash table, and if the queried cell is not yet allocated, then we simply say this cell is empty. It creates the illusion of a complete grid, yet the grid itself remains sparse, meaning it has no unused cells.

Multiple cells could hash into the same table entry, so I allow each entry to hold multiple cells by storing them as a linked list.

Such a sparse grid is a much better choice, because it has no wasted memory and can be extended indefinitely along any dimension. As illustrated below, the grid nicely adapts to terrain’s unevenness and contains all of the polygons without wasting too much space.

Visualization of the sparse grid:

Generating and using a sparse grid

Since our scene is static, I don’t even have to “insert” polygons into a cell at runtime. I know from the beginning which cells to place a polygon into. So I can preallocate an array of polygons, then store the location of the overlapping polygons inside each cell.

Knowing the mechanism of a sparse grid, generating it is simple. I generated mine with the following steps:

1. Iterate over each polygons and allocate cells that overlaps with them
2. Iterate over all the cells to pre-allocate the polygon array and subdivide them in a way that each subdivision belongs to a single cell
3. Iterate over each polygons, and fill them into the pre-allocated array

Using it is also very straightforward:

1. Have an empty array of polygons
2. Tightly bound the swept ellipsoid of player’s movement into a bounding box, and use that box to check which cells are overlapped
3. For each overlapping cell, take out their polygons and append it to my empty array
4. After that, I have an array of neighbor polygons that possibly collide with the player

Despite being such a simple technique, grid gives pretty good result. It reduces the frightening 100ms per frame to only about 1ms amortized. But we are game programmers, we don’t care about amortized time! We only care about worst cases!

Worst cases are still pretty bad

This particular mesh contains thousands of faces and occupies a very small amount of space. Running the player against it requires around 15ms to process per frame. It’s the toughest edge case a grid algorithm can encounter. The only way to partition this kind of geometry is to set grid’s resolution to really high, but then grid’s fine resolution will be an overkill for the rest of the geometry and waste memory.

This exposes the weakness of grid; each cell is of equal size. It’s unable to adapt to geometries of different details. In this case, recursive partitioning techniques, such as octree and kd-tree, should performance much better.

One simple fix is to replace this kind of meshes’ collision representations with simpler shapes. The high detail of this geometry is only for rendering purposes. Player will notice nothing if I replace them with mere approximations. For the particular example shown above, I will probably slap together a couple of boxes and use them as the collision volume instead.

Further optimization … with hardware!

Ok, it’s pretty silly trying to run stuff fast if I ignore the fact that modern PCs all have multiple cores, each core with two hyperthreads. An obvious optimization is just multithreading the collision operation. However, I don’t want this multithreaded work to be beneficial to just collision code. I want it to be useful to other components of the engine as well, namely skeletal animation and asset loader. So I decided to make a generic multithreaded job system that can be shared across multiple subsystems of the game engine.

A job system is where there is a queue of jobs where multiple workers can grab work from and complete. Each of these workers are on separate threads. Whenever a job is pushed onto the job queue, each worker is woken up and it will try to grab the job and finish it.

Constructing such a job system is not too much work. All I have to do now is to partition the neighbor polygons into small batches, then submit these batches onto the job queue. Nothing has special has to be done, since each worker don’t touch the same piece of data when the work is divided this way. Doing just that sped the collision code up by about 6 times.

Conclusion

So, combining both spatial partitioning and multithreading, collision now performs at an average of 0.5ms per frame. When encountering the “worst case”, it runs at about 3ms per frame. Again, there are workarounds to the “worst cases”, so I don’t worry too much about that. I will gladly accept this as an early victory.

Future polishing work

My grid implementation isn’t yet perfect. There are still much polishing to be done. As you can see from screenshots and video, there are strange cells hovering above in the air but contains no actual polygon. That is because my triangle “voxelization” implementation is terrible. It just uses the triangle’s bounding box to test overlapping cells, but it produces a lot of false positives because triangle only takes a thin slice of that bounding box.

The implementation also possibly multi-count polygons queried from neighbor cells. I already store a unique ID in each polygon and plan on timestamping them so that no unique polygon is included in the neighbor polygons twice.

Next up, robustness

Collision is tricky because it requires numerical robustness to work well, and in a game, we have to make sure collision works a hundred percent of the time. It is not acceptable to have even one edge case or once corner that player can clip through or get stuck on. And the bad news is, in the past couple of days when I had spare time to work on Monter, I have actually gotten stuck on some geometries, and even falling through on others. My fear is that even when I think I “fixed” it, there is really no way to know if I did “fix” it, or having a robust way to reproduce edge cases where the collision algorithm fail. It is also the reason I haven’t SIMD the code yet, I want to be able to change the code around easily during the next phase, and SIMD doesn’t help with that.

I want to make a collision debug & testing infrastructure to really iron out the core of the collision code. I already know it has bugs in it, and I don’t trust the algorithm itself from the first place. After we get a reasonable framerate, it’s time to get serious and really test our code and think about how to make it robust.

And as always, a recording of my progress