Part 3.3: Minimizing cache misses

Tutorial

advanced

+5XP

20 mins

(108)

Unity Technologies

In this section of the DOTS Best Practices guide, you will:

Learn how to minimize cache misses by arranging your data so that your code operates on large buffers rather than small ones
Learn how to declare and manage DynamicBuffers for maximum cache efficiency
Learn about the performance and memory implications of chunk fragmentation, and how to avoid it

Languages available:

1. Iterate on large sets

When your code requests a piece of data that isn’t in the CPU cache, the CPU must fetch that data from memory and fill a cache line with it. This is slow, but the benefit is that you now have the data (and the data that comes after it in memory, enough to fill the 64-byte cache line) cached for fast access. Furthermore, if your memory access patterns after this point are predictable to the CPU, it can prefetch the data it thinks you’re going to need into the cache. CPU prediction is somewhat simplistic, so it can see when you’re iterating forwards or backwards through contiguous linear buffers in memory, but it treats anything more complex as random access, resulting in cache misses. This means that iterating over contiguous linear buffers is what you should be aiming for as often as possible.

To get the most out of prefetching, you should favor working on large buffers over small ones. Don’t base your data around the idea of hierarchical iteration. If you have to rearrange your data to make it linearly-accessible or make a copy of the data for processing, then do so. Duplicating data to allow for faster read access (in exchange for slower or more complex write operations) is called denormalization; for a good example of how it can work in an ECS context, see the Innogames blog post Entity Relationships in Unity DOTS/ECS. Sort and filter entity data sets by state, rather than branching on state; in other words, put your conditional tests outside of your loops whenever it’s possible to do so.

By default, the Native Containers don’t allow nesting; for example, you can’t make a 2D or 3D NativeArray. Similarly, you can’t schedule a job from inside another job, or nest an Entities.ForEach() inside the lambda of another Entities.ForEach(). This is because the job safety system can only guarantee safety by only allowing you to launch jobs from the main thread. So if your application calls for the world to be divided into grid cells or voxels, don’t try to nest arrays of voxels along the x, y, and z axes, and don’t try to nest loops to iterate over each of these axes in turn. Instead, store the cell data in a linear array, and calculate the (x, y, z) coordinates of the cell from the index and the grid bounds.

2. Set DynamicBuffer capacity

Dynamic buffer components are useful ways to add array-like functionality to entities. As the name suggests, these buffers can be dynamically resized, for example by using the Add() method to add new items to the buffer. However, much like C# List, DynamicBuffer stores an internal array of components, and if the number of items in the buffer ever reaches the buffer’s capacity, it has to reallocate a new, larger array to make room.

If a DynamicBuffer is within its initial capacity, it’s stored inline in the chunk, as if the entity contained a component with an array inside it. However, if the DynamicBuffer exceeds capacity, ECS allocates memory outside the chunk and moves the entire DynamicBuffer into the newly-allocated memory. So if a DynamicBuffer’s capacity is 12 elements, and then a 13th element is added, a number of problems are introduced:

When you add the new element, ECS allocates the new buffer storage, and copies the existing buffer elements to the new memory location. This can be time consuming.
Every future attempt to access the DynamicBuffer result in a cache miss, because the buffer data is no longer inline in the entity chunk
This situation contributes to chunk fragmentation. Once the DynamicBuffer has exceeded initial capacity and moved, there is always a 12-element empty space in the chunk that your code is no longer accessing but which persists for as long as the DynamicBuffer exists.

Setting internal buffer capacity

The default capacity of all DynamicBuffers is calculated using TypeManager.DefaultBufferCapacityNumerator. This defaults to 128 bytes, or (for example) 32 integers. If you know in advance how many elements a given DynamicBuffer is likely to contain, you should declare it when you declare the buffer, using the [InternalBufferCapacity] Attribute. As long as the buffer never grows past this initial capacity, it never needs to reallocate.

// My buffer can contain up to 42 elements inline in the chunk
// If I add any more then ECS will reallocate the buffer onto a heap  
[InternalBufferCapacity(42)]  
public struct MyBufferElement : IBufferElementData  
{  
    public int Value;  
}

Setting buffer capacity dynamically

Sometimes it’s not possible or practical to make reasonable predictions at compile time about how much capacity a DynamicBuffer might need. These situations can be problematic when you add items to a DynamicBuffer one at a time, because by default the buffer grows by one element every time, which means that every Add() that increases the Capacity causes an allocation.

It’s possible to dynamically control the capacity to avoid such situations. You can use DynamicBuffer.EnsureCapacity() to forcibly reallocate the buffer into an area of memory that’s big enough to accommodate the specified capacity without needing to reallocate every time a new element is added. If dynamic buffers end up taking up too much memory due to capacity padding that you no longer need, you can shrink them back to size by calling DynamicBuffer.TrimExcess().

3. Understand chunks

The Entities package manual page about Archetypes Concepts describes the structure and organization of components in memory. Here’s a useful recap:

Every query-based foreach, IJobEntity, IJobChunk or Entities.ForEach() uses an EntityQuery to filter which entities and components are involved in the data transformation.
An EntityQuery contains a list of EntityArchetypes. These describe which groups of entities match the query.
An EntityArchetype contains a list of ArchetypeChunks (commonly referred to as “chunks”). These are 16KB buffers in unmanaged memory.
A chunk contains the components for a number of entities that match a specific archetype.

An Entity is simply an index into a structure inside an EntityManager that points to a specific chunk, and a specific index within that chunk where the entity’s component data resides.

For another good explanation, see the Innogames blog post Unity’s “Performance by Default” under the hood section ECS Memory Layout.

Chunk fragmentation

One problem that can arise from storing components in 16KB chunks is chunk fragmentation. Chunk fragmentation simply means that archetype chunks are not being used efficiently.

ECS guarantees that the components within a chunk will be stored contiguously, but this can waste memory if chunks are not full. Additionally, cache misses occur every time a system has to jump from one chunk to another in order to process the next entity.

To take an extreme example, if you have 100,000 entities that all have their own unique archetype, each one of them keeps its component data in a different chunk, meaning that ECS will allocate more than 1.5GB of chunk data, most of it empty. This is more than enough to generate out of memory crashes on some platforms. Furthermore, if your application didn’t run out of memory, every job that operates on a number of entities would encounter a cache miss between each and every entity.

Shared components and chunk fragmentation

A common cause of chunk fragmentation is incorrect usage of shared components. You can use shared components to group a large number of entities into a comparatively small number of subgroups that don’t change often. This removes the increased memory footprint of storing a copy of the component data for each entity, and can make certain kinds of data processing more efficient.

As a rule, you should only use shared components if the following statements are all true:

It’s useful for your systems to operate on individual subgroups
There is a comparatively small number of these subgroups
The memory saved by using shared components rather than standard components is greater than the memory that is lost by creating more archetypes.

Prefabs and chunk fragmentation

Another cause of chunk fragmentation is prefabs. You can instantiate prefab entities to dynamically create new entities at runtime, but the prefabs themselves aren’t the same as the entities which are instantiated from them. Prefab entities have a Prefab component which causes EntityQueries to implicitly ignore them. The EntityManager strips this component from the new copies during instantiation, meaning that ECS systems operate on the instanced entities but leave the original prefab alone. This means that prefabs have a different archetype to the entities they instantiate, and if you load several prefabs with different archetypes, each prefab occupies its own 16KB chunk.

The memory overheads of these prefabs can add up quickly, so if you’re making (for example) a procedurally-generated game that uses lots of different prefabs, make sure you know, understand, and can manage how many of them are loaded into memory at any one time.

Heavy entities and chunk fragmentation

The number of entities that can fit into a chunk can be different according to the archetype. This is because a chunk is always 16K, but each entity archetype is represented by a different set of components containing different amounts of data. A heavy entity is one which has a large number of components, or components that contain a lot of data. Not many of these entities can fit into a chunk. Consequently, iterating over a large set of these entities means the application encounters a higher number of chunk boundaries and subsequent cache misses.

It’s important to remember that, unlike in OOP (where a class typically represents a particular type of object or item in your simulation world), there are no objects or items in ECS at all. There are only components, grouped in various ways. Although it can be convenient to think of an Entity as analogous to an OOP object, in reality an Entity is little more than an index to a data structure that provides access to one specific collection of components. This means that there’s no particular reason that you need to represent (for example) a character in your game as a single entity.

This means that if you have large numbers of heavy entities, the solution is to simply break them down into a larger number of lighter entities that can pack into chunks more efficiently. For example, if you divide your character’s components into different entities according to which SystemGroup processes them (for example, AI, pathfinding, physics, animation, rendering, etc.), those SystemGroups can iterate over chunks filled with components that contain only the data they need (for example, the chunks which the physics simulation access have room for more physics components because they aren’t also filled with AI state machine data).

4. Not everything has to be an Entity

As discussed above, the purpose of an Entity is simply to provide a way to identify a particular group of components so that it can be referenced elsewhere. For many of the entities you create, you never need to keep hold of the resulting Entity handle: your systems simply operate on the components and give you the desired behavior.

In fact, ECS and the Entities package are an entirely optional part of DOTS. EntityManagers in ECS are essentially just a specialized memory manager that helps to pack blittable data (in the form of components) into NativeContainers inside memory pages called chunks. EntityQueries are just an efficient way to select chunks to operate on. Systems are just a way to launch jobs that operate on the data in a scheduled, parallelized way. It’s entirely possible to structure parts of your codebase to use NativeContainers of blittable data that isn’t organized into components, and to schedule Burst-compiled jobs from methods that don’t belong to systems at all.

It might be that parts of the simulation you want to build already lend themselves to nicely packed contiguous buffers of blittable data, but those data structures are not necessarily a good fit for becoming entities or components. If you’re making a Minecraft-style voxel game, should every voxel be an entity? Probably not. If you did, the size of the Entity handle for each voxel might well end up taking up more memory than the voxel data itself. Don’t be afraid to step outside ECS if your data structures work better without it.

Part 3.3: Minimizing cache misses

1. Iterate on large sets

2. Set DynamicBuffer capacity

3. Understand chunks

4. Not everything has to be an Entity

Complete this tutorial