Part 1: Understand data-oriented design

Tutorial

·

advanced

·

+5XP

·

15 mins

·

(643)

Unity Technologies

Part 1: Understand data-oriented design

In this section of the DOTS Best Practices guide, you will:

  • Learn how data-oriented design (DOD) differs from object-oriented programming.
  • Find links to a number of written and video primers introducing DOD concepts
  • Learn some key principles to remember when implementing DOTS functionality in Unity

Languages available:

1. Understanding DOD

Unity’s Entities package (also known as the Entity Component System, or ECS) is not an API that you can incorporate into your project and expect to get great performance automatically. Data-oriented design (DOD) is a fundamentally different approach to the object-oriented programming (OOP) that many developers use as their main (or only) programming paradigm.


Object-oriented programming involves structuring your code into classes that represent types of things you might find in the real world. An instance of a class represents a single object. Generally, all of the object’s data is hidden inside behind private keywords. There are methods to operate on that data, and inheritance hierarchies to express objects that are similar to other objects but different in certain ways. The result is lots of individual objects, scattered throughout memory. OOP might be intuitive for humans to understand, but it’s not efficient for modern CPUs to process.


By contrast, data-oriented design focuses on the data. Developers consider what data is needed, and how best to structure it in memory so that the CPU can efficiently access the data while running the systems that process it. Rather than representing single encapsulated objects driven by inheritance, DOD uses composition to break those objects down into components, and then groups the components into arrays. Systems then iterate across the arrays to transform the data as required by the project’s algorithms. The most common use-case in DOD is to consider many components at once, rather than one object at a time.


To work successfully with DOD, you need to ignore a lot of OOP concepts that might seem like absolutely fundamental aspects of programming. Forget encapsulation and data hiding. Forget inheritance and polymorphism. To a large extent, forget about reference types. These concepts will not help you.


Let’s look at the difference between OOP and DOD approaches with an example. Here’s a screenshot from an imaginary game, “Beach Ball Simulator: Special Edition”. The player has activated a power-up to move all of the green balls. We’ll explore the different ways the game will access data once the player starts moving those green balls around.





In OOP, the code iterates over an array of Sphere classes to check the Color of each one and set the Position of the green ones. Although the array is packed with contiguous data, it only contains references to Sphere classes, and the actual Sphere class data can be scattered throughout memory, resulting in cache misses. In DOD, Spheres are decomposed into Color and Position components and packed into buffers, resulting in fewer cache misses and much faster processing.


2. Learning resources

If you’re new to DOD, you should take the time to learn more about the fundamental principles of DOD before you launch into writing DOTS code in Unity. Understanding DOD up front will save you a lot of time compared to diving straight in without a full understanding and then getting stuck. There are lots of great publicly available resources:


Data-oriented design primers








DOD in Unity







Advanced reading





Unity’s DOD packages


The package documentation contains vital information on how to use DOD in Unity. It doesn’t go into as much detail about the theory of DOD as the links above, but the documentation explains how the API works to help you work on your data design and implementation.


The Data-Oriented Technology Stack is made up from a collection of different ECS packages. It’s important to understand what each package does, and how each one contributes to the overall technology stack.




  • Burst is a compiler that translates from IL/.NET bytecode to highly optimized native code using LLVM

  • The Entities package (also known as Entity Component System or ECS) allows for the creation and management of Entities and Components to unlock the power of Collections, jobs and Burst - see the Installation page of the documentation for information on how to get started

  • The Entities Graphics package is a system that collects the data needed to render ECS entities, and sends this data to Unity's existing rendering architecture.

In addition, it’s important to get familiar with the ECS Samples repository, for examples of how the DOTS packages can be used.


3. Key Principles

The following key principles are fundamental to understanding how to approach DOD:


  • Design before you code. All computer code is essentially a series of data transformations. Work out what data you need, how you need to format and group that data for efficient CPU access at runtime, which systems should transform the data, and how they should do it. If you get the data design right, it should be very straightforward to write the code to transform that data.

  • Design for efficient memory cache usage. You should ensure your data is packed into contiguous buffers with no gaps or variables that aren’t needed. Systems and jobs should iterate linearly over this data, which allows the CPU to pull data into a cache line in an efficient and predictable way. If this isn’t what’s happening in your project, redesign your data.

  • Design for blittable data. Blittable data is data that the job system can copy into jobs without the need for any additional processing of pointers or references. You should design your data such that it contains no managed types wherever possible: no classes, no references, and ideally no strings. You should schedule as much DOTS code as possible, and you should ensure that as much of that scheduled code as possible is parallelized. Blittable data makes this possible. It’s also the core of HPC#, which allows the Burst compiler to produce efficient code.

  • Design for the common case. You can’t make all data structures and code be fully efficient, all of the time. Optimizing for edge cases at best wastes your time, and at worst means that you introduce inefficiencies where they really matter by generalizing the common case. When you make decisions, consider whether a given piece of code is going to run 100,000 times per frame, once per frame, once every few seconds, or only during initialization. Focus on the frequent operations.

  • Embrace iteration. Much more so than OOP, DOD lends itself to iterative development. Initial assumptions about data might be incorrect. Requirements might change. DOD code tends to be much more modular than OOP code, so you can remove and replace entire systems and the components they operate on if you find more efficient algorithms. Embrace this iterative process. Make time for it in your development schedule.

Complete this tutorial