next up previous contents
Next: Design Patterns Up: Data-Oriented Design Previous: In Practice   Contents

Beta release of Data-Oriented Design :
Expect errors, spelling and factual. Expect out of date data, or missing stuff. Expect to be bored stiff in some sections, and rushed in others, but most of all, please send any feedback on any of these and any other things that you spot, to support@dataorienteddesign.com

Subsections

Maintenance and reuse

When object-oriented design was first promoted, it advertised that it was easier to modify and extend existing code bases than the more traditional procedural approach. Though this is not true in practice, partially due to the language used by games developers and partially because there is an inherent coupling between objects with respect to their interfaces, it still remains a consideration of anyone who uses it when entering into another programming paradigm. Regardless of their level of expertise, an object-oriented programmer will cite the extensible, encapsulating nature of object-oriented development as a boon when it comes to working on larger projects.

Highly experienced but more objective developers have admitted or even written about how C++ is not highly suited to large scale development, but can be used in it as long as you follow strict guidelines. [LargeScaleC++] But for those that cannot immediately see the benefit of the data-oriented development paradigm with respect to maintenance, and evolutionary development, this chapter covers why it is easier than working with objects.

Debugging

The prime causes of bugs are the unexpected side effects of a transform, or an unexpected corner case where a conditional didn't return the correct value. In object-oriented programming, this can manifest in many ways, from an exception caused by de-referencing a null, to ignoring the interactions of the player because the game logic hadn't noticed it was meant to be interactive.

Lifetimes

One of the most common causes of the null dereference is when an object's lifetime is handled by a separate object to the one manipulating it. For example, if you are playing a game where the badguys can die, you have to be careful to update all the objects that are using them whenever the badguy gets deleted, otherwise you can end up dereferencing invalid memory which can lead to dereferencing null pointers because the class has destructed. data-oriented development tends towards this being impossible as the existence of an entity in a table implies its processability, and if you leave part of an entity around in a table, you haven't deleted the entity fully. This is a different kind of bug, but it's not a crash bug, and it's easier to find and kill as it's just making sure that when an entity is destroyed, all the tables it can be part of also destroy their elements too.

Avoiding pointers

When looking for data-oriented solutions to programming problems, we often find that pointers aren't required, and often make the solution harder to scale. Using pointers where null values are possible implies that each pointer doesn't only have the value of the object being pointed at, but also implies a boolean value for whether or not that instance exists. Removing this unnecessary extra feature can remove bugs, save time, and reduce complexity.

Bad State

Sometimes a bug is more to do with a game not being in the right state. Debugging then becomes a case of finding out how the game got into its current, broken state.

When you encapsulate your state, you hide internal changes. This quickly leads to adding lots of debugging logs. Instead of hiding, data-oriented suggests keeping data in simple forms, and potentially leaving it around longer than required can lead to highly simplified transform inspection. If you have a transform that appears to work, but for one odd case it doesn't, the simplicity of adding an assert and not deleting the input data reducing the amount of guesswork and toil required to generate the bug fix. If you keep most of your transforms as one-way, that is to say they take from one source, and produce or update another, but even if you run the code multiple times it will still produce the same results as it would the first time. The transform is idempotent. This useful property allows you to find a bug, then rewind and trace through without having to attempt to rebuild the initial state.

One way of keeping your code idempotent is to write your transforms in a single assignment style. If you operate with multiple transforms but all leading to predicated join points, you can guarantee yourself some timings, and you can look back at what caused the final state to turn out like it did without even rewinding. If your conditions are condition tables, just leave the inputs around until validity checks have been completed then you have the ability to go into any live system and check how it arrived at that state. This alone should reduce any investigation time to a minimum.

Reusability

A feature commonly cited by the object-oriented developers that seems to be missing from data-oriented development is reusability. The idea that you won't be able to take already written libraries of code and use them again, or on multiple projects, because the design is partially within the implementation. To be sure, once you start optimising your code to the particular features of a software project, you do end up with unreusable code. While developing data-oriented projects, the assumed inability to reuse source code would be significant, but it is also highly unlikely. The truth is found when considering the true meaning of reusability.

Reusability is not fundamentally concerned in reusing source files or libraries. Reusability is the ability to maintain an investment in information. A wealth of knowledge for the entity that owns the development IP.

Copyright law has made it hard to see what resources have value in reuse, as it maintains the source as the object of it's discussion rather than the intellectual property represented by the source. The reason for this is that ideas cannot be copyrighted, so by maintaining this stance, the copyrighter keeps hold of this tenuous link to a right to withhold information. Reusability comes from being aware of the information contained within the medium it is stored. In our case, it is normally stored as source code, but the information is not the source code. With Object-Oriented development, the source can be adapted (adapter pattern) to any project we wish to venture. However, the source is not the information. The information is the order and existence of tasks that can and will be performed on the data. Viewing the information this way leads to an understanding that any reusability that a programming technique can provide comes down to it's mutability of inputs and outputs. It's willingness to adapt a set of temporally coupled tasks into a new usage framework is how you can find out how well it functions reusably.

In object-oriented development, you apply the information inherent in the code by adapting a class that does the job, or wrapper it, or use an agent. In data-oriented development, you copy the functions and schema and transform into and out of the input and output data structures around the time you apply the information contained in the data-oriented transform.

Even though, at first sight, data-oriented code doesn't appear as resuable on the outside, the fact is that it maintains the same amount of information in a simpler form, so it's more reusable as it doesn't carry the baggage of related data or functions like object-oriented programming, and doesn't require complex transforms to generate the input and extract from the output like procedural progamming tends to generate due to the normalising.

Duck typing, not normally available in object-oriented programming due to a stricter set of rules on how to interface between data, can be implemented with templates to great effect, turning code that might not be obviously reusable into a simple strategy, or a sequence of transforms that can be applied to data or structures of any type, as long as they maintain a naming convention.

The object-oriented C++ idea of reusability is a mixture of information and architecture. Developing from a data-oriented transform centric viewpoint, architecture just seems like a lot of fluff code. The only good architecture that's worth saving is the actualisation of data-flow and transform. There are situations where an object-oriented module can be used again, but they are few and far between because of the inherent difficulty interfacing object-oriented projects with each other.

The most reusable object-oriented code appears as interfaces to agents into a much more complex system. The best example of an object-oriented approach that made everything easier to handle, was highly reusable, and was fully encapsulated was the FILE type from stdio.h that is used as an agent into whatever the platform and OS would need to open, access, write, and read to and from a file on the system.

Unit Testing

Unit testing can be very helpful when developing games, but because of the object-oriented paradigm making programmers think about code as representations of objects, and not as data transforms, it's hard to see what can be tested. Linking together unrelated concepts into the same object and requiring complex setup state before a test can be carried out, has given unit testing a stunted start in games as object-oriented programming caused simple tests to be hard to write. Making tests is further complicated by the addition of the non-obvious nature of how objects are transformed when they represent entities in a game world. It can be very hard to write unit tests unless you've been working with them for a while, and the main point of unit tests is that someone that doesn't fully grok the system can make changes without falling foul of making things worse.

Unit testing is mostly useful during refactorings, taking a game or engine from one code and data layout into another one, ready for future changes. Usually this is done because the data is in the wrong shape, which in itself is harder to do if you normalise your data as you're more likely to have left the data in an unconfigured form. There will obviously be times when even normalised data is not sufficient, such as when the design of the game changes sufficient to render the original data-analysis incorrect, or at the very least, ineffective or inefficient.

Unit testing is simple with data-oriented technique because you are already concentrating on the transform. Generating tables of test data would be part of your development, so leaving some in as unit tests would be simple, if not part of the process of developing the game. Using unit tests to help guide the code could be considered to be partial following the test-driven development technique, a proven good way to generate efficient and clear code.

Remember, when you're doing data-oriented development your game is entirely driven by stateful data and stateless transforms. It is very simple to produce unit tests for your transforms. You don't even need a framework, just an input and output table and then a comparison function to check that the transform produced the right data.

Refactoring

During refactoring, it's always important to know that you've not broken anything by changing the code. Allowing for such simple unit testing gets you halfway there. Another advantage of data-orieted development is that, at every turn, it peels away the unnecessary elements, so you might find that refactoring is more a case of switching out the order of transforms more than changing how things are represented. Refactoring normally involves some new data representation, but as long as you normalise your data, there's going to be little need of that. When it is needed, tools for converting from one schema to another can be written once and used many times.


next up previous contents
Next: Design Patterns Up: Data-Oriented Design Previous: In Practice   Contents Beta release of Data-Oriented Design :
Expect errors, spelling and factual. Expect out of date data, or missing stuff. Expect to be bored stiff in some sections, and rushed in others, but most of all, please send any feedback on any of these and any other things that you spot, to support@dataorienteddesign.com

Richard Fabian 2013-06-25