gdritter repos when-computer / master drafts / subjects-and-entities.telml
master

Tree @master (Download .tar.gz)

subjects-and-entities.telml @masterraw · history · blame

\meta{("subjects-and-entities" "subjects and entities" ("programming") 1458077247)}


Object-oriented programming is very bad for high-performance programs
like graphics- and simulation-intensive video games. Or, at least,
traditional object-oriented design is.

The main culprit is the cache. If I have some \em{thing} in a gamesay,
an enemy characterthen it needs a lot of data associated with it: its
3D meshes and textures, its physics information, its current position, its
inventory, its AI state But not all that information is necessarily
being used at any given time. For example, when I am running the
AI processing step, or calculating physical forces acting on
characters, or drawing things in the game, then I'm only using a small
part of all the information associated with a thing.

In a proper object-oriented design, I would take all the information relevant
to a thing and package it up in an object: for example, in a \tt{Game_Unit}
object. That means, however, that in order
to operate on \em{just a subset} of that information, I need to pull
the object into the cache, which puts \em{all} the object's information
in the cache. This means the cache gets full of extraneous data,
and consequently cache misses become much more common, making the
program slower overall.

One way of avoiding this problem is using so-called
\link{http://gamesfromwithin.com/data-oriented-design|Data-Oriented Design},
which involves pulling apart the data stored in objects and storing it
all separately. This has the advantage of being much more cache-friendly,
but it generally doesn't have language support, and so has to be done
manually. The advantages of typical object-oriented design—like encapsulation
and data hiding—are much more difficult to retain, and the language you
use might actively fight against you in these cases.

Following a similar path, some programmers have started using and advocating
\link{http://www.gamedev.net/page/resources/_/technical/game-programming/understanding-component-entity-systems-r3013|
  Component-Entity Systems},
which are a few steps beyond Data-Oriented Design in roughly the same direction,
but also a radically different way of structuring programs. At a high level,
Component-Entity Systems work like this:

\ul{
  \li{An \em{entity} is an abstract identifier. It has no
      other data \em{directly} associated with it.}
  \li{A \em{component} is a table which maps entities to a collection
      of data, ideally to pieces of simple scalar data. Not
	  every entity needs to appear in a given component.}
  \li{An entity can be associated with several components. That
      means the entity can be used as an index into the component,
	  allowing the programmer to retrieve or modify the associated
	  data stored in the component.}
  \li{Operations are written in terms of one or more components.
      These are sometimes called \em{systems}.}
}

This all sounds very abstract, so what does this look like in practice? Let's
describe a basic game. For the sake of simplicity, let's assume a
we have four salient pieces of information for an in-game unit:
its position, its current health, its appearance, and the state of
its AI (including current goals and actions):

\code
{\ttkw{component} Position(int x, int y);
\ttkw{component} Health(int x);
\ttkw{component} Appearance(Image img);
\ttkw{component} AI_State(State st);
}

Every entity in the game will be associated with one or more
of these components: scenery will have \tt{Position} and
\tt{Appearance} data; invulnerable characters might have \tt{Position},
\tt{Appearance}, and \tt{AI_State}, but no \tt{Health}; a player
character will have \tt{Position}, \tt{Health}, and \tt{Appearance},
but no \tt{AI_State}; and so forth.
Well: I've been saying that entities \em{have} this data, but it's more
appropriate to say they're \em{associated with} that data. All
the relevant data is stored in the components, and the entity
is being used as the index used to access that data.
\ref{db}
\sidenote{If this is hard to visualize, think of components as database
tables, and your entity as the primary key used in all your tables,
which you can use to access or update that data.
You of course wouldn't want to \em{actually} implement a game like
that, but it's similar in spirit.}

Now, to write the salient \em{operations} of our game, we can
write them in terms of one or more components: when we draw
our game, we write the draw operation in terms of \tt{Appearance},
which allows us to loop over everything that has image data
associated with it. On the other hand, when we want to move units
around, we'll write an operation in terms of both the \tt{AI_State}
\em{and} \tt{Position} components, because we need to know what
the unit plans to do in order to update its position.

The advantages of Data-Oriented Design that I described earlier are
still in effect, because all the data associated with a component
can be stored packed together: looping over the \tt{Appearance}
component won't bring any non-\tt{Appearance} data into the cache.
But Component-Entity Systems have an extra advantage
over pure Data-Oriented Design: you gain \em{compositionality} in
a way that's not necessarily present in other designs.\ref{bom}
\sidenote{
\link{http://t-machine.org/index.php/2013/05/30/designing-bomberman-with-an-entity-system-which-components/|
  This blog post about applying component-entity design to Bomberman}
has a section called
\em{Consider the possibilities of your new Components}, in which
the author explores the compositions of components as interesting avenues
of discovering new gameplay. It also goes into a lot more detail
about what a component-entity approach would look like.
}

For example:
you might design a game with a \tt{Health_Pickup} component for
items that restore health when a player interacts with them. An
entity that is associated with both the \tt{Health_Pickup} and
\tt{AI_State} components will act as a mobile health powerup
that can choose how to move around using some kind of AI routine.
On the other hand, an entity associated with both
the \tt{Health_Pickup} and \tt{Health} components is a health pickup
which can be destroyed, perhaps so that it cannot be used by an
opposing player. In both those cases, no extra implementation work
would need to be done for these conjunctions of components: the new,
interesting functionality falls out naturally from implementing
each feature in isolation.

While Component-Entity Systems are interesting,
there are no \em{languages} that are inherently component-entity-oriented.\ref{lng}
\sidenote{At least, if there are, I don't know about them.}
Component-Entity System are usually implemented using existing
object-oriented languages.
So my pseudocode examples above which used the \tt{\ttkw{component}} keyword
were all pure fiction. But what \em{would} a component-entity language
look like?

I'm going to change pace a bit and discuss an old and sadly
mostly-forgotten paper:
\link{http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.131.4805&rep=rep1&type=pdf|
Harrison and Ossher's 1993 OOPSLA paper,
\em{Subject-Oriented Programming: A Critique of Pure Objects}}.
The primary motivation behind the paper is addressing
what they see as a flaw in object-oriented design: most
object-oriented languages include inheritance, and therefore involve
a tree of \em{is-a} relationships. However, objects don't exist in
just a single place in a single ontology: a set of objects can be seen
as occupying multiple places in multiple ontologies.
As a concrete but slightly fanciful example: do we choose a culinary
ontology for our program and write \tt{Tomato extends Vegetable},
or do we choose a biological ontology and write \tt{Tomato extends Fruit}?
What if one part of the program needs one and the other needs
the other?

More concisely, as Jorge Luis Borges\ref{borg} said:
\sidenote{From the Jorge Luis Borges essay
\link{http://www.crockford.com/wrrrld/wilkins.html|\em{The Analytical Language of John Wilkins}}.}

\blockquote
{It is clear that there is no classification of the Universe that is
not arbitrary and full of conjectures. The reason for this is very
simple: we do not know what kind of thing the universe is.}

Harrison and Ossher propose that, instead of dealing with objects
that are instances of classes, we deal with \em{subjects}, which
\em{can be seen as} instances of classes. Any time an object is
interacted with, it is interacted with in a subjective context,
in which the data and operations associated with it can be different
depending on the subjective context. The only thing shared between
the context is the identity of the objects:

\blockquote
{The essential characteristic of subject-orinted programming is
that different subjects can separately define and operate upon
shared objects, without any subject needing to know the details
associated with those objects by other subjects. Only object
identity is necessarily shared.
}

Harrison and Ossher's proposed system is in some respects very
similar to component-entity systems, but is in other respects quite
different. There is certainly commonality: what a component-entity
system calls an \em{entity}, a subject-oriented language calls an
\em{object-identifier} or an \em{oid}, and both consider entities
or oids to be abstract identifiers with no directly associated
information.

In a subject-oriented language, an operation must exist within a
given \em{subject activation}, which exposes pieces of information
associated with the entity and a set of operations. An entity can,
within a given subject activation, have fields and methods, and
those fields and methods can be entirely distinct from the fields
and methods exposed by a different subject activation. Every
method invocation, therefore, has to exist within a subject
activation, so that we know the actions and fields available
within that subjective frame.

A salient \em{difference} is the way that subject-oriented programming allows
certain pieces of information, or certain operations, to be shared
among different subject activations. Harrison and Ossher's repeated
example involves a \tt{Tree} object shared by, among others,
a \tt{Woodcutter} subject and a \tt{Bird} subject. A \tt{Woodcutter}'s
view of the tree has an estimated value, which the woodcutter
might use to determine whether the tree is worth cutting down. On
the other hand, a \tt{Bird}'s view of the tree involves its
suitability for building a nest in. In both cases, though, they might
care about a piece of information like the tree's height.

However, Harrison and Ossher's approach to this issue seems awkward:
they suggest that,
rather than straightforwardly sharing the height between the subjects,
the \tt{Bird} subject and the \tt{Woodcutter}
subject should both have \em{their own copy} of a field representing
the tree's height, and
that the two must be \em{made to} agree: they must return
the same value, or some compatible value (for example, by returning
some value which is commutative.) If they \em{fail} to agree, the
program throws an exception. This is almost \em{certain} to be a
source of frustration in practice, or at least a major source of
gotchas.

The Harrison and Ossher approach also describes how to mediate
two distinct object hierarchies, so that different subject
activations can use inheritance over the same set of classes
in very different ways. (The \tt{Cook} subject, for example,
could use \tt{Tomato extends Vegetable}, while the \tt{Botanist}
subject could use \tt{Tomato extends Fruit}, so a given
oid can be seen by both as being situated within different
hierarchies.) They then go on to describe how one subject's
class hierarchy might be incomplete with respect to another
subject's hierarchy, and describe how to match those hierarchies
together, or infer class hierarchies based on interfaces or other
mechanisms.

I would argue that the best thing to do is to combine the high-level
details of Harrison and Ossher's subject-oriented language design
with the specific mechanisms used in component-entity systems.
They clearly have a common starting
point and a similar approach to modeling the world, in which abstract
entities \em{can be viewed in some context} as having associated operations
and information. The Harrison and Ossher approach unfortunately gets caught
in a quagmire of hierarchies and modeling, but much of that complexity can
be alleviated if we treat subject activations like
sets of components: suddenly, the \tt{Bird} and the \tt{Woodcutter}
subjects/systems can simply \em{share} the \tt{TreeHeight} component,
without having to resort to awkward and complicated agreement
strategies on the hierarchies or results involved.

As for the specifics of what a component-entity language might
look like, I leave that as a creative exercise for the reader.
\ref{exc}\sidenote{I \em{do} have ideas. Someday I will implement them.}