The Future of Game Programming | Andrew Rollings and Ernest Adams on Game Design

The way we program computers, even after 50 years of doing it, is still slow, exacting, and error-prone . Almost all the efforts to devise better kinds of programming languages and better ways of programming have been ignored by the industry in general and game programmers in particular, who usually trade convenience and even reliability in exchange for execution speed without a second thought. Game programmers are notoriously conservative; it took many years to persuade them to program in high-level languages rather than "down on the bare metal," and it took many more to use object-oriented techniques and project-management tools. Even today there's still something of a cult of machismo among the more hard- core programmers, each trying to outdo the others to squeeze a few more instructions per millisecond out of the hardware.

Most game machines contain a single general-purpose central processing unit and one or more dedicated graphics processing units. Their CPUs follow the traditional one-instruction-at-a-time model that goes all the way back to the Jacquard loom. This design was originally intended for computing ballistics tables for artillery guns, not for simulating complex scenes or looking five moves ahead in a chess game. However, it seems firmly entrenched, and we believe that until there is a completely new paradigm in computer hardware design, software will continue to be programmed in much the same way that it has been for the last 20 years. Because computers are designed to perform mathematical calculations, our models are all still mathematical rather than, say, neurological. Mathematics has the advantage that it's highly abstract and can be applied ”with varying degrees of accuracy ”to nearly anything.

However, despite the fact that the near future doesn't look as if it will offer any major changes to the way computers are built and programmed, we do expect to see a number of advances in computer programming, and these will have a distinct effect on game development as well.

Scene Representation

We've separated scene representation from animation because they represent two different kinds of programming problems. The current state of the art is to represent a scene as a set of textured surfaces whose shapes are defined by a database of polygons. 3D graphics-acceleration hardware is essential for doing this quickly. It works very well for the simple problem of displaying objects in a room, so now most of the graphics programming effort has turned to harder challenges: creating lighting effects, particle effects, fog, and so on.

As for the future, there could come a time when hardware support for drawing polygons seems as antiquated as hardware support for drawing lines does now. There are at least three other ways in which scenes can be displayed, and there might be more that we haven't envisioned .

Mathematical Representations

At the moment, scenes are represented as data: thousands or millions of points in 3-space that define the corners of the polygons that make up the surfaces that we see. This data normally is created in a 3D scene editor such as Maya, and they take up a lot of memory. They're particularly expensive as a way of representing curved surfaces because they have to break down the curve into a large number of straight lines.

Another way of representing a scene is by describing it as a series of mathematical equations that describe the surfaces, whether curved or flat. Two techniques currently under investigation are nonuniform rational B-splines (NURBS) and B zier patches. Program code uses the equation to calculate the shapes of the surfaces and project them onto the screen. It takes more computing power than drawing polygons, but it allows the artist to represent more curves in less memory.

Procedural Scene Representation

Instead of storing a database of points or mathematical equations, you could "paint" a scene algorithmically by writing program code that generates an image on the fly ”a chair subroutine would draw a picture of a chair , for example. This extends traditional object-oriented programming, in which programmers write code that determines how objects behave, to include the concept of code that determines how they look as well. Unlike raw image data, a drawing algorithm can be given new parameters to tell it to draw things in a variety of ways. A program called AARON has already used this technique to create 2D paintings of people and objects.

Real-Time Ray Tracing

Ray tracing is an extremely slow but powerful technique in which the color of each pixel on the screen is computed, one by one, from a three-dimensional model of a scene, including its light sources. The idea is that each pixel is hypothetically "lit" by a ray of light coming from within the scene somewhere, and the process computes where it originated and what happened to it. Because it computes each ray of light individually, ray tracing can display the effects of mirrors, lenses, translucent surfaces, and anything else that affects light as it travels . The process normally takes many hours to generate a single still frame. Ray tracing is often used to create special effects in movies because for each frame of the movie, it has to be done only once. In a computer game, in which things are changing on the fly 30 times a second or more, it's much too slow ”for now. But there could be a time when real-time ray tracing is made possible by hardware accelerators.

It's impossible to predict which lines of research will prove fruitful. One thing is certain: Whatever is the hottest, most exciting state-of-the-art technique today will be yesterday 's news a few years from now. This is one of the reasons a designer should avoid creating designs dependent on a specific piece of hardware ”they age too quickly.

Animation

In film and television, animation is pre-rendered and can be refined in the studio until it looks right in every scene. In computer games , however, animation must be displayed on the fly, often without any way to adjust it to account for differences between one scene and another. This is fairly easy when animating rigid mechanical objects such as machines, and it is very difficult with soft, deformable objects such as people and animals. When the action being animated is self-contained and has a natural cycle, such as a person running in a straight line along a flat surface, it looks pretty good. However, there are a number of ways in which computer game animation can be improved in the future.

Facial Animation and Speech Generation

The human brain contains special neurological wiring that responds specifically to faces, and the face transmits a huge amount of data about the emotional state of the speaker. Obviously, prescripted facial animation performing in conjunction with fixed audio clips of the character speaking is already commonplace, but the holy grail of facial animation is on-the-fly lip synchronization with artificially generated speech. We've reached the point at which we can do this with rather wooden, robotic-seeming characters , but truly natural-sounding speech will come in time. The other aspect of facial animation, emotional display, also shows great promise. At the moment, a few games are doing this in a fairly clumsy way ”a raised eyebrow or a frown ”but before long we will be able to do more subtle expressions.

Inverse Kinematics

Computer animation involving interactions between two objects often doesn't look right. Consider the simple motion of walking uphill . With each step, the walker's forward foot should stop descending at a point higher than his rear foot because the ground is higher there. The angle of his ankles should also be different from what it is on flat ground because his feet are sloping upward from back to front. If you use the same walk cycle that you would use on flat ground, the walker's forward foot will appear to descend into the earth, and his ankles will be at the wrong angle. To correct these errors, it's possible to use a programming technique called inverse kinematics to compute where the heel and toes should stop based on the height of the ground it will contact. The data for the position and orientation of the legs is then modified to account for the different height of the surface. This has to be done at every step, to compensate for changes the angle of the surface. If the terrain flattens out or starts to slope downhill, the positions of the legs must change to reflect that.

Inverse kinematics have a great many uses besides walking. In reaching out to pick up an object, for example, the distance the character's arm extends naturally depends on how far away the object is. If the animation for extending the arm is fixed, the model must be the same distance from every object it is going to pick up. If the model is too far away, its arm will stop moving before the hand reaches the object, which will then appear to float up in midair. If the model is too near, its hand will appear to pass through the object. By using inverse kinematics, the model's arm can be made to stop extending at the point at which the hand touches the object.

Inverse kinematics are computationally expensive compared to using fixed animation, especially when there are large numbers of animated people moving around in complex environments such as a cocktail party. But research is underway, and as processors become more powerful, we can expect to see more of this technique.

True Locomotion

At the moment, most computer animation moves a 3D model rather the way a marionette moves. A marionette wiggles its legs back and forth to look as if it's walking, but the puppet isn't really moving by pushing its feet against the ground. The same is true of computer models: The movement of the model through an environment is actually computed by a mathematical formula unrelated to the movement of its legs. Typically, the speed of the model either is fixed or varies according to a straight-line acceleration, as with a rocket. But if the movement of the legs doesn't actually match the speed of the model over the ground, it produces a visual anomaly: The character looks as if she is ice skating. This often appears in sports games because different athletes run at different speeds depending on their ability ratings, but they all use the same animation cycle for running. In most other games, all characters move at the same speed or at a few fixed speeds, each of which has its own properly tuned animation cycle. For now, only sports games have a wide and continuous range of speeds for all their athletes .

The solution to this problem, even more processor- intensive than inverse kinematics, is true locomotion ”that is, simulating the movement of a body according to real physics acting on the body, involving its mass, strength, traction on the ground, and many other factors. If done properly, calculations should also take into account such features as the swaying of the person's body as weight shifts and the flexing and deformation of the feet under the changing load conditions of walking. True locomotion is common in pre-rendered animations such as the dinosaurs in the film Jurassic Park , but it has yet to be seen in computer games because we just don't have the processing power to do it in real time, especially for a whole field full of athletes. But it won't be long before we do. It's another thing to look out for in the coming years.

Natural Language Processing

Not long after computers were invented, early artificial intelligence researchers confidently predicted that they would have programs speaking and understanding English within 10 years. Fifty years later, we're not significantly closer to that goal. Computers do use language to communicate with their users, but it's almost entirely by means of prescripted sentences. Few programs have been devised that can express meaning by generating sentences from individual words, and those few usually do so over a very limited domain.

It turns out that generating and understanding natural language is an exceedingly hard problem. Large areas of the human brain are devoted to it. Language comprehension involves much more than understanding the dictionary definitions of the words and the rules of grammar; it also takes into account the relationships between the speakers , their physical circumstances, the sorts of routine conversational scripts that we follow, and many other variables . To give an extremely simple example, a person who is drowning might shout "Help!" to those on shore, and a person on shore who can't offer help himself for some reason might also shout "Help!". The first person obviously means "help me " while the second means "help him. " It's up to the listener to observe the situation and draw the correct conclusion about who needs help. Most of us could do this in a fraction of a second, but at the moment, no computer program can do so at all. A great deal of natural language comprehension is tied into something called "common sense," but common sense is so enormous and illogical that we don't even know how to start to teach it to computers.

Nevertheless, natural language processing will be extremely significant in the games of the future. There are two problems to solve: language recognition and language generation.

Language Recognition

Language recognition isn't the same as voice recognition, which we've already dealt with in the section on gaming hardware. Language recognition is the process of breaking down sentences to decode their meaning, and it is also called parsing . Computers aren't too bad at parsing sentences that refer to a tightly restricted subject. This is what compilers do as the first step in processing program source code. Source code, how ever, has extremely rigid rules and an unambiguous meaning for everything. English is much more complex, fluid, and illogical. Consider the sentence , "Alice told Betty that she would have to leave." Who would have to leave?

Giving orders in English will be a lot of fun, as long as it doesn't prove to be less efficient than doing it by other means. Most games provide a fairly restricted domain, so orders such as "Attack," "Hold your ground," and even "Start a diversion on the west side of the enemy base" won't be too difficult to interpret. But the real challenge for language recognition will be in games with simulated characters, with whom the player wants to have conversations. Early text-based adventure games did a certain amount of this, but most of that work was abandoned with the arrival of graphical adventure games and scripted conversations. For the moment, most programs that try to do language recognition sort of fake it, guessing what the player means from keywords in the input and responding more or less appropriately depending on how good the guess was. It will probably require several more decades of AI research before we can do language recognition well.

Language Generation

Simple language generation ”assembling prerecorded phrases into sentences ”is less difficult than language recognition. Unlike parsing user input, which could be anything, as designers we can limit the scope of what a game character says and guarantee that it's grammatically correct. We're already starting to get good doing this and play it back smoothly, and this will continue to improve.

In the near future, we can't expect to have wide- ranging conversations with artificial characters, but we ought to be able to simulate reasonable interactions in stereotypical sorts of situations: bartenders, gas-station attendants, invading aliens , and so on. For now, these will probably remain scripted conversations, but we might be able to replace the current mechanism, in which the game just delivers a canned piece of dialogue, with a sentence assembled from semantic fragments that vary somewhat depending on the character's state of mind. To give a trivial example, take the sentence "I don't know" as a response to a question to which the character doesn't have the answer. If the character feels sympathetic to the player, the software could add "I'm sorry, but" before the sentence; if the character feels unsympathetic, it could add, "and I don't care" at the end.

Real language use, in the sense of converting a character's mental desire to make a "speech act," along with the semantic content of that act, into an actual utterance, is a far harder problem. Games will undoubtedly be able to do it someday, but, as with language recognition, this is primarily a subject for AI research at the moment.