Approach to Learning | AI Game Development: Synthetic Creatures with Learning and Reactive Behaviors

With the purpose of learning clear, this section looks at different ways to achieve it.

Two Game Phases

Learning can take place during the development (offline), or after the game has started (online).

Offline

When learning is used during the development, it can be considered as a preprocess on the same terms as compacting graphical data structures or compressing sound files. Offline learning generally assumes that the problem is static, and that an optimal solution can be found automatically. The developer can assist and manage the learning by using tools and methodologies.

This approach is used by cunning and efficient developers who have the experience to let the AI do their work for them quicker than they would do it themselves. It's possible because there is a well-identified goal and the system remains the same in the game.

Online

Online learning happens after the game has shipped, whenever the player launches the simulation. In many cases, this form of adaptation is used as part of the design to provide a more immersive world and believable creatures.

This approach requires much more thought, because the developer must foresee many possible situations and outcomes. In many cases, the AI engineers are responsible for guaranteeing that their design is safe, because no amount of beta testing will take into account all the situations. We'll discuss the issues that arise in more detail during Chapter 48.

Misidentification

Although online learning can be a justifiable design feature, sometimes developers believe they need online learning, but in fact, it's because they are incapable of defining a good solution offline (for instance, too big or too complex). This is a perfectly valid justification for online learning, as long as the AI engineer is aware of the alternative! When trying to find a solution, both online and offline solutions should be equally considered.

For example, most simple NPCs (for instance, cannon fodder) do not need to learn online; they can be trained offline. Their behavior is simple to identify: The problem is mostly static. It's also very good practice to use offline learning even when online adaptation is required. This allows near-optimal default behaviors to be present, allowing the AI to look reasonable from the start.

Two Fundamental Techniques

Technically speaking, there are two ways of learning information: batch and incremental.

Batch

Batch learning uses a large data set to extract relevant facts. The idea is to rely on vast quantities of information to extract general trends. This is the principle behind data mining, for which decision trees and neural networks are commonly used. Decision trees are batch learners because they process all samples to build the tree recursively.

Batch algorithms are generally very efficient, and can be easily optimized because they operate on large data sets in one pass. They also provide great quality solutions, because they consider such a wide variety of examples. For this reason, batch algorithms should always be preferred, or attempted first as a proof of concept.

Incremental

The incremental method takes samples one by one, and tweaks the internal representation by a small amount every time. Eventually, a good solution arises from the learning.

Incremental learning uses much less memory than batch algorithms, because in incremental learning, data is immediately discarded after it has been learned. However, the quality of the results can suffer from this, because it's much tougher to learn incrementally without "forgetting."

Interoperability

Generally, it's the case that incremental algorithms are used online, and batch algorithms offline. However, both these design decisions are not connected; either form of algorithm could easily be used in both contexts.

It's certainly feasible for an incremental algorithm to learn from a full data set (for instance, perceptron back propagation to learn target selection), and to get a batch algorithm to update the decision tree (for instance, learning weapon selection). Both these combinations are somewhat inefficient computationally, especially the second. However, they have their uses on a case-by-case basis, when needing low memory usage or better quality behaviors respectively. Naturally, there are a few precautions to take in each of these cases, but once again we'll discuss these in Chapter 48.