Training Procedure | AI Game Development: Synthetic Creatures with Learning and Reactive Behaviors

Training procedure uses weight optimization to produce the desired neural network. Effectively, the aim of the training procedure is to satisfy an objective function. The objective function determines the quality of a network, based on a high-level metric: Measure the performance for numerous examples, or just compare the total weight adjustments to a threshold. The objective function mainly determines when the training process is complete.

Data Sets

The training of a perceptron requires example data, namely an existing collection of input data with the desired output. Each of these input/output pairs is known as a sample, a single training case. Together, these samples form the data set.

Typically, not all the data set is used for training; it is split into two or three different subsets. Only one of these is used for training. The other two can be potentially used for validation (checking the results of the training to improve it), and testing (the final performance analysis of the network). Because of the simplicity of single-layer perceptrons, this is not always necessary, but this method comes in handy for more complex problems.

Further Information

The management of data sets is generally an extremely common problem in pattern recognition. Perceptrons also can be used for classification and feature recognition, but do not require any measures to split the data set (because it always converges). Chapter 26, "Classification and Regression Trees," covers the handling of data sets in more detail.

Training Algorithms

Training a perceptron usually adjusts the weights together using the optimization techniques presented. The key difference between training algorithms is how to process the samples, and there are two different approaches:

Each case can be treated individually, in an incremental fashion. The weights of the network are updated every time a sample is processed.
All the samples can be treated as a batch. The weights are updated only after the entire set has been processed. This is known as an epoch. (It's a reference to measure the performance-learning algorithms.)

Regardless of the approach used, the aim of the training process is to adjust the weights into a near optimal configuration, which will allow the network to perform well when evaluated.

Perceptron Training

The perceptron training algorithm is an incremental approach, but makes use of gradient information for better convergence (see Listing 17.2). This is done using the steepest descent technique, which computes the necessary adjustment Dw_i for each weight w_i:

graphics/215equ01.gif

This equation expresses the necessary change to a weight in terms of the learning rate h, the output difference between the actual output y and the desired target t, and the current value of the input x_i. The learning rate h is a small constant, usually chosen by the AI engineer, as discussed for the gradient methods. Formally, this is gradient descent on the error surface.

Listing 17.2 The Perceptron Training Algorithm

 initialize weights randomly while the objective function is unsatisfied      for each sample           simulate the perceptron           if result is invalid                for all inputs i                     delta = desired output                     weights[i] += learning_rate * delta * inputs[i]                end for           end if      end for end while

Testing of the result's validity is usually based on Boolean logic. The inputs and outputs are also usually set to 0 or 1. The interesting point to notice is that only the misclassified patterns are used to update the weights of the network.

Delta Rule

The delta rule is the equation expressing the gradient of the error in each weight, but it has also given its name to a training algorithm (see Listing 17.3). (It is also the basis of preceding solution.) A batch approach processes all the training samples before updating the weights.

Listing 17.3 The Delta Rule Applied as a Batch Learning Algorithm

 while termination condition is not verified      reset steps array to 0      for each training sample           compute the output of the perceptron           for each weight i                delta = desired output                steps[i] += delta * inputs[i]           end for      end for      for each weight i           weights[i] += learning_rate * steps[i]      end for end while

Mathematically, this corresponds to gradient descent on the quadratic error surface. In practice, it means the error is minimized globally for the entire data set, and provably so! The best result is always reached, so no validation is needed.

Synopsis

Perceptrons are an incredibly simple model providing a solution to linear problems. As such, there are very straightforward and efficient ways to teach it. The main decision is between the perceptron training algorithm and a batched delta rule.

Both methods are proven to find solutions if they exist, given a small enough learning rate h. The perceptron training will just make sure that all the outputs are correct in binary terms. On the other hand, the delta rule will minimize the error over all the training samples in continuous space. This guarantees that there is one single global minimum, and that the learning will converge (given a suitable h). This has many advantages, including the ability to deal with noise and provide a good approximation for nonlinear functions.

As such, the delta rule used in batch mode should be chosen whenever possible. The main requirement for this is to have all the data sets available for training (for instance, a log of wins and losses from the game). If this is not the case, and the perceptron needs to be learned using a stream of incoming data samples, the only option is an incremental one (for instance, learning tactics from experience during the game). Once again, just a simple application of the delta rules suffices; discarding samples classified correctly as in perceptron training can be useful in this case to prevent favoring recently learned samples.