Advanced Issues | AI Game Development: Synthetic Creatures with Learning and Reactive Behaviors

When the policies, operators, and other parameters of the genetic algorithm combine together, unexpected phenomena can arise when solving problems.

Premature Convergence

Really successful individuals tend to see their genes spread throughout the population. If a particular allele is much better than all the others, it will start taking over. Eventually, there may be only one allele left for that gene. This is known as genetic drift.

Isn't keeping the fittest alleles of all genes desirable? If the solution is optimal, such convergence is desirable. However, like other forms of optimization, convergence can happen too soon; the solution may not actually be the best. In this case, genetic drift is a problem because diversity in the gene pool is lost. Once particular alleles have taken over, it is difficult to get evolution back on track.

Genetic drift is especially problematic for small populations. In this case, it will take few generations for the genetic code of successful individuals to take over. With larger populations, the risk of premature convergence is reduced.

An easy solution to genetic drift involves using a much higher mutation rate, especially toward the end when convergence is detected. This will change the values of genes with a higher probability, and attempt to reintroduce some new allele. This is an appealing alternative to premature convergence, but the high levels of randomness don't help the genetic algorithm.

A better way lies in preventing genetic drift rather than curing it. The trick is to impose constraints on the population, allowing them to breed with only a subset of the population:

Multiple populations can be introduced, which can be relatively small. They exchange individuals among each other to allow the spreading of genes, while still preventing premature convergence.
Single populations with spatial structure are a popular alternative. Here, individuals live on a 2D grid, and are only allowed to breed with others nearby. Offspring are reinserted locally, too. The spreading of genetic code is constrained by space, which introduces fitness islands: places where the genetic code is similar.

These techniques have proven very powerful in preventing premature convergence. Multiple populations are likely to use more memory, but this is an improvement over the huge populations required to cure genetic drift. Using spatial organization in populations is an easy alternative.

Slow Convergence

It can also be difficult to get the algorithm to converge at all. This is an issue when the fitness function gives few hints about the solution. (That is, all fitness values are similar.) Noise in the fitness can also make it difficult to converge to a result.

In this case, redesigning the fitness function would be strongly recommended (see Chapter 35). As an alternative, however, we can turn up the elitism to really encourage fitter individuals to spread their genes. This runs the risk of premature convergence, but it's a trade-off that often has to be made!

An adaptive elitist policy can improve the situation slightly. Starting off with high elitism will encourage the results to improve at the start. Then, decreasing elitism as the fitness increases often manages to prevent premature convergence. Together with this, increasingly random mutation can be used to explore the search space better.

Parameters, Parameters

Genetic algorithms suffer from having lots of parameters. These are mutation rate, crossover type and parameters, population size, and fitness scaling policy just to name a few. There are good values to start experimenting with, but often the ideal values will depend on the problem itself.

There are alternatives to manual experimentation, but these generally cost computation time:

It's possible to use a hierarchy of genetic algorithms. One meta-genetic algorithm sits on top of the ordinary genetic algorithm and optimizes the parameters mentioned, which is particularly slow.
There is also a so-called parameterless genetic algorithm. It actually does have parameters, but they are chosen so that no human input is needed. This uses a combination of suitable parameter choices for the crossover rates and mutation rates, but then uses brute force by simulating many different population sizes [Harik99].

In practice, the default parameters work well. Although it wouldn't take long to implement either of these strategies, it seems like overkill for the kind of problems that are likely to be encountered. It's far more constructive to spend time on the design instead.

Domain Knowledge

Genetic algorithms can be relatively naive; they do not exploit knowledge of the problem to help solve it quicker. It's possible for the AI engineer to give hints to the genetic algorithm by designing specific policies and genetic operators.

For example, during the initialization there can be a bias for certain genotypes. If an expert can tell which ones are likely to do well, the algorithm can generate more similar solutions randomly.

As for the genetic operators, custom crossover and mutation routines can be designed. These will take into account the good places to split the genetic code, use gene-dependant mutation rates, or even prevent fruitless manipulation of genes before they happen.

Finally, the representation itself can help the genetic algorithm. It may be possible to choose a representation that minimizes the size of the search space, or even its complexity. Interdependent genes can be placed close to each other so that they will be kept together when transmitted to the offspring.