Modeling Movement | AI Game Development: Synthetic Creatures with Learning and Reactive Behaviors

Different types of movement can be used in deathmatch situations. A Q-learning component is responsible for mapping the situation to the right type of movement, with the reward signal based on the moods and success in the game.

Actions

The actions correspond to the different types of movement. These include stand, explore, and gather when no enemy is present, but include pursue and evade during combat. Each movement type may be executed at a different speed, so a walking and running variation could be made available. This increases the size of action space, however, so speeds will only be included if there are practical benefits.

States

The definition of states consists mostly of high-level features representing the state of play. For combat, features include the predicted outcome, the current trend, and the physical capabilities for continuing the fight. These are designed by experts to take into account multiple details of the current situation. The state also includes information about the environment when a fight is not active (for instance, nearby sounds).

Reward Signal

The evaluative feedback is based on the outcome of fights (for instance, death, survival, or a kill). Various other events are also considered, depending on the mood (for instance, damage inflicted, fight time, and terrain explored). The reward signal is accumulated over time, but is discounted at regular intervals. Small discounts of few percent (around 10 percent) are used to emphasize recent experiences.

Learning Policy

Unlike the gathering behaviors, the learning for movement must cope with discounted rewards. To achieve this, the reward is propagated from one state to another as time goes by. (This is known as backup of depth 1, common in Q-learning.) Because Q-learning uses the estimates of previous return values to compute the value of the current state, it uses bootstrapping.

The reward signal is discounted and accumulated over time until the state changes. Then, the value of the previous state is updated based on the current state value.