Learning Shooting Styles | AI Game Development: Synthetic Creatures with Learning and Reactive Behaviors

The weapon selection and shooting behaviors to a lesser extent can be adjusted to satisfy different requirements (for instance, based on effectiveness or the moods of the animats). This is achieved with an RL approach based on statistics and learning episodes (a.k.a. Monte Carlo; see Chapter 46, "Reinforcement Learning").

Actions

The possible actions selected by the learning algorithm correspond to styles of shooting and weapon selection. For example, common requirements are high damage per second, better hit probabilities, prolonged fights, and highly lethal first shots. The RL algorithm learns the most suitable style for each situation, which is then passed to the respective capabilities.

States

The choice of the states reflects the reward signal, because we want the states to have accurate value estimates. Therefore, the moods are the primary parameters of the state, because the reward and hence the policy changes from mood to mood. However, some aspects of the reward may not be subject to moods (for instance, death). So, other strategic features of the situation are included in the model of the state as for the movement component. This allows the RL to find the correct styles based on moods and other general trends.

Reward Signal

The reward signal is only active during fights, specifically when the animat is firing projectiles. This allows the animat to distinguish between a poor reward from inactivity and a poor reward from unsatisfactory performance (which helps the learning). The reward signal is usually determined by the moods, but some aspects of the reward are independent from emotions (such as the basic desire for survival).

Learning Policy

Shooting styles are selected at regular intervals when a fight is active. The learning is episodic, so the estimates are only adjusted when the fight is over (and the fight usually finishes with extreme reward values, either high or low). All the states and rewards gathered during the fight are processed, and the estimated return for each action is updated accordingly.