Test Automation in Action: The Sims Online

In their 2003 presentation to the Game Developer's Conference in San Jose, Greg Kearney, Larry Mellon, and Darrin West detailed their implementation of game test automation in the development of The Sims Online (http://serious-code.net/moin.cgi/AutomatedTestingInMmpGames). The problem they identified was that developing and deploying massively multiplayer Permanent State Worlds (PSWs) has proven to be very difficult due to the distributed nature and large scale of the system.

Their goal was to significantly reduce costs, appreciably stabilize the game, and reduce errors in The Sims Online (TSO) by automating their testing program. They wanted the tools they created to increase the efficiency of the initial development of the game, provide early load testing, and carry over to central company operations as long- term aids in the maintenance and testing of extensions of TSO and future games .

The TSO test team focused its automation efforts on game aspects involving highly repetitive actions and scenarios that would require large numbers of connected players. For this reason, the team identified regression and load testing as the ideal candidates for automation, because a roadblock in the client connection process could make the entire game grind to a halt. Other less mission-critical elements could be tested by hand and would not have as great an effect on the gameplay. Additionally, the team wanted to add further automated code to assist in game development, game tuning, and marketing.

SimScript

The TSO team used a subset of the main game client to create a test client in which the GUI is mimicked via a script-driven control system. To the main game server these test clients look identical to actual connected game player clients . Test scripts produce in-game actions in place of human players. This is enabled by a Presentation Layer inserted between the TSO client's GUI and the supporting client-side portions of the game code. Their scripting system ‚ which they dubbed SimScript ‚ is attached to this Presentation Layer. Simulated user play sessions are generated by a series of scripted actions such as "Create Avatar," "Use Object," or "Buy House."

SimScript is an extremely simple scripting system. As it is intended to mimic or record a series of user actions, it supports no conditional statements, loop constructs, or arithmetic operations. Stored procedures and const-style parameters are used to support reusable scripts for common functionality across multiple tests.

Two basic flow control statements exist in SimScript: WaitFor and WaitUntil . These provide the ability to simulate the time gaps between mimicked user actions "wait_for 5 seconds", "wait_until reading _skill: 100", to block until a process enters a particular state ("wait_until client_state: in_a_house"), and to synchronize actions across the distributed system ("wait_until avatar_two: arrives"). WaitUntil commands simply block script execution until the condition is met, or a timeout value is exceeded.

http://serious-code.net/moin.cgi/AutomatedTestingInMmpGames

Here's an example of SimScript code:

 # this script brings an avatar directly into a testable condition inside the game: a quick skill test is then performed wait_until       game_state      selectasim pick_avatar      $alpha_chimp wait_until       game_state      inlot chat              Hi. I'm in and running. log_message      Testing object placement log_objects place_object     chair           10 10 log_objects # invoke a command on a different client remote_command   $monkey_bo.                 use_object chair sit # and do some skill increase for self set_data         avatar                        reading_skill 0 use_object       bookshelf                    read wait_until       avatar                       reading_skil 100

Load Testing

Load-testing using the TSO system was a significant challenge because they wanted to simulate realistic loads accurately on a continual basis to stabilize their system prior to going live. To this effect, they created a series of test clients. Each client acts independently and is controlled either by a scripted series of individual user interface actions or by an event generation algorithm. The system was set up to collect data from all clients and from the main server cluster and to start up and shut down the system automatically, while also fully automating the metrics collection. To control the load testing they used LoadRunner, a commercially available load generation system. Bridge code hooks LoadRunner into the TSO test client and enables LoadRunner to control up to thousands of simulated users against a given candidate server cluster. Configuring the client to run without a GUI ( NullView ) significantly reduces the memory footprint and allows many more test clients per load generation box.

Regular Regression

Because of the nature of distributed systems, new code will regularly break old code. For this reason, the TSO automated test system had to be able to validate every game feature to ensure stability of the release game. Its regression engine was linked closely to the outcome of normal QA procedures, ensuring that wherever a new defect was observed , the regression test client would focus on suspected trouble spots and generate useful data to completely eradicate the bug. Thus, critical path roadblocks were quickly identified and removed. Interestingly, the team noted a large number of false positives and false negatives generated by the system, so the importance of human viewing of test result data was emphasized to ensure there would be no wasted time chasing items that could be overlooked.

New Use for Old Code

Surprisingly, the team reported that relatively little new C++ code was required to support their testing system. This was largely achieved by utilizing the existing game code ‚ the game client code in particular ‚ and reformatting it as test code. Similarly, reusing the actual game GUI and reconfiguring it for test purposes kept new coding to a minimum. Indeed, the basis for the test code was found in existing code already in the game that had been put there to enable normal testers and programmers to "cheat."

All for One and One for All

Many of the TSO test scripts are relatively short. Scripting all required test conditions by hand was not realistic, so an extensible series of algorithms was used to generate events wherever possible. Random and deterministic strategies were used to generate scripts on-the-fly to replace the otherwise lengthy hand coding that would have been required. For instance, the TSO team introduced an algorithm they called TestAll that would walk through the list of all objects currently in a location, build a list of all possible interactions, and then execute all possible actions using a deterministic strategy. They were then able to generalize the results by having the system place the objects in various locations throughout the game world and retest the objects in a host of new terrain or location configurations. Using this approach, the team had 230 game objects under regression and about 40% of the game's supporting infrastructure under automated regression within a month of starting.

Lessons Learned

The TSO team learned a number of important lessons from automating the testing of their game. See which ones you can apply to your own:

Use automated regression tests to ensure that once a feature works it should never be allowed to stop working.
Get solid support from management to prevent even minor deviations from the test procedures. This could defeat much of what you are trying to accomplish with your automation.
Identify the critical path elements and protect them. Basic infrastructure tests need to precede any specific scripted feature tests.
Build testability into the core design of a game. The team soon discovered that many elements of TSO were not well suited to automated testing, but they could have been if testing had been a concern from the genesis of the design process.
Develop or acquire new tools to help organize and track the information the original test tools generate. Creating and maintaining hundreds of automated test cases can quickly become unmanageable.
Automate the noise reduction aspect of the system, filtering out data that the system doesn't need to process. Otherwise, information overload can quickly become a problem.
Run a basic low-level "sniff test" of the code before it gets into the main release of the game. This is immensely useful in avoiding substantial roadblocks in the development process.
Automate full-scale load testing, running constant tests of thousands of active clients. These tests significantly reduce problems that you would otherwise face at launch.
Abstract test commands to maximize their re-use. The SimScript language can be used to automate single or multiplayer modes and titles, and is independent of which platform the game is being tested on.

The TSO team succeeded by not trying to automate everything, focusing the automation where it was of best use and, where possible, using game code and existing game clients that minimized the need to create new code that itself would need to be debugged .