Test Plans | Game Coding Complete

The test plan is an exhaustive restatement of the design document. It goes without saying that if you don't have a design document you simply can't create a reasonable test plan. I can already hear people out there tapping away nasty emails to me saying, "Design documents? "We don't need no steenkin design documents!" They probably never had to write one, and they are writing fantastic games from the seat of their pants and are quite happy about it. More power to you. I wish I were that good, or perhaps I should say I wish I was that lucky. Perhaps those same people sending me smug emails are also the people who've worked every weekend for the last year on a project that is "Still in Beta, but tracking well toward the ship date." I can truthfully say that any project I've worked on that had detailed design documents and its associated test plan shipped on time and on budget. As of this writing, that is five projects over the last five years, creating a consecutive on time ship record. I'm not bragging. I'm trying to save you from interminable crunch mode, lost weekends, and buggy games.

I should also mention that my product development teams worked little overtime, too. I attribute that success to the talent of the team, surely, but also to their hard line organization. Test plans are a big part of that organization. We use six different kinds of formal testing in our projects:

Functional Tests: These are checklists that touch each feature of the game as it was intended by the designers to be touched.
Stress Tests: These tests are written by truly the most evil and imaginative testers on the team and include crazy things like setting Ethernet cables on fire to see how the application will react.
Playability Tests: These tests focus on the attitude of the player about game features (sound, graphics, design, and so on) and how they compare to other games.
Usability Tests: These tests focus primarily on the user interface and the accessibility of the features and controls of the game by the intended audience.
Configuration Tests: These tests focus on running the games under all possible hardware and operating system configurations.

Functional Tests

If the design document is a blueprint for the developers to create a game, the functional test is a series of checklists to determine how close the resulting application comes to the construction ideal. If you ever build a house you'll be presented with a set of blueprints and perhaps some architectural sketches to give you an idea of what the real house will look like and how its subsystems will function when it is complete. As the house is built, the builder will ask you to perform a "walk-through" from time to time so you catch any misinterpretations as they happen. As you check each part of the house, you perform a functional test against the blueprints. If the blueprints specified a three-car garage, you'll probably object to finding a two-car garage during your inspection. The same process works for evaluating the various construction phases of software.

Functional tests are customized heavily for each game, but they do have a common format. You should remember that functional tests are going to be handed out to testers that have never seen the game. The test document should include exhaustive details on what is being tested, how to begin the test, exactly what steps to follow to complete the test, and how to record the results. If you want to see the entire functional test for a game we need to focus on a really simple game such as Roulette. You may not know every detail but you know that you bet on a number, and if the ball lands on the number you get paid. Sounds simple, right? It's more complicated than you'd thing, so let's look at an example of the functional test for a game of roulette.

Description:
Wagers are made by placing your chips on the table in positions correlating with where you expect the ball to land on the wheel. You can bet on color, odd/even, single numbers (including zero/double zero), combinations of numbers, or groups of numbers.
The Roulette operator known as the banker, croupier, or dealer spins the wheel after wagers have been placed and releases an ivory marble into the bowl in the opposite direction of the spinning wheel. When the spinning wheel slows, the marble will come to rest in one of the compartments that will be the winning bet. The marker is placed on the Roulette Table's corresponding number to where the ball landed and wagers are paid appropriately before removing the marker and moving on to the next game.
Gameflow/lnterface:
- SPIN Spins the roulette wheel (see payoff table below)
- Notice that the maximum pause between pushing the spin button and the release of the ball is less than 1.5 seconds.
- Notice that the ball always makes at least 4 orbits before beginning to bounce around.
- All animations are synced properly with each other, there is no jumping or skipping.
- Notice that the ball lands in the correct pocket.
- Speech plays and informs player of spin result.
- REPEAT BET: After a spin and bets are resolved, pressing this button will duplicate the bets made in the previous round
- EXIT: Returns player to appropriate casino floor.
- Roulette Wheel Close up
- Displays the cup where the ball eventually lands.
- Displays the total of all bets on the table.
- Displays winnings after a spin.
- The close up image matches the wheel shown on the table.
- Number Tracking (chart located on the top right corner) keeps track of number and color of past spins.

Betting :

Bet placed before spin.
Bankroll is decreased by the appropriate amount of the bet.
Maximum total of all bets on the table is $1,000.00.
Able to bet by left-clicking on chip denomination and left-clicking betting spot.
Able to remove bet by right-clicking on chip.
Tool tip displays bet amount and payout odds for ANY chip placed on the betting table.
The table clearing animation happens in the following way:
The losing bets are drawn translucently and begin to disappear from left to right.
The winning bets are paid off individually, with a sound effect.
Each players winning chips are slid over the felt towards them.
At each stage of the animation, the OK button can be hit and the animation will abort; clearing the table instantly.

Bet	Type	Return
1 to 18	Win: number is 1 through 18	Payoff is 1–1 (e.g., $1 bet = $2 return)	Bankroll is adjusted correctly
19 to 36	Win: number is 19 through 36	Payoff is 1–1 (e.g., $1 bet = $2 return)	Bankroll is adjusted correctly
Even	Win: number is EVEN	Payoff is 1–1 (e.g., $1 bet = $2 return)	Bankroll is adjusted correctly
Odd	Win: number is ODD	Payoff is 1–1 (e.g., $1 bet = $2 return)	Bankroll is adjusted correctly
RED	Win: number is RED	Payoff is 1–1 (e.g., $1 bet = $2 return)	Bankroll is adjusted correctly
BLACK	Win: number is BLACK	Payoff is 1–1 (e.g., $1 bet = $2 return)	Bankroll is adjusted correctly
1^st 12	Win: number is 1–12	Payoff is 2-1 (e.g., $1 bet = $3 return)	Bankroll is adjusted correctly
2^nd 12	Win: number is 13–24	Payoff is 2-1 (e.g., $1 bet = $3 return)	Bankroll is adjusted correctly
3^rd 12	Win: number is 25–36	Payoff is 2-1 (e.g., $1 bet = $3 return)	Bankroll is adjusted correctly
Column Bet 1 (1–34)	Win: number is in column beginning with 1 ending with 34	Payoff is 2-1 (e.g., $1 bet = $3 return)	Bankroll is adjusted correctly
Column Bet 2 (2–35)	Win: number is in column beginning with 2 ending with 35	Payoff is 2-1 (e.g., $1 bet = $3 return)	Bankroll is adjusted correctly
Column Bet 3 (3–36)	Win: number is in column beginning with 3 ending with 36	Payoff is 2-1 (e.g., $1 bet = $3 return)	Bankroll is adjusted correctly
Line Bet Placed on the top edge of betting table between 2 numbers (e.g., corner of 21 24 line bet covers #19–24)	Win: number is in either row touching chip (e.g., corner of 21 and 24 line bet wins on #19–24)	Payoff is 5-1 (e.g., $1 bet = $6 return)	Bankroll is adjusted correctly
Five Number Bet (only 1 spot that is a 5 number bet on the table) Placed on the top edge of betting table at the corner of 3 and 00	Win: number is 1,2,3,0,00	Payoff is 6-1 (e.g., $1 bet = $7 return)	Bankroll is adjusted correctly
Corner Bets Placed at the corner of 4	Win: number is one of the 4 numbers the chip touches	Payoff is 8-1 (e.g., $1 bet = $9 return) numbers	Bankroll is adjusted correctly
Street Bet (Trio Bet) Placed on the top edge of betting table along the side of a single number (e.g., edge of 15 covers numbers 13,14,15)	Win: number is in the same row as chip (e.g., edge of 15 covers 13,14,15)	Payoff is 11–1 (e.g., $ 1 bet = $12 return)	Bankroll is adjusted correctly
Split bet Bet is placed on the edge	Win: number is one of the 2 numbers that the chip	Payoff is 17-1 (e.g., $1 bet =touches $18 return)	Bankroll is adjusted correctly
Straight Bet Bet is placed on a single number	Win: Exact number	Payoff is 35 to 1 (e.g., $1 bet = $36 return)	Bankroll is adjust correctly

Shortcut Keys:
- Testing Keys
- 'H' brings up a dialog that will set the next winning number. NOTE: Activate this dialog only after all bets have been placed on the table.
- 'W' tests the wheel balance. See the results in Roulette Test.txt.
- 'B' places $1 on each betting location on the table. This is useful for checking correct payouts. NOTE: There is a significant pause (6–7 seconds) while waiting for all the bets to be placed.
- Enter - Clear Table
- S - Spin Wheel
- Up Arrow - Same Bet
- Down Arrow- retract last bet

The functional test starts with some user education, again making no assumptions regarding the background of the tester. The person performing the test has perhaps heard of roulette, but has never played it. Even if they are a roulette expert, they've never before played your implementation of it and will need some basic instruction to get the game started.

The gameflow/interface section goes into detail about how the game is played and what to expect. The checklist describes animations, sound effects, and speech the tester should experience. Many functional tests are like this one and are order dependant, which means that the tester must complete each test in a specific order. It would be a good idea to specify that fact for the tester somewhere in the document; if a subsystem is broken the tester might be able to continue the functional test on a different subsystem. It is extremely important that this is made absolutely clear in the document, since bad test results on one subsystem may invalidate any later tests. The tester will usually have a good idea that something is wrong when all the tests fail one after the other and will likely abort the remaining checklists.

Take a look at the next section containing tests for bets and win results. This is a good example of a complete test of all the betting and winning combinations possible on a roulette table. There are 18 sections corresponding to the different payouts on a roulette table. If you are a roulette expert you'll remember that there are over 200 places to bet on a roulette table. Should each betting location be exposed in the functional test? I'm pretty sure that if a tester was forced to place a bet on each betting location in turn and wait the 15 seconds or so for the roulette ball to land in a pocket over 200 times, there would certainly be a killing spree. The tester's time is extremely valuable, and functional tests should always strive to be constructed to save them time while maximizing the testing coverage.

The goal of a functional test is to verify the development plan against the bits created by the development team. This kind of verification takes a human mind since it is frequently subjective. If the development plan calls for "realistic animations of spinning roulette wheel," a human being must make that judgement. I'd expect when automated testing gets good enough to make those calls we'll probably have automated software development as well and both automated systems could argue at 10Ghz over whether the animation is realistic enough. You and I will be sitting on the beach somewhere either collecting royalties or hunting for food.

Stress Testing

Stress tests try to push the application to run at or past the edge of reasonable operating limits. Many issues found after running all the stress tests are fixed, but some are relegated to the "readme" file or a troubleshooting FAQ on a web site somewhere. You can assume that if your game sells a few million copies (lucky you) that the one crazy bug that happens after a million hours of gameplay will happen to some poor sod every few hours. Hopefully, not to the same person! If it's going to happen, you should certainly understand how bad it can be and whether a workaround will exist.

Gotcha

Stress tests should test the application under limited system memory. Low memory is trickier on Windows applications, since the virtual memory manager swaps under used memory pages out to the hard disk. Many games wisely pre-allocate all memory requirements when the game initializes, or perhaps at the beginning of a mission or level. A role playing game uses memory in a much more dynamic fashion. Continuous worlds and freedom of character movement require sophisticated resource caching, which can result in unpredictable memory usage. Any memory leaks quickly become a problem in this type of game. Cracking the computer case and removing memory is the very best test of any low memory configuration. Developers have a little more trouble debugging memory issues, since the development environment takes significantly more memory than the standalone application. For them, the best alternative is to find or write a little application that simply "eats" a specified block of memory, and then runs the application.

Similar to the low system RAM tests, you can also run stress tests in low VRAM situations. That is, of course, except for all you Xbox developers out there because the Xbox has a unified memory architecture (you lucky bastards).

Gotcha

The Windows Task manager might not be telling you the truth about how much memory you are using. Under some memory management schemes, MFC included, freed memory is not reflected in the task manager immediately. If you want to see how much memory your game is really using, minimize it. Any freed memory will be reflected in the task manager.

Stress tests should include any use of secondary storage, whether it is PC disk space or console memory units. Tests should include initializing the game with little or no extra secondary storage space. You should also test to see what happens if secondary storage space is extinguished while the game is running. PC game developers should also include tests of other applications sucking up the hard drive space while your game is running.

Most games expect certain hardware, such as input devices and sound cards. Stress tests should always include a suite where expected hardware requirements are not available when the game initializes. Some programmers forget that fact when writing code, and simply assume that everyone is going to have a sound card installed in their PC. While it may be rare to find a PC without a sound card, it is probably a lot less rare to find that someone simply disabled his or her sound card in the current hardware configuration. Doing that is equivalent to removing the hardware, and any game should at least make an effort to detect missing hardware during initialization.

Properly handing the case where the user changes the hardware configuration while the game is running is, I believe, a little too much to ask for any developer, and would certainly fall under the "Doctor, doctor, it hurts when I disable my video card while the game is running." No kidding, you doofus.

One caveat to this for PC developers is properly handling someone changing the video resolution or bit depth during game play. This can be a real pain in the ass, but it's important to support because what's happening in the background is a lost surface (or texture) is detected during the render loop. If a surface is lost, you must write code to restore it, because there are tons of possibilities where this can occur. A screen saver that uses Direct3D to display psychedelic pictures will cause any other DirectX based game to lose surfaces if it ever takes control. If the programmers didn't properly detect and restore those lost surfaces the game will crash. It's the kind of thing that makes PC programmers flock to console development. Tests to cover this should include running DirectX based screen savers in the background, minimizing the game and changing the desktop bit depth or resolution, and flipping back and forth from windowed mode (if you attempt to support running in windowed mode, which can make your life much harder), to full screen mode.

Gotcha

Video bit depth is serious business for Windows games that run in windowed mode. DirectX applications will not run in 8-bit or 24-bit windowed modes on some video cards. Most games simply default to full screen when they detect they can't run in a window. This also goes for game windows that use a full 800x600 window. Customers running in 640x480 or 800x600 do not have the screen real estate to run in windowed mode, requiring your application to detect it and default to full screen.

The last topic in stressing limited resource availability could clearly fill an entire book all by itself—network connectivity. I spoke earlier of setting the Ethernet cable on fire, and while that is a valid test it would perhaps be easier for your IT group if you simply unplugged it before and during game play. Other tests should include anything you can do to force limits on the size and quality of the bandwidth.

All applications should detect limited resources when they initialize. Every stress test should establish the application's ability to detect a lack of any kind of resource: secondary storage, CPU speed, or other hardware. If a needed resource is lacking when the game initializes, the application should notify the user and perhaps even give them a solution. "The system is low on memory—close some applications and try again." This is the bare minimum and every game should do this—no excuses.

Detecting and handling dwindling resources while the game is running is much, much harder. It is much rarer for a game to elegantly handle all out of memory conditions or file I/O errors. Stress tests can and should test these scenarios, especially since it is important to understand how your game will fail. Most likely the user will have to go back to a saved game, resulting in some lost time. Certainly if the programming team is going to write code to handle these issues properly, specific stress tests should be added to the test plan to ensure that they actually succeed.

Stress testing can also push the game interface. Going bananas on the keyboard, mouse, or joystick buttons can expose weaknesses in the code or game design. Imagine a game that can support only a few simultaneous sound effects, but also plays a short sound effect when a key is pressed. Mashing tons of keys at the same time may expose a weakness if the programmer wasn't thoughtful enough to check how many sound effects are currently playing before launching a new one. This is a case where stress testing will expose a weakness that is important to fix.

Playability Testing

A game that is playable is one that captures and keeps the attention of the target audience and gives them great entertainment. One way to perform this test is to find people who are hard core players and knowledgeable about the game genre and have them play the game for an hour and answer questions about it and themselves. By the way, there is such as thing as a hard core solitaire player—I've met them.

A typical questionnaire will ask the person about their profession, how many games they play, which games in particular they've played lately, and their hardware. The results of the questionnaire allow the proctors of the test to tabulate the results based on similarities of player profile. Imagine you are working on a game with a wide audience such as a trivia game. If your playability tests measured the age brackets of your players you may very well find that older players enjoyed the game but gave the user interface low marks. You might use this information to increase your font size to make your user interface easier to read.

After the personal demographics, a playability test will ask users what they thought about the game design: the graphics, sound, tutorials, online help, in short, about every major subsystem. It is common to have playability tests performed not only on your game in production, but published versions of your competitor's games. It can give you excellent ideas about how to improve upon your design.

Playability testing is a deep exploration of the game. Each person playing the game may spend a few hours with it and some extra time filling out the questionnaire. It is pretty common for the developers to help people find the features and by giving hints and making themselves available for questions. Remember that the idea is to expose the testers to as much of the game as possible, so they can compare it to other games they've played.

Console companies will also do playability testing on your product as well. Since most console companies fear an overall weak library of games will destroy their product, they are careful to evaluate all games for their systems. They will rigorously test your game for quality and playability ideas.

Usability Testing

Usability testing specifically tests the discoverability of the user interface and the mechanics of the game. The people taking part in the test are given a list of tasks to perform in the game and absolutely no instructions on how to go about completing those tasks. It may sound odd, but it accurately recreates the exact conditions of someone playing the game for the first time. Absolutely no interaction between the developers and testers is allowed, since that would invalidate the test. Watching someone attempt to perform a task in the game, and failing to do so, is extremely educational for developers.

A Tale from the Pixel Mines

The first leisure title I ever worked on was Microsoft Casino. I thought this would be a piece of cake after working on the Ultima series at Origin Systems; boy was I ever wrong. Creating a discoverable game interface for my target audience, which turned out to be people my parent's age, was much more challenging than I thought it was. Microsoft has fantastic usability labs and working with them was my first experience with this kind of testing. When the first usability tests came back negative, I thought surely something had gone wrong in the test. I simply couldn't believe that people couldn't figure out how to place bets in Blackjack, our first working game. All you had to do was click on the table! Frustrated, I took the weekend off and went to see my parents in North Texas. I was so proud to show my Mom the game I made, because she loves casino games. Sure enough I watched her struggle to figure out how to place the initial bet. I had to tell her how to do it, and I never doubted the usability labs again.

You can learn a lot by quietly watching people play your game. Even if you don't have formal usability testing in your company you can perform some simple experiments yourself by grabbing some passerby on their way back from the bathroom. Invite them into your office and tell them to perform some task in your game, and watch them do it. Their first choice of how they think it is supposed to work will usually surprise you, but there is almost always a subconscious reason they make that choice. It may be impossible for them to tell you why they thought it should work, but after sampling five or six people it may very well become clear that you've either designed something brilliant or you've gone down the wrong track.

Configuration Testing

Configuration testing is critical in PC products because of the wide range of processors, BIOSs, video cards, sound cards, and every other kind of wacky peripherals people choose to install in their systems. Some of these peripherals can be quite widespread and yet have odd failure cases that you'll only find in configuration testing. Before you console developers skip to the next section, remember that there are many after market peripheral companies making all kinds of interesting things for consoles: fishing poles, gloves, dance pads, and even stationary bicycles. You'd be wise to take some of these peripherals into account when you test your game.

Generally the configuration tests will expose particular weaknesses in compatibility between your application and drivers that were shipped stock with video and sound cards. That's why it's important to test with drivers that shipped with the original equipment and the most recent drivers, so that certain problems can be identified and solved by upgrading the drivers. This information is valuable for customer service and for FAQs on your company Web site.

A Tale from the Pixel Mines

Occasionally, you'll find a real incompatibility that needs to be addressed. One of the games I worked on was having odd display problems on 3DFx cards (may they rest in peace). You'd look at an animating sprite and see it appear as if the vertical sync had gone out of a TV set. After consulting with the 3DFx engineers we discovered that there was a limitation on VRAM surface dimensions; they couldn't have a height to width ratio higher than 8:1. Go figure. We reworked the organization of our animating sprites and all was good. The scary part of this story is that none of the developers on the team had a VooDoo card installed because (to put it mildly) they preferred other video cards. VooDoo cards were installed in 20% to 30% of our user base, which would have resulted in crushing return rates had the flaw escaped the configuration testing.

Another part of configuration testing is measuring performance, which can be extremely difficult to solve if it's a problem, so early detection is very important. When our development team was working on Bicycle Casino and Bicycle Card, performance was a huge issue on our minimum specification machine, mostly due to the size of the animations we were cramming through only 2Mb of VRAM. The CPU target was a 166MHz Pentium, a machine that was 5 to 6 years old at that point in time. Our competition, Sierra's Hoyle games ran fine on 133MHz machines because they had lighter 8-bit animations. Throughout the entire development cycle we made incremental improvements to the performance. Had we delayed performance testing to the end of the development cycle we would not have had time to make as many improvements, and our game would have been unplayable on many machines for our projected target audience.

Take great pains to measure performance carefully, and measure it exactly the same way following an exact checklist. This is especially true on the PC platform where many things can adversely affect performance, which have nothing to do with your game. In our performance testing we create big spreadsheets of exact timing information for each section of the game. As programmers make modifications to fix performance problems, we make sure to re-run these tests in exactly the same way so the programmers will know if their efforts were successful. Keep a running record of the changes and the resulting changes in performance. They may be able to give the programmers additional clues as to what they can do to enhance performance. It's not enough to just figure you'll optimize this or that. One can actually over-optimize, by pursuing optimizations where, while welcomed, don't add anything to the overall performance of the product.

Testing your game against Beta versions of operating systems is also a really good idea. While a game is in development it is common for programmers (and pretty much everyone else) to forgo installing the latest Beta operating system release because it takes time and might cause weird system behavior. Since most developers will work on an established operating system release. it is unlikely that they will find incompatibilities during development. Make sure you have one machine in your test lab that has the latest pre-release OS from Microsoft.

When Microsoft released Beta versions of Windows XP Home it was quickly discovered that when the OS was installed with the default configuration it was illegal for an application to write to the hard drive under the Program Files directory. Applications developers had to store their saved game data under the Documents and Settings/userid/ Application Data directory tree. Anyone who was developing an application for Christmas 2001 release, and who didn't test against Windows XP Home would have crashed under that operating system.