The Life Cycle of a Build | Game Testing All in One (Game Development Series)

A basic game testing process consists of the following steps:

Plan and design the test. Although much of this is done early on during the planning phase, planning and design should be revisited with every build. What has changed in the design spec since the last build? What additional test cases have been added? What new configurations will the game support? What features have been cut? The scope of testing should ensure that no new issues were introduced in the process of fixing bugs prior to this release.
Prepare for testing. Code, tests, documents, and the test environment are updated by their respective owners and aligned with one another. By this time the development team should have marked the bugs fixed for this build in the defect database so the QA or test team can subsequently verify those fixes and close the bugs.
Perform the test. Run the test suites against the new build. If you find a defect, test "around" the bug to make certain you have all the details necessary to write as specific and concise a bug report as possible. The more research you do in this step, the easier and more useful the bug report will be.
Report the results. Log the completed test suite and report any defects you found.
Repair the bug. The test team participates in this step by being available to discuss the bug with the development team and to provide any directed testing they may require to track it down.
Return to step 1 and re-test. With new bugs and new test results comes a new build.

These steps not only apply to black box testing, but they also describe white box testing, configuration testing, compatibility testing, and any other type of QA. These steps are identical no matter what their scale. If you substitute the word "game" or "project" for the word "build" in the preceding steps, you will see that they can also apply to the entire game, a phase of development (Alpha, Beta, and so on), or an individual module or feature within a build. In this manner, the software testing process can be considered fractal ‚ the smaller system is structurally identical to the larger system, and vice versa.

As illustrated in Figure 8.3, the testing process itself is a feedback loop between the tester and the developer. The tester plans and executes tests on the code, then reports the bugs to the developer, who fixes them and compiles a new build, which the tester plans and executes tests on, and so on.

Figure 8.3: The testing process feedback loop.

A comfortable scale from which to examine this process is at the level of testing an individual build. Even a relatively small game project may consist of dozens of builds over its development cycle.

Test Cases and Test Suites

As discussed in the previous chapter, a single test performed to answer a single question is a test case; a collection of test cases is a test suite. The lead tester, primary tester, or any other tester tasked with test creation should draft these documents prior to the distribution of the build. Each tester will take his or her assigned test suites and perform them on the build. Any anomalies should be noted and checked against the defect database. Any anomalies not already present in the database should be written up as new bugs.

In its simplest form, a test suite is a series of incremental steps that the tester can perform sequentially. Subsequent chapters in this book discuss in depth the skillful design of test cases and suites through such methods as combinatorial tables and test flow diagrams. For the purposes of this discussion, consider a short test suite you would execute on Minesweeper , a simple game available with most versions of Microsoft Windows. A portion of this suite is shown in Figure 8.4. You will find a sample test suite in Appendix E.

Figure 8.4: Portion of a test suite for Minesweeper .

This is a very small portion of a very simple test suite for a very small and simple game. The first section (steps one through seven) tests launching the game, ensuring that the default display is correct, and exiting. Each step either gives the tester an incremental instruction or asks the tester a simple question. Ideally , these questions are binary and unambiguous. The tester performs each test case and records the result.

Because the testers will inevitably observe results that the test designer hadn't planned for, the Notes field allows the tester to elaborate on a Yes/No answer, if necessary. The lead or primary who receives the completed test suite can then scan the Notes field and make adjustments to the test suite as needed for the next build.

Where possible, the questions in the test suite should be written in such a way that a "yes" answer indicates a "pass" condition ‚ the software is working as designed and no defect is observed . "No" answers, in turn , indicate that there is a problem and a defect should be reported . There are several reasons for this: it's more intuitive, since we tend to group "yes" and "pass" (both positives) together in our minds in the same way we group "no" and "fail." Further, by grouping all passes in the same column, the completed test suite can be easily scanned by both the tester and test managers to determine quickly whether there were any fails. A clean test suite will have all the checks in the Pass column.

For example, consider a test case covering the display of a tool tip , a small window with instructional text incorporated into many interfaces. A fundamental test case would be to determine whether the tool tip text contains any typographical errors. The most intuitive question to ask in that test case is

 Does the text contain typographical errors?

The problem with this question is that a pass (no typos) would be recorded as a "no." It would be very easy for a hurried (or tired ) tester to mistakenly mark the Fail column. It is far better to express the question so that a "yes" answer indicates a "pass" condition:

 Is the text free of typographical errors?

As you can see, directed testing is very structured and methodical. After the directed testing has concluded, or concurrently with directed testing, a less structured, more intuitive form of testing, known as ad hoc testing, takes place.

Entry Criteria

It's advisable to require that any code release meets some criteria for being fit to test before you do any testing on it. This is similar to the checklists that astronauts and pilots take to evaluate the fitness of their vehicle systems before attempting flight. Builds submitted to testing that don't meet the basic entry criteria are likely to waste the time of both testers and programmers. The countdown to testing should stop until the test "launch" criteria are sufficiently met.

The following is a list of suggestions for entry criteria to use. Don't keep these a secret from the rest of the development team. Make the team aware of the purpose ‚ to prevent waste ‚ and work with them to produce a set of criteria that the whole team can commit to.

The game code should be built without compiler errors. Any new compiler warnings that occur are analyzed and discussed with the test team.
The code release notes should be complete and provide the detail that testers need to plan which tests to run or re-run for this build.
Defect records for any bugs closed in the new release should be updated so they can be used by testers to make decisions about how much to test in the new build.
Tests and builds should be properly version controlled, as described in the following sidebar.
When you are sufficiently close to the end of the project, you also want to receive the game on the media that it will ship on. Check that the media provided contains all of the files that would be provided to your customer.

Version Control: Not Just for Developers

A fundamental principle of software development is that every build of an application should be treated as a separate and discrete version. Inadvertent blending of old code with new is one of the most common (and most preventable) causes of software defects. The process of tracking builds and ensuring that all members of a development team are checking current code and assets into the current version is known as version control.

Test teams must practice their own version control. There's nothing more time-wasting than for a test team to report a lot of bugs on an old build. This is not only a waste of time, but it can cause panic on the part of the programmers and the project manager.

Proper version control for the test team includes the following steps:

Collect all prior versions from the test team before distributing the new build. The prior versions should be stacked together and archived until the project is complete.
Archive all paperwork. This includes not only any build notes you received from the development team, but also any completed test suites, old test plans, screen shots, saved games , notes, .AVIs, and any other material generated during the course of testing a build. It is sometimes important to retrace steps along the paper trail, whether to assist in isolating a new defect or determining in what version an old bug was introduced.
Verify the build number with the developer prior to duplicating it.
In cases where builds are transmitted electronically , verify the byte count, file dates, and directory structure before building it. It's vital in situations where builds are sent via FTP or email that the test team makes certain they are testing a version identical to the version the developers uploaded. Confirm the integrity of the transmitted build before giving it to the testers.
Renumber all test suites and any other build-specific paperwork with the current version number.
Distribute the new build for smoke testing.

Configuration Preparation

Before the test team can work with the new build, some housekeeping is in order. The test equipment must be readied for a new round of testing. The test lead must communicate the appropriate hardware configuration to each tester for this build. Configurations typically change little over the course of testing. To test a single-player-only console game, you need the game console, a controller, and a memory card. That hardware configuration typically will not change for the life of the project. If, however, the new build is the first in which network play is enabled, or a new input device or PC video card has been supported, you may need to augment their hardware configurations to perform the tests on that new code.

Perhaps the most important step in this preparation is eliminating any trace of the prior build from the hardware. "Wiping" the old build on a Nintendo GameCube is simple because the only recordable media for that system is a memory card. All you have to do is remove and archive the saved game you created with the old build. More careful test leads will ask their testers to go the extra step of reformatting the memory card, which completely erases the card, to ensure that not a trace of the old build's data will carry forward during the testing of the new build.

Tip ‚

Save your saves! Always archive your old user -created data, including game saves, options files, custom characters , and custom levels.

Not surprisingly, configuration preparation can be much more complicated for PC games. The cleanest possible testing configuration for a PC game is

A fresh installation of the latest version of the operating system, including any patches or security updates.
The latest drivers for all components of the computer. This not only includes the obvious video card and sound card drivers, but also chipset drivers, motherboard drivers, ethernet card drivers, and so on.
The latest versions of any "helper apps" or middleware the game requires to run. These can range from Microsoft's DirectX multimedia libraries to multiplayer matchmaking software such as GameSpy Arcade.

The only other software on the computer should be the new build.

"Bob" once walked into a QA lab that was testing a very cutting-edge 3D PC game. Testing of the game had fallen behind, and he was sent from the company's corporate headquarters to investigate. Bob arrived late in the morning, and at noon he was appalled to see the testers exit the game they were testing and fire up email, IRC, Web browsers, and file sharing programs ‚ a host of applications that were installed on their test computers. Some even jumped into a game of Unreal Tournament . Bob asked the assistant test manager why he thought it was a good idea for all the testers to have these extraneous programs on their test configurations. "It simulates real-world conditions," he shrugged, annoyed by Bob's question.

As you may have already guessed, this lab's failure to wipe their test computers clean before each build led to a lot of wasted time chasing false defects ‚ symptoms testers thought were defects in the game, but which were in fact problems brought about by, for example, email or file sharing programs running in the background, taxing the system's resources and network bandwidth. This wasted tester time also meant a lot of wasted programmer time, as the development team tried to figure out what in the game code might be causing such (false) defects.

The problem was solved by reformatting each test PC, freshly installing the operating system and latest drivers, and then using a drive image program to create a system restore file. From that point forward, testers merely had to reformat their hard drive and copy the system restore file over from a CD.

Whatever protocol is established, config prep is crucial prior to the distribution of a new build.

Smoke Testing

The next step after accepting a new build and preparing to test it is to certify that the build is worthwhile to formally test. This process is sometimes called performing a smoke test on the build, because it's used to determine whether a build "smokes" (malfunctions) when run. At a minimum, it should consist of a "load & launch," that is, the lead or primary tester should launch the game, enter each module from the main menu, and spend a minute or two playing each module. If the game launches with no obvious performance problems and each module implemented so far loads with no obvious problems, it is safe to certify the build, log it, and duplicate it for the test team.

So the build is distributed. Time to test for new bugs, right? Not just yet. Before testing can take a step forward, you must take a step backward and verify that the bugs the development team claims to have fixed in this build are indeed fixed. This process is known as regression testing .

Regression Testing

Fix verification can be at once very satisfying and very frustrating. It gives the test team a good sense of accomplishment to see the defects they report disappear one by one. It can be very frustrating, however, when a fix of one defect creates another defect elsewhere in the game, as can often happen.

The test suite for regression testing is the list of bugs claimed to be fixed by the development team. This list, sometimes called the knockdown list , is ideally communicated through the bug database. When the programmer or artist fixes the defect, all they have to do is change the value of the Developer Status field to "Fixed." This allows the project manager to track the progress on a minute-to-minute basis. It also allows the lead tester to sort the regression set (by bug author or by level, for example). At a minimum, the knockdown list can take the form of a list of bug numbers sent from the development team to the lead tester.

Tip ‚

Don't accept a build into test unless it is accompanied by a knockdown list. It is a waste of the test team's time to regress every open bug in the database every time a new build enters test.

Each tester will take the bugs they've been assigned and perform the steps in the bug write-up to verify that defect is indeed fixed. The fixes to many defects are easily verified (typos, missing features, and so on). Some defects, such as hard-to-reproduce crashes, may seem fixed, but the lead tester may want to err on the side of caution before he closes the bug. By flagging the defect as verify fix , the bug can remain in the regression set for the next build (or two), but out of the open set that the development team is still working on. Once the bug has been verified as fixed in two or three builds, the lead tester can then close the bug with more confidence.

At the end of regression testing, the lead tester and project manager can get a very good sense of how the project is progressing. A high fix rate (number of bugs closed divided by the number of bugs claimed to have been fixed) means the developers are working efficiently . A low fix rate is cause for concern. Are the programmers arbitrarily marking bugs as fixed if they think they've implemented new code that may address the defect, rather than troubleshooting the defect itself? Are the testers not writing clear bugs? Is there a version control problem? Are the test systems configured properly? While the lead tester and project manager mull over these questions, it's time for you to move on to the next step in the testing process: performing structured tests and reporting the test results.

Test "Around" the Bug

The old saying in carpentry is "measure twice, cut once." Good testers thoroughly investigate a defect before they write it up, anticipating any questions the development team may have.

Before you begin to write a defect report, ask yourself questions such as the following:

Is this the only location or level where the bug occurs?
Does the bug occur while using other characters?
Does the bug occur in other game modes (for example, multiplayer as well as single player, skirmish as well as campaign)?
Can I eliminate any steps along the path to reproducing the bug?
Does the bug occur across all platforms (for example, PlayStation2 and Xbox)?
Is the bug machine-specific (for example, does it occur only on PCs with a certain hardware configuration)?

These are the types of questions you will be asked by the lead tester, project manager, or developer. Try to develop the habit of second-guessing such questions by performing some quick additional testing before you write the bug. Test to see if the defect occurs in other areas. Test to determine whether the bug happens when you choose a different character. Test to check which other game modes contain the issue. This practice is known as testing "around" the bug.

Once you are satisfied that you have anticipated any questions the development team may ask, and you have your facts ready, you are ready to write the bug report.

Report the Results

Good bug writing is one of the most important skills a tester must learn. A defect can only be fixed if it is communicated clearly and effectively. One of the oldest jokes in software development goes something like this:

Q: ‚

How many programmers does it take to screw in a light bulb?

Answers

A: ‚	None ‚ it's not dark where they're sitting.

Good bug report writing gets programmers to "see the light" of the bug. But programmers are by no means the only people who will read your bug. The audience may include

The lead tester or primary tester, who may wish to review the bug before they give it an "open" status in the bug database.
The project manager, who will read the bug and assign it to the appropriate member of the project team.
Marketing and other business executives, who may be asked to weigh in on the possible commercial impact of fixing (or not fixing) the bug.
Third parties, such as middleware vendors , who may be asked to review a bug that may be related to a product they supply to the project team.
Customer service representatives, who may be asked to devise workarounds for the bug.
Other testers, who will reproduce the steps if they are asked to verify a fix during regression testing.

Because you never know exactly who will be reading your bug report, you must always write in as clear, objective, and dispassionate a manner as possible. You can't assume that everyone reading your bug report will be as familiar with the game as you are. Testers spend more time in the game ‚ exploring every hidden path, closely examining each asset ‚ than almost anyone else on the entire project team. A well-written bug will give a reader who is not familiar with the game a good sense of the type and severity of defect it describes.

Just the Facts, Ma'am

The truth is that defects stress out development teams, especially during "crunch time." Each new bug added to the database means more work still has to be done. An average- sized project can have hundreds or thousands of defects reported before it is completed. Developers can feel very overwhelmed and will, in turn, get very hostile if they feel their time is being wasted by frivolous or arbitrary bugs. That's why good bug writing is fact-based and unbiased .

 The guard's hat should be blue.

This is neither a defect nor a fact; it's an unsolicited and arbitrary opinion about design. There are forums for such opinions ‚ discussions with the lead tester, team meetings, play testing feedback ‚ but the bug database isn't one of them.

A common complaint in many games is that the AI (artificial intelligence) is somehow lacking. (AI is a catch-all term used to mean any opponents or NPCs controlled by the game code.)

 The AI is weak.

This may indeed be a fact, but it is written in such a vague and general way that it is likely to be considered an opinion. A much better way to convey the same information is to isolate and describe a specific example of AI behavior and write up that specific defect. By boiling issues down to specific facts, you can turn them into defects that have a good chance of being fixed.

But before you begin to write a bug report, you have to be certain that you have all your facts.

Brief Description

Larger databases may contain two description fields: Brief Description and Full Description. The Brief Description field is used as a quick reference to identify the bug. This should not be a cute nickname, but a one- sentence description that allows team members to identify and discuss defects without having to read the longer full description each time. Think of the brief description as the headline of the defect report.

 Crash to desktop.

This not a complete sentence, nor is it specific enough for a brief description. It could apply to one of dozens of defects in a database. The brief description must be brief enough to be read easily and quickly, but long enough to describe the bug.

 The saving system is broken.

This is a complete sentence, but is it not specific enough. What did the tester experience? Did the game not save? Did a saved game not load? Does saving cause a crash?

 Crash to desktop when choosing "Options" from Main Menu.

This is a complete sentence, and it is specific enough so that anyone reading it will have some idea of the location and severity of the defect.

 Game crashed after I killed all the guards and doubled back through the level to get all the pick-ups and killed the first re-spawned guard.

This is a run-on sentence that contains far too much detail. A good way to boil it down might be

 Game crashed after guards respawned.

The TV listings in your newspaper can provide excellent examples of a brief description ‚ they boil down an entire half-hour sitcom or two- hour movie into one or two sentences.

Tip ‚

Write the full description first, and then write the brief description. Spending some time polishing the full description will help you understand the most important details to include in the brief description.

Full Description

If the brief description is the headline of a bug report, the Full Description field provides the gory details. Rather than a prose discussion of the defect, the full description should be written as a series of brief instructions so that anyone can follow the steps and reproduce the bug. The steps should be written in second person imperative, as though you were telling someone what to do. The last step is a sentence (or two) describing the bad result.

 1.  Launch the game. 2.  Watch the animated logos. Do not press ESC to skip through them. >  Notice the bad strobing effect at the end of the Developer logo.

The fewer steps, the better, and the fewer words, the better. Remember Brad Pitt's warning to Matt Damon in Ocean's Eleven :"Don't use seven words when four will do." Likewise, don't use seven steps when four will do. Time is a precious resource when developing a game. The less time it takes a programmer to read and understand the bug, the more time he has left over to fix it.

 1.  Launch game. 2.  Choose Multiplayer. 3.  Choose Skirmish. 4.  Choose "Sorrowful Shoals" map. 5.  Choose two players. 6.  Start game.

These are very clear steps, but for the sake of brevity they should be boiled down to

 1.  Start a two-player skirmish game on "Sorrowful Shoals."

Sometimes, however, you need several steps. The following bug describes a problem with a power-up called "mugging," which steals any other power-up from any other unit.

 1.  Create a game against one human player. Choose Serpent tribe. 2.  Send a Swordsman into a Thieves Guild to get the Mugging power-up. 3.  Have your opponent create any unit and give that unit any power-up. 4.  Have your Swordsman meet his unit somewhere neutral on the map. 5.  Activate the Mugging battle gear. 6.  Attack your opponent's unit. >  Crash to desktop as Swordsman strikes.

This may seem like a lot of steps, but it is the quickest way to reproduce the bug. Every step is important to isolate the behavior of the mugging code. Even small details, like meeting in a neutral place, are important, since meeting in occupied territory might bring allied units from one side or another into the fight, and the test might then be impossible to perform.

Great Expectations

Oftentimes, the defect itself may not be obvious from the steps in the full description. Because the steps produce a result that deviates from user expectation, but does not produce a crash or other severe symptom, it is sometimes necessary to add two additional lines to your full description: Expected Result and Actual Result.

Expected Result describes the behavior that a normal player would reasonably expect from the game if the steps in the bug were followed. This expectation is based on the tester's knowledge of the design specification, the target audience, and precedents set (or broken) in other games, especially games in the same genre .

Actual Result describes the defective behavior. Here's an example:

 1.  Create a multiplayer game. 2.  Click Game Settings. 3.  Using your mouse, click any map on the map list.  Remember the map you clicked on. 4.  Press up or down directional keys on your keyboard. 5.  Notice the highlight changes.  Highlight any other map. 6.  Click Back. 7.  Click Start Game. Expected Result:  Game loads map you chose with the keyboard. Actual Result:  Game loads map you chose with the mouse.

Although the game loaded a map, it wasn't the map the tester chose. That's a bug, albeit a subtle one.

Use the Expected/Actual Result steps sparingly. Most of the time, the defect is obvious.

 4.  Click "Next" to continue. Expected Result:  You continue. Actual Result:  Game locks up. You must reboot the console.

It is understood by all members of the project team that the game shouldn't crash. Don't waste time pointing out the obvious.

Things to Avoid

For the sake of clarity, effective communication, and harmony among members of the project team, try to avoid a couple of common bug-writing pitfalls: humor and jargon.

Although humor is welcome in high-stress situations, it is not welcome in the bug database. Ever. There are too many chances for misinterpretation and confusion. During crunch time tempers are short and nerves are frayed. The defect database may already be a point of contention . Don't make the problem worse with attempts at humor (even if you think your joke is hilarious).

It may seem counterintuitive to want to avoid jargon in such a specialized form of technical writing, but it is wise to do so. Although some jargon is unavoidable, and each development team quickly develops its own slang specific to the project they're working on, testers should avoid using (or misusing) too many obscure technical terms. Remember that your audience ranges from programmers to financial executives, so use plain language as much as possible.