The Test Process | Developing Online Games: An Insiders Guide (Nrg-Programming)

Know this going in: You have probably already underestimated the amount of time and number of people you need for testing your PW. Because these games are so complex and have such different yet interlocking technology, there is a lot that has to be stressed, broken, and fixed. Teams that haven't been through this at least once don't really understand this. "They underestimate the testing and scaling challenges," according to Gordon Walton, "and end up with a fire drill when their service scales up."

There is a process you can use to minimize the risks, but it requires some redefinition of what the standard test phases are testing for compared to their common use today. The process involves time and patience.

As discussed in Chapter 1, "The Market," the process also involves spending money because you'll need to add bandwidth and server hardware to anticipate the increased number of testers you'll need and ramp up your customer service (CS) and player relations staff at the same time. Over a 6- to 12-month period, and assuming you'll need to handle at least 20,000 simultaneous users at launch, you'll buy and install enough bandwidth and hardware to handle that load and hire and train somewhere between 20 and 50 player relations staff to service them at launch.

Develop, Test, Fix, Scale Up, Wash, Rinse, Repeat

"You can never start testing too soon. We've now instituted the process of monthly test builds, which go through a rigorous testing cycle. We then stabilize the build and check for performance. This is giving us a big edge as we prepare for testing this time around. Likewise, you will always try to cheat at the end and give up testing time for new features. Don't do it. The problem just gets worse with every feature that you add."

” Jeff Anderson , CEO of Turbine Entertainment

What Jeff remarks on is a process that is coming into use more and more among experienced teams, though few yet go to the quality control extent that Turbine now uses. In the past, teams tended to do no or only minimal testing of their code before it went into a version build, and then depended on the QA department to test the build while they moved on to the next build.

Do you see the problem here? Developers were (and in many cases, still are) working on the next build before the current one was debugged . That means more content was added before the debug was finished, which can and usually does mean a lot of wasted work as the new code becomes inoperative after the fixes are deployed. I know this sounds utterly insane and self-defeating, but this is exactly how it was and is done today by most online game development teams.

Thankfully, it is changing, as the industry as a whole watches high-profile failures, gains experience in online games, and begins to understand that more moving parts mean more and better testing procedures. The key elements to keep in mind are time, numbers , and patience.

Rarely does the testing process for a PW take enough time or involve enough testers. As Gordon Walton has noted, almost every team underestimates the scaling and testing challenges for a PW. Up to early 2002, for example, the average announced time for full Beta testing of a PW was about three months.

This may seem adequate on the face of it, until you understand that what the MMO industry refers to as a Beta test is what most other industries call Alpha and Beta combined. This is one of the bad holdovers from the early days of MMO development for the old online services; teams were totally inexperienced with the whole process of quality testing procedures and just made them up as they went along. They knew Alpha and Beta tests were standard, so they just figured that the testing they did themselves was Alpha and the testing with players involved was Beta. What none of us understood at the time, of course, was the distinction should have been made by what was being tested , not who was involved in the testing.

Now that many MMO products are being developed in (relatively) more professional settings, combined with managers having experience with more than one MMOG development project, this is starting to change. The test periods are segmented by content to be tested, no matter who is involved, and are starting to stretch, although if 2001's launches were any indication, not by nearly enough. To give credit where it is due, most of the bad launches in this industry have had more to do with short- term financial pressures than development teams believing buggy products are ready for paying customers. This is being penny-wise and pound -foolish; more on that in the later section entitled "Patience."

Time and Numbers

Until you've done it once, it can be difficult to grasp the concept that some bugs happen at 2,000 simultaneous players, but not at 1,000, and vice versa, or just how many network and server- related problems can be revealed simply by scaling up from 500 to 1,000 simultaneous testers. Teams also tend to greatly underestimate how long it will take to fix and retest bugs, balance problems, and flawed mechanics and systems.

In addition, you may find that your game and social mechanics, balancing, non-player character (NPC) and monster populations, and overall world size and design work quite well at 500 simultaneous players at the end of the Alpha test series, but become completely inadequate for 1,000 simultaneous players. For an industry that designs server cluster technology to hold 2,000 “3,000 simultaneous players on average, this can be terrifying.

It all starts with a proper test plan that emphasizes enough time to actually do the job correctly, notes specific testing targets for each phase (what systems, mechanics and/or load are to be tested), allots the time to truly fix each major test build before moving on to the next, and scales up the simultaneous player numbers far beyond the norm.

As you can see from Table 9.1, you shouldn't be skimpy on planning test phase durations. It is a good idea to schedule longer periods than you believe you need; you can always cut back the duration if things go well, but adding more test time tends to upset everyone from management right through to the players. As a general rule of thumb, a test period that lasts two weeks will take at least another two weeks to debug and retest. At an absolute minimum, the Alpha and Beta tests should be planned at no less than six months; for a major, highly anticipated game, they should be planned to last at least one year.

Table 9.1. Testing Time and Numbers

This table represents an idealistic progression of a one-year test period for a highly anticipated PW. "QA dept." represents the QA testers, "all hands" is the entire company, if the development team is part of a larger organization, and "PT" stands for player-testers, or outside players who are brought in to assist at various stages. The various phases represent major systems and stress loads to be tested. The player-tester numbers represent total testers in the program, not simultaneous testers.

graphics/09tafig01gif.gif

Internal, Closed Testing

What Jeff Anderson refers to in the quote at the start of this section is what you should be doing for pretty much all your milestones: build, test, fix, repeat until completely fixed, new build, repeat until you hate your life. The testing will be easy for early milestones, will get somewhat elusive and nonlinear for middle milestones as the code and content become more complicated, and will smooth out again approaching Alpha.

This kind of testing takes place within the team and QA group , with the occasional all-hands testing involving the company as a whole. No one really expects to find all the hidden bugs or flaws, although you will find many of them (Some bugs won't show up until hundreds of people interact with the code and each other simultaneously .) The purpose is to sterilize the apparent bugs and anomalies from the build before complicating matters with new content and code.

Most teams have some sort of procedure in place for this, however informal; where the teams often fail is in not formalizing the procedure and completing the testing before moving on.

Alpha Testing

Alpha testing is your first attempt to start really testing distinct systems, such as the combat system or magic, for balance, utility, and functionality. You'll also be testing how systems work together to eliminate conflicts and fix flaws in design or balance. The purpose of the Alpha test is to move the game to a "feature-complete" status in preparation for the Beta phase. By the time you're finished with Alpha testing, you should have in hand a game that you believe is feature-complete and ready to be fully played by outsiders, with no more features or content to be added, and which has had some stress and load testing.

If you are going to add features or systems to the design, the Alpha test phase should be considered the last chance to do so, and the attitude and work should be aimed more at completing the set design than thinking up new bells and whistles to go in. There is always the chance someone forgot a critical feature or tweak, however, so the team managers shouldn't necessarily close their minds completely to the possibility.

In general, early Alpha tests should include inside testers, not outside testers. In the past, Alpha testing, or the first rounds of major tests, took the place of the closed Beta, with a few interested players invited to pound on certain mechanics, systems, or features, while other systems, features, and mechanics were still being finished for testing.

While it is enticing and sometimes necessary to invite potential players in to an early Alpha test, the primary purpose is to test and debug distinct systems, mechanics, and features. Players, on the other hand, are notorious for playing builds, not testing builds. It takes careful selection of outside testers and even more careful management of them to ensure that meaningful testing is done. Left to themselves, outside testers will play the game to the extent possible and ignore such niceties as bug reports . This may be acceptable when it comes to open load/stress testing; it is not acceptable when the game isn't feature-complete.

At some point in the Alpha phase or later in the Beta phase, however, you will want to scale up the simultaneous player load, and this will require you to bring a select group of outsiders into the fold. Most teams accept applications for testers online and sift through the thousands they receive to choose 50 or 100 who seem to understand the process. Actually, finding outside volunteers experienced at game testing isn't difficult; managing them and getting them to actually report bugs can be. It is helpful to have someone on your QA team as the main point of contact for test volunteers, tell him/her specifically what needs to be tested, and charge him/her with monitoring and compiling the reports. This person should be utterly ruthless about dropping testers who aren't making reports. Regardless of how well your team sifted through applications to get what appeared to be a worthwhile group of 50 “100 testers, you'll probably find that some of them are just taking up server space and bandwidth for the notoriety of being in the test and leaking information to their buddies . They are expendable and replaceable . You are selecting a jury of sorts, so you should select alternates in case some of your jurors develop human traits, like laziness , the inability to keep a bargain, and so forth.

Finally, you'll want to make sure that at least minimal load and stress-testing is done to ensure that the simultaneous user numbers don't break your network code or the game's mechanics. You'll want a minimum of 200 simultaneous testers, as anything less won't put a serious strain on the servers. This might be a good time to pull out some of those old tester applications, pick another 300 “500 for Beta, and invite them in to the last Alpha stress and load tests as a bonus.

Beta Testing

Beta testing is not a design phase! Burn that phrase into your brain.

Beta testing should mean: "We're feature-complete, there is no system or feature in the game or technical design document left to add, and no more original design work will be done on this game until after launch, period. Now we're going to find the bugs and flaws that we missed in Alpha, fix them, scale up the load, and then do it all over again until we have a stable, balanced game fit for paying customers."

That's what Beta should be; what actually happens is that most online games use the Beta test process to finish the execution of features and systems that were part of the final design, add entirely new features and systems, and make major code changes. In fact, most online games are finishing up the game design document's features and systems right up to launch day, which is why so many of them launch with balance problems, technical instabilities, and major bugs. Every feature, system, or other type of game element requires extensive testing; trying to add things at the last minute guarantees that they won't be tested properly and won't work right. In this context, there is no such thing as a small feature or change; anything you do is liable to affect one or more other moving parts in ways you never dreamed possible. If you've burned the opening phrase into your brain, you won't have as many problems during Beta as other games have had. It is a simple rule; enforcing it will save splitting headaches in both the short and long runs.

Moreover, understand that this is not as short a process as most people think. The Alpha test may have gone swimmingly, but adding numbers tends to break things. In fact, that is the whole reason to scale up numbers and stress-test; break things, fix, scale up, break them again, fix again, repeat as necessary.

Closed Beta

Before starting Beta, you're supposed to be feature-complete. That means everything you expect the players to have access to at launch is in the game and tested through Alpha. Don't make the mistake of rationalizing the meaning of "feature-complete" as, "We're almost there, except for combat and that magic stuff" just to get into the Beta phase. The time to add (or cut) features is before Beta begins.

Now is when you'll really start ramping up the number of testers available, tracking the simultaneous user load, and gearing up some test sessions specifically to get as many people as possible online, until the technology breaks. The process of "scale up, stress, break, fix, and repeat" is a vital one if your servers are going to be stable for launch, and it is one that you can't skimp on. Too many teams make the mistake of thinking that load stressing and balancing can be completed at the end of the process, just before launch, instead of stabilizing at each step before proceeding. This kind of deferred maintenance will compound the problems and workload you'll face at the end of the process, when you should be making a final polish before launch.

Although you've just come through the Alpha test and the game is feature-complete, don't get complacent. With more people involved in the testing process, this is where the bugs you didn't find in Alpha will start to show up and any flaw in your network, mechanics, or overall design that slipped through previous tests will become painfully obvious.

The first objective of the closed Beta phase should be to scale up the tester numbers to equal the maximum planned simultaneous player load for one or two server clusters/ world iterations. This is the first highly critical chokepoint of the Beta; you'll find out if your technology can handle the physical load and if your design is balanced enough and has enough flexibility to keep the players occupied when a server cluster is full. You'll also be looking for exploit holes you missed in Alpha. As Kathy Schoback, Director of External Development and Publishing, Sega of America, put it to us in an interview:

Beta testing can reveal many detrimental player behaviors before launch.

For example, NFL 2K2 for Dreamcast was widely hailed as extremely player friendly, and the experience of playing against a real person completely exceeded the artificial intelligence ( AI ) experience ”that is, until players began to realize that they could just "hang up" on their opponent when their beloved Raiders were getting whupped 49 “0 in the first quarter. This could have been identified by extensive and truly "public" Beta testing, and rectified by more aggressive community management.

Your community management team will probably be the first to spot these kinds of activities, either through postings on tester forums or through direct observation. The problem is not so much in identifying them as it is in fixing them; some, like the problem noted by Kathy Schoback, are tough to solve. How do you tell the difference between a random disconnection and someone bailing out to keep from losing? If you can't, what kind of stop-gaps or penalties can you reasonably program that won't unduly penalize the poor person who was randomly disconnected? And this is only one problem of many you'll encounter.

There will also be problems with both technology scaling and game design. Beyond just finding bugs, fixing them, and retesting, you will be doing fine-tuning on class, skill, and mechanics, balancing issues as flaws become apparent and retesting those, too, and they can be far more difficult to fix.

The final objective of the closed Beta phase is to stabilize the technology and design to the point that they work consistently and well. This means no known but unfixed client or server crash bugs ("We'll get it in the next phase, really; time to move on now, we're wwaaaalllking, we're wwaaaalllking "), a stable login server, and mechanics and gameplay that are as balanced as you can reasonably make them.

Security Issues

"Players love to cheat ” especially in online games. Based on this information, it is important to properly Beta-test online games before they go out to market ”this extra exposure will give you valuable feedback and may also help identify items that you may not have anticipated. Also be ready to add server-side support to prevent user cheating with methods that you were not able to predict."

” Scott Hawkins , consultant for Sega of America

Some players are just lame and like to cheat for bragging rights. It may seem weird that a significant portion of the player base is willing to do anything to win, but that's the reality of the situation. Worse, some of the outside testers you bring in for this phase are there for one reason only: to find bugs and exploits and not report them. What they are hoping is that no one else finds them and they can reserve them for their own use after the game goes live to the public. In these days, when players can auction off characters and items for hundreds or thousands of dollars, there is great incentive to use bugs and exploits to cut the time necessary for character development or object acquisition to improve cash flow from sales.

This has worked in the past because developers haven't understood the need for monitoring, tracking, and logging tools to identify and stop this activity. From experience, the players willing to indulge in this kind of thing understand that their chances of being caught are slim because most online games have no or only rudimentary, hard-to-use tools and because developers initially place little emphasis on finding and stopping the activity.

To save yourself a mountain of headaches later, take the time to build the tools, assign one or more people to be "security specialists," and use the testing period to debug and refine the tools, the process, and the people. See the following sections for more detail.

Open Beta

By this time, if you've been diligent about the test, fix, retest procedure, your techno logy is probably fairly stable up to 2,500 “5,000 simultaneous players and the feature set is working adequately. Now is the time to really put stress on the product by adding several thousand more testers to the mix.

Stability is the goal here; the games that failed the launch process in 2001 lost the bulk of their customers due to unstable technology. There were certainly content and feature problems, but most players are willing to work through those kinds of problems ”for a while at least. What they have little patience for is not being able to log in, getting disconnected constantly, server-side latency, and bugs or crashes that interrupt gameplay and/or negate hours of "work." These things interrupt immersion and socializing and dilute the very permanence of character that helps attract and retain players for the game.

To reach that goal, you're going to need to load the servers with far more bodies than you have previously. Yes, you'll still be finding bugs and doing some rebalancing, but these activities should be minimal if closed Beta was successful. The real objective is to constantly "break" the technology, fix it, add more warm bodies, break the technology again, fix it again, add more bodies. Repeat this process until you cannot only clearly demonstrate that the technology can handle the simultaneous player loads you expect at launch, but you can also develop a sense of confidence that it will be capable of handling the growth of the first three to six months post-launch .

Stressing Out

The goal here is numbers; recruit as many testers as you can get for your servers and aim for as much testing up-time as possible between fixes. If you refer back to Table 9.1, you'll note that the number of outside testers for the first round of open Beta takes a huge leap from 2,500 “5,000 to 10,000. This may strike you as a difficult number to achieve, but it won't be; these days, even relatively unknown MMOs receive Beta applications in the tens of thousands. The real problem will be in deciding which ones to allow in. Since you'll be looking less for formal bug reports from these people and more just to get their warm bodies into the game, the easiest method is just to select the additional 5,000 at random and email them sign-up codes for the test.

Now it is time to repeat the closed Beta process of test, break, fix, retest, repeat as necessary. The only significant difference to the process is that you'll be adding testers in numbers you probably didn't consider before starting development.

What you're liable to experience will be scary; your technology will break down constantly. If it isn't the login servers choking on the number of simultaneous players trying to get into the game, it'll be one or more servers in a cluster crashing unexpectedly because too many players tried to crowd on to one physical machine, or all the clusters going down because the database tried to do too much, too quickly, and choked on its own output, or a zombie process eating up the system memory until a server grinds to halt, or you get the idea.

It may be scary, as well as incredibly frustrating, and you may take some very vocal beatings from testers carping in public about your "bad" technology. (Warning: This does tend to cause executives to get nervous and worry about the company's reputation.) However, problems like these are exactly what you want in this phase. If you don't find them now, imagine how bad a public relations beating you'll take after launch, when they will surely show up (probably during a four-day holiday weekend , with the five critical people you need to fix the problems out of cell phone range in the Alaskan wilderness, and with three- quarters of the player relations staff under quarantine from a freak outbreak of some unknown virus).

There will be incredible pressure from executives to fix things now , which translates as, "I don't care if you have to use chewing gum and chicken wire; just stop the screaming!" If the technology is particularly unstable under load and the volume of complaints and nasty comments from players rises to unbelievable levels, you can expect morale on the team to plummet. In other words, things are going to get hairy.

Patience

All of which brings us to patience. With all that pressure and screaming going on, there is going to be the temptation to rush through the process with stop-gap fixes that stem the problem but don't actually provide a permanent solution. If the pressure gets too high, you may be ordered to do it by some higher-up with no foresight but tender sensibilities and a highly evolved "fight or flight" survival instinct. This may solve the immediate problem, but it's like using duct tape on a leaky pipe instead of welding the hole closed; there will be a constant dripping until the duct tape is thoroughly soaked, the gum fails, and the water explodes out again. Better to bite the bullet, cultivate patience, and provide the right repair the first time.

Doing a fix when you should have performed a repair postpones the problem until the worst possible time: when paying customers are coming in the door. The idea is to find and solve problems so the launch is as smooth and trouble-free as you can make it. A stable, trouble-free launch is the single best method you have of creating good word of mouth and getting a lot of customers in the door quickly.

Some people in the company may not understand this, however, especially if this is the company's first PW and their first real exposure to just how vocal ”and downright mean and nasty ”online gamers can be. If they haven't been properly prepared beforehand for how superficially ugly the situation can get, that first exposure can be a real shocker. The lesson to learn here is: Patience begins with preparing everyone before the worst happens.

That means the expectations of everyone involved, from senior management to the players, have to be properly managed throughout the development and test process. Proper management of expectations for this means talking about them long before you get to the open Beta phase:

The producer(s) needs to brief senior management and the company as a whole on the process and expectations, beginning from the very first day all the way through the Beta process, including the possible reactions from the players when you start stressing the technology.
The team leaders have to educate their people during development on what the Beta process is all about and why it is a good thing that the technology is being made to break so much.
QA and community relations have to continually brief the players involved in open Beta that one of the specific goals of the phase is to continually break the technology, and encourage them to leap into that wholeheartedly.
Community relations and public relations have to use the web site and interviews to brief the press and general public on the purpose of open Beta testing and why it is a good thing that the game is breaking.

Why do all this? Without this kind of management of expectations on the front end, a development team can expect their lives to be a living hell all through the open Beta test phase. Unless you're a card-carrying sadomasochist with a desire for gratuitous pain, why would you want to go through that?