I've been in tons of arguments about which bugs get fixed and which get left behind. After a few years of this I've noticed a few trends that will perhaps save you from the same arguments. It may also save you and some of your programming buddies some time at the desk. Mind you, I'm not going to talk here about design issues, just bugs.
|Best Practice|| |
Crashes can be a nightmare, especially the ones that don't seem to have any reproducible steps. When your game crashes a lot your customers will wonder why they paid hard earned money for something that doesn't work right or even worse erases their hard drive. Here's a good rule of thumb: If your game crashes more than once every two to three hours of solid play, you should fix it. Beyond 5 to 6 hours of continuous play, things get pretty fuzzy. If it takes that long to find the bug it will be extremely difficult for programmers and testers to verify the fix. Give that one a low priority. If the crash destroys anything more than the last saved game, you should fix it regardless of how often it happens. Do everything you can to find those reproducible steps to nail that bug. Have every tester and programmer spend an entire day on it and try to find by brute force if that's all you have left.
Memory leaks can be the hardest or easiest bugs to find, depending on how you've engineered your memory management. The one certain thing is that you simply cannot tolerate any memory leaks in your game, period. This is especially true for console games, but PC games should set just as high a quality bar as far as leaking resources is concerned.
The reason memory leaks are insidious has a lot to do with how people really play your games. Casual players sometimes leave them up and running for long periods of time even when they aren't playing. Hardcore players will invest themselves in 36-hour marathon sessions without stopping. Either way your game will eventually run out of resources and melt down, even if there is a small leak somewhere.
Bugs that point to hardware incompatibility or unsupported operating systems can cut your sales and increase your returns. Sometimes it can be hard to tell if a certain video card, sound card, or particular flavor of Windows is important to your game's market. Do whatever you can to find out and try to make it work. Don't assume anything about these bugs; they can be trivial to fix or impossible depending on your architecture and some unwise choices you may have made in the past. When you get this kind of bug the first thing you should do is reproduce the bug on a different machine. Eliminate the possibility that the problem is driver related by installing the latest driver. Eliminate the chance that the card is a bad one by using a different card of the exact same make and model. Don't use a different operating system, though. Sometimes drivers differ from OS to OS. Once you've done all these things you can be sure you have a bona fide hardware issue on your hands.
Don't go about solving hardware problems alone. You can get help from the manufacturer. Call up the developer relations group (every hardware company has one) and tell them about your trouble. They'll usually ask you to ship a CD of your application to them, or they may have you run your application against a special set of hardware drivers that are used to find problems. Hardware companies are very motivated to help developers find problems, so get their help as soon as you've figured out you have a real problem.
|Best Practice|| |
DirectX developers can actually use the DirectX samples to identify operating system incompatibilities. DirectX is not 100% reliable under every possible operating system configuration (such as running under terminal services with a dual monitor setup on the remote machine). If the test team has found some strange issue with your game, try running a DirectX sample under the same configuration. Odds are the DirectX sample will fail in the same way. You can be pretty sure that whatever the issue is, it's not something you did and it will probably be impossible for you to fix.
The irony about hardware related bugs is that it takes you and your team forever to find out that the bug isn't your problem after all. Try your best to stay out of a never-ending bug hunt with these kinds of issues by eliminating your code as a culprit early on.
Bad performance can turn a fun game into an annoying nightmare. I'd almost rather have the game crash outright than suffer though a slideshow. At least the pain and suffering is time-limited! Performance issues are probably the hardest issues to fix, especially if you wait until every last piece of code has been written. Look at it this way—you won't speed your game up by adding features. Performance issues are difficult to fix because generally you are turning some na ve algorithm (just load all the data into memory, and everything will be fine!) into a solution. I've never seen a performance enhancement that made the code simpler, unless you call cutting an entire subsystem a performance enhancement.
Establish your CPU budget early and keep your code to that budget. If you've implemented 60% of the subsystems and you are already at your target framerate, you've got a serious problem. Perhaps most people get into this kind of trouble because they don't have a CPU budget in place, or no one was watching the shop. The bottom line is to catch a performance problem early, they always take forever to fix.
The last kind of bugs that need to get fixed at all costs are the embarrassing ones. Everyone on the team should be amenable to fixing these issues, usually because they have a low risk and they usually mean a lot to the team.
|A Tale from the Pixel Mines|| |
If anyone out there remembers Ultima VIII, you'll recall that it was the first Ultima to include jumping puzzles. I hate jumping puzzles, especially when you can't control where you land. Oh, and I should mention that the penalty for missing any jump was death! What a horrible design flaw (for which we were crucified by more than a few reviewers). By the time Ultima VIII was finishing up, everyone on the team had gotten pretty good at jumping, especially because you died when you missed, and almost no one was aware how hard it was for new players. It turned out the fixing the jumping bug took only a few hours—the fix was released in a patch a few months later.
There is definitely a class of bugs you shouldn't spend any more time on them than it takes to get a good laugh out of it and mark it "Won't Fix" or "By Design." You have to be careful, though, and you should remember that some tester wrote it up because they thought it was important enough to write. Still, there are some funny ones to be had in every bug database, and others that just belong on the trash heap. Sometimes testers are really game designers in disguise—and good ones at that. Some of them just think they are game designers. On the first Casino game I did, one of the testers wrote up a bug saying that the game needed a "Luck Slider" in the game options. That might be the action/adventure equivalent of unlimited ammo, but I wasn't buying it.
Test tools are fantastic for getting a lot of testing done in a hurry, but they can be used for evil purposes. Make sure that testers know that when they use test tools or cheats they can break the game, and it is a dumb idea to write up bugs caused in this manner.
Archaic hardware and operating systems are a great source for impossible bugs. Granted it's good to know how things will fail on these older systems but you shouldn't be responsible for this stuff unless somehow your target market sill has this old garbage installed in their computers. If an old video or sound card driver is causing problems with your game, check the latest version. Clearly if the most recent driver works, put something in the README file and move on. Don't waste any time trying to find a workaround for an outdated driver.
Hallucinations are my favorite. Sometimes I wonder if the test team really is smoking better stuff than any human deserves. Anyway, take a look at this one:
Description: CTRL Z is not sensitive enough.
Steps to Repro:
Open any game.
Make one move.
Use CTRL Z to undo (quickly once, slowly once).
Compare that to using a very quick ALT ENTER to reduce screen size.
Results: CTRL Z takes a very deliberate keystroke to perform whereas ALT ENTER and the like can be a very quick keystroke combo.
Expected Results: Should be more sensitive along the vein of ALT ENTER and other such hotkeys.
I think it is just as likely that this tester spilled a little Dr. Pepper in his keyboard, and the Z key was a little sticky. Any engineer knows that it isn't the application that controls the sensitivity of the keyboard. What you want to watch out for here assigning this bug to a junior programmer without reading it fully and having the poor sod spend a few days writing new keyboard drivers to "solve" the problem. I get nightmares just thinking about it. Watch out for strange bugs like these, and don't forget to point these out to junior team members. They won't know to throw these bugs back to the testers; it is more likely they'll want to fix everything that comes their way and try to make everyone happy. It's a shame that testers sometimes write these issues up, but hey there is such a thing as junior testers, right? Senior testers would never write this stuff into the bug database.
That's another good reason to take a close look at the name of the tester who wrote the bug. Sometimes you'll catch things a little earlier if you know what, and who, you are looking for.
There's an interesting exercise that illustrates the cost of writing a single bug in terms of time and money. This example shows the shortest possible lifecycle of a bug, something that is fixable in a short time by a programmer. We'll put the economics under the spotlight and set the average cost of a developer or tester at $100,000 per year ($0.84/ minute). I can already hear the laughing, so stop and do your own math. I like mine easy.
Tester sees the bug, checks the reproducible steps, and writes it up in the bug database: 2 minutes.
Team lead assigns the bug to a team member: 10 seconds.
Developer reads the bug and decides to fix it: 60 seconds.
Developer finds the code to make the fix, and happens to find it in an isolated section of code that can be changed without recompiling the entire application: 60 seconds.
Developer compiles, links, and tests the fix: 60 seconds.
Developer records the change in the bug database and checks in the source code: 60 seconds.
Team tester checks the bug fix before it is sent to the publisher: 60 seconds.
Publishers tester checks the bug fix and closes the bug: 60 seconds.
Total time: 4 minutes for testing, 4 minutes 10 seconds for development. Total cost: $6.79
That may not seems like a lot of time, but it adds up. For every 100 issues that are this trivial, the team spends about 13 hours and $679. I could have a nice trip to a sandy beach for less than that.
Let's look at something more typical, and throw a little human error in to boot:
Tester sees the bug, forgets to check the reproducible steps, and writes it up in the bug database: 1 minute.
Team lead assigns the bug to a team member: 10 seconds.
Developer realizes that the bug got assigned to the wrong place. She reassigns it back to the team lead: 25 seconds.
Team lead talks to a few developers and finds the right person to fix it: 15 minutes.
Developer reads the bug and decides to fix it: 60 seconds.
Developer finds that the bug can't be reproduced as the tester suggested, and tries a few other things to get the bug to happen, and fails: 5 minutes.
Developer assigns the bug to internal test to try to find repro steps: 10 seconds.
Internal test tries to reproduce the bug, and fails: 15 minutes.
Internal test reassigns the bug back to original tester, marked "Not Repro": 10 seconds.
Publishers tester reproduces the bug instantly and bounces the bug back to the developer: 1 minute.
Developer gets the latest code, does a complete rebuild: 45 minutes.
Developer again attempts to reproduce the bug and fails: 30 minutes.
Developer calls the tester at the publisher and they try to find the problem: 45 minutes.
Developer FTPs entire image of testers machine for on site analysis: 20 minutes. (I'm assuming the developer doesn't just wait for the FTP.)
Developer reproduces the bug and finds that the save game file is corrupted: 15 minutes.
Developer records the findings in the bug database: 60 seconds.
Total time: 2 minutes 25 seconds for testing, 177 minutes 35 seconds for development. Total cost: $151.20
Clearly this is a more extreme case and it does pick on the testing group. I could have just as easily come up with a case where development submitted a bad build causing all kinds of lost time, but that's a different subject.
What's so tragic about this case is it caused nearly 3 hours of development time to be lost to a non-issue. If you have 100 issues like this on your project, you've just burned fifteen thousand dollars and change. That's enough money to take your whole team to a sandy beach. Mistakes like this can and will happen on every project. It is up to the entire team, development and test, to try their best to minimize these problems.