Source Code Repositories and Version Control

In comparing game development with other kinds of software development projects what really stands out is the sheer number of parts that are required. Typically, hundreds of source code files of every description are used. Every sound effect comes from a source WAV or MP3. Every texture has a source PSD or TGA and a companion JPG or BMP after it's been compressed. Every model has a MAX file (if you use 3D Studio) and has multiple source textures. You might also have HTML files for online help or strategy guides. The list goes on and on. Even small games have hundreds, if not thousands, of individual files that all have to be created, checked, fixed, rechecked, tracked, and installed into the game.

Back in the old days the source files for a big project were typically spread all over the place. Some files were stored on a network (if you knew where to look) but most were scattered in various places on different local desktop computers, never to be seen again after the project finished. Unfortunately, these files were frequently lost or destroyed while the project was in production. The artist or programmer would have to grudgingly recreate their work, a hateful task at best.

A Tale from the Pixel Mines

When I first arrived at Origin Systems I noticed some odd labels taped to people's monitors. One said, "The Flame of the Map" and another "The Flame of Conversation." I thought these phrases were Origin's version of Employee of the Month but I was wrong. This was source control in the days of SneakerNet, when Origin didn't have a LAN. If someone wanted to work on something, they physically walked to the machine that was the "Flame of Such and Such" and copied the relevant files onto a floppy disk, stole the flame label, and went back to their machine. Then they became the "Flame." When a build was assembled for QA, everyone carried their floppy disks to the build computer and copied all the flames to one place. Believe it or not, this system worked just fine.

Source control management is a common process used by programmers throughout the game industry. Team programming is simply too hard and too risky to manage without it. Source control can also track media assets such as sound and art, but non-programmers find source control systems unwieldy and will bitterly complain for a better solution.

Most companies choose to track these bits and pieces with the help of a database. The tool you use doesn't have to be anything more than an Excel spreadsheet to keep a list of each file, who touched it last, what's in the file, and why it is important to your game. If you're really cool you might even write a little PHP/MySQL portal site and put a complete content management intranet up on your local network to track files.

To help you put your own version control process in place, I'll introduce you to some of the more popular version control tools that professional game developers use in their practice. I'll also tell you which ones to avoid. Of course, keep in mind that there is no perfect, one-size-fits-all tool or solution. The important thing is that you put some type of process together and that you do it at the beginning of any project.

AlienBrain from NXN

For those of you with really serious asset tracking problems and a good budget to blow, there's a pretty good solution out there that will track your source code and other assets: AlienBrain from NXN. NXN has a client list with over sixty companies that looks like a who's who of the PC and console computer game industry. Their software integrates with nearly every programming and artist tool out there: CodeWarrior, Visual C++, 3DStudio Max, Maya, Photoshop, and many others. An example of AlienBrain is shown in Figure 4.1.

click to expand
Figure 4.1: AlienBrain from NXN.

The downside to AlienBrain is that it's expensive; however, the company, to its credit, has been bringing the price down. The way I view the cost issue is that if a tool like this saves each person on your team ten hours or more of work by keeping good assets from getting wiped out, it will easily pay for itself.

Programmers and "build gurus" will like the fact that AlienBrain has sophisticated branching and pinning mechanisms just like the more advanced source code repositories on the market. (I'll discuss the importance of branching a little later in this chapter.) Artists and other contributors will actually use this product, unlike others that are mainly designed to integrate well with Visual Studio and not creative applications such as Photoshop and 3D Studio Max. One of the big drawbacks to other products is their rather naive treatment of non-text files. AlienBrain was written with these files in mind.

Visual SourceSafe from Microsoft

Visual SourceSafe is the source repository distributed with Microsoft's Visual Studio and it is an excellent example of, "You get what you pay for." What attracts most people to this product is a easy to use GUI interface and an extremely simple setup. You can be up and running on SourceSafe in ten minutes if you don't type slowly.

The biggest problem with SourceSafe is how it stores the source repository. If you dig a bit into the shared files where the repository is stored you'll find a data directory with a huge tree of files with odd names like AAAAAAAB.AAA and AAACCCAA.AAB. The contents of these files are clear text, or nearly, so this wacky naming scheme couldn't have been for security reasons. If anyone out there knows why they did it this way drop me an email; I'm completely stumped.

Each file stores a reverse delta of previous revisions of a file in the repository. Every revision of a file will create a new SourceSafe file with one of those wacky names. For those of you paying attention you'll remember that many of these files will be pretty small, given that some source changes could be as simple as a single character change. The amount of network drive space taken up by SourceSafe is pretty unacceptable in my humble opinion.

There's also a serious problem with speed. Even small projects get to be a few hundred files in size, and large projects can be tens, or even hundreds of thousands of files. Because SourceSafe uses the network directory structure to store its data, access time for opening and closing all these files is quite long and programmers can wait forever while simply checking to see if they have the most recent files. SourceSafe doesn't support branching (see my discussion on branching a little later), unless you make a complete copy of the entire tree you are branching.

Forget attempting to access SourceSafe remotely. Searching thousands of files over a pokey internet connection is murder. Don't even try it. Finally, SourceSafe's file index database can break down, and even the little analyzer utility will throw up its hands and tell you to start over. I've finished projects under a corrupted database before, it just happened that the corruption was affecting a previous version of a file that I didn't need. I was lucky.

If I haven't convinced you to try something other than SourceSafe, let me just say it: Don't use SourceSafe. I've heard rumors that Microsoft doesn't use it either. I guess they don't eat their own dogfood, huh?

The Free Ones: CVS, RCS, ...

Those hard core folks in the Linux community will be happy to know I fully support the use of RCS, CVS, and the other free source code repositories out there. These tools are command line repositories, although the industrious web surfer can find GUI utilities and integration utilities for those of us who have forgotten what a command line looks like.

The great thing about these applications is that they have hundreds of thousands of users and an enormous following. If you have any problems using these tools or getting them set up (which is usually the most difficult part), you'll find lots of help quickly. Probably the most important thing about using these tools is that they are free. You can't argue with that.

Perforce by Perforce Software

If AlienBrain is just out of reach for you financially, you should get Perforce. I've used this product for years and it's never let me down. For any of you lucky enough to move from SourceSafe to Perforce, the first thing you'll notice is its speed. It's damn fast.

Perforce uses a client/server architecture and a Btrieve-based database for storing the repository. That architecture simply blows the pants off anything that uses the network directory hierarchy. More than storing the current status of each version of each file, it even stores the status of each file for everyone who has a client connection. That's why most SourceSafe slaves freak out when they use Perforce the first time; it's so fast they don't believe its actually doing anything. Of course, this makes remote access as fast as it can possibly be.

Best Practice

Since Perforce "knows" the status of any file on your system, you have to be careful to inform it if you do anything to a file outside of the Perforce utilities, such as changing a file you don't have checked out or opened for edit. Perforce will assume you know what you are doing and happily ignore the change. SourceSafe actually does local data/time comparisons, so it will tell you that the local file is different than the network copy. This comes at a horrible degradation in speed.

Perforce has a nice GUI for anyone who doesn't want to use the command line. The GUI will perform about 95% of the tasks you ever need to perform, so you can leave the command line to someone who knows what they're doing.

The branching mechanisms are extremely efficient. If you make a branch from your main line of development to a test line, Perforce only keeps the deltas from branch to branch. Network space is saved, and reemerging branches is also very fast.

You'll find almost as many third party tools that work with Perforce as with some of the free repositories. Free downloads are available including tools that perform graphical merges, C++ APIs, conversion tools from other products like SourceSafe and PVCS, and tons of others.

The Really Expensive Tools: StarTeam and ClearCase

You know these products are expensive when they don't even list the cost per seat on their web site. Instead, you have to send your contact information to a salesman, who is bound to call you more than once.

Starbase's StarTeam and Rational's ClearCase are used at "serious" software companies that have equally serious budgets. These tools have some fantastic features that make all the programmers raise their eyebrows and go, "ohhhhh..." One in particular I found was a branch merge utility that graphically depicted the changes from one branch to another. Nice.

The reality of it is that if you can afford these packages you can afford AlienBrain, which is used by all the serious game studios. If you've really got that much money to burn, buy AlienBrain for everyone on your team and take the rest of the money and give everyone bigger bonuses.

Using Source Control Branches

I freely admit that just until last year I didn't use branching. I also admit that I didn't really know what it was for, but it also wasn't my fault. I blame Microsoft. Their Visual Source Safe tool is distributed with Visual Studio and many engineers use it without question. Microsoft engineers don't use it, and for good reason. Microsoft Office has hundreds of thousands of source files and many hundreds of engineers. SourceSafe was never designed to handle repositories of that size, and it doesn't have some critical features, especially branching.

Branching is a process where an entire source code repository is copied so that parallel development can proceed unhindered on both copies simultaneously. Sometimes the copies are merged back into one tree. It is equally possible that after being branched, the branched versions diverge entirely and are never merged. Why is branching so important? Branches of any code imply a fundamental difference in the lifecycle of that code. You might branch source code to create a new game. You might also branch source code to perform some heavy research. Sometimes a fundamental change, such as swapping out one rendering engine for another or coding a new object culling mechanism is too dangerous to attempt in the main line of code. If you make a new branch, you'll wall off your main line and get the benefits of source control.

SourceSafe's branching mechanism makes a complete copy of the entire source tree. That's slow and fat. Most decent repositories keep track of only the deltas from branch to branch. This approach is much faster and it doesn't penalize you for branching the code.

Here are the branches I use and why:

Main: Normal development branch
Research: A "playground" branch where anything goes, including trashing it entirely
Publish: The branch submitted for milestone approval

The Research and Publish branches originate from the main branch. They may or may not be merged with the main branch, depending on what happens to the code. The Main branch supports the main development effort; most of the files in your project are changed in the main branch.

The Research branch supports experimental efforts. It's a great place to make some core API changes, swap in new middleware, or make any other crazy change without damaging the main line. The Publish branch is for milestone submissions or important demos. Programmers can code fast and furious in the main line while minor tweaks and bug fixes needed for milestone approval are tucked into the Publish branch.

A Tale from the Pixel Mines

Perhaps the best evidence for branching code can be found in how a team works under research and release scenarios. Consider a programming team about to reach a major milestone. The milestone is attached to a big chunk of cash, which is only paid out if the milestone is approved. The team is old fashioned and doesn't know anything about branching. Just before the build, the lead programmer runs around and makes everyone on the team promise not to check in any code while the build is compiling. Everyone promises to keep their work to themselves, but since the build takes a long time everyone continues to work on their own machines.

The build doesn't even compile the first time. One of the programmers forgets to check in a few files. They already started working on other things, but it was easier to revert the changes and fix the compile errors than attempt to finish their work. The programmer loses an hour of work for that—a big penalty. Another programmer considers making a big change to the Al code, but stops short. The Al code might have a few bugs that might halt the milestone, and it's too risky to make major changes. The programmer heads home for the weekend; a little annoyed that the build blocked productive work.

The completed build is FTP'd to the publisher's test department. Just for fun let's assume they upload the build late on Friday night. The test team downloads the new build on Monday morning and finds some heinous problem and can't even begin to run through the milestone acceptance checklist. They get on the phone and call the team; they need a new build pronto.

The problem is tracked to the sound system. The programmer working on the sound system came in over the weekend and ripped out the kitchen because he knew it was buggy, but he's halfway through and it will take two or three days to finish it. The only way a new build can happen fast is by making a band-aid change to the source code that existed on Friday, and launching a new build. Without a branch, the programmer grabs the build code by grabbing the version by date and time, or perhaps with a label if they were smart enough to set one. He hacks together a build and hopes it works. He can't check in the change, so the build has to happen on his machine or by hand copying code to the build machine, with the version control disabled. After the build is approved, all the changes have to be merged back into the latest code without losing anything or even worse, losing everything.

If you don't think this is that bad, you are probably working without branches and have trained yourself to enjoy this little hellish scenario. You never lose any code because you are careful, and so is everyone on your team. I used to think that way too, and I thought branches were too much trouble until I tried them myself.

From experiences like this, I've learned that there is a much better way to manage and build projects. Here's a much better way to do builds:

The lead programmer walks around and makes sure the team has all the milestone changes checked in. She goes to the build machine and launches a milestone build by double clicking on an icon. The build finishes with the same failure as the first team. Instead of reversing their latest work, the compile error is fixed on the build machine in the publish branch, leaving the main branch to be fixed later. No one loses any work. The AI programmer and sound programmer continue working, blissfully in the zone, and get loads of excellent code in the main branch.

The finished build is checked and sent to the publisher via the same FTP site on Friday night. When the phone call comes in Monday morning, the build is hosed, but the simple tweak is made in the publish branch. Every change is in the code repository but in two different branches. The milestone is approved and the lead programmer launches another build script to merge the changes in the publish branch back to the main line. The close script detects a merge conflict in the sound code and they fix it. All the code is back in the main line.

Every change, both the tweaks to fix the build and the ongoing development, is in the source repository. No one loses a single minute of time or line of code. Now, which approach do you like better?