Section 4.6. SCM Tools | Practical Development Environments

4.6. SCM Tools

The seven different SCM tools examined in this section are a mixture of closed and open source software. There are noticeably more usable SCM tools available than build tools (see Section 5.5), and there are certainly more tools available from commercial organizations.

What should you look for in an SCM tool? Beyond the basic saving and retrieving of different versions of files, I suggest, in order of importance:

Confidence in the integrity of your data
Fast and simple creation of tags, extraction of tagged files, and generation of diffs
Good support for branching and merging, ideally with both command-line and graphical interfaces
Integration with other existing tools such as bug tracking systems
A good web interface to let people browse the different versions of their files and also to search through earlier versions of the files
Good support from the tool vendor or the tool's community

Section 4.7, later in this chapter, summarizes the major differences between the tools discussed in this chapter.

4.6.1. CVS

CVS (http://www.cvshome.org) is by far the most commonly used open source SCM tool. The CIA project (http://cia.navi.cx), which tracks commits from hundreds of open source projects, shows that 70% of their commits come from projects using CVS. Many of the terms used by CVS, such as commit and check out, have become de facto terms used by other SCM tools. Other SCM tools such as Subversion and Arch are careful to provide a "Migration Guide for CVS Users" document and tools. CVS is licensed under the GPL.

The History of CVS

CVS, which is usually described as an acronym for Concurrent Versions System, was first released by Dick Grune in 1986 as a series of Unix shell script wrappers around RCS (Revision Control System). RCS is an even older SCM tool designed by Walter Tichy for single-user, single-machine, single-directory development environments. Tichy later went on to design another SCM tool named RCE, which stands for Revision Control Engine. By 1989, Brian Berliner had rewritten CVS in C, and Jeff Polk had made it scale better. CVS was originally intended to run on a single machine; later on, the concept of CVS clients and a CVS server were added by Jim Kingdon. CVS is an SCM tool that has evolved, rather than having an overall design.

CVS has been stable since around the end of the last century, but major releases occur about once per year, with around half a dozen minor releases per year. A number of the early contributors to CVS are now part of the Subversion project.

CVS is most commonly used over a network, with a single Unix or Windows-based server providing the repository, though some partial support for distributed servers was added with Version 1.12.10. Developers use CVS clients to check out a sandbox, which is their local working copy of the files under control of CVS. Different developers can check out the same files at the same time, since the C in CVS stands for concurrent. The opposite is true with SCM tools such as Visual SourceSafe, which let only one person at a time work on each file; this becomes a bottleneck even with medium-sized projects. After making changes, the files are checked in to the repository, along with some text comments about the changes. The first person to commit her changes forces the other developers to update their files before they can commit. CVS doesn't care how long you take between checkout and commit. CVS logs are available for each file, and these logs describe all the checkins for that file. CVS supports branches, tags, and also some basic assistance for merges.

The CVS project uses the GNU Autotools suite (see Section 5.5.3) to build executables for DEC Alpha, Cray, HP-UX, Solaris, GNU/Linux, FreeBSD, NetBSD, IRIX, OS/2, Windows, Mac OS X, and VMS, among others (see the file INSTALL in the CVS source for the complete list). The CVS source also includes an extensive set of unit tests known as the "sanity checks." CVSNT (http://www.cvsnt.org) is a well-established fork of CVS taken in 1999 by Tony Hoyle, originally to add native support for Windows NT to CVS, but the two products still interoperate well. Features that have been added to CVSNT include better support for Unicode, ACLs, and Windows authentication. WinCVS and MacCVS, which are popular GUIs for using CVS on Windows and Macintoshes, respectively, both use CVSNT under the covers.

For many years, the best documentation for CVS was "the Cederqvist," also known formally as "Version Management with CVS" (https://www.cvshome.org/docs/manual), an online manual written by Per Cederqvist that extends the manpage written by Roland Peschand and the FAQ maintained by David G. Grubbs. While the Cederqvist is still useful, and has even been published as a book by Network Theory (http://www.network-theory.co.uk), there are now a number of other good books about CVS. The best ones are Essential CVS, by Jennifer Vesperman (O'Reilly); Open Source Development with CVS, by Moshe Bar and Karl Fogel (Paraglyph), which is also available online at http://cvsbook.red-bean.com; and Pragmatic Version Control Using CVS, by Dave Thomas and Andy Hunt (Pragmatic Bookshelf). There are also numerous how-to documents and tutorials all over the Internet, with particularly good ones at http://en.wikipedia.org/wiki/Concurrent_Versions_System and http://www.devguy.com/fp/cfgmgmt/cvs.

The biggest strength of CVS is that many developers are already familiar with it. It does scale well with reasonably large projects (hundreds of users, thousands of files, millions of lines of code) and large file sizes (tens of megabytes), though the time to tag files increases linearly with the number of files and their sizes. CVS is simple to set up and maintain; most CVS servers have the longest uptimes of any machine in a company. It's secure against casual attacks, though it has been cracked in the past (see Section 4.5.5, earlier in this chapter).

Since CVS is both open source and mature, there are also dozens of separate tools to add extra functionality to CVS. A few of the most useful are:

ACLs: These allow you to control who can commit files, according to the user, the branch, and the directory name. The cvs_acls script from the contrib directory of the CVS source and the patches from http://cvsacl.sourceforge.net are examples of such add-ons.
Browsing CVS files: For web-based viewing of repositories, the Python-based ViewCVS interface (http://viewcvs.sourceforge.net) is excellent; it also supports browsing of Subversion repositories.
Graphical CVS clients: There are a number of graphical CVS clients in common use, and they all hide some of the details of the CVS command line. The oldest one is WinCVS (http://www.wincvs.org). TortoiseCVS (http://www.tortoisecvs.org) is well integrated with the Windows filesystem browser. My current favorite graphical CVS client is SmartCVS (http://www.smartcvs.com) because it runs on any platform with a JVM and provides all the add-ons of the other clients by default.
Commit email: The activitymail Perl script, available from https://activitymail.cvshome.org, has a large number of choices for sending email about commits. One particularly useful addition to email is to include links to a web-based view of the files' changes.
Change logs: The cvs2cl Perl script from http://www.red-bean.com/cvs2cl can generate change logs in HTML or XML. These change logs comply with the GNU standard for change logs, which is part of the coding standards at http://www.gnu.org/prep/standards/standards.html#Change-Logs. They can also act as a collection of "poor man's changesets" for CVS, and you can generate scripts to revert complete changesets or merge them to other branches.
Changesets: CVSps (http://www.cobite.com/cvsps) generates changesets from individual commits to a CVS repository.
Local changes: cvsdelta (http://directory.fsf.org/cvsdelta.html) creates summaries of what has changed locally in your sandbox.

Clients for CVS have been written in Java, Tcl, and C++. Most modern IDEs and many bug tracking systems have some level of integration with CVS. CVS is still the default SCM tool for many preconstructed environments, including SourceForge, which is probably the largest CVS user in the world. (The GNU project may have the largest single CVS repository.) Other products that tie all this extra information into one convenient web site for your project are the excellent FishEye (http://www.cenqua.com) and the open source CVS Monitor project (http://ali.as/devel/cvsmonitor).

The weaknesses of CVS in many ways reflect the fact that it evolved, rather than being designed as a whole. Interactions with a CVS server are atomic on only a per-directory basis, not per transaction. So if you update your local sandbox at the same time that another developer is checking in his changes, you may get only some of his changes. Alternatively, if something nasty happens to the CVS server during a commit, your commit may fail, with some files changed but with others unchanged. Try hitting Ctrl-C sometime during a CVS commit and then see which files were committed and which ones weren't. (Don't worryanother commit will catch the files that were missed by the first one.) When you create a tag, CVS doesn't let you record a message with a description of why the tag was created. Renaming a file causes a break in the recorded history of that file. Changing the name of a directory requires intervention in the repository by the CVS administrator and may not always be possible, so choose your directory names and hierarchy very carefully.

Living with branches and merging in CVS is somewhat of a headache, as described earlier in this chapter in Section 4.5.1 and Section 4.5.2; you should always tag CVS branches before merging from them. Using CVS to keep track of source code from a third party by importing it into your repository is a task to do with a clear head and a written set of notes in front of you, and be careful not to use the files that you just imported from againcheck out a fresh copy instead. Authentication, authorization, and accounting support in CVS is rather rudimentary, and there is no support for an internationalized version of the tool. CVS works best with text files but can handle binary files, albeit inefficiently (and don't forget to use cvs add -kb to disable keyword substitution, in order to avoid corrupting such nontext files). Once an RCS file in a CVS repository exceeds about 10 versions and 100MB on a server with 1GB RAM, you can expect to see slower checkouts of that file, especially if it is on a branch.

4.6.1.1. Making your life with CVS easier

This section contains a number of ideas that can make administering more complex installations of CVS easier:

Use modules

The name of what you ask CVS to check out for you is referred to as a module. The top-level directories in your repository are the default modules. The interesting thing about modules is how they can be used to collect different directories from the repository together into a single target for checking out. For instance, if there is a project in the directory projects/projectA and projectA also wants to use files from a directory named common/xml, then entries in the CVSROOT/modules administrative file such as:

# The module named common refers to the top-level directory "common" common          common # The module named common_xml refers to the "xml" subdirectory  # in "common" but it will be named src/xml when checked out common_xml      -d src/xml    common/xml # The module named projectA is a combination of the  # projects/projectA directory and the common/xml directory projectA        projects/projectA &common_xml

will cause the command cvs co projectA to create a local subdirectory projectA with subdirectories src/xml and the directories from projectA. This kind of indirection is important because it can create different directory structures simply by defining new modules. Be warned, though, that you can't tell CVS to use one particular version of the modules file, so be careful not to change the module definitions that are needed for older releases of projects. Modules are an aspect of CVS that are often overlooked, perhaps because they seem complicated to configure, but understanding what you can do with them will make your life with CVS much easier.

Avoid symbolic links

The temptation is so strong. You want to move a directory within the source tree and yet somehow preserve the change history of all its files. You know that just moving the directory in the repository will break your ability to go back in time, since CVS doesn't version directories, only files. But what if you moved the directory anyway and then created a symbolic link (a file that points to another file, also known as a soft link) from the old location to the new one? Yes, it works: developers will see the directory in both the old and new locations, and can commit files in either directory, though locking the directory may not work properly if you configure CVS to use LockDir to keep your locks elsewhere. But what about when the next directory move comes along three months from now? Then you'll have soft links to soft links, and so on. CVS does not keep track of different versions of soft links, so using soft links within a CVS repository always leads to extra work later on.

Sometimes the idea to use soft links arises from wanting to share a directory between two top-level directories without one group having to check out multiple modules. A better approach is to use alias and ampersand modules, as discussed in the previous item in this list.

Synchronize clocks

It's good practice, both for CVS and for build tools such as make, to synchronize the clocks on every machine that will use the tools. ntp is the most common synchronization client and server for Unix, and your local time server may well even be named something like ntp.example.com. Windows XP has its own synchronization client, and the Tardis tool works for all earlier versions of Windows.

Know which commands make immediate changes

After using CVS for a while, you may be lulled into believing that nothing you do in your sandbox can affect the rest of your team until you commit the changes. Wrong! CVS commands that modify the repository, apart from the tagging and branching ones (both the local and remote versions), include cvs add directory, which adds a directory immediately, and cvs import, which changes the head of the tree straightaway. (There is a -X argument with more recent versions of the import command to avoid this problem.) To make your life easier, pause to consider before using the tag, add directory, and import commands.

Save the output

When you are creating tags or branches with cvs tag, or merging versions with cvs update -j, or using cvs import, it's a good idea to save the lengthy output from these commands. Important informationsuch as existing tags not being moved or the names of files with merge conflicts in themappears in the output and is not saved anywhere else. If you do lose the output from a command, you may be able to see which files have conflicts by running cvs -n update.

Be careful with top-level directories

Since renaming directories and moving them around is hard to do well with CVS, some CVS administrators find it helpful to keep all project directories under a single top-level directory. When the time comes to change the directory structure of the project, they can create a new top-level directory and copy the subdirectories into that. One problem with this approach is that it's now more complicated to merge changes into both the old and new top-level directory structures. The neater approach to this problem is to define a module per project and then have the module refer to the directories that make up the project.

Some CVS administrators also find it convenient to make the top-level directory in their repository unwritable by people who aren't also CVS administrators, so that accidental imports don't leave their mistakes there. This does mean that new top-level directories have to be created by a CVS administrator.

Avoid keywords and strings that complicate merges

CVS has some convenient keywords such as $ Date$ and $ Id$ that are automatically expanded during commits to the current date or other information about the file. Unfortunately, when merging files from one branch to another, CVS does not treat the expanded versions of these variables as special, and merges can end up with hundreds of conflicts to be resolved by hand, where most of them are just changes in the date a file was modified. Many people avoid using these keywords and rely on cvs log for the same information. Still, the $ Id$ keyword can be useful if you suspect that releases might escape without their source code being tagged.

Another tip to make merges easier is to avoid using the strings <<<<< and >>>>> in your files. These strings are inserted by CVS to mark conflicts in merged files.

Beware of unexpected shell expansions

If the cvs commit command is used with the -m "some comment here" argument to make a comment about a commit, then shell characters in the comment are expanded. So a comment such as cvs commit -m "Changed the default $PATH value" will have $PATH replaced by its value in the current shell, and the commit message will end up looking something like "Changed the default /usr/local/bin:/usr/bin:/bin value" in your logs. This doesn't happen if you use an editor to add the comment or if you use single quotes instead of double quotes.

Change your shell prompt

When you have lots of different branches checked out in different sandboxes, it's easy to forget which one you're working on. Obviously, naming your local directory something suggestive helps, but you can also add the branch name to your shell prompt and even change the color of the cursor. The following incantation does this for the bash shell: just replace _branch with some text that appears in your branch names. Other shells have similar abilities.

PS1="[\u@\h\$(\ if [ -d CVS ]; then \   if [ -e CVS/Tag ]; then \     cat CVS/Tag | sed -e 's/^T/ /' | sed -e 's/^N/ /' \     | sed -e 's/^D/ Date /' | sed -e 's/_branch/\[\033]12;blue\007\]/'; \   else \     echo ' \[\033]12;black\007\]MAIN' ; \   fi; \ else \   echo '\[\033]12;black\007\]' ; \ fi) \W]\\$ "

Avoid empty directories: You can create empty directories in your CVS repository, and when you check out a tree, the directories will appear as you would expect. There is a handy -P argument to cvs update to remove, or prune, empty directories. However, if you check out a tagged version of your tree, the empty directories are automatically pruned, and you have to run cvs update -d to get them back. The easiest thing to do is avoid empty directories in your source tree and instead create them as needed with your build tool. Adding empty dummy files is an ugly workaround.
Tag CVSROOT too: When you tag some files for a release, don't forget to tag the files in CVSROOT too. These files describe how CVS is configured and can change over time. If you want to know which directories a particular module represented at the time of a release, this will help.

CVS is the default choice for SCM for many open source and commercial projects. It is also the base standard by which other SCM tools, both commercial and open source, are measured. Subversion (described in the next section) is designed to be a replacement for CVS, but it will be a long time, if ever, before CVS goes away.

4.6.2. Subversion

Subversion (http://subversion.tigris.org) is an open source SCM tool designed as a "compelling replacement for CVS." Subversion development has been partially funded by CollabNet (http://www.collabnet.com), a commercial PDE discussed in Section 3.1.3. Subversion is released under the Apache Software Foundation license, with CollabNet given as the copyright holder.

The History of Subversion

The first release of Subversion was in October 2000, and steady monthly releases finally led to Version 1.0 being released in February 2004. Key project members include Karl Fogel (who cowrote one of the CVS books referred to in Section 4.6.1, earlier in this chapter), Jim Blandy, C. Michael Pilato, Brian Fitzpatrick, Greg Stein, Kevin Hancock, and Ben Collins-Sussman.

Subversion (also known as SVN) really is like CVS 2.0. Even typing the main command svn feels somehow similar to typing cvs. Even apart from the fact that Subversion has an order of magnitude more code, there are substantial differences between Subversion and CVS under the hood, including a default Berkeley DB database backend rather than the flat-file RCS format used by CVS. (A filesystem backend called FSFS is also available.) However, the basic client/server model used by CVS is unchanged, and you still check files out, edit them, update, and commit them.

While using a Subversion client is as easy as using a CVS client, configuring a Subversion server can be a little harder. The default network protocol used to connect a Subversion server and its clients is based on an extension to HTTP that is called WebDAV. If you already have an Apache web server running on your Subversion server machine, you can configure it to use WebDAV and then install and configure the Berkeley DB database. Alternatively, you can use the svnserve executable, which is much more like CVS's cvs server process in concept.

The major changes in Subversion compared with CVS are:

Renaming directories and files: Directories are now versioned, just like files. You can rename directories and files and still follow their commit history.
Atomic operations: All Subversion operations either succeed fully, or fail with no changes made to the repository.
Versioned metadata: Every file and directory can have arbitrary information (metadata) associated with it as key/value pairs, and this information is versioned. Recording files' owners, ACLs, and any other information needed for specific sites can be implemented using this mechanism.
Full support for binary files: Subversion is designed to fully support both binary and text files much more efficiently than CVS does.
Cheaper branching and tagging: The cost of branching and tagging need not increase with the project size.

Where Subversion and CVS Differ Most

The major difference about Subversion for most CVS users is what version numbers now mean. In CVS and many other SCM tools, version numbers are assigned per file. In Subversion, version numbers are per change to a repository. So "Version 23" of a file now means that the repository has had 23 commits to it, not that a particular file has had 23 commits to it. This also means that the version number of a file changes on every commit anywhere in the repository, even if the file hasn't changed at all.

Another difference is that tags and branches appear in the URL of the repository and look exactly like directories. For example, the URL for a subdirectory subdir in the main development of a project myproject could be http://svn.example.com/myproject/trunk/subdir, and a branch for release 1.0 could look like http://svn.example.com/myproject/branches/rel_1_0_branch/subdir.

Subversion can run on most Unix versions, Windows 2000 (and later for the server), and Mac OS X. Windows support is native and has always been part of the project. The limitation on the server for Windows is due to the use of Berkeley DB, which apparently doesn't run on Windows 95, 98, or ME. Using the FSFS filesystem backend should remove this limitation.

A number of tools to convert data from many other SCM tools to Subversion have been developed as part of the product. The script cvs2svn is one such useful tool; it converts existing CVS repositories to Subversion repositories. Some Apache projects have converted some of their repositories to Subversion, and GCC is in the process of doing so.

One of the most remarkable things about Subversion has been just how many other projects have sprung up around it, integrating it into existing IDEs and extending existing tools to support it. Even the effort to provide internationalized versions has been impressive. For web-based viewing of repositories, the Python-based ViewCVS (http://viewcvs.sourceforge.net) also supports browsing of Subversion repositories. TortoiseSVN (http://www.tortoisesvn.org) is one graphical client for Subversion that is well integrated with the Windows filesystem browser. Another graphical client for Subversion that can be used on Windows, Linux, and Macintosh machines is SmartSVN (http://www.smartsvn.com).

Development of all these supporting tools for Subversion has been made easier by clear documentation from the beginning of the project. One of the main sources of information is the book Version Control with Subversion, by Ben Collins-Sussman, Brian W. Fitzpatrick, and C. Michael Pilato (O'Reilly), which is also available online at http://svnbook.red-bean.com. Other books about Subversion include Practical Subversion, by Garrett Rooney (Apress), which is aimed more at SCM administrators; Pragmatic Version Control Using Subversion, by Mike Mason (Pragmatic Bookshelf); and Subversion in Action, by Jeffrey Machols (Manning). Another useful source of information and discussion about Subversion is the Subversionary web site at http://www.subversionary.org.

However, Subversion has limited support for ACLs and the cvs2svn script may have some difficulties handling complex branching schemes. The known bugs in Subversion are publicly available at the Subversion home page. Subversion still has plenty of room left to grow, with a number of ideas already scheduled for later releases. One such idea is the ability to track who is editing which files. Another is the ability to lock files so only one person can edit them at a time.

In summary, Subversion set out to build a replacement for CVS while keeping its familiar parts, and for the most part it has succeeded. Expect to see Subversion become the other choice for public SCM tools in PDEs like SourceForge and the Apache Project. CollabNet already uses Subversion as the underlying SCM tool in its PDE product, and more companies are likely to follow.

4.6.3. Arch

Arch (http://www.gnu.org/software/gnu-arch) is a distributed open source SCM tool, as opposed to the centralized servers of CVS and Subversion. It's designed to scale to tens of thousands of users, in the same way that peer-to-peer (P2P) tools such as BitTorrent have scaled well for distributing large files. Arch is licensed under the GNU General Public License. Note that Arch is still changing, and the version discussed here is tla-1.3, released in December 2004.

The History of Arch

Arch began life as a collection of shell scripts by Tom Lord in 2002. Its growth since then has been rapid, driven by both the attractions of distributed SCM and some good publicity, and Arch has been rewritten in C. The personalities involved in developing Arch are definitely prickly at times, and Arch versus Subversion mudslinging has become something of a cliché in discussions about open source SCM tools.

At its simplest, using Arch is like having a repository on your own machine, one that you can make commits to, branch, and generally rearrange as you wish, even on your laptop on an airplane. Then you synchronize from other repositories when you want, and they can accept your changes at their discretion.

Arch is carefully designed to minimize server-side work, so that it can scale well. It assumes that disk space is cheap and that network communication is the most costly operation. Just like Subversion, Arch provides atomic commits across entire source trees. Practically any shared resource such as a directory, FTP server, or web server can be used as an Arch server. Different versions of the metadata such as tags are stored, in addition to the versions of the files. Arch keeps track of file and directory rename operations by using unique identifiers for everything; these don't change, even when the name of a directory changes.

Changesets are a key part of Arch and use the familiar diff format, at least for text files. The unique identifiers for each file make it possible to automatically patch files, even when their names have changed. Arch also remembers which changesets have already been applied, so the potential multiple-merge problems of CVS can be avoided. The default format used for storing files and changesets is simple in the extremecompressed tarballs and a file formatted exactly like an email message. These tarballs have checksums and can also be cryptographically signed to help ensure their integrity. The simple format means that only a few commonly available tools are required for Arch to work properly after installation.

Arch is known to work on GNU/Linux, FreeBSD, NetBSD, AIX, and Solaris. Portability to Windows is planned for the near future, but the main focus for Arch still seems to be Unix-based platforms. Other versions of Arch have been written in languages other than C, but tla by Tom Lord seems to be the most commonly used version of Arch.

Currently, the best sources of documentation on Arch are the "Arch Meets Hello World" tutorial at http://www.gnu.org/software/gnu-arch/tutorial/arch.html and the ever-changing Wiki at http://wiki.gnuarch.org. Documentation of the rather large number of Arch commands (over a hundred) is terse, which contributes to the generally steep learning curve for Arch.

Like any newer product, Arch has its rough edges. When it was evaluated in April 2005 for use with the Linux kernel, it was felt to be too slow for such a large project. Some people feel that the filenames used to refer to particular versions are too long to type comfortably, and that the choice of special characters in the names clashes awkwardly with the same characters used by common shells such as bash and also tools such as vi and vim. Arch has not yet been internationalized, though a fork of it named ArX has been. Other problem areas, which may or may not have been fixed by the time you read this, include the lack of symbolic links, the lack of file permissions (for controlling access), spaces not being allowed in filenames, and some Unix/Windows end-of-line formatting problems. One issue that is unlikely to have changed is that Arch developers can seem arrogant in their zeal for their project.

Arch is the best open source example of a trend in SCM tools toward tools that are distributed, rather than centralized on a single server. The emphasis on changes to a project's source code being seen as a collection of separate changesets is also a distinct trend in all modern SCM tools. In terms of development, Arch is roughly where CVS was 10 years ago: definitely usable for noncritical projects, but rough around the edges, particularly with regard to ease of use and documentation. Still, it has the backing that comes with being an official GNU project, and if development continues as it has, Arch could be a strong contender among open source SCM tools.

4.6.4. Perforce

Perforce (http://www.perforce.com) is a commercial SCM tool, currently licensed for around $750 per user, which includes a year of support. There are a range of licensing options, including free use for open source projects.

The History of Perforce

Perforce Software was founded by Christoper Seiwald in 1995. Seiwald also created the Jam build tool described in Section 5.5.5. Perforce sales have grown to include thousands of companies, including Microsoft (see Section 4.6.7, later in this chapter) and Google.

Perforce, also known as P4, is a modern, centralized, fully networked SCM tool. It provides atomic commits across entire depots (repositories) and supports branching and merging well, including automatically tracking when files were merged. Concurrent access to multiple files is the normal way of using Perforce, but unlike CVS, Perforce also keeps track of who is editing each file. Depots store binary files as compressed files and use an RCS-like format for text files. Metadata about the files and changelists (changesets), such as branch information and associated bugs, are stored in a separate, proprietary, journaled database. Backups of Perforce server depots can be made without stopping the server from being used, and no separate licensing server is used, which also reduces administrative work.

Perforce is supported on a wide variety of platforms, including almost all recent Unixes; Windows NT, 2000, and later; Macintosh Classic and Mac OS X; and VMS. Windows 95 and 98 are not supported for Perforce servers. Dozens of other platforms are supported for Perforce clients. APIs to use Perforce as part of an application exist for C, C++, Java, Perl, and Python, among other languages.

Documentation for Perforce is extensive and of good quality. All the documentation is freely downloadable in convenient file formats from the company's web site. Judging by comments in newsgroups and weblogs and from what I've heard through other sources, the product support team at Perforce is excellent. Training and other consulting services are readily available.

Perforce has been carefully designed to scale well as projects grow. For instance, tagging and branching operations are fast, taking much less than the linear time seen with CVS. The Perforce web page http://www.perforce.com/perforce/reviews.html provides some useful comparisons of various SCM tools and tells how each one scales as a project grows.

Like any SCM tool that uses a database, Perforce requires attention to maintenance. Disk space allocation and tuning procedures are well documented in the Perforce System Administrator's Guide. Integrity-checking tools are provided to guard against database corruption. Renaming directories and files is a two-step process, but the history of each step is retained. Files on the client machine are read-only until the user tells Perforce that she wants to edit them. This can be awkward if you are working offline, or if an external application wants to write temporary changes to files that are stored in Perforce.

In summary, Perforce is similar in architecture to CVS but has stronger functionality and is much faster. The product is mature and well supported, and there are numerous tools that extend or integrate Perforce in various customized ways. Perforce is a good choice for larger groups of developers, especially within a company with the resources to administer it properly.

4.6.5. BitKeeper

BitKeeper (http://www.bitkeeper.com) is a commercial SCM product from BitMover. BitKeeper is licensed per person who modifies files, and licenses can either be purchased for around $1,750 or leased for around a third of the purchase cost. There is also a different license for using BitKeeper at no cost. The version described here is 3.2.3, released in August 2004.

The History of BitKeeper

BitMover, Inc., the company that produces BitKeeper, was founded by Larry McVoy and others in 1998. McVoy was also the designer of TeamWare, the SCM tool used and sold by Sun. A huge success for BitKeeper was its choice by Linus Torvalds as the SCM tool to replace CVS for developing the GNU/Linux kernel. Some open source developers found the terms of the free license unpalatable, and there was some spirited discontent back in 2002, when a few Subversion developers clashed with Larry McVoy over the changing terms of the free license. The publicity doesn't seem to have hindered BitKeeper's sales. In early 2005, reverse engineering of BitKeeper's protocol led to the free version of BitKeeper no longer being offered by BitMover. Torvalds then began developing an SCM tool named git, which was designed as a possible replacement for BitKeeper in the Linux kernel project.

BitKeeper, also known as BK, is a modern, distributed SCM tool, complete with atomic operations, changesets, file metadata, strong support for branching and merging, and a web-based graphical interface. Since BitKeeper is fully distributed, it has no central point of failure and it scales extremely well. It also helps that the bandwidth requirements for most common BitKeeper actions are relatively small. Every developer effectively has a copy of the repository on his machine, which makes working with the proverbial laptop on an airplane easy. You can make your local changes available (a push) using a wide variety of protocols from SSH to HTTP, or even using email.

BitKeeper handles all the complexity of pushing the changes in a local repository out to other developers' repositories. Renaming of files is handled well, including the tricky problem of two developers renaming the same file at the same time. You can add different comments to different files in a changeset, which is sometimes useful. The data format used by BitKeeper is based on SCCS, the original Unix SCM tool created by Marc Rochkind in 1972. SCCS files include checksums to help avoid corrupted data.

BitKeeper runs on most modern Unixes, Mac OS X, and Windows 98 and later releases. There is an long-standing offer from BitMover to support any platform for a sale of over 50 licenses, providing it is POSIX-compliant and not prohibitively expensive.

Documentation for BitKeeper is good, though the printable versions are available only with the product. Online documentation is extensive, and support is reportedly very responsive. There is a good demonstration of BitKeeper available at http://www.bitkeeper.com/Test.html. There is an open source BitKeeper client available from BitMover (http://www.bitmover.com/bk-client.shar), though this tool only extracts files from repositories. There is also an open source tool called SourcePuller (http://sourceforge.net/projects/sourcepuller) that can interact more generally with BitKeeper. Development of this tool was what led to the free version of BitKeeper ceasing in 2005.

BitKeeper is an attractive commercial SCM tool. The pricing scheme seems to indicate that BitKeeper is competing against ClearCase and is intended for use by large businesses, while still working closely with the open source community for the good publicity. Being chosen for GNU/Linux kernel development is a strong endorsement for any SCM tool.

4.6.6. ClearCase

ClearCase (http://www.ibm.com/software/rational) is the SCM part of a large change management environment known as the Rational Unified Process. ClearCase is licensed commercially at around $5,000 per developer, though this is negotiated on a per-site basis, and there is a "lite" version available for around $1,250.

The History of ClearCase

ClearCase grew out of the DSEE SCM tool by Apollo (which was later bought by HP) but was first developed and released by Atria in 1992. Atria merged with PureSoft, and the merged company was later bought by Rational, which in turn was bought by IBM. ClearCase is used by HP, 3Com, eBay, Cisco, and many other large computer-related companies.

ClearCase is unique among the major SCM tools in that it uses a separate, versioned, distributed filesystem on each developer's machine. Once in this filesystem, you automatically see the chosen versions of the files managed by ClearCase. So you never have to manually update your local copy of a filethe filesystem just makes it appear for you. Alternatively, you can freeze different parts of what you see at particular versions. Developers choose which versions of which sets of files they wish to see by modifying their "configuration specification" file, also known as the "config spec." These files can build on top of each other, allowing for complicated descriptions of which files you end up actually using.

If the ClearCase server is unavailable, not only will developers be unable to use the SCM tool, they won't see the directories containing the ClearCase controlled files. To ensure that the networked filesystem remains available all the time, ClearCase supports redundant servers as well as the ability to distribute source trees across multiple servers.

Directories as well as files are versioned, and the ClearCase filesystem supports soft links. The branching and merging environment provided by ClearCase has good graphical support, and the merge tools seem particularly well liked. The ClearCase make tool, ClearMake, provides extensive information about all generated objectsyou can even view the precise command used to generate an object file at any time. ClearCase can also use this information to wink in object files that have already been built, rather like ccache does (see Section 5.4.1). However, ClearMake is noticeably slower than other versions of make, though the accuracy of dependency checking is much improved. ClearMake can also automatically produce a "bill of materials" (BOM) for a release, listing the specific version of each file used to construct the build. Of course, a BOM is only one part of what is needed to reproduce a release: the tools used and their versions are others.

ClearCase servers and clients are supported on AIX, HP-UX, IRIX, GNU/Linux, Solaris, and Windows NT, 2000, and later versions.

ClearCase comes with extensive documentation and support from IBM. Two useful books are The Art of ClearCase Deployment: The Secrets to Successful Implementation, by Darren W. Pulsipher and Christian D. Buckley (Addison-Wesley), and Software Configuration Management Strategies and Rational ClearCase: A Practical Introduction, by Brian A. White (Addison-Wesley).

The biggest drawback of ClearCase for many organizations is its cost, both the initial per-seat cost and the cost of the substantial administrative team required to keep ClearCase working. The large amount of administrative work needed to keep ClearCase running properly explains why it is rarely found in smaller companies. ClearCase can use large amounts of disk space on developers' machines, depending on how it is configured, and places substantial demands on networks. When either of these resources is limited, the performance of ClearCase can become very slow. For small to medium projects, ClearCase is usually seen as overkill.

4.6.7. Visual SourceSafe

Visual SourceSafe (http://msdn.microsoft.com/vstudio/productinfo) is a commercial centralized SCM tool from Microsoft. As of 2005, licenses are available for approximately $500 per seat.

The History of Visual SourceSafe

SourceSafe, originally written by Brian Harry and Kenny Felder, was purchased by Microsoft in 1994. Development continued, with the current release being Visual SourceSafe 6.0.

Interestingly, Microsoft itself used an internally developed version of RCS named SLM until 1999, when it began using a version of Perforce named SourceDepot.

Visual SourceSafe is a centralized SCM tool, usually used in a locking (pinning) manner, where only one developer can change a file at a time. It's designed to be used almost exclusively on Windows-based platforms by small groups of developers. One of its strengths is its tight integration with Visual Studio and other Microsoft tools. However, it is not unique in that respect, since Perforce, BitKeeper, and ClearCase also integrate well with Visual Studio. Commits are not atomic across a source tree.

There is one non-Microsoft book about Visual SourceSafeEssential SourceSafe, by Ted Roche and Larry C. Whipple (Hentzenwerke Publishing)but it doesn't cover the subjects that many developers find hard to use, such as branching. In the end, the tool's own online help and the MSDN library have the largest amount of information about Visual SourceSafe.

Visual SourceSafe is an older product, and frankly, it's showing its age. You can find some (mostly negative) opinions about it at http://www.highprogrammer.com/alan/windev/sourcesafe.html and http://www.developsense.com/testing/VSSDefects.html, and a more balanced discussion at http://c2.com/cgi/wiki?SourceSafe. You could also pay $99 for a formal report by Forrester (http://www.forrester.com). Some people claim that they have had their stored files corrupted using the tool, while others dismiss these claims. Using branches with Visual Studio projects seems to be more complicated than usual to get right, and performance is never fast enough. Supporting multiple time zones for developers requires other add-on products.

Some of these issues may be addressed in future releases, but I don't recommend using Visual SourceSafe for any new project. If you are looking for a product that feels like Visual SourceSafe, there is Vault, a commercial SCM tool from SourceGear (http://www.sourcegear.com) that uses the same terminology as Visual SourceSafe but does everything more robustly and over larger networks. There is also a new SCM product from Microsoft, provisionally named Visual Studio 2005 Team System, that's intended for larger groups of developers than is Visual SourceSafe; it is due for release sometime in late 2005.