4.6. SCM ToolsThe seven different SCM tools examined in this section are a mixture of closed and open source software. There are noticeably more usable SCM tools available than build tools (see Section 5.5), and there are certainly more tools available from commercial organizations. What should you look for in an SCM tool? Beyond the basic saving and retrieving of different versions of files, I suggest, in order of importance:
Section 4.7, later in this chapter, summarizes the major differences between the tools discussed in this chapter. 4.6.1. CVSCVS (http://www.cvshome.org) is by far the most commonly used open source SCM tool. The CIA project (http://cia.navi.cx), which tracks commits from hundreds of open source projects, shows that 70% of their commits come from projects using CVS. Many of the terms used by CVS, such as commit and check out, have become de facto terms used by other SCM tools. Other SCM tools such as Subversion and Arch are careful to provide a "Migration Guide for CVS Users" document and tools. CVS is licensed under the GPL.
CVS is most commonly used over a network, with a single Unix or Windows-based server providing the repository, though some partial support for distributed servers was added with Version 1.12.10. Developers use CVS clients to check out a sandbox, which is their local working copy of the files under control of CVS. Different developers can check out the same files at the same time, since the C in CVS stands for concurrent. The opposite is true with SCM tools such as Visual SourceSafe, which let only one person at a time work on each file; this becomes a bottleneck even with medium-sized projects. After making changes, the files are checked in to the repository, along with some text comments about the changes. The first person to commit her changes forces the other developers to update their files before they can commit. CVS doesn't care how long you take between checkout and commit. CVS logs are available for each file, and these logs describe all the checkins for that file. CVS supports branches, tags, and also some basic assistance for merges. The CVS project uses the GNU Autotools suite (see Section 5.5.3) to build executables for DEC Alpha, Cray, HP-UX, Solaris, GNU/Linux, FreeBSD, NetBSD, IRIX, OS/2, Windows, Mac OS X, and VMS, among others (see the file INSTALL in the CVS source for the complete list). The CVS source also includes an extensive set of unit tests known as the "sanity checks." CVSNT (http://www.cvsnt.org) is a well-established fork of CVS taken in 1999 by Tony Hoyle, originally to add native support for Windows NT to CVS, but the two products still interoperate well. Features that have been added to CVSNT include better support for Unicode, ACLs, and Windows authentication. WinCVS and MacCVS, which are popular GUIs for using CVS on Windows and Macintoshes, respectively, both use CVSNT under the covers. For many years, the best documentation for CVS was "the Cederqvist," also known formally as "Version Management with CVS" (https://www.cvshome.org/docs/manual), an online manual written by Per Cederqvist that extends the manpage written by Roland Peschand and the FAQ maintained by David G. Grubbs. While the Cederqvist is still useful, and has even been published as a book by Network Theory (http://www.network-theory.co.uk), there are now a number of other good books about CVS. The best ones are Essential CVS, by Jennifer Vesperman (O'Reilly); Open Source Development with CVS, by Moshe Bar and Karl Fogel (Paraglyph), which is also available online at http://cvsbook.red-bean.com; and Pragmatic Version Control Using CVS, by Dave Thomas and Andy Hunt (Pragmatic Bookshelf). There are also numerous how-to documents and tutorials all over the Internet, with particularly good ones at http://en.wikipedia.org/wiki/Concurrent_Versions_System and http://www.devguy.com/fp/cfgmgmt/cvs. The biggest strength of CVS is that many developers are already familiar with it. It does scale well with reasonably large projects (hundreds of users, thousands of files, millions of lines of code) and large file sizes (tens of megabytes), though the time to tag files increases linearly with the number of files and their sizes. CVS is simple to set up and maintain; most CVS servers have the longest uptimes of any machine in a company. It's secure against casual attacks, though it has been cracked in the past (see Section 4.5.5, earlier in this chapter). Since CVS is both open source and mature, there are also dozens of separate tools to add extra functionality to CVS. A few of the most useful are:
Clients for CVS have been written in Java, Tcl, and C++. Most modern IDEs and many bug tracking systems have some level of integration with CVS. CVS is still the default SCM tool for many preconstructed environments, including SourceForge, which is probably the largest CVS user in the world. (The GNU project may have the largest single CVS repository.) Other products that tie all this extra information into one convenient web site for your project are the excellent FishEye (http://www.cenqua.com) and the open source CVS Monitor project (http://ali.as/devel/cvsmonitor). The weaknesses of CVS in many ways reflect the fact that it evolved, rather than being designed as a whole. Interactions with a CVS server are atomic on only a per-directory basis, not per transaction. So if you update your local sandbox at the same time that another developer is checking in his changes, you may get only some of his changes. Alternatively, if something nasty happens to the CVS server during a commit, your commit may fail, with some files changed but with others unchanged. Try hitting Ctrl-C sometime during a CVS commit and then see which files were committed and which ones weren't. (Don't worryanother commit will catch the files that were missed by the first one.) When you create a tag, CVS doesn't let you record a message with a description of why the tag was created. Renaming a file causes a break in the recorded history of that file. Changing the name of a directory requires intervention in the repository by the CVS administrator and may not always be possible, so choose your directory names and hierarchy very carefully. Living with branches and merging in CVS is somewhat of a headache, as described earlier in this chapter in Section 4.5.1 and Section 4.5.2; you should always tag CVS branches before merging from them. Using CVS to keep track of source code from a third party by importing it into your repository is a task to do with a clear head and a written set of notes in front of you, and be careful not to use the files that you just imported from againcheck out a fresh copy instead. Authentication, authorization, and accounting support in CVS is rather rudimentary, and there is no support for an internationalized version of the tool. CVS works best with text files but can handle binary files, albeit inefficiently (and don't forget to use cvs add -kb to disable keyword substitution, in order to avoid corrupting such nontext files). Once an RCS file in a CVS repository exceeds about 10 versions and 100MB on a server with 1GB RAM, you can expect to see slower checkouts of that file, especially if it is on a branch. 4.6.1.1. Making your life with CVS easierThis section contains a number of ideas that can make administering more complex installations of CVS easier:
CVS is the default choice for SCM for many open source and commercial projects. It is also the base standard by which other SCM tools, both commercial and open source, are measured. Subversion (described in the next section) is designed to be a replacement for CVS, but it will be a long time, if ever, before CVS goes away. 4.6.2. SubversionSubversion (http://subversion.tigris.org) is an open source SCM tool designed as a "compelling replacement for CVS." Subversion development has been partially funded by CollabNet (http://www.collabnet.com), a commercial PDE discussed in Section 3.1.3. Subversion is released under the Apache Software Foundation license, with CollabNet given as the copyright holder.
Subversion (also known as SVN) really is like CVS 2.0. Even typing the main command svn feels somehow similar to typing cvs. Even apart from the fact that Subversion has an order of magnitude more code, there are substantial differences between Subversion and CVS under the hood, including a default Berkeley DB database backend rather than the flat-file RCS format used by CVS. (A filesystem backend called FSFS is also available.) However, the basic client/server model used by CVS is unchanged, and you still check files out, edit them, update, and commit them. While using a Subversion client is as easy as using a CVS client, configuring a Subversion server can be a little harder. The default network protocol used to connect a Subversion server and its clients is based on an extension to HTTP that is called WebDAV. If you already have an Apache web server running on your Subversion server machine, you can configure it to use WebDAV and then install and configure the Berkeley DB database. Alternatively, you can use the svnserve executable, which is much more like CVS's cvs server process in concept. The major changes in Subversion compared with CVS are:
Subversion can run on most Unix versions, Windows 2000 (and later for the server), and Mac OS X. Windows support is native and has always been part of the project. The limitation on the server for Windows is due to the use of Berkeley DB, which apparently doesn't run on Windows 95, 98, or ME. Using the FSFS filesystem backend should remove this limitation. A number of tools to convert data from many other SCM tools to Subversion have been developed as part of the product. The script cvs2svn is one such useful tool; it converts existing CVS repositories to Subversion repositories. Some Apache projects have converted some of their repositories to Subversion, and GCC is in the process of doing so. One of the most remarkable things about Subversion has been just how many other projects have sprung up around it, integrating it into existing IDEs and extending existing tools to support it. Even the effort to provide internationalized versions has been impressive. For web-based viewing of repositories, the Python-based ViewCVS (http://viewcvs.sourceforge.net) also supports browsing of Subversion repositories. TortoiseSVN (http://www.tortoisesvn.org) is one graphical client for Subversion that is well integrated with the Windows filesystem browser. Another graphical client for Subversion that can be used on Windows, Linux, and Macintosh machines is SmartSVN (http://www.smartsvn.com). Development of all these supporting tools for Subversion has been made easier by clear documentation from the beginning of the project. One of the main sources of information is the book Version Control with Subversion, by Ben Collins-Sussman, Brian W. Fitzpatrick, and C. Michael Pilato (O'Reilly), which is also available online at http://svnbook.red-bean.com. Other books about Subversion include Practical Subversion, by Garrett Rooney (Apress), which is aimed more at SCM administrators; Pragmatic Version Control Using Subversion, by Mike Mason (Pragmatic Bookshelf); and Subversion in Action, by Jeffrey Machols (Manning). Another useful source of information and discussion about Subversion is the Subversionary web site at http://www.subversionary.org. However, Subversion has limited support for ACLs and the cvs2svn script may have some difficulties handling complex branching schemes. The known bugs in Subversion are publicly available at the Subversion home page. Subversion still has plenty of room left to grow, with a number of ideas already scheduled for later releases. One such idea is the ability to track who is editing which files. Another is the ability to lock files so only one person can edit them at a time. In summary, Subversion set out to build a replacement for CVS while keeping its familiar parts, and for the most part it has succeeded. Expect to see Subversion become the other choice for public SCM tools in PDEs like SourceForge and the Apache Project. CollabNet already uses Subversion as the underlying SCM tool in its PDE product, and more companies are likely to follow. 4.6.3. ArchArch (http://www.gnu.org/software/gnu-arch) is a distributed open source SCM tool, as opposed to the centralized servers of CVS and Subversion. It's designed to scale to tens of thousands of users, in the same way that peer-to-peer (P2P) tools such as BitTorrent have scaled well for distributing large files. Arch is licensed under the GNU General Public License. Note that Arch is still changing, and the version discussed here is tla-1.3, released in December 2004.
At its simplest, using Arch is like having a repository on your own machine, one that you can make commits to, branch, and generally rearrange as you wish, even on your laptop on an airplane. Then you synchronize from other repositories when you want, and they can accept your changes at their discretion. Arch is carefully designed to minimize server-side work, so that it can scale well. It assumes that disk space is cheap and that network communication is the most costly operation. Just like Subversion, Arch provides atomic commits across entire source trees. Practically any shared resource such as a directory, FTP server, or web server can be used as an Arch server. Different versions of the metadata such as tags are stored, in addition to the versions of the files. Arch keeps track of file and directory rename operations by using unique identifiers for everything; these don't change, even when the name of a directory changes. Changesets are a key part of Arch and use the familiar diff format, at least for text files. The unique identifiers for each file make it possible to automatically patch files, even when their names have changed. Arch also remembers which changesets have already been applied, so the potential multiple-merge problems of CVS can be avoided. The default format used for storing files and changesets is simple in the extremecompressed tarballs and a file formatted exactly like an email message. These tarballs have checksums and can also be cryptographically signed to help ensure their integrity. The simple format means that only a few commonly available tools are required for Arch to work properly after installation. Arch is known to work on GNU/Linux, FreeBSD, NetBSD, AIX, and Solaris. Portability to Windows is planned for the near future, but the main focus for Arch still seems to be Unix-based platforms. Other versions of Arch have been written in languages other than C, but tla by Tom Lord seems to be the most commonly used version of Arch. Currently, the best sources of documentation on Arch are the "Arch Meets Hello World" tutorial at http://www.gnu.org/software/gnu-arch/tutorial/arch.html and the ever-changing Wiki at http://wiki.gnuarch.org. Documentation of the rather large number of Arch commands (over a hundred) is terse, which contributes to the generally steep learning curve for Arch. Like any newer product, Arch has its rough edges. When it was evaluated in April 2005 for use with the Linux kernel, it was felt to be too slow for such a large project. Some people feel that the filenames used to refer to particular versions are too long to type comfortably, and that the choice of special characters in the names clashes awkwardly with the same characters used by common shells such as bash and also tools such as vi and vim. Arch has not yet been internationalized, though a fork of it named ArX has been. Other problem areas, which may or may not have been fixed by the time you read this, include the lack of symbolic links, the lack of file permissions (for controlling access), spaces not being allowed in filenames, and some Unix/Windows end-of-line formatting problems. One issue that is unlikely to have changed is that Arch developers can seem arrogant in their zeal for their project. Arch is the best open source example of a trend in SCM tools toward tools that are distributed, rather than centralized on a single server. The emphasis on changes to a project's source code being seen as a collection of separate changesets is also a distinct trend in all modern SCM tools. In terms of development, Arch is roughly where CVS was 10 years ago: definitely usable for noncritical projects, but rough around the edges, particularly with regard to ease of use and documentation. Still, it has the backing that comes with being an official GNU project, and if development continues as it has, Arch could be a strong contender among open source SCM tools. 4.6.4. PerforcePerforce (http://www.perforce.com) is a commercial SCM tool, currently licensed for around $750 per user, which includes a year of support. There are a range of licensing options, including free use for open source projects.
Perforce, also known as P4, is a modern, centralized, fully networked SCM tool. It provides atomic commits across entire depots (repositories) and supports branching and merging well, including automatically tracking when files were merged. Concurrent access to multiple files is the normal way of using Perforce, but unlike CVS, Perforce also keeps track of who is editing each file. Depots store binary files as compressed files and use an RCS-like format for text files. Metadata about the files and changelists (changesets), such as branch information and associated bugs, are stored in a separate, proprietary, journaled database. Backups of Perforce server depots can be made without stopping the server from being used, and no separate licensing server is used, which also reduces administrative work. Perforce is supported on a wide variety of platforms, including almost all recent Unixes; Windows NT, 2000, and later; Macintosh Classic and Mac OS X; and VMS. Windows 95 and 98 are not supported for Perforce servers. Dozens of other platforms are supported for Perforce clients. APIs to use Perforce as part of an application exist for C, C++, Java, Perl, and Python, among other languages. Documentation for Perforce is extensive and of good quality. All the documentation is freely downloadable in convenient file formats from the company's web site. Judging by comments in newsgroups and weblogs and from what I've heard through other sources, the product support team at Perforce is excellent. Training and other consulting services are readily available. Perforce has been carefully designed to scale well as projects grow. For instance, tagging and branching operations are fast, taking much less than the linear time seen with CVS. The Perforce web page http://www.perforce.com/perforce/reviews.html provides some useful comparisons of various SCM tools and tells how each one scales as a project grows. Like any SCM tool that uses a database, Perforce requires attention to maintenance. Disk space allocation and tuning procedures are well documented in the Perforce System Administrator's Guide. Integrity-checking tools are provided to guard against database corruption. Renaming directories and files is a two-step process, but the history of each step is retained. Files on the client machine are read-only until the user tells Perforce that she wants to edit them. This can be awkward if you are working offline, or if an external application wants to write temporary changes to files that are stored in Perforce. In summary, Perforce is similar in architecture to CVS but has stronger functionality and is much faster. The product is mature and well supported, and there are numerous tools that extend or integrate Perforce in various customized ways. Perforce is a good choice for larger groups of developers, especially within a company with the resources to administer it properly. 4.6.5. BitKeeperBitKeeper (http://www.bitkeeper.com) is a commercial SCM product from BitMover. BitKeeper is licensed per person who modifies files, and licenses can either be purchased for around $1,750 or leased for around a third of the purchase cost. There is also a different license for using BitKeeper at no cost. The version described here is 3.2.3, released in August 2004.
BitKeeper, also known as BK, is a modern, distributed SCM tool, complete with atomic operations, changesets, file metadata, strong support for branching and merging, and a web-based graphical interface. Since BitKeeper is fully distributed, it has no central point of failure and it scales extremely well. It also helps that the bandwidth requirements for most common BitKeeper actions are relatively small. Every developer effectively has a copy of the repository on his machine, which makes working with the proverbial laptop on an airplane easy. You can make your local changes available (a push) using a wide variety of protocols from SSH to HTTP, or even using email. BitKeeper handles all the complexity of pushing the changes in a local repository out to other developers' repositories. Renaming of files is handled well, including the tricky problem of two developers renaming the same file at the same time. You can add different comments to different files in a changeset, which is sometimes useful. The data format used by BitKeeper is based on SCCS, the original Unix SCM tool created by Marc Rochkind in 1972. SCCS files include checksums to help avoid corrupted data. BitKeeper runs on most modern Unixes, Mac OS X, and Windows 98 and later releases. There is an long-standing offer from BitMover to support any platform for a sale of over 50 licenses, providing it is POSIX-compliant and not prohibitively expensive. Documentation for BitKeeper is good, though the printable versions are available only with the product. Online documentation is extensive, and support is reportedly very responsive. There is a good demonstration of BitKeeper available at http://www.bitkeeper.com/Test.html. There is an open source BitKeeper client available from BitMover (http://www.bitmover.com/bk-client.shar), though this tool only extracts files from repositories. There is also an open source tool called SourcePuller (http://sourceforge.net/projects/sourcepuller) that can interact more generally with BitKeeper. Development of this tool was what led to the free version of BitKeeper ceasing in 2005. BitKeeper is an attractive commercial SCM tool. The pricing scheme seems to indicate that BitKeeper is competing against ClearCase and is intended for use by large businesses, while still working closely with the open source community for the good publicity. Being chosen for GNU/Linux kernel development is a strong endorsement for any SCM tool. 4.6.6. ClearCaseClearCase (http://www.ibm.com/software/rational) is the SCM part of a large change management environment known as the Rational Unified Process. ClearCase is licensed commercially at around $5,000 per developer, though this is negotiated on a per-site basis, and there is a "lite" version available for around $1,250.
ClearCase is unique among the major SCM tools in that it uses a separate, versioned, distributed filesystem on each developer's machine. Once in this filesystem, you automatically see the chosen versions of the files managed by ClearCase. So you never have to manually update your local copy of a filethe filesystem just makes it appear for you. Alternatively, you can freeze different parts of what you see at particular versions. Developers choose which versions of which sets of files they wish to see by modifying their "configuration specification" file, also known as the "config spec." These files can build on top of each other, allowing for complicated descriptions of which files you end up actually using.
Directories as well as files are versioned, and the ClearCase filesystem supports soft links. The branching and merging environment provided by ClearCase has good graphical support, and the merge tools seem particularly well liked. The ClearCase make tool, ClearMake, provides extensive information about all generated objectsyou can even view the precise command used to generate an object file at any time. ClearCase can also use this information to wink in object files that have already been built, rather like ccache does (see Section 5.4.1). However, ClearMake is noticeably slower than other versions of make, though the accuracy of dependency checking is much improved. ClearMake can also automatically produce a "bill of materials" (BOM) for a release, listing the specific version of each file used to construct the build. Of course, a BOM is only one part of what is needed to reproduce a release: the tools used and their versions are others. ClearCase servers and clients are supported on AIX, HP-UX, IRIX, GNU/Linux, Solaris, and Windows NT, 2000, and later versions. ClearCase comes with extensive documentation and support from IBM. Two useful books are The Art of ClearCase Deployment: The Secrets to Successful Implementation, by Darren W. Pulsipher and Christian D. Buckley (Addison-Wesley), and Software Configuration Management Strategies and Rational ClearCase: A Practical Introduction, by Brian A. White (Addison-Wesley). The biggest drawback of ClearCase for many organizations is its cost, both the initial per-seat cost and the cost of the substantial administrative team required to keep ClearCase working. The large amount of administrative work needed to keep ClearCase running properly explains why it is rarely found in smaller companies. ClearCase can use large amounts of disk space on developers' machines, depending on how it is configured, and places substantial demands on networks. When either of these resources is limited, the performance of ClearCase can become very slow. For small to medium projects, ClearCase is usually seen as overkill. 4.6.7. Visual SourceSafeVisual SourceSafe (http://msdn.microsoft.com/vstudio/productinfo) is a commercial centralized SCM tool from Microsoft. As of 2005, licenses are available for approximately $500 per seat.
Visual SourceSafe is a centralized SCM tool, usually used in a locking (pinning) manner, where only one developer can change a file at a time. It's designed to be used almost exclusively on Windows-based platforms by small groups of developers. One of its strengths is its tight integration with Visual Studio and other Microsoft tools. However, it is not unique in that respect, since Perforce, BitKeeper, and ClearCase also integrate well with Visual Studio. Commits are not atomic across a source tree. There is one non-Microsoft book about Visual SourceSafeEssential SourceSafe, by Ted Roche and Larry C. Whipple (Hentzenwerke Publishing)but it doesn't cover the subjects that many developers find hard to use, such as branching. In the end, the tool's own online help and the MSDN library have the largest amount of information about Visual SourceSafe. Visual SourceSafe is an older product, and frankly, it's showing its age. You can find some (mostly negative) opinions about it at http://www.highprogrammer.com/alan/windev/sourcesafe.html and http://www.developsense.com/testing/VSSDefects.html, and a more balanced discussion at http://c2.com/cgi/wiki?SourceSafe. You could also pay $99 for a formal report by Forrester (http://www.forrester.com). Some people claim that they have had their stored files corrupted using the tool, while others dismiss these claims. Using branches with Visual Studio projects seems to be more complicated than usual to get right, and performance is never fast enough. Supporting multiple time zones for developers requires other add-on products. Some of these issues may be addressed in future releases, but I don't recommend using Visual SourceSafe for any new project. If you are looking for a product that feels like Visual SourceSafe, there is Vault, a commercial SCM tool from SourceGear (http://www.sourcegear.com) that uses the same terminology as Visual SourceSafe but does everything more robustly and over larger networks. There is also a new SCM product from Microsoft, provisionally named Visual Studio 2005 Team System, that's intended for larger groups of developers than is Visual SourceSafe; it is due for release sometime in late 2005. |