2.1. Why Subversion?Subversion may be successful, but does that mean it's right for your project? If you're currently using CVS, the answer is almost certainly yes; if you're not using CVS, the answer very well may still be yes. Of course, choosing a version control system for a project is a big decision, and making big decisions based solely on "because that book-author-guy said so" isn't usually a good idea (and could be grounds for firing if the project is your job). So, since you don't want to take my word for it, let's examine why Subversion (which is also known as SVN) may be a good choice for a software development project. 2.1.1. A Software Engineering ToolSoftware development is a craft, on par with the finest woodworking or most intricate mechanical clock. Every craft, however, reaches its highest potential when it moves beyond mere craftsmanship, and becomes an engineering discipline. Although it is still in its infancy, the discipline of software engineering aims to elevate software development to the level of the other engineering practices. Part of a good engineering process, though, are tools that enable that process, and Subversion is one of those tools. Subversion is hosted (as an open source project) by tigris.org, an online community designed to promote the development of open source tools for software engineering. As the Tigris mission statement says, open source development has contributed much to software development tools and practices, and will almost certainly continue to do so in the future. This gives Subversion a focus that promotes its viability as not just a repository for source code, but a full-blown tool for software engineering. Because Tigris is an active community, dedicated to the success of the projects hosted there, it is likely that Subversion will continue to evolve over time into a powerful tool for software engineering. Tigris is funded by CollabNet (www.collab.net), which is responsible for funding Subversion's initial development. In fact, many of the core Subversion developers are employed full time by CollabNet, to work on further Subversion development. Already, Subversion is a flexible tool, capable of supporting many aspects of organized software engineering. Much of this book, in fact, is dedicated to showing how Subversion can fit into a variety of different development processes. 2.1.2. Open Source SolutionsI grew up in a small town surrounded by farmland, and even though I've never worked on a farm, when you live near them long enough you can't help but learn a few things. One of those things you learn is the importance of diversity. A smart farmer never stakes the entire future of the family farm on a single source. For example, if you grow corn, you grow a variety, to minimize the chance that a disease or insect will come through and kill the whole crop. Genetic diversity in the fields makes the whole stronger and more resilient. When I first learned about open source software, I quickly realized that it follows the same principle of the farmers who plant a variety of corn in their fields. By allowing a menagerie of software developers to contribute to a project, the project is strengthened, and becomes much more resilient to hardships over time. An open source project is significantly less likely to succumb to the economic equivalent of an insect swarm, which helps to ensure not only the quality of the product that evolves, but also the longevity. Subversion is licensed under an open source license similar to those used for the Apache Web server and the BSD operating systems. In short, the license states that you are free to do whatever you want with it, as long as you give credit to CollabNet, which currently holds the copyright for Subversion. Unlike the GNU Public License, used for Subversion's predecessor CVS, Subversion does not require you to distribute code for any changes you makealthough contributing changes back to the Subversion project helps ensure that they remain compatible with the rest of the system as new versions are released, as well as provides others who may have similar problems to solve the benefit of your brilliance. So, in addition to it being free of charge, exactly what direct benefit do you as the end user gain from Subversion's open source license? I want to avoid philosophical discussion of whether you have a right to open source software or of the supposed "evilness" of closed source products. There are, however, a number of practical benefits to SVN being an open source project. By being open, Subversion positions itself to be a standard in version control and has a good chance of succeeding at overthrowing the reigning de facto standard, CVS. Standards, of course, are generally a good thing to deal with. It means that even if your other tools don't currently support SVN, the odds are that they will in the future; or if they don't, someone will develop a third-party solution to adapt your current tools to Subversion. It also means that more people will know how to deal with Subversion. For open source projects, this increases the number of people who will know how to use Subversion, which will hopefully help increase the number of people willing to get involved in the project and contribute code. Conversely, if your project is internal or closed source, the increased number of people who know how to deal with Subversion will still help, by decreasing the likelihood that developers will have to be explicitly trained in SVN's use by their employers. The open nature of Subversion also makes it very integrable. The core of Subversion is, in fact, a set of libraries with a well-documented, open application programming interface (API). This API allows developers to write custom clients for Subversion, integrate it into another tool, provide a GUI, or even automate some of the process of interacting with an SVN repository. 2.1.3. Major Features of SVNWhat, exactly, can Subversion do? What features make it stand out against other version control systems? What features foster familiarity by being similar to other version control systems? How do these work together to form an excellent version control system? To answer those questions, let's take a look at the highlights of Subversion's feature list, and how many of those features stack up against other version control systems, such as the venerable CVS, Microsoft's Visual SourceSafe, or GNU's Arch. Basic OperationThe basic interface to Subversion is very similar to CVS. The primary method of interfacing with Subversion is through the command line (although there are some very good GUI front ends available), and by design, SVN has very similar command-line operations to CVS. Commands have not been needlessly renamed, and for the most part, if a user is familiar with how to do something in CVS, doing the same thing in SVN will have similar (if not identical) syntax. In fact, unless there was a compelling reason for changing the syntax, Subversion uses identical command syntax for all of the Subversion features that have a CVS counterpart. Of course, with Subversion's much larger feature-set, Subversion has many commands that don't have a CVS counterpart, but the basics are extremely easy to pick up and the overall command-set is still reasonably small and easy to learn. Even if you aren't coming from a CVS background, the concepts are similar to many other version control systems, and the differences should be easy to learn. This clean, simple approach sets off Subversion from another common open source VCS, Arch, which has a complex command-set and a paradigm vastly different from that of CVS or Subversion. Like many common version control systems, Subversion uses a client-server paradigm, where a central repository sits on a server and clients check out local working copies where they can modify things as much as they like. When a modification is complete, changes between the repository and the working copy are merged, and the modified version is committed back to the repository. Repository FlexibilitySubversion allows for great flexibility in layout of repositories, by keeping revision histories for both files and directories across moves, copies, and renames. This may not seem like a big deal if you're not familiar with other version control systems, but copy, move, and rename functionality is a feature sorely lacking in many popular version control systems. Most notably, CVS is notoriously inflexible when trying to modify an existing repository's structure. It does not allow files to be moved, copied, or renamed without splitting the file's history (so that you need to know its old name to see its old history). Worse, CVS doesn't allow directories to be moved around (or even deleted) without editing the repository directly. Similarly, moves, renames, and copies are difficult in Microsoft's Visual SourceSafe, although not quite as bad as CVS. On the other hand, with Subversion, you can move, rename, and copy files and directories as much as you like without any worry that you will lose (or split) your history or corrupt older revisions. Atomic CommitsSubversion uses transactions whenever it modifies the database. When a commit starts, Subversion marks the current state of the database, then makes its modifications. That way, if a crash (or bang or boom) interrupts the commit, there is no risk that the database will be corrupted. When it is resumed, the database will automatically be restored to its state before the commit began. This is another feature sorely lacking in many older VCSs, such as CVS or Visual SourceSafe. If a network glitch or software crash causes a commit in either of those systems to fail, the repository can be left in a corrupted, unstable state, which may require the repository to be restored from backup. Branches and TagsMost version control systems allow for the revision trees of individual files and directories to be branched and tagged. Subversion, on the other hand, does not explicitly support this. In fact, SVN has no built-in concept of either branches or tags. Instead, it provides cheap copies. When a developer uses the svn copy command, Subversion does not make a copy of data contained in those files. Instead, it just marks the location of the new file and links it back to the history of the original file, up to the point where the copy is made. From that point on, if changes are applied to the copy, a new path of revision is created for the copy, independent of subsequent revisions applied to the original file. Using this paradigm, a branch can be created by simply copying the directory (or file) to be branched. Usually, this is done into a directory named branches, so that it is always clear to users that they are dealing with a branch of part of the repository's main trunk. There is no enforcement of this in Subversion, though, and in Chapter 9, "Organizing Your Repository," I talk about a variety of different approaches that you can take when deciding where to place branches, to best fit a project's style of development. Similarly, tags are also created by making a copy, usually in a tags directory. Like branches, this makes for a wide latitude of flexibility when dealing with how tags are used. The downside is that there is no built-in enforcement to make sure that tags stay tags, and don't inadvertently become branches when someone makes a change to them. It is possible, though, to enforce tag policies using either Subversion's support for hook scripts, or (if you are using Apache as your server) permissions on the tags directory to disallow changes to files in the tags directory after they have been created. Binary FilesVersioning binary files is a more difficult task than versioning text files. With a text file, the file data itself has meaning to a human being, which makes it easy to merge files or examine their differences. With a binary file, though, you need an external program to interpret the file and present it in a manner that has meaning for a human. This makes it difficult for a version control system to automatically perform merges or present diffs, because it has no context for performing merges properly and no way to present the result of a diff in a meaningful manner. Instead, diffs will result in incomprehensible binary data, and merges will likely result in corrupted files that cannot be read by the proper external program. However, versioning of binary files is not hopeless, because they can at least be stored in a versioned manner that allows different versions to be retrieved and compared with external tools. Anyone who has ever used CVS to version binary files, though, knows that it handles them quite poorly. So poorly, in fact, that it doesn't store differences to versioned binary files. Instead, it just stores an entire new copy of the file whenever a binary file is committed. Subversion improves on this by using a binary difference function for all files, which allows binary files to be versioned the same as text files. Subversion still doesn't have any direct support for automatically merging binary files (which would be nearly impossible anyway, unless SVN could understand the binary files). It does, however, have much better support for resolution of merges that can't be handled automatically. When a merge conflict occurs, Subversion provides complete copies of both versions of the file, which allows the user to easily use an external editor to manually merge the conflicted file. Symbolic LinksRelease 1.1 of Subversion adds the capability to version symbolic links from UNIX systemslike GNU Arch and unlike CVS. If a user is working on a UNIX-like system, he can add symbolic links to the repository, just as he would any other file, and the repository will retain the link information for any other UNIX user who checks out a working copy of the repository. (Windows users will not get symbolic links, because Windows does not support UNIX-style symbolic links.) Conflict ResolutionSubversion and CVS both use a paradigm of making modifications and then merging them with the modifications others have made, instead of the file locking paradigm used by many other VCSs like Visual SourceSafe. Resolving conflicts in merges when using CVS can be a bit messy, though. When CVS fails to automatically merge changes between the working copy and the server, it replaces the conflicted file in the working copy with a version of the file containing diffs of the two different versions. If the conflict was large, the resulting diff can waste hours while the developer tries to sort through the changes; and because the local changes are not backed up, there is no way to revert to the pre-conflict state of the working copy without sorting through the diff. Subversion, of course, can't prevent conflicts from happening anymore than CVS can, but it does handle them better when they do happen. If a conflict occurs, SVN replaces the offending file with one containing diffs, just like CVS; however, it also adds temporary versions of the file with the local version, the server version, and the local version prior to any changes. These extra files make resolving conflicts significantly less painful, and once the conflict has been settled, a call to svn resolved removes the extra files. StorageSubversion has a flexible repository backend that allows different types of repository storage systems to be plugged in, transparently to the client. Originally, the only actual repository storage system that was available was the Berkeley DB database system. As of release 1.1 of Subversion, though, a filesystem-based backend (FSFS) is also available as part of the core Subversion system. Instead of storing the entire repository database in a single monolithic database, like the Berkeley DB backend does, the Subversion FSFS storage uses individual files for each revision in the repository. So, when you commit revision 3529, there will be a file named 3529 created, which holds all of the changes for that revision, regardless of how many versioned files were changed in that revision. Network ProtocolsSubversion provides two servers for communicating with the repository via different protocols. The first server (known as svnserve) uses an SVN-specific network protocol that requires a dedicated server and open port, which allows a Subversion server to be set up quickly and easily. svnserve also supports Inetd access, or tunneled-SSH style access. The other server is a module for the Apache Web server, and is based on the Web-based Distributed Authoring and Versioning (WebDAV) protocol, with a few extensions for version control-specific operations. By using this standard protocol, served over HTTP via Apache, there is no need to open a special port on the server. Because WebDAV support is built into a variety of file managers on different operating systems, it is also possible to get limited access to interact with a repository directly through the Gnome Nautilus file manager on Linux, the Microsoft Windows Explorer, or any other WebDAV client. If readonly access to the repository head is all that is required, you can even access the repository through a Web browser with no special clients or setup required. Data TransferSubversion reduces much of the overhead that is associated with communications between the client and server, through a couple of methods not used by many older VCSs, such as CVS. For starters, it only transfers file differences both from client to server and from server to client, whenever possibleunlike CVS, which only transfers differences when going from server to client. Subversion also caches a lot more information locally, which allows it to avoid network communications altogether in many instances. It even stores a full copy of the working directory as of the last update, to allow the user to make comparisons with local changes without contacting the server. PropertiesOne of Subversion's powerful, unique features is its support for file and directory metadata in the form of properties that allow the user to store arbitrary keyword:value data pairs that can be associated with a particular file or directory. This makes it easy to store whatever file metadata makes sense in your development process. Additionally, Subversion defines several special properties that it can use internally to provide some extra functionality, like keyword expansion or end-of-line interpretation. Hook ScriptsSubversion supports a broad array of hook scripts that are run in response to a variety of SVN actions, such as before or after a commit or property change. These scripts are given access to relevant information about the action that is taking place, as well as the capability to examine the repository. Hook scripts can be a powerful tool for automating tasks or enforcing policies. Although they are supported in one form or another by most version control systems, the Subversion support for hook scripts is much more flexible than that found in many others. CVS, for instance, provides commit scripts with little information about the commit being made, such as the target branch for the commit. Full-featured APISubversion features a very complete API, which developers can use to easily and elegantly create new client interfaces, to create new Subversion servers, or to integrate Subversion into other development tools. In fact, the standard Subversion client tools, as well as the SVN servers, use these same APIs to communicate with each other, the Subversion repository, and a local working copy. Additionally, the interfaces are available with language bindings for a number of different programming languages (such as C, C++, Java, Perl, and Python), which allows interfacing programs to be written in whatever language best suits the problem at hand (and the developer's expertise). |