Section 14.2. Real-world Studies | Subversion Version Control. Using The Subversion Version Control System in Development Projects

14.2. Real-world Studies

Hypothetical case studies based around common project archetypes have their uses, but it can also be helpful to take a look at some real-world projects, and how they manage many of the issues associated with using Subversion. As with the archetypal studies, in this section I will examine the different choices made by these projects, and how those choices fit in with the topics discussed in previous chapters.

14.2.1. KeyGhost Ltd

KeyGhost, Ltd. is a developer of embedded and PC software and hardware that uses Subversion for storing not only software source code, but also documentation in the form of Open Office files, and hardware designs. It chose to use Subversion based on indications that it is poised to become the next open source version control standard.

Repository

KeyGhost arranges its projects into 29 separate repositories, one for each project, most of which are legacy projects that see little activity. Its more active projects have around 500 revisions, with a total repository size of 1GB. In total, the repositories are used by less than 10 developers who all have rights to commit changes.

Each repository is organized into a top-level directory, named for the project, with TRunk, branches, and tags directories in each project. Under the trunk directory, developers categorize project files into source code, documentation, and hardware designs (using source, docs, and pcb, respectively). Figure 14.5 shows an example of the standard layout for a KeyGhost repository. KeyGhost makes use of tags for storing project releases, and uses branches whenever it has an appropriate need for branching a project's development.

Figure 14.5. The standard KeyGhost repository layout.

The repositories themselves are hosted on a Microsoft Windows 2000 server, using the Berkeley DB database backend. To share the repository, KeyGhost uses the Apache server, largely due to its ease of setup and administration. It also uses secure HTTP over SSL to secure the repository for remote access. The KeyGhost developers access Subversion from Windows 2000 client machines, and use TortoiseSVN as a GUI client.

Migrating to Subversion

The KeyGhost migration to Subversion involved a conceptual change, from using a paradigm where files were locked to limit a single developer to making modifications to a particular file at a time, to the Subversion merge paradigm without locking. Additionally, they had to overcome the hurdle of a user base without previous experience in version control. To overcome this issue, developers were given training in version control, and provided with the TortoiseSVN GUI client to make the learning curve significantly less steep.

Storing Binary Files

In addition to storing text-based source code files, KeyGhost also uses its Subversion repository for storing binary files from Open Office and its circuit board design package. Despite Subversion's use of a binary difference algorithm to store only changes to a binary file, developers found the storage requirements from one version of a file to the next to be hefty. In order to limit unbounded exponential repository growth, KeyGhost has made a policy of limiting commits to its binary files.

Repository Migration

Subversion makes a valiant attempt to make restructuring of a repository a simple process. However, KeyGhost discovered that simple does not mean trouble free, and it can be prudent to put some long-term thought into structure. KeyGhost began with a single Subversion repository, using a single top-level /trunk directory with individual projects in subdirectories under that. After using Subversion for a while, however, KeyGhost decided to migrate to its current structure of multiple repositories, with one project per repository. Because a number of files had been moved or deleted, KeyGhost found that svnadmin export and svndumpfilter were unable to properly migrate all of the projects with their full histories. In the end, KeyGhost was forced to resort to checking out working copies and reimporting those into a new repository (which still caused a loss of history).

14.2.2. Error Free Software

Error Free Software (EFS) develops a proprietary trading system, which it stores in a Subversion repository. EFS chose Subversion after examining a number of different version control systems, and settled on Subversion due to its snug fit with the EFS environment. The developers found it to have a full feature set, without any undue complexity.

Repository

The EFS repository is arranged with a number of different top-level directories with a variety of purposes, as you can see in Figure 14.6.

Figure 14.6. The Error Free Software repository layout.

/branches This directory stores project branches. EFS doesn't make very much use of this directory, and as of this writing only had a total of six branches.

/dailyLibraryBuild Daily builds of the repository are stored here. Each daily build is placed in its own directory. The directories are named for the date of the build, using two-digit year, month, and day numbers (YYMMDD).

/releases Project releases are stored here.

/doc This directory is used to store documentation for individual projects.

/projects The EFS trading system consists of a large number of application suites and libraries. This directory is used to store individual application suites, which are linked to the various libraries using svn:externals properties.

/src The actual source code for the repository (which is linked via svn:externals in /projects) is stored in this directory. This acts as EFS's /trunk directory.

/spd EFS stores its design documents for its software in this directory.

The repository itself is very large, totaling more than 35,000 revisions in a 2GB database. Much of the repository, however, was preexisting when EFS migrated to Subversion, and was carried over from SCCS. The repository is hosted on a machine running RedHat Linux, and uses Berkeley DB as its repository database backend.

The repository is also accessed by about 30 people every day, most of whom perform regular commits. The developers access the repository from a mix of machines running Sun's Solaris and machines running Microsoft's Windows XP. Remote access to the repository is done through Apache, which was chosen due to its ease of integration into the existing authentication infrastructure, previous familiarity with Apache, and general all-round good looks. It also made it easy to make the repository accessible from a Web browser.

14.2.3. Teledata Communications

Teledata Communications, Inc. (TCI) uses Subversion to store all of its source code, documentation, and build projects, as well as information from data providers, and development documentation. TCI began testing Subversion fairly early on in its development, at around version 0.24, and have been using it in a production setting since July of 2003, after giving it a thorough run through all of its paces.

One of the major reasons for switching to Subversion from TCI's previous (commercial) version control system was to save costs on per-seat developer licensing. As the company was growing in size, it came to the conclusion that its previous VCS solution wasn't worth the cost. So instead of shelling out more money to license new developers, TCI decided to make the jump to Subversion instead. Even though their developers had experience with the previous system, as did most of their new hires, the benefits of moving to Subversion outweighed the costs of training.

The other reason for TCI's switch to Subversion is best illustrated by Mark Bohlman, the Software Development Manager at TCI.

As we were utilizing a commercial version of RCS, all the developers had developed habits of locking/unlocking and utilizing e-mail or instant messaging to indicate that they needed a particular code file in order to make changes. Within two weeks of my arrival at TCI, it became apparent that a fairly large stumbling block existed within the team when changes were needed. We have development staff broken up into three distinct groups, backend Java coders, mid-level JSP coders, and frontend HTML developers. As the boundaries between these groups are fairly loose at any given time, one developer would need to access code from another level. At the time, we also were dispersed in a location where the physical separation between the teams made it difficult to communicate without the use of e-mail and/or IM. A number of times in the first weeks of my employ here, it became clear that deadlines would be missed solely because of the inefficient usage of the tools, and the way that locking was implemented.

The straw that broke the camel's back for me was trying to get a build release for a client only to find that a developer in the Java group had checked code back in to allow a frontend developer to make a change to a logo, and failed to complete needed changes. This ended up costing us with the customer, which we managed to hold onto, but at the expense of nearly a week's worth of client testing, QA testing, and development time (saying nothing of the ill-will it generated).

Repository

Teledata Communications' data is split into three separate repositories, each of which holds a different type of data.

The developer repository holds all of TCI's source code, as well as its third-party vendor tools and all associated documentation. It is used by TCI's developers, and is laid out according to the standard /trunk, /branches, and /tags scheme. Clocking in at a little over 2GB in size, this repository has over 11,000 revisions and uses Berkeley DB as its storage backend. All of TCI's developers, QA testers, and systems engineers access this repository daily, and have both read and commit access.
The next repository holds all of the information that TCI's data providers supply them with, such as credit bureaus, valuation providers, and criminal background data sources. This data is continually undergoing modification by the vendors, and the number of vendors themselves is also increasing regularly. It is important for TCI to keep track of these changes in order to maintain all of the versions of its software that is being used by clients. This repository is laid out with top-level directories for each data provider, and no branching or tagging is used. Read access to this repository is provided to the entire company, but commit access is limited to just a few developers and a maintainer.
The final repository is a client requirements repository that stores customer requirements, use cases, scenarios, and design thoughts for each project and client. Documents are arranged in the repository by application, with each application in a top-level directory. Below that, documents are arranged by client, with a directory for each of an application's clients under the directory for that application. Access to this repository is also provided to all in the company for checkout purposes, but commits are limited to systems engineers and managers.

Branches and Tags

The TCI developer repository uses tags in its automated build process. The Java applications that developers build run under WebLogic and have an Ant-based build process that involves creating a tag for each build provided to a development test, QA test, or production environment. To ensure consistency between the three builds, they are all done at the same time. Custom properties are used to indicate the configuration files that should be used for determining build environments.

TCI also makes use of branches for a variety of uses. Changes in branches are periodically merged back into the trunk, as appropriate.

New development lines for a code base.
Custom changes for an individual client.
Experimental development.

Branches are also sometimes used for bug fixes. Whether to do bug fixes in a branch or in the trunk is dependent on the development state of an application (i.e., QA, beta, or production).

Hook Scripts

Hook scripts are used for

Repository access control. This allows them to prevent commit access for unauthorized persons.
Sending commit notification e-mails for the client documentation repository. This ensures that all involved parties (sales, sales engineering, development, QA, MIS, and support) are informed of any changes to client requirements.

14.2.4. GladeSoft

This case study looks at GladeSoft, Inc. GladeSoft is the smallest company among the various case studies (it has three developers accessing the repository). GladeSoft migrated to Subversion under familiar circumstances, after finding CVS too painful to continue using. Within the company, Subversion repositories are used to store source code, corporate data, and the GladeSoft Web site.

GladeSoft's choice to use Subversion came down to a variety of different requirements it had for a version control system.

The ease of migrating from an existing CVS repository, while preserving the repository history.
Subversion's clean and useable design.
The ability to use HTTP and SSL for authenticated communication with the repository, without needing shell accounts.
The transactional atomic commits, which GladeSoft found to be especially important in the storm-prone area of Florida where the company is located (frequent power loss).

Repository

GladeSoft uses three separate repositories for storing information.

A primary repository holds the source code for the product.
The second holds GladeSoft's corporate data.
The final repository holds the GladeSoft Web site.

The source code repository is arranged with a standard /trunk, /branches, and /tags. Inside /branches are several subdirectories that allow them to categorize their branches. Tags are created to mark feature freezes and release points in the source code, and branches are used mainly for making customer-specific changes. The repository is small, holding less than 20MB of data, in over 1500 revisions, and is accessed by three developers who commit changes.

The other two repositories don't make use of branching or tagging, and simply have their main file tree at the top level. These repositories are even smaller than the source code repository, clocking in at around 5MB, with even fewer revisionsGladesoft doesn't like doing paperwork.

All three repositories are served via Apache, from an old 200MHz PowerPC running Gentoo Linux. HTTP was chosen for its capability to work without local shell accounts (for individual users), which provides extra security. It is also used for its source browsing capability, which makes it easy for GladeSoft to quickly check a source file or do an informal code review. Client connections are made from a menagerie of operating systems, including Windows, Linux, various BSDs, and OS X.

Hook Scripts

GladeSoft also uses two hook scripts for its source repository.

The first sends out commit notification e-mails.
The second automatically runs the build server against the latest source using a variety of compliers and targets.

14.2.5. ExCo

In this case study, we will look at a company that uses Subversion to store its complete source code base, as well as its build tools. The company declined to have its real name mentioned, so to protect the innocent I will call it "ExCo" instead.

ExCo began using Subversion after migrating from its CVS repository in 2003, which it had previously migrated to from Microsoft Visual SourceSafe in 2000. The migration occurred because CVS was not meeting ExCo's needs (although it was still better than VSS). Subversion allowed ExCo to maintain a similar development paradigm (thus less training) while making almost everything easier to perform.

Some of the other reasons for the migration include

The branching and tagging paradigms, which were easier to learn, especially for developers who didn't want to invest much time in training
The atomic commits, which eliminated the need for complicated commit processes
The possibility of allowing individual developers to perform branching and merging, which allowed for a "task branch" model of development
Easy migration of existing tools to work with Subversion instead of CVS
The well-known names such as Karl Fogel and Collab.net that were associated with Subversion

Repository

ExCo's repositories are set up with a fairly standard arrangement. The top-level directories are made up of /trunk, /branches, and /tags, as well as a directory named /devbranches, where individual developers can create their own task branches (see Figure 14.7).

Figure 14.7. The ExCo repository layout.

ExCo has three repositories, which hold 8,000, 9,600, and 2,400 files, respectively, and are used to store different sets of projects. Access to the repositories are through an Apache server, due to its stability and security (via SSL). Because ExCo has developers overseas, the security was an important feature. Commits to the repository are allowed for all developers who have access, which comprises approximately 20 to 28 developers. The server is hosted on a Solaris machine, with users connecting from Windows 2000 clients.

Branches and Tags

Branches and tags are heavily used inside the ExCo Subversion repositories.

Tags are created for each successful build performed by their automated build system.
Branches are created for each individual task, which is then merged into its final destination when the task is complete.
Production ready builds are branched in order to stabilize them before they are deployed.

Because of ExCo's heavy use of branches, it has found it necessary to deal with merge tracking (which Subversion lacks in any real form). Instead of having an overall merge tracking plan, however, ExCo relies on individual developers to track their own changes and merges. To date, this has worked well and not caused any real problems.

The People Problem

One of the issues noted by ExCo as a problem to be dealt with was not technical at all. It is the problem of getting developers to integrate their work process with a version control system. Ron Bieber (of "ExCo") explained this issue.

The biggest problem we've had in whole [has] been people problems. Your average developer doesn't care about source control and doesn't see why they have to learn it. If you look at most companies (at least for people I've either talked to or interviewed for positions), they use very primitive methods to keep track of source control (including just storing things on a network drive), or they know just enough to check out and do not know anything about branching. You usually have to have a completely full-time employee just to manage concurrent development. It was a challenge to get people to learn [Subversion], but once they did, the process became self-sufficient without the extra head count.

14.2.6. Wye Corp

In this case study, we'll look at a company that does embedded development, and uses Subversion to store its firmware and hardware specifications, as well as source for device drivers and testing applications. The company in question declined to be identified by name for this book, so I'll refer to it as "Wye Corp."

Wye Corp switched to Subversion after hitting one too many walls while dealing with CVS's limitations. Many of its projects had started out for internal use only, but as time passed, its customers started using their tools, which inevitably led to requests from the customers to add features and expand the projects. Attempts to expand the projects, however, quickly hit a wall with CVS, as developers attempted to restructure file and directory layouts and found it impossible without breaking CVS's file history.

Repository

Instead of setting up a single repository, Wye Corp uses a different repository for each project, 16 in all. The repositories range in size up to 400MB, but have relatively low revision counts (under 500). Some of the repositories are as old as two years. Overall, the company has 14 developers accessing its various repositories, and limits access to only those people who have a need.

Repository layout for Wye Corp is a fairly standard /trunk, /branches, and /tags setup, with the addition of a dedicated /releases directory. The releases directory allows Wye Corp to separate internal tags used for marking milestones (such as points where support for a special feature was incorporated) from releases that were delivered to a customer, which is something Wye Corp needs to keep careful track of. Inside /releases, there is also a directory called info, which contains a file releases.txt. Wye Corp uses the releases.txt file to keep track of which customer received each released version of its software. Figure 14.8 shows how this layout is arranged.

Figure 14.8. The Wye Corp repository layout.

The repository itself is hosted on a Dell server with 400GB of RAID5 storage space and 1GB of memory, running RedHat Enterprise Linux. Remote access to the repository is served via Apache. Originally, HTTP was chosen because it was the only network capable server available (Wye Corp was a very early adopter of Subversion, starting at version 0.17). Later on, however, it began to rely on the convenience of browsing the repository over the Web, as well as the ability to authenticate against its LDAP servers.

Branches and Tags

Wye Corp has a few different uses for branches and tags.

Tags are used for the occasional internal milestone that bears remembering.
Releases to customers are also copied and placed in the /releases directory.
Branches are used for significant changes to the code base.
Branches are also used for trying out a potential solution to a problem, without breaking the main trunk for everyone else.

For the most part, Wye Corp has found little reason to do many merges from one branch to another, because most of their development occurs on the main trunk, which it uses to directly create release tags. Occasionally, however, circumstances have required merges to a release that used only certain revisions. In those cases, Wye Corp has used a separate file in the repository to keep track of all merges.

Hook Scripts

Wye Corp has several hook scripts in place to enforce policy and provide automatic notifications.

A script is used that updates the internal developer Web site with information when commits are made.
Another notification script is used for sending e-mails whenever a commit occurs.
Wye Corp prevents any changes to /tags or /releases except to create a new tag or release.
A script checks to make sure that no developer makes a commit with an empty log message.

14.2.7. ZedCom

For our final case study, let's look at one more company that decided not to have its name used. We'll call it "ZedCom." ZedCom uses its Subversion repository to store firmware for multimedia embedded systems. Like many of the previous case studies, ZedCom migrated to Subversion from CVS, due to the limitations that I've already exhaustively discussed.

Repository

The ZedCom repository is arranged with four top-level directories: /trunk, /users, /branches, and /tags. The familiar directories have the functions you would normally expect, although I should note that the /trunk directory contains subdirectories for individual projects. Additionally, the /users directory contains subdirectories for each Subversion user (named after his username), where individual users can create their own private branches.

The repository has relatively few users, with 10 who can access the repository and six who perform commits. It contains about 600MB of data in 1,800 revisions. Access is performed via svnserve, due to its ease of setup (ZedCom has no need for secure authentication).

Branching is limited to user branching, and although ZedCom has a /branches directory, it has never actually used it. Merge tracking of the branches is done manually via commit logs, which works, but serves as a source of irritation for the developers.