Section 14.1. Archetypal Studies | Subversion Version Control. Using The Subversion Version Control System in Development Projects

14.1. Archetypal Studies

In this section, we'll look at several development project archetypes. For each one, I will describe the sort of development philosophy that such a project is likely to have, as well as key points of process and policy that define the archetype. Then, I'll examine the choices that such a project will have to make when integrating with Subversion, such as laying out the repository and using properties, hook scripts, and other Subversion features. I'll also look at the limitations of Subversion that you may run into with each archetype, and how you can work around them.

14.1.1. Managed Chaos

The managed chaos project is one with little managerial oversight over the day-to-day details of development. Instead, managerial duties are limited to integration and high-level design. In fact, in many such projects, the project manager is really more of a project maintainer, and may even be officially titled that way. You are most likely to find this sort of a project in an academic or open source project setting.

In this case study, we will look at a hypothetical managed chaos project and open source program called BogeyTalk, which allows text on your computer to be read to you in the style of Humphrey Bogart. BogeyTalk is a mature project, with dozens of contributors, as well as a small team of five maintainers, handpicked from the project's major developers by the creator of the project, who we'll call Bob. Significant project releases are frequent (a couple of times a year), and the project follows the Linux Kernel version numbering scheme (even minor numbers for releases, and odd minor numbers for development versions).

Repository Layout

Because BogeyTalk is the only project hosted in its Subversion repository, the entire project resides at the top level of the repository. The main development branch of the project resides in the /TRunk directory. Additionally, there are /stable, /development, and /maint_branches directories, as you can see in Figure 14.1. The /maint_branches directory is used to store maintenance branches (as I will explain shortly), whereas the /stable and /development directories are tag directories that store snapshots of various releases of the project.

Figure 14.1. The BogeyTalk repository layout.

Branches and Tags

Because most BogeyTalk developers do not have access to make direct commits to the repository, task branches make little sense. Therefore, BogeyTalk uses branches and tags primarily to signify different versions and releases. For each release of the project, a tag of the project is created in either /stable or /development, depending on whether the release was a stable release or a development release (the different directories help ensure a logical separation between the two release categories). Both of these directories are used as immutable tag directories (i.e., copies are created here from elsewhere and nothing is changed once created), but the repository is not set up to enforce that policy. Instead, Bob made the decision to trust the maintainers and make it easy for them to make a correction if there is an error when creating a release tag. Because there are only a few maintainers, this works out well.

Most actual development on BogeyTalk occurs in the /TRunk directory, but occasionally there needs to be further development on an older version of the program in order to patch security holes or maintain compatibility with other projects. To support these security patches, the BogeyTalk project maintains maintenance branches for each stable version of the project (1.0, 1.2, 1.4, and so on) in the /maint_branches directory. These branches are created at the time of the stable version's release, and are identical to that version's tag in the /stable directory at the time of the release. When security or compatibility patches need to be made, they are committed to these branches, which are then tagged to create subminor version releases (1.0.1, 1.0.2, and so on).

Properties

The BogeyTalk makes use of the svn:keywords property to embed repository information in the comment header of each file, using the $Id$ tag. Because most BogeyTalk users get their copy of the project's source code from a release package, and not directly from the repository, this makes sure that the information necessary for finding a particular file in the repository will always be available.

BogeyTalk also uses several custom properties for storing additional meta-information relevant to the project.

Each release tag stores packaged tarballs and RPM files in pkg:tgz and pkg:rpm properties, respectively. This allows a script to automatically maintain the downloads directories on the project's Web site, by scanning the releases directories for new packages. If it discovers a release that does not have associated packages (or has out-of-date packages), it sends an e-mail warning to Bob to make sure the problem is corrected. Bob could have opted to use a directory in the repository to store package files, but he decided to use the properties to maintain a logical connection between packages and the source they're associated with. Because the package files are created from the tag itself, and then attached to the tag after it is created, the possibility that a tag will have an out-of-date package associated with it is minimal.
Most of the BogeyTalk project is licensed under the GNU General Public License (GPL), but a few key sections are licensed under the GNU Lesser General Public License (LGPL) to allow external programs to link to BogeyTalk without requiring them to follow the GPL's restrictions. To keep clear which files fall under which license, each file maintains a license property that states the license for that file. For GPL licensed files, it has a value of GPL, and for LGPL licensed files it has a value of LGPL. The project's documentation is also licensed under the GNU Free Documentation License (FDL), and those files have a license property of FDL.
Because most of the developers who contribute to the BogeyTalk project do not have direct write access to the repository, the repository commit logs do not reflect that actual author of a change. One way around this is to always note the author's name in the log files, but for this project Bob wanted to go one step further. He wanted his Web-based Subversion blame tool to show the actual author of specific bits of code, rather than just the developer responsible for the commit. So, in addition to noting who the author of a committed revision is, the BogeyTalk project also makes use of a developer revision property, where the real name of the contributing developer is stored.

Scripts

BogeyTalk uses a custom script that automatically maintains the project's download directories, which contain packaged versions of the project in source and binary form. Each night, the script is run as a cron job on the project server.

The script first iterates through each of the immediate subdirectories in /stable, of which there is one for each release.
For each stable release, the script checks the pkg:tgz and pkg:rpm properties to see if they contain an up-to-date package.
1. If either property is empty, or contains a package that is older than the latest revision of that release, the script sends an e-mail to the project maintainer.
2. Otherwise, the script checks to see if the package already exists on the project's Web server. If it does not, the package is copied to the server.
The script then repeats steps 1 and 2 for all of the development releases in the directory /development.
Finally, the script creates a tarball (.tgz) of the /TRunk directory, names it with the current date, and places it on the Web server as a nightly snapshot of the development.

BogeyTalk also has a custom script for creating its contributer annotated blame output for the BogeyTalk developer's Web site. The BogeyTalk blame script takes the raw output from svn blame and compares each entry with the developer property for the appropriate revision (to improve performance, it caches values it has already discovered). It then replaces the author label in the blame output to match the value of developer. To ensure that this script is able to run at a reasonable speed, due to the large amount of svn propget commands that it must call, the script is run from the same machine as the repository itself.

14.1.2. Rapid Development

The rapid development project is aimed at getting rapid functional output, without a long upfront development cycle. The project is often subject to frequent requirements changes, and developers need to be able to react quickly to shifts. Because development cycles are short, development needs to perform frequent integration, and Subversion plays a big role as a supporting framework that helps support changes.

For this case study, we'll look at a hypothetical Web database application, being developed by the software development consulting firm, Programmers, LLC. The client on the project, Internet Sales, Inc., wants to put the application into use on its internationally known online sales Web site, but the exact requirements are fluctuating rapidly due to changing market needs. Because the application is a custom development job, there is no intention to market it as a prepackaged product, but Programmers, LLC will likely be contracted in the future to support the software for Internet Sales, Inc.

Repository Layout

The repository for Programmers, LLC holds all of its ongoing projects, not just the database application for Internet Sales, Inc. Therefore, the top level of the repository consists of subdirectories for each project, which in this case is referred to as ISDB (Internet Sales DataBase). The developers at Programmers, LLC are no fans of extra work, though, and many of their projects tend to have overlapping functionalities. Therefore, they have also developed a number of in-house projects that contain libraries used by their contract projects, which they also store in top-level directories. In the case of ISDB, there is one Web database project, stored in /webdb, that is used.

Inside the /isdb project directory, the project is split into a main /isdb/trunk directory, an /isdb/dailies directory, and an /isdb/releases directory (see Figure 14.2). The trunk directory is where the main project development occurs. They store daily project build tags in the dailies directory, and versions of the project released to the client in releases.

Figure 14.2. The Programmers, LLC repository layout.

Branches and Tags

Because the ISDB project is on a rapid development schedule, the project's developers are using continuous integration of their work. That puts all of their development work on the main trunk, and alleviates the need to use branches for separating work. There is also no need to use branches for supporting multiple versions of the software, because there will only be a single client that they need to support. If the project were to be "branched" for development for a different client, the developers would instead make a copy of the /isdb top-level project to create a new project for their new client.

The Programmers, LLC developers do make frequent tags of the ISDB project trunk, though. Each day, they make a snapshot of the TRunk directory in the dailies directory to store the state of the project at the end of that day's development. Additionally, they make tags of the trunk directory whenever they release a version of the software to the client (either for testing purposes, or as a version to be used in production), and place them in the releases directory.

Properties

Programmers, LLC makes use of the svn:externals property to link their in-house libraries to the projects that use them. In the case of ISDB, it makes use of their custom Web database library, located in /webdb. To link that to the ISDB project, /isdb/trunk directory has the svn:externals property set to

 libs/webdb        https://svn.programmers.com/repos/webdb/trunk

Programmers, LLC also makes heavy use of properties to store project information at the top level of each project. For each project, the top-level directory for that project stores

The name of the client for the project, in proj:client
Client contact information, in proj:contact
Scheduling information for the project, in proj:schedule
Project budget information, in proj:budget

As a part of the ISDB project's rapid development cycle, the project uses automated regression tests to ensure that no developer's contribution breaks other parts of the project. To facilitate these tests being run automatically, the top-level project directory stores two properties, tests:daily and tests:hourly. These properties contain a list of the tests that should be run on a daily basis and the tests that should be run on an hourly basis (respectively). Additionally, each source file has a property named tests:commit, which lists tests that should be run whenever changes to that file are committed to the repository.

Scripts

The ISDB project uses a number of scripts to perform automatic testing on the project, in order to help facilitate Programmers, LLC's rapid development and integration schedule.

A script runs each day to generate a snapshot of the daily build in the directory, /isdb/dailies. This script also builds the daily snapshot and runs the tests specified in the project's tests:daily property. Any errors are reported to the project Web server (which shows build statistics) and e-mailed to the appropriate developers.
Another script performs hourly builds and runs the tests contained in the project's tests:hourly property.
Finally, a post-commit script runs the appropriate tests:commit tests whenever a commit is made.

14.1.3. Central Planning

A centrally planned project has a high degree of rigidity in design and process. Project design is likely done up-front (or at least in large iterative cycles), and although individual developers may not be micromanaged by the project manager, they generally are required to follow rigid policies. This sort of development project is often necessary for managing very large projects, or complex projects that require a large amount of high- and low-level project design.

To examine this project archetype, we'll look at the hypothetical government contractor, GovCon. GovCon develops a wide variety of different projects for various government agencies. Many of the projects are quite large, and all must conform to detailed and exacting government specifications. In order to maintain the level of control necessary to meet these specs, GovCon maintains a strict project hierarchy of project management, with clearly defined task descriptions for each member of a project team. Each project team (there may be more than one per project) is further split into two sides: developers and quality assurance testers. Communication between the two sides of the team is important, and Subversion is used as a key tool in facilitating that communication.

Repository Layout

GovCon uses separate Subversion repositories for each project, so individual projects exist at the top level of their respective repository. The project then consists of four subdirectories: /qa_builds, /dev_builds, /tasks, and /releases (as shown in Figure 14.3). The /qa_builds and /dev_builds projects are used for compiling and storing integrated project builds, whereas the /tasks directory is used for individual developer work on specific tasks. Released versions of the projects are stored in the /releases directory.

Figure 14.3. The GovCon repository layout.

Branches and Tags

GovCon makes heavy use of branches in its project development to separate tasks in order to allow each individual developer's work to be thoroughly tested by a QA tester before integration into the rest of the project. For each development task that needs to be completed, a project manager creates a branch of the project in the /tasks directory, from that day's development build (located in the dev_builds directory) or from the daily QA build (in /qa_builds). Then, after the task has been completed, it is marked for QA testing. Completed and tested tasks are then integrated into a QA build (in the /qa_builds directory) where the integrated build is tested before using it to create a new development build.

Properties

The GovCon projects use properties as a tool for facilitating communication between QA testers and developers. The top-level directory for each project branch contains a status property (qa:status), which indicates whether a branch has been tested yet. When a task is created, its status is set to untested/inprogress, and the qa:tester property is unset. As soon as a developer feels that the current task is ready for testing, that developer changes the value of qa:status to untested/ready. QA testers can then go through and test all of the tasks with a status that is marked as ready. If the tester is happy with the results, the qa:status property is set to tested/passed. If the task is not satisfactory, qa status is set to tested/failed. When a task is tested, the QA tester's username also is entered into the qa:tester property.

Scripts

All of the GovCon projects use pre-commit hook scripts to ensure that a variety of project policies are being followed correctly.

Log messages are checked to ensure that they match the proper log format.
Committed source code is run through a style checker, to ensure that it matches the prescribed coding styles for the project.
QA properties are checked to make sure they have valid values.

Additionally, a post-commit hook script checks the QA property values for changes, and sends e-mails to the appropriate people to inform them of the current status for tasks.

14.1.4. Small Teams

In small-team projects, there are very few developers working on a projectgenerally 10 or less. Development process tends to be relaxed, because there are few enough people to still keep it manageable, and individual development style has a much greater influence on project policy.

For this case study, we'll look at an imaginary startup company named SmallCo. SmallCo was started by five friends who graduated from college together, and has since added three new developers. Of the five original founders, though, only three are developers themselves (the fourth is a business guy and the other is in marketing). That leaves a current full development team of six people for SmallCo, all of whom are working on developing the company's ground-breaking new Internet product.

Repository Layout

SmallCo currently has only a single product, which is stored in its own repository. With so few developers, SmallCo hasn't seen much of a need to be particularly creative with its repository layout, either. Following standard convention, the top level of the SmallCo repository is laid out with /branches, /tags, and /TRunk directories, as you can see in Figure 14.4.

Figure 14.4. The SmallCo repository layout.

Branches and Tags

SmallCo does all of its project development in the main /TRunk directory, with each developer committing changes as they are made and tested. Tags of the project are made at both development milestones (i.e., beta or alpha releases) and at releases. Branches are only used when problems need to be fixed with a previously released version, in which case the tag for that release is copied into the branches directory, where it can be modified as necessary. When the fix is finished, the branch is moved back to the tags directory as the new release. For instance, if a bug is found in version 1.1 of the software, the developers would copy /tags/version_1.1 to /branches/version_1.1.1 and make the necessary changes. When the new bug-fix release is ready, it is moved back to /tags using svn mv.

Properties

Because of SmallCo's size, its developers tend to handle project management and communication more informally than a larger operation might. As a result, it hasn't yet found a large need for custom properties to store or communicate project metadata, and to date has not instituted any such properties. The only properties that are used in the repository are the predefined Subversion properties, which the developers use as the situation warrants.

Scripts

As with properties, SmallCo has not found a pressing need to automate any of its practices or policy enforcement with custom scripts. It is small enough that policy enforcement is more easily done offline (if someone messes up, the developers fix it), and the development process is handled too informally to benefit from automation.