Section 12.1. Effective Branching and Tagging | Subversion Version Control. Using The Subversion Version Control System in Development Projects

12.1. Effective Branching and Tagging

Branching and tagging inside Subversion is one of its more flexible features, due to the use of simple cheap copies for both actions. Within an organized software development process, though, flexibility is only good to a point, before it becomes a hinderance to people trying to work together. To avoid this chaos, you need to develop guidelines for branching and tagging. If you have simple rules for what branches and tags should be created, when they should be created, and what they should be named, you will find that you have greatly increased the ability for your developers to make use of the project's branches and tags to aid in collaboration on the project.

12.1.1. Branch and Tag Creation and Organization

Before you can effectively make use of branches and tags, you need to decide what circumstances warrant the creation of branches and tags, and how those branches and tags will be organized in the repository. Because creating branches and tags is fast, and essentially uses no space, there is little reason to be stingy with their use. On the other hand, you don't want to waste time creating branches and tags that hold no value for anyone. If the copies just sit around collecting dust and littering your repository hierarchies, the useful branches may become harder to find and use.

If you're coming from a CVS background (or another similar VCS), you will find that tags are much less necessary in Subversion than they were in CVS. For example, CVS users often use tags to preserve the state of their work before a large commit in case the commit was interrupted (which would cause an incomplete commit that could be hard to recover from). With Subversion, though, all commits are atomic, so this is no longer a concern. Similarly, it is a common practice with CVS to create tags before and after feature commits, so that the differences can be easily examined later. This is also no longer necessary, because Subversion's global revision numbers make it easy to compare the state of the entire repository before and after any commit. Subversion also makes the timing of tag creation less important, because you can use the --revision option when you use svn copy to create a new tag, in order to create a tag from a revision other than the current HEAD revision.

There are a number of different things that you might use branches and tags for, and for each there is a different set of issues to consider when deciding your policies for creating and organizing them. Let's take a look at a few of the different branch and tag categories that you might have, and the policies you might use.

Software Version Branches

Often, you will have multiple versions of a project being developed simultaneously. For example, say you have a project that is creating an application called FooMatic. When version 1.0 of FooMatic is released, you want to mark the point in development where that occurred, and then continue developing the main trunk in preparation for FooMatic 2.0. Now, say it's six months after the release of FooMatic 1.0, and you're well on the way to FooMatic 2.0. Despite all of your careful beta testing, though, version 1.0 wasn't perfect, and someone finds a bug. The version 2.0 development group, however, has completely changed that section of the code, and the bug doesn't even apply anymore. You don't want to tell all of your customers that they'll have to wait for the next version to get the bug fixed though. They need it fixed now. Fortunately, when you released 1.0, you created a tag for that version. Now, when bugs are found, you can create a new branch from the 1.0 tag, fix the bug, and release a new (tagged) bug-fix version without interrupting development on version 2.0. When a fix applies to both versions, you can use a merge to copy the changes from one branch to another.

Alternately, you can make a branch every time you release a version of the software to the public that you want to support with fixes in the future and maintain it as a separate line of development. If you release version 1.0, it is likely that you'll want to release minor versions that fix bugs and security issues before the release of version 2.0. So, make version 1.0 a separate branch from development of 2.0 when you make the 1.0 release. Then, create tags of minor releases from that branch as they are developed. If you release minor feature releases (1.0, 1.1, 1.2, and so on), you'll probably want each of those to be a separate branch, too, so that you can release 1.0.1 while you're developing 1.1.0.

Your best choice is to pick a version number level and make a unique branch for each release at that granularity. So, if you pick the major revision numbers, you'd have a branch for 1.0, 2.0, 3.0, and so on. If you pick the first level of minor revisions, you'd have branches named 1.0, 1.1, 1.2, 2.0, and 2.1. The choice of the exact level to branch on is dependent on the way any individual project releases software, but consistency is much more important than one particular choice of branch point. In general, I suggest standardizing on one or two levels of minor revision numbers.

When it comes to organizing these revision branches, there are a couple of ways to approach the process. The first is to see the release branches the same as any other branch. Your current main line of development occurs in /trunk, and whenever a release is made, you create a new branch in /branches/releases/. If you think about it, all of your development is really occurring on a release branch, regardless of whether you call the development on the trunk a release branch. So, instead of having a /trunk, you might want to consider having two top-level directories named /releasedev and /releases, as shown in Figure 12.1. In /releasedev, put a branch for each release of the project that is currently being developed. Then, when a version is actually released, move it into the /releases directory (where the release will be treated as an immutable tag) and create a copy of the new release in /releasedev that will become the next development version.

Figure 12.1. A repository with release branches instead of a trunk.

Quality Assurance Branches and Betas

Many projects have a development team and a quality assurance team. In such a case, it can be helpful to have two branches of developmentone that the developers use for their day-to-day work and another that the quality assurance (QA) team uses for its testing. When the development team finishes a feature or fix (or on a fixed schedule), the changes on the development branch are merged to the QA branch for the testers to inspect.

One approach to this setup is to have two fixed branches. All development occurs on one branch, and is then merged over to the QA branch for testing. Another approach is to have a single QA branch, with multiple development branches. For instance, each developer could be working on her own branch, which would be periodically merged into the QA branch. Then, when a QA tester finds an issue, she can create a new branch with the state of the project where the issue occurred. A developer would then be able to fix the issue on that branch and merge the fix back into the QA branch when it's finished.

To organize QA branches, we can build on the release branches structure suggested in the previous section. Instead of having a single directory for each project release, create two branches, so that you have a structure where you will have something like /releasedev/version_1_0/dev/ and /releasedev/version_1_0/qa/. This gives you a development branch and a quality assurance branch for each version of the software that is being actively developed.

As development on a project advances, you will invariably release beta versions of the software to testers outside of your quality assurance team. These versions are generally created from your quality assurance branches, and will be immutable releases, just like a final version release. One option for organizing beta releases is to store them in /releases/, just as you would a final version release. Or, you can make a distinction by creating another top-level directory named /betas/ where beta releases can be tagged.

Task Branches

Another area where branches can be useful are in task branches. For each individual feature or issue that a developer is going to work on, she creates a task branch. On the task branch, the developer can make small, incremental changes and commits until that particular task is finished. When the task is complete, it can be merged back into the main trunk, or into a QA branch. For example, in Figure 12.2, you can see how task branches are created from the last release of the project, and then merged into a QA branch, which is then moved to create the next bug-fix release of the project.

Figure 12.2. Task branches used for fixing issues in a release.

There are a few policies that you need to decide on when using task branches. Who will create the branches? When will task branches be created? What granularity of task requires a task branch? How will the branches be organized? There is, of course, no universal "right answer" to these questions. Instead, the answers are based on the myriad of intricacies that define your project.

Who will create the branch? The obvious answer is to have the developer who will be working on that particular task create the branch. Whenever a developer starts a new task, he creates a new branch. It's simple, and it works well for projects with a lot of developer freedom. If you have a large project with lots of managerial oversight, though, it might be easier to keep track of the tasks currently in progress if task branches are created by a project manager responsible for assigning tasks. When a task is assigned, the branch is created and handed to a developer to implement. This might also be a good choice for sensitive projects where the main trunk would not be available to every developer for security reasons. When a developer is assigned a task, the project manager would only have to create a branch of the small subsection of the project necessary for implementing the task and give permission for that branch to the appropriate developer(s). On a mature project, you may even have most (or all) of the tasks generated by QA testers who are handling bug reports from users.

When will task branches be created? There are really only two major options here (with small variations). Task branches can either be created when the task is scheduled or when work on it begins. For tasks that are responding to a particular base state of the project (such as bug fixes), it is usually a good idea to create the tasks when the task is scheduled, because there is usually a well-known baseline to work from at that point. Later on, when work on the task actually begins, it's possible that the changes made to the project in the interim may have modified the base project to a point where there is no good baseline to start implementation of the task. On the other hand, for other tasks that are adding new features to an evolving project, it may be better to create the branch when the developer starts working on the task. This way, the branch is up-to-date when it is begun, which will make merging it back into the main trunk a little easier.

What granularity of task requires a branch? This is largely a matter of taste. If you want very fine-grained project organization, you may want individual task branches for each atomic feature enhancement or bug fix. Or, if you prefer a more coarsely managed project, it might make sense to only create branches for major tasks that will require large code changes or additions.

How will the branches be organized? The way you organize task branches has a lot to do with who is creating and using them. If, for instance, each developer maintains her own task branches while working on the project, you might want to give each developer her own directory for storing them. However, if the task branches are generated by a project manager or QA tester, it might be better to have a common directory for task branches. Then, if you want to keep the branches that are being worked on separate, each developer could move the branch from the common directory into her own task branch directory when she begins work on the task.

Sliding Tags

Sometimes, you have a tag that you want to point to a changing target while retaining the same name. As an example, say you create a daily build of your project. In addition to a tag that will always point to that particular build, it might be useful to have a tag called daily_build that always points to today's build. That way, anyone who needs access to the most up-to-date daily build can check out the daily_build tag and can just update to get the latest release.

There are a couple of ways you can approach creating these sorts of tags. One way is to delete the old tag and recreate a new one by the same name whenever you want to move the tag. This has the advantage of being fairly easy to execute, but it requires two steps to perform the change (if anyone updates between the two steps, his directory will be deleted on disk and he'll have to redownload the whole thing). Using delete/re-add also has the disadvantage that it makes an svn log on that directory useless, because it won't show the history of the tag.

An alternative to using copy and delete is to make use of the svn:externals property to create your sliding tag. With the externals property, you can create a directory that holds the tag, and then set svn:externals to point to the correct directory and revision number. Then, when the path or revision number is changed to move the tag, you will have a log record of where the tag has pointed to. The downside to using svn:externals is that the syntax for creating and moving the tag is a little more complex than using copy and delete; but in most cases, I would suggest it as the better alternative.

Merge Tracking with Tags

Subversion doesn't do a particularly good job of tracking mergesyet. In fact, Subversion's poor merge tracking is arguably its weakest point as a version control system. The commonly suggested practice for tracking merges is to use the log files to keep track of which range of revisions were merged, and where they were merged from, in the log message for each committed merge.

Keeping track of merges is important, because subsequent merges need to account for the past history, in order to avoid applying incorrect changes. If you create a branch of /TRunk at revision 50 and then merge changes on the trunk made between 50 and 100, it's important to make sure the next merge applies the changes from 100 to 150, not 50 to 150. Using the log messages to note that you merged 50 to 100 already is a serviceable solution, but it's not the only one. Subversion merges apply the difference between two arbitrary sections of the repository. Using different revisions for a single path is only one way to get those two sections for the merge. Another way is to give two entirely different repository paths. So, if you made a tag of /trunk at the last point where you performed the merge, you could instead use that in the merge, instead of needing to know the revision number.

Merge tags can either be stored in a common location, such as /tags/merges, or they can be stored alongside the specific branch where the merge occurred. You could, for example, create a directory named /branches/proj_branch_1_merges/ where all of the versions involved in a merge into /branches/proj_branch_1/ would be tagged.

Tagging Project Builds

If you have an automated build system that performs nightly (or hourly, or even more frequent), it may be useful for it to automatically generate tags that reference those builds. So, for instance, if it creates a build on June 4th at 3:00 in the morning, it could create a tag named something like /builds/build-060405-0300. Then, when the build system runs your test harness to do regression testing, it can include a reference to the specific build in the Subversion repository as a part of its results output. That way, you have a durable reference to the exact state of the repository at the time of the test run, and can recreate it at a later date if you need to in order to fix any issues that arose during the tests.

Milestone and Release Point Tags

One of the most common uses for tags is to mark important milestones in the code. One example would be tags that mark releases of the project. By creating a tag at every such milestone, you can create an easily accessible record of your project's history that is significantly more useful than simply knowing what the project looked like on a certain date, or at a particular revision number.

The first policy to adopt when deciding on milestone tags is to determine which milestones you will tag. Some people may spend a lot of time thinking about this in order to decide on a detailed policy. Don't. Tags are cheap. They take up almost no space in the repository. You could sit and make tags from full copies of your repository all day long and not make a significant dent in the size of your repository. Therefore, there is no reason to be stingy with them. Release a beta? Create a tag. If you release a daily build, create a tag for each release. Even if you create hourly project builds for in-house development, there is little reason not to keep track of those builds by creating tags.

Milestone tags are best organized by collecting different types of milestones into their own directories. For example, you might have directories named /tags/releases/, /tags/betas/, and /tags/builds/. Or, if you don't want to hide them away in the /tags directory, you can move those directories up to the top level. If you have multiple projects, you might want to put all of those directories in a project directory, or you might want to put just some of them in a project directory. For instance, you could have a toplevel /releases/ directory that stored the releases for every project, and project-specific directories for holding builds and beta releases.

Saved Working Copies Snapshots

At times, it can be very useful to save the current state of a working copy, without committing all of the changes to the current trunk or branch. This can be especially useful if you have a working copy that is made up of several switched directories and you want to save a snapshot of that layout. Because Subversion allows you to copy from a working copy, this is easy to do. All you need to do is take a directory in your current working copy and copy it to a repository URL.

The best place to store a working copy snapshot depends on the purpose of the snapshot. If you make a snapshot for purposes of releasing a project beta, obviously, you would want to store your snapshot along with other tags of beta releasessimilarly for full releases, builds, or any other sort of tag. Rarely will you find the need to make a mixed revision snapshot that doesn't have some sort of other purpose; but if you do, it may be useful to have a special tags directory (such as /tags/snapshots/) for storing them.

12.1.2. Merging Policies

Merging is currently Subversion's biggest weak point. It can be difficult to perform merges correctly, and it is fairly easy to perform an incorrect merge that causes unintended consequences. The best way to avoid problems is to set out clear policies for when to perform merges, who should perform the merges, and how they should be documented.

When to Merge

The best time to merge depends a lot on what is being merged. Merges can be done on a timetable. They can be done whenever changes occur (or whenever relevant changes occur). Generally, you'll find that you want to use a mix of the two. If, for example, you have a build engineer who manages performing a daily build of your project, she might want to standardize on a daily routine of merging from the available development branches in order to create the day's buildespecially if you have multiple independent development branches for different developers that need to be merged and tested every day.

Conversely, if your developers use a more rapid XP-style test-edit-build-test cycle, developers may need to do frequent merges to and from the project's trunk in order to continuously make sure that the rest of the project continues to work with the changes they are making on their branch.

Who Should Merge

In general, it's good policy to allow the same people who should be performing modifications to a trunk or branch directory the ability to perform merges from other branches. Merges should be performed by people who are familiar with the target branch, as well as the source that is being merged in. Because Subversion doesn't have any context for the merges it performs (they're just dumb textual merges), it's important that any merges be thoroughly tested before they are committed. Even if there are no conflicts, merges can easily break working code, and it's important for the person performing the merge to be able to detect and fix any errors that are introduced.

If you have a build engineer or quality assurance tester, it can be useful for him to maintain a QA or build branch, and may be prudent for him to perform merges from other branches into the branch that he maintains. That way, he maintains complete control over what goes into the branch. That also means, though, that he will almost certainly be merging in source code that he didn't write himself. Therefore, if conflicts occur, he may not have the proper background necessary to make a decision on how the conflicted sections should be merged. One solution would be to have the person performing the merge make an educated guess as to the proper resolution, and then test it. This is probably the fastest way to resolve the conflict, but it's also the least reliable (and most likely to introduce subtle errors that don't get caught). Another, possibly better, solution is to have the merger resolve the conflict, but then send a detailed description of what was done to any developers who might have more information about the conflict. Then, they can examine what was done and hopefully catch potential issues. Or, as a third potential solution, the merge could be blocked when a conflict occurs, until the developers who wrote the merged code in the first place can resolve it. This is probably the safest solution, but it is also the most timeconsuming. In the end, there really isn't a universal best solution here. Ideally, you should try to avoid merge conflicts as much as possible by maintaining an organized development process. If you never have two people working on the same section of code, you never have to worry about merge conflicts. If you do have two people working on the same section of code, they should be talking to each other.

Documenting Merges

Some day, Subversion will have built-in merge tracking that allows you to easily sync two directories without worrying about which revisions have already been merged, or which direction the merges have happened in. In the meantime, the best way to maintain good clean merges between branches and the trunk is to keep detailed documentation about exactly what you've done. Whenever a developer performs a merge, she should record in the log message what repository paths were involved in the merge, as well as the revisions of each path that were involved. Additionally, the log messages that record a merge should always follow an agreed upon standard format that includes a keyword, such as Merged, to allow developers to easily use grep to filter the output of svn log for merges that have previously occurred. If you are using tags to mark the merge history of a trunk or branch directory, developers should also make those tags whenever they perform a merge (you might want to use a script to automate this process and ensure that the tags always get created).