Hatteras: A Look Into the Future of SCC at Microsoft | The Build Master: Microsofts Software Configuration Management Best Practices

Hatteras is an enterprise-class Software Configuration Management (SCM) product. The codename Hatteras comes from a lighthouse on the shores of North Carolina where the product is being developed. The final name of the product is Team Foundation, and it includes more than just the source control functionality. The Hatteras piece is referred to as Team Foundation Source Control (TFSC). The other pieces of the Team Foundation product are touched on in Chapter 18. I wanted to include this tool in this chapter as I just briefly talk about the upcoming VSTS tools in Chapter 18 but wanted to go into more details on TFSC. Another reason for me to include this section is that there are some important definitions that need to be added to our build dialect, such as all of the branching definitions.

This tool has been completely designed and developed from scratch; in other words, this is not a new generation of Microsoft's infamous VSS. It provides standard source code version control functionality that scales across thousands of developers, such as Microsoft's own development teams. As part of the Visual Studio (VS) 2005 release, Hatteras provides integration with the Visual Studio IDE and with other enterprise tools such as the Visual Studio work item (bug) tracking tool. Hatteras also provides a standalone GUI, a command-line interface, and a Web-based interface.

Let's define some new terms as they relate to TFSC:

Repository The data store containing all files and folders in the TFSC database.
Mapping An association of a repository path with a local working folder on the client computer.
Working folder A directory on the client computer containing a local copy of some subset of the files and folders in a repository.
Workspace A definition of an individual user's copy of the files from the repository. The workspace contains a reference to the repository and a series of mappings that associate a repository path with a working folder on the user's computer.
Change set A set of modifications to one or more files/folders that is atomically applied to the repository at check-in.
Shelve The operation of archiving all modifications in the current change set and replacing those files with original copies. The shelved files can be retrieved at a later time for development to be continued. This is my favorite feature.

Some of the features in TFSC are fairly standard among SCC tools:

Workspace creation
Workspace synchronization
File checkout
Overlapping checkout by multiple users of the same file
Atomic change-set check-in
File diffs
Automated merge
Code-line branching
File-set labeling
User management and security

What really sets TFSC apart from the competition is its powerful merging and branching features. I don't try to explain the entire product here, but just touch on why I think these two features are so cool.

Merging Functionality in TFSC

The merging functionality in TFSC is centered on the following typical development scenarios:

Scenario 1: The catch-up merge The user wants to merge all changes from a source branch that have not yet been migrated to the target branch. The source and target can be a subtree or an individual file/folder.
Scenario 2: The catch-up no-merge The user wants to discard nonmerged changes in the source branch from the set of candidate changes for future merges between the specified source and target.
Scenario 3: The cherry-pick merge The user wants to merge individual change sets from the source branch to the target branch. Changes introduced to those files prior to the specified change set should not be migrated.
- The user can specify the change sets to merge with a change set number.
- The user can specify individual file revisions to merge between the source and target.
Scenario 4: The cherry-pick no-merge The user wants to discard a single change set from the list of all possible changes to merge between the source and target so that this change set never appears in the list of candidates for a cherry pick merge.
Scenario 5: Merge history query The user wants to know whether the specified change set has been merged into the target branch. If it has, the user wants to know what change set the merge was committed in. The user also wants to know if part of the change set has been merged, but not all.
Scenario 6: Merge candidate query The user wants to obtain a list of change sets that have been committed to a source branch but have not yet been migrated to the target branch. From this list, the user selects change sets to migrate with a cherry pick merge.

How TFSC Addresses the Scenarios

TFSC merging is designed to provide users with an extremely powerful and flexible tool for managing the contents of branches. Merges can be made into a single file or into a tree of related files. Merges can also migrate the entire change history of the specified source files or an individual change set or revision that might contain a specific fix or feature that should be migrated without moving other changes from the source in the process. Merging the entire change history prior to a given point in time is known as a catch-up merge (Scenarios 1 and 2), whereas selecting individual change sets or revisions to merge is known as a cherry-pick merge (Scenarios 3 and 4). The merge command also allows users to query for merge history and merge candidates and perform the actual merge operation.

TFSC presents merge history and candidate merges as a list of change sets that have or can be migrated between a source and a target branch. Merges can be made to a subset of files in a change set, creating a situation in which a partial change set has been merged. In this case, TFSC represents the partial state of the merge and allows the user to finish merging the change set later.

Merges are pending changes in TFSC. The user can choose to perform several merge operations within a workspace without committing changes following each merge. All these merges can be staged in the user's workspace and committed with a single check-in as a single change set. In addition, the pending merge operation can be combined with the checkout and rename commands to interject additional changes to the files that will be committed with the merge.

Hopefully you followed this summary and are still with me. Now let's go into how branching works in TFSC.

Branching in TFSC

Branching is the SCM operation of creating an independent line of development for one or more files. In a sense, branching a file results in two identical copies of the original file that can be modified as desired. Changes in the old line are not, by default, reflected in the new line and vice versa. Explicit operations can be performed to merge changes from one branch into another.

There are many different reasons for branching and many different techniques to accomplish it. In the most common scenarios, branching is reasonably simple, but branching can become complicated. A complex system with lots of branched files can be hard to visualize. I recommend mapping this with a visual product (such as Visio) so that the picture is clear.

Following are a handful of scenarios in which branching is interesting. Any SCM team should adopt these definitions.

Release Branching

We've been working on a Version 1 release for a year now, and it is time to begin work on Version 2. We need to finish coding Version 1 fixing bugs, running tests, and so on but many of the developers are finished with their Version 1 work (other than occasional interruption for bug fixes) and want to start designing and implementing features for Version 2. To enable this, we want to create a branch off the Version 1 tree for the Version 2 work. Over time, we want to migrate all the bug fixes we make in the process of releasing Version 1 into the Version 2 code base. Furthermore, we occasionally find a Version 1 bug that happens to be fixed already in Version 2. We want to migrate the fix from the Version 2 tree into the Version 1 tree.

Promotion Modeling

Promotion modeling is equivalent to release branching, where each phase is a release. It is a development methodology in which source files go through stages. Source files might start in the development phase, be promoted to the test phase, and then go through integration testing, release candidate, and release. This phasing serves a couple of purposes. It allows parallel work in different phases, and it clearly identifies the status of all the sources. Separate branches are sometimes used for each phase of the development process.

Developer Isolation

A developer (or a group) needs to work on a new feature that will be destabilizing and take a long time to implement. In the meantime, the developer needs to be able to version his changes (check in intermediate progress, and so on). To accomplish this, he branches the code that he intends to work on and does all his work independently. Periodically, he can merge changes from the main branch to make sure that his changes don't get too far out of sync with the work of other developers. When he is done, he can merge his changes back into the main branch.

Developer isolation also applies when semi-independent teams collaborate on a product. Each team wants to work with the latest version of its own source but wants to use an approved version of source from other teams. The teams can accomplish this in two ways. In the first way, the subscribing team "pulls" the snapshot that it wants into its configuration, and in the second way, the publishing team publishes the "approved" version for all the client teams to pick up automatically.

Label Branching

We label important points in time, such as every build that we produce. A partner team picks up and uses our published builds on a periodic basis, perhaps monthly. A couple of weeks after picking up a build, the team discovers a blocking bug. It needs a fix quickly but can't afford the time to go through the approval process of picking up an entirely new build. The team needs the build it picked up before plus one fix. To do this, we create a branch of the source tree that contains all the appropriate file versions that are labeled with the selected build number. We can fix the bug in that branch directly and migrate the changes into the "main" branch, or we can migrate the existing fix (if it had been done) from the "main" branch into the new partner build branch.

Component Branching

We have a component that performs a function (for simplicity, let's imagine it is a single file component). We discover that we need another component that does nearly the same thing but with some level of change. We don't want to modify the code to perform both functions; rather, we want to use the code for the old component as the basis for creating the new component. We could just copy the code into another file and check it in, but among other things, the new copy loses all the history of what brought it to this point. The solution is to branch the file. That way, both files can be modified independently, both can preserve their history, and bug fixes can be migrated between them if necessary.

Partial Branching

Partial branching is equivalent to component branching, where the "component" is the versioned product. In this case, we work on a product that has a series of releases. We shipped the Everett release and are working on the Whidbey release. As a general rule, all artifacts that make up each version should be branched for the release (source, tools, specs, and so on). However, some versioned files aren't release specific. For example, we have an emergency contact list that has the home phone numbers for team members. When we update the list, we don't want to be bothered with having to merge the changes into each of the product version branches, yet the developers who are enlisted in each version branch want to be able to sync the file to their enlistment.

Identifying Branches (Configurations)

When a file is branched, it is as if a new file is created. We need a way to identify that new file. Historically, this has been done by including the version number of the file as part of the name of the file. In such a mechanism, the version number consists of a branch number and a revision number. A branch number is formed by taking the version number of the file to be branched, appending an integer, and then adding a second integer as a revision number. For example, 1.2 becomes 1.2.1.1 (where 1.2.1 is the branch number and 1 is the revision number). See Chapter 16 for more details on branch labeling.

This is all well and good, but it quickly becomes unwieldy not only from the standpoint of dealing with individual files, but also from the standpoint of trying to pick version numbers apart to understand what they mean.

To address these issues, the notion of "configurations" was developed. A configuration is a collection of files and their version number. Configurations generally have a human-readable name, such as Acme 1.0. Having named configurations is great, but before long, even that will get to be a problem. You will need a way to organize them.

An interesting way to address this organization problem is to make configurations part of the actual source code hierarchy. This method of organization is natural because it is how people do it without version control. It avoids the problem of having to teach most people the concept of configuration, and it provides a great deal of flexibility in how you combine configurations. For example, two versions of an Acme product (where Version 2.0 is branched from Version 1.0) might look something like this:

Acme 1.0
Anvil
Hammer
Head
Handle
Acme 2.0
Anvil
Forge
Hammer
Head
Handle

Branching granularity has different approaches. In the traditional approach, branching is done on a file-by-file basis. Each file can be branched independently at different times, from different versions, and so on. Configurations help prevent this from becoming chaotic. They provide an umbrella to help people understand the purpose of the various branches. File-by-file branching is flexible, but you must take care to ensure that it doesn't get out of hand. In addition, file-by-file branching can be hard to visualize.

Another technique is always to do branching globally. Whenever a branch is created, all files in the system are branched. (There are ways to do this efficiently, so it's not as bad as it sounds.) The upside of this global branching is that it is easy to understand and visualize. The downsides include the fact that it forces a new namespace (the branches namespace) and is less flexible. For example, I can't have a single configuration that includes two copies of the same file from different configurations, as in the previous component branching scenario.

More Scenarios

Shelving and offline work are such excellent features that they alone justifies moving from whatever SCC tool you currently use to TFSC.

Shelving Current Changes

A contributor, working on a new feature, checks out a series of files from the repository.
A critical bug is found that needs immediate attention by this contributor.
The contributor chooses to shelve his current change set for the feature he was working on. All of his currently checked-out files are archived on the server, where they can be retrieved later. The files are replaced by the unmodified copies of the same version he originally synced from the server. The files do not appear to be checked out in the contributor's workspace.
The contributor makes changes to address the bug as needed. The modified files are checked in as a new change set.
The contributor now unshelves his previous change set from the server. The modified files that he previously archived to the server are placed in his workspace. The files once again appear to be checked out in his workspace.
The contributor, wanting to merge any modifications to these files that were made during the bug fix, syncs his workspace with the server. The updates are automatically merged into the checked-out files in the local workspace.
The contributor continues work on the new feature and checks in all modifications as a single changeset when the feature is complete.

Offline Checkout/Check-In

A contributor syncs his workspace and takes his laptop home for the evening.
At home, he continues working and chooses to check out a file.
An unmodified copy of the checked-out file is placed in the contributor's cache on his local computer.
The contributor continues to work and check out additional files. Unmodified copies of all these files are placed in the cache.
When the feature is complete, the user attempts to check in the change set. Because the user is offline, the check-in option is not available.
Wanting to begin work on the next feature, the user shelves his modifications for retrieval and check-in when he is able to go back online.

I have designed VBLs with customers using several different SCC tools. Some worked better than others, but what I really like about TFSC is that it is designed from the ground up to work most efficiently with the way that developers and projects interact. It's not necessary to customize the tool with hacks or tricks to get it to do what you want. All the features are there.