Managing Throughput | The Build Master: Microsofts Software Configuration Management Best Practices

The primary measure of success of the SNAP system is throughput the number of check-ins processed over a given period. The demand on a check-in system fluctuates during the course of the development cycle. For example, just after you finish a major milestone, the demand on the system drops significantly. Conversely, in the closing weeks of the next milestone, demand on the system rises, peaking around the milestone date.

Microsoft Sidenote: Lessons Learned Part 2 (from the NetDocs Team)

Predicting demand is not easy. We found that a developer's behavior adapted to the conditions of our queue. Also, some developers like to check in once a week, whereas others want to check in as frequently as they can even several times per day if the system allows them. In one product team, it became well known that getting a check-in through successfully was hard, so developers tended to save up changes and check in less frequently than they might have wanted to. During the evolution of the product, the number of developers went up significantly, yet the number of check-ins processed per day did not keep pace.

There clearly were periods when we maxed out the system. We found that at peak loads, the queue backed up. Also, the number of merge conflicts in check-ins started rising dramatically after a check-in sat in the queue for more than a day before being processed.

We discovered that an effective way to manage peak loads and extend the effective throughput of the lab was to merge unrelated check-ins. By carefully picking and choosing between check-ins that affected different parts of the product, we were able to assemble a merged check-in that processed several sets of changes at once. As a result, we saw significant gains in effective throughput.

We generally experienced few complaints from the developers and did not see many merge conflicts in our check-ins if we could keep our queue wait time within about a day and a half. For us, that represented a queue length from 20 to 25 check-ins deep. When the backup went over two days (for us, 30 or more), we saw significant complaints.

When one group tried flushing out stuck or failed check-ins after a certain timeout, they found that the problem was often caused by something not related to the check-in. When this happened, the SNAP queue would empty as every check-in timed out and then would have to be rerun the next day anyway. Nothing was saved.

By far, the largest problems came when significant new snippets of code were checked in without quickly getting good tests in place for them. Although the product and the tests stabilized, the lab throughput suffered.

To figure out an average check-in time with a SNAP system, take the length of your longest build and add to it the length of the longest test. Add the time it takes to set up the test machines. Then add a few minutes here and there to account for the time needed to perform bookkeeping and process the actual SNAP submit statement. You now have the expected length of a check-in, assuming that you buy enough hardware to run as much as you can in parallel and you have decided to integrate testing into your SNAP process.

From the length of a check-in, you know the number of check-ins per day that you can process. At peak demand periods, it seemed like our developers needed to be able to process a check-in every other day or so. During off-peak intervals, we were seeing a rate closer to once a week.

Again, use these guidelines in conjunction with any data that you have on your current check-in process. In most source code control systems, you can easily discover the number of change lists that have been submitted to any given branch.

This data will help you determine the schedule of your project and how you can improve your build process by either adding more hardware or rearchitecting some processes.

SNAP works best when you can divide processes into small, independent pieces and run them in parallel. Because a product build must often be able to be run on a single developer's machine, it is difficult to parallelize the build. But testing can generally be made to distribute reasonably well.