Queuing Theory | Implementing Lean Software Development: From Concept to Cash

Queuing theory is the study of waiting lines or queues. We certainly have queues in software developmentwe have lists of requests from customers and lists of defects we intend to fix. Queuing theory has a lot to offer in helping manage those lists.

Little's Law

Little's Law states that in a stable system, the average amount of time it takes something to get through a process is equal to the number of things in the process divided by their average completion rate (see Figure 5.2).

Figure 5.2. Little's Law

In the last section we said that the objective of a lean development organization is to reduce the cycle time. This equation gives us a clear idea of how to do that. One way to decrease cycle time is get things done fasterincrease the average completion rate. This usually means spending more money. If we don't have extra money to spend, the other way to reduce cycle time is to reduce the number of things in process. This takes a lot of intellectual fortitude, but it usually doesn't require much money.^[5]

^[5] See Michael George and Stephen Wilson, Conquering Complexity in Your Business: How Wal-Mart, Toyota, and Other Top Companies Are Breaking Through the Ceiling on Profits and Growth, McGraw-Hill, 2004, p. 37.

Variation and Utilization

Little's Law applies to stable systems, but there are a couple of things that make systems unstable. First there is variationstuff happens. Variation is often dealt with by reducing the size of batches moving through the system. For example, many stores have check-out lanes for "10 items or less" to reduce the variation in checkout time for that line. Let's say you have some code to integrate into a system. If it's six weeks' worth of work, you can be sure there will be a lot of problems. But if it's only 60 minutes of work, the amount of stuff that can go wrong is limited. If you have large projects, schedule variation will be enormous. Small projects will exhibit considerably less schedule variation.

High utilization is another thing that makes systems unstable. This is obvious to anyone who has ever been caught in a traffic jam. Once the utilization of the road goes above about 80 percent, the speed of the traffic starts to slow down. Add a few more cars and pretty soon you are moving at a crawl. When operations managers see their servers running at 80 percent capacity at peak times, they know that response time is beginning to suffer, and they quickly get more servers.

Since Google was organized by a bunch of scientists studying data mining, it's not surprising that their server structure reflects a keen understanding of queuing theory. First of all, they store data in small batches. Instead of big servers with massive amounts of data on each one, Google has thousands upon thousands of small, inexpensive servers scattered around the world, connected through a very sophisticated network. The servers aren't expected to be 100 percent reliable; instead, failures are expected and detected immediately. It's not a big deal when servers fail, because data has been split into tiny pieces and stored in lots of places. So when servers fail and are automatically removed from the network, the data they held is found somewhere else and replicated once again on a working server. Users never know anything happened; they still get almost instantaneous responses.

If you have ever wondered why Google chose to dedicate 20 percent of its scientists' and engineers' time for work on their own projects, take a look at the graph in Figure 5.3. This figure shows that cycle time starts to increase at just above 80 percent utilization, and this effect is amplified by large batches (high variation). Imagine a group of scientists who study queuing theory for a living. Suppose they find themselves running a company that must place the highest priority on bringing new products to market. For them, creating 20 percent slack in the development organization would be the most logical decision in the world. It's curious that observers applaud Google for redundant servers but do not understand the concept of slack in a development organization.

Figure 5.3. Queuing theory applies to development as well as traffic

Most operations managers would get fired for trying to get maximum utilization out of each server, because it's common knowledge that high utilization slows servers to a crawl. Why is it that when development managers see a report saying that 90 percent of their available hours were used last month, their reaction is, "Oh look! We have time for another project!" Clearly these managers are not applying queuing theory to the looming traffic jam in their department.

You can't escape the laws of mathematics, not even in a development organization. If you focus on driving utilization up, things will slow down. If you think that large batches of work are the path to high utilization, you will slow things down even further, and reduce utilization in the process. If, however, you assign work in small batches and concentrate on flow, you can actually achieve very good utilizationbut utilization should never be your primary objective.

Reducing Cycle Time

Let's agree, at least for the moment, that our objective is to reduce the average cycle time from concept to cash or from customer need to deployed software. How do we go about accomplishing this goal? Queuing theory gives us several textbook ways to reduce cycle time:

Even out the arrival of work
Minimize the number of things in process
Minimize the size of things in process
Establish a regular cadence
Limit work to capacity
Use pull scheduling

Even Out the Arrival of Work

At the heart of every lean process is an even level of work. In a factory, for example, a monthly plan to build 10,000 widgets translates into building one widget every minute. The factory work is then paced to produce at a steady rate of one widget per minute.

The budgeting and approval processes are probably the worst offenders when it comes to creating a steady flow of development work. Requests are queued for months at a time, and large projects may wait for the annual budgeting cycle for approval. Some think that by considering all proposals at the same time, an organization can make a better choice about how to spend its budget. However, this practice creates long queues of work to be done, and if all of the work is released at the same time, it wreaks havoc on the development organization. Moreover, it means that decisions are made well out of sync with need, and by the time the projects are started, the real need in the business will probably have experienced considerable change. Tying project approvals to the budgeting cycle is generally unnecessary and usually unrealistic in all but the slowest moving businesses.

All of our work is in the first half of the year.

I was having dinner with a fairly high-level manager of a very large IT organization. They had a practice of moving developers around to different projects, and I suggested that they might want to consider assigning teams to specific business areas so they could become familiar with their customers.

"We couldn't do that," he said. "The business leaders all want their work to be done in the first half of the year, from January to June, so we try to do the background stuff they aren't interested in the second half of the year."

"Why do they want all of the work done in the first half of the year?" I asked. "Is your business seasonal?"

"No," he replied. And then after thinking for a moment, he added, "They get their budgets at the end of December, and so I suppose they want the work done as soon as possible after that."

"I know you want to be responsive to the business leaders," I said, "But you can't let them get away with that. You have to make it clear what that kind of schedule does to your ability to support them. What if you took every business leader's budget and allocated a team of people to their business based on their annual budget, then let the business leader decide how the team would spend their time throughout the year?"

"Well, that would make too much sense," he grinned. "It would mean a lot of change…." And he promised to consider it further.

Mary Poppendieck

Queues at the beginning of the development process may seem like a good place to hold work so that it can be released to the development organization at an even pace. But those queues should be no bigger than necessary to even out the arrival of work. Often we find that work arrives at a steady pace, and if that is the case, then long queues are really unnecessary.

Doctor's Appointments

Here in the United States, doctors stop accepting new patients when their backlog of appointments starts to grow too long. In many clinics, the waiting list for an appointment is about two monthsit is not allowed to get longer, but then again, it never seemed to get any shorter either. This would be regarded as a stable system in queuing theory.

One clinic in Minnesota studied lean ideas and decided to see what would happen if they shortened the waiting time. For a while, most doctors worked an extra half day every week while the schedulers made sure that the arrival of work remained stable. Over the period of a half a year, the clinic reduced waiting times for an appointment to about two days. The clinic found that doctors still saw the same number of patients with the same mix of problems. Some doctors were surprised to find that they did not need a cushion of sixty days of appointments to keep them busy; in fact, they saw very little difference in their workload.

From a patient point of view, there was a dramatic differencesuddenly they could call up and get an appointment within a day or two. This was truly a "lean solution" for patients.

Mary Poppendieck

Minimize the Number of Things in Process

In manufacturing, people have learned that a lot of in-process-inventory just gums up the works and slows things down. Somehow we don't seem to have learned this same lesson in development. We have long release cycles and let stuff accumulate before releasing it to production. We have approval processes that dump work into an organization far beyond its capacity to respond. We have sequential processes that build up an amazing amount of unsynchronized work. We have long defect lists. Sometimes we are even proud of how many defects we've found. This partially done work is just like the inventory in manufacturingit slows down the flow, it hides quality problems, it grows obsolete, usually pretty rapidly.

One of the less obvious offenders is the long list of customer requests that we don't have time for. Every software development organization we know of has more work to do that in can possibly accommodate, but the wise ones do not accept requests for features that they cannot hope to deliver. Why should we keep a request list short? From a customer's perspective, once something has been submitted for action, the order has been placed and our response time is being measured. Queues of work waiting for approval absorb energy every time they are estimated, reprioritized, and discussed at meetings. To-do queues often serve as buffers that insulate developers from customers; they can be used to obscure reality and they often generate unrealistic expectations.

But we have such a long list of things to do, how can we pare it down?

We generally find that long queues of work to do are unrealistic and unnecessary. Here are some ideas of how to deal with these queues.

Start by asking, "How many things in this queue are we realistically never going to get around to?" Cut all of the things you'll never get to out of the queue immediately. Be honest. Just hit the delete key.
So, how many items did that exercise get rid of? Half? Now take the remaining items and do a Pareto analysis on them. Rate each one on a scale of 1 to 5. The critical items will rate a 5. The unimportant items will rate a 1. Now get rid of all except those that got 4s and 5s. Just hit delete. Don't worry, if they turn out to be important, they'll come back at you.
Now take the items that are left and calculate how many days, months, or years of work they represent. Will you have other things added to the list that will be more important? With that in mind, do you have the capacity to do the remaining items on the list in the near future? If not, should you add additional capacity?
If your list is still unrealistically long, there is probably some purpose that it is serving beyond making effective decisions on what to do and what not to do. For example, a long list might deflect undue attention or absorb frivolous requests. Break the list into two lists, one which will serve the exterior purpose, and the other which you will keep short and work off of.

Seven Years?

"We prioritize this list every week," the manager said.

"Do you know about how many requests there are in the list?" I asked.

"Yes, there are about 750," he said.

"Do you know how many of those you can do in, say, a month, on the average?"

"Yes, we average about nine every month," he answered. The company kept good statistics.

"Wow!" someone else said. "That's seven years of work!"

"Seven years!" the manager was astonished. "I had never looked at it that way."

"And why do you keep so many on the list if you know you will never get to them?" I asked.

"Well, our process expert said we don't want to lose track of anything. And we've gotten so we don't spend much time on the list every week."

"So you never get back to the customers and say, 'No, sorry.' And they probably keep expecting you to develop their features," I said. "Don't you want to be a bit more honest with your customers?"

"Yes, we used to tell them 'No' all the timeI was very aggressive about it when we were small and I was more involved. Customers seemed to appreciate the honesty. Maybe we should start doing that again…."

Mary Poppendieck

Minimize the Size of Things in Process

The amount of unfinished work in an organization is a function of either the length of its release cycle or the size of its work packages. Keeping the release cycle short and the maximum work package size small is a difficult discipline. The natural tendency is to stretch out product releases or project durations, because the steps involved in releasing work to production seem to involve so much work. However, stretching out the time between releases is moving in exactly the wrong direction from a lean perspective. If a release seems to take a long time, don't stretch out releases. Find out what is causing all the time and address it. If something is difficult, do it more often, and you'll get a lot better at it.

Releases Take Too Long

After a talk at a company, the QA manager came to me and said, "I don't see how we can follow your advice. We have a lot of pressure to put as many features as we can in each release, because they are so far apart."

"Why not release more often?" I asked.

"We can't possibly release more often because verification takes so long," came the immediate reply.

"Why does verification take so long?" I wondered.

"We find lots of problems in verification that have to be fixed," he said. I began to see a vicious circle.

"Can't you find most of the problems before verification?" I asked. After all, he was the QA manager.

"Verification is supposed to be independent," the QA manager replied. "If we verify the code while the developers are writing it, that would destroy our independence."

I was really surprised. There are a lot of good reasons to delay final verification. In embedded systems the hardware usually isn't ready until the end. When deploying to a customer site, you don't have access to their environment until the very last moment. But this was a new one.

"I can understand independent verification," I said, "but how is that related to long release cycles? Can't verification be independent but test a smaller amount of code?"

"Well, I'll have to think about it, but we might get too close to the developers that way." He sounded reluctant. I guessed that their long release cycles were not going to get shorter any time soon.

Mary Poppendieck

Oh, NOW I Get It!

We were teaching a class that had done current value stream maps and then future value stream maps. The last of the groups was presenting their future map. Suddenly someone from a different group exclaimed, "Oh, NOW I get it!" The presenter paused to see what this was all about.

"When I developed that future value stream map that I just finished presenting," he said, "I cut the release cycle to a third of its former length, and I was really frustrated that I still had a rather low process cycle efficiency. What I just realized is that I've been trying to optimize utilization with releases. The whole concept of a release is what's driving down my efficiency. If I could release as soon as a patch is ready instead of waiting for a release, the process cycle efficiency would be much better!"

The speaker was obviously proud of this blinding insight, but as he looked around he noticed that most people weren't all that impressed. Then he said kind of sheepishly, "I guess this is what you've been trying to say all morning. It just took the idea this long to sink into my head."

Tom & Mary Poppendieck

Establish a Regular Cadence

Iterations are the cadence of a development organization. Every couple of weeks something gets done. After a short time people begin to count on it. They can make plans based on a track record of delivery. The amount of work that can be accomplished in an iteration quickly becomes apparent; after a short time people stop arguing about it. They can commit to customers with confidence. There is a steady heartbeat that moves everything through the system at a regular pace. A regular cadence produces the same effect as line leveling in manufacturing.

What should the cadence be? One friend favors one week iterations. He finds it's just long enough to get customers with emergencies to be sure the problem is real before his team dives in and just short enough to deliver very timely work. Another friend swears by 30 days, because it gives the team time to think things through before they start coding, yet is short enough that managers can wait until the next iteration to ask for changes.

The cadence is right when work flows evenly. If there is a big flurry of activity at the end of an iteration then the iteration length is probably too long; shorter iterations will help to even out the workload. Cadence should be short enough that customers can wait until the end of an iteration to ask for changes, yet long enough to allow the system to stabilize. This is best understood by considering a household thermostat. If the thermostat turns on the furnace the instant the temperature falls below the temperature setting, and turns it off the instant the temperature rises above the setting, the furnace will cycle on and off too frequently for its own good. So thermostats have a lag built into them. They wait for the temperature to drop a degree or two below the setting before turning on the furnace, and they wait until the temperature goes a degree or two above the setting before turning the furnace off. This lag in response is small enough so you don't feel much difference, and big enough to keep the furnace from oscillating. Use the same concept when finding the right cadence for your situation.

Asynchronous Cadence

One embedded software department we know of found their release schedule was getting too complicated as hardware models proliferated. So they decided to create a single version of the software that would run on all hardware models. Once they established the platform, they added technology capabilities to the software at three-week intervals. As new hardware models were developed, their engineers could look at the plan for software "technology drops" and decide which drop to pick up. A new model might wait a couple weeks for a technology drop with a feature it really needed, or the software department might be convinced to change its technology drop schedule, but only if other hardware models agreed.

By uncoupling the software from the hardware, the software department was able to establish its own cadence, and it didn't take long to discover that the new system was much more productive.

Mary Poppendieck

Limit Work to Capacity

Far too often we hear that the marketing department or the business unit, "Has to have it all by such-and-such a date," without regard for the development organization's capacity to deliver. Not only does this show lack of respect for the people developing the product, it also slows down development considerably. We know what happens to computer systems when we exceed their capacityit's called thrashing.

A Long Saturday in the Airport

We got to the Melbourne airport at 7:30 on a fine Saturday morning, plenty of time to catch our 10:00 a.m. flight to Auckland. A check-in desk had just opened up, so we put our luggage on the scale and tried to hand our tickets to the check-in agent. "Not so fast," she said. "The computers are down." As we looked around, we finally noticed the long lines everywhere.

"How long have they been down?" we asked.

"Oh, about an hour," she said.

In most US airports, each airline desk is probably accessing a different computer system, but in airports like Melbourne, there is one computer system that everyone shares. So that meant the whole airportboth domestic and international terminalshad been down for an hour.

Shortly after that the phone rang. Someone was calling to spread the word that the computers were coming back up. Throughout the airport we could see dozens of people poised at their terminals. And then, everyone began typing furiously, all at the same time. It took about 15 seconds for the system to crash again. "It's been like that for the last half hour," the agent told us. "Every ten minutes they say the system is coming up, and then it crashes again."

We could guess why: The system probably was not designed for hundreds of people to type at exactly the same time.

The computer system finally came up three hours later. We were rather late getting into Auckland.

Mary & Tom Poppendieck

Time sometimes seems to be elastic in a development organization. People can and do work overtime, and when this happens in short bursts they can even accomplish more this way. However, sustained overtime is not sustainable. People get tired and careless at the end of a long day, and more often that not, working long hours will slow things down rather than speed things up. Sometimes an organization tries to work so far beyond its capacity that it begins to thrash. This can happen even if there appear to be enough people, if key roles are not filled and a critical area of development is stretched beyond its capacity to respond.

A Customer Service Problem

I was visiting a company that asked me to look at its customer service process. As we drew a value stream map on the white board, we got to the point where a customer service team was on site installing software. At that point, I was told, they had a problem. "How often does this happen?" I asked.

"Every single time," they all agreed.

"Okay, so how does the problem get fixed?" I asked.

"It doesn't." General agreement again.

"It doesn't?" I had to be sure I heard that right.

"Yes, a request is made to development, but it goes into a queue. It never comes out unless it's one of our Top 3 priority customers. They're just too busy."

With what? I wondered, but instead I asked, "So what happens to the customer?"

"Well, the customer service people stay on site. They do other things. It can be weeks before they get any help."

"But you said they only get help if they are one of the Top 3," I said.

"Well, the customer eventually complains enough that they get moved up into the Top 3, and another one gets bumped out."

"It would appear to me," I said, "that you don't have a customer service problem, you have a problem delivering code that can be counted on to work at a customer site." In the ensuing discussion I could tell that this was a sore point.

"Why can't the development department focus on figuring out what is causing this and fix it?" I asked.

"Our investors are very demanding," they said. "We have a product map. We have to be developing new systems for new customers. We have to keep on adding more customers."

"But if you can't bring new customers up, why do you want them?" I asked. "It seems to me that you're thrashing."

"Yes, that's a good word," some agreed. But others didn't seem to think that the problem was such a big deal. Which, in the end, was probably the source of the problem in the first place.

Mary Poppendieck

Use Pull Scheduling

When a development team selects the work it will commit to for an iteration, the rule is that team members select only those (fine-grain) items they are confident that they can complete. During the first couple of iterations, they might guess wrong and select too much work. But soon they establish a team velocity, giving them the information they need to select only what is reasonable. In effect, the development team is "pulling" work from a queue. This pull mechanism limits the work expected of the team to its capacity. In the unlikely event that the team finishes ahead of time, more work can always be pulled out of the queue. Despite the fact that everyone always has work, the pull system has slack, because if emergencies arise or things go wrong, the team can adapt either by terminating the current iteration or by officially moving some items to the next iteration. Finally, since the team is working on the most important features from a customers' perspective, they are working on the right things.

An Example of Pull Scheduling

I was visiting a medium-sized department in a large financial institution. Swen (not his real name) was responsible for business results and needed software changes to deliver them. He was very frustrated because all of his requests seemed to take forever. His frustration was matched by Karl (also not his real name), the senior manager in the IT department, who felt that Swen didn't understand how difficult his requests were or the problems he created by constantly changing his mind.

Karl insisted that each request needed to have a rough cost/benefit analysis and, if it passed, a more detailed architectural review before being scheduled for implementation. This sounded fine to Swen. All he wanted was to have some control over what was done and an understanding of when things were going to be ready to use.

I sketched the queuing idea in Figure 5.4 as a management approach:

Figure 5.4. Pull system for managing a workflow

With this system, Swen agreed to be limited to a maximum of six requests at a time. Karl committed to having a rough cost/benefit analysis (that would take about four hours) done within a week of each request. At that point Swen could either put it into the architectural review queue or reject it, but he agreed to have no more than three architectural requests in the queue at a time. If the queue was already full, a new request would either replace one of the existing requests in the queue or be rejected. Swen agreed that when the architectural queue was full, he would not submit any more requests that were less important than the three in the queue.

Karl agreed to complete an architectural review and more detailed cost estimate within two weeks. Then Swen could either accept or reject the result. Accepted requests went into the backlog, and every two weeks the team would pull an iteration's worth of work from the backlog. Swen agreed to limit the number of items in the backlog to no more than two iterations worth of work.

The important point is that Swen would "own" the queues. By keeping them short, he could tell at a glance approximately when any feature would be complete. Swen could reorganize the queues, add, or take away items at any time until the items were pulled by Karl's teams. Karl's organization would always be busy but never swamped, and they would always be working on exactly what Swen wanted.

Karl had a few other customers, but Swen accounted for 65 percent of his workload. Karl felt that he could integrate his other customers into this queuing system or else have separate teams work on their requests.

Mary Poppendieck

Cascading queues (as shown in Figure 5.4) are possible, and are often used at organizational boundaries. Queues are a useful management tool, because they allow managers to change priorities and manage cycle time while letting the development teams manage their own work. But queues are not an ideal solution. When they are used, here are some general rules to follow:

Queues must be kept shortperhaps two cycles of work. It is the length of the queues that governs the average cycle time of a request through the development process.
Managers can reorganize or change items at any time that they are in a queue. But once teams start to work on an item, they should not interfere with day-to-day development.
Teams pull work from a queue and work at a regular cadence until that work is done. It is this pull system that keeps teams busy at all times while limiting work to capacity.
Queues should not be used to mislead people into thinking that their requests are going to be dealt with if the team does not have the capacity to respond.

Summary

The measure of a mature organization is the speed at which it can reliably and repeatedly execute its core processes. The core process in software development is the end-to-end process of translating a customer need into deployed product. Thus, we measure our maturity by the speed with which we can reliably and repeatedly translate customers' needs into high quality, working software that is embedded in a product which solves the customers' whole problem.