Painting Over the Cracks: XP on a 50-Person Project | Extreme Programming Refactored: The Case Against XP

What really happens when XP scales up to large projects? In this section we analyze a case study of a 50-person XP project (with the team referred to as ATLAS). The case study can be found online here: http://www.xpuniverse.com/2001/ pdfs /EP202.pdf .

The ATLAS project was conducted by ThoughtWorks, Inc., an XP shop and home of Extremo author Martin Fowler. In this project, we see the circle of snakes really come to life.

See Chapter 3 for our circle of snakes metaphor.

Note that when Matt wrote the original The Case Against Extreme Programming article http://www.softwarereality.com/ExtremeProgramming.jsp and introduced the circle of snakes a few years ago, he had not seen this research. But, as you ll see, XP breaks down in just about all the ways you would expect it to as projects grow in size . In fact, some of the breakdowns would occur even on smaller projects.

The first thing to note about this particular study is that, after a year and a half of XP, the code is definitely worse than [when they] started. Yet this is one of XP s biggest selling points: that repeated refactoring supposedly results in greater code quality. So, what went wrong?

Looking at the case study, it s quite obvious to us that the circle of snakes broke loose. One practice slipped, meaning that the next in line stopped working, and so on. The team, meanwhile, saw this as a sign that it needed to work harder at applying the XP practices. (In the words of Boxer the cart horse from George Orwell s Animal Farm , The only possible answer is that I was not working hard enough. I will work harder! Or, to put it another way, The only possible answer is that I was not refactoring hard enough. I will refactor harder! )

XP is simply an impractical approach to follow on medium- to large-scale projects. Even a medium- sized , 50-person project like the one described in the case study quickly shows XP s shortcomings. To apply XP s practices effectively involves tailoring it, resulting in something that the team might prefer to call XP, but really isn t.

Dodging the Practices

Table 14-1 summarizes some of the ways in which we would expect the XP practices to fail as the project grows (either in scope or in team size). Just for fun, we ve also summarized the corresponding problems that were reported on the ATLAS project (full descriptions can be found in the referenced case study).

Table 14-1: Problems Reported on the ATLAS Project
Xp Practice, Tenet, Or Maxim	How We Expect Xp To Fail As The Project Grows	What Actually Happened On Atlas
On-site customer	Difficult for the customer to speak with a single, unambiguous voice. A detailed requirements document would help.	One customer wasn t sufficient; instead, a team of analysts was needed. ^[4]
Iteration planning meeting	Communication less effective; risk of some key decisions being missed.	The meetings had to be split into smaller, more manageable meetings attended by different people (with summary meetings held later).
Small releases	Larger pieces of work could be difficult to fit into a single iteration. This makes project velocity more difficult to measure (possibly less accurate over time).	The team worked hard to adhere to a fixed 2-week iteration, although larger pieces of work needed to span iterations.
Programmers test their own code	XP suffers from not stipulating a separate software tester. Having a separate QA team that is proactively involved in all areas of development helps to reduce the bug count and improve (and enforce) the development process. QA is especially important on large projects; leaving programmers to test their own code isn t sufficient.	Although the team members kept to this tenet, they also discovered that a separate QA team was essential.
Pair programming	Lower levels of communication, because there are more people to pair-rotate with (see the next entry on pair rotation).	The team pair-programmed religiously when new functionality was being added, but abandoned the practice when bug-fixing or writing repetitive code.
Pair rotation ^[5]	Lower levels of communication; programmers end up specializing in particular niches of the project, which in turn increases the risk of insufficient communal knowledge of a particular area if a programmer leaves the project. A documented design can significantly reduce this risk.	The team started out rotating pairs but stopped, citing the following reason: When you have deadlines ”we find ourselves signing up for things we already know. They added, Signing up for cards in several parts of the system in one iteration is definitely out of fashion these days.
Refactoring	The team may find itself relying more and more on refactoring to keep the code in shape (in other words, emergent design becomes harder and harder work).	A year into the project, Refactoring [was] being done much more often as code starts to spaghetti in some parts of the app.
Sustainable pace (40- hour week)	As refactoring becomes more difficult (hence, time consuming), sustainable pace may as a result become more difficult to adhere to and overtime becomes the norm. The programmers become more tired. Tired programmers means more bugs and less effective unit tests, which in turn means refactoring with a safety net full of gaping holes, which in turn means more bugs . . . .	The team interpreted this practice as minimum 40 hours. They claimed that working overtime did not adversely affect them (and yet, again, they later concluded that the code degenerated during the project).
Coding standards	Collective ownership becomes more problematic , because code written by different people would be inconsistent and hence more difficult to decipher.	Coding standards were very informal (i.e., not strictly adhered to). The team reports that this was not detrimental to its progress; yet the team later concluded that the code was in a much worse state than when the team started.
Collective ownership	Communication of the overall design decreases; knowledge of individual areas becomes highly specialized. Refactoring also becomes significantly more problematic, because it involves changing other parts of the system, which are owned by other teams but are affected by the code being refactored.	From the case study: Code owner-ship [sic] remains diluted. Developers start specializing in parts of the system again.
The code is the design	As the project grows, code becomes less and less effective as a method of communicating the design.	The team discovered that the code is not sufficient design documentation, and the team members needed to regularly communicate the design through presentations. ^[6]
Stand-up meetings	Lower levels of communication, because the increased team size would make stand-up meetings for everyone involved less practical.	The team abandoned these in favor of informal communication and monthly team meetings. Informal communication is just another way of saying We couldn t be bothered to document our design. On a large project, this is lunacy.
Metaphor	More misunderstandings (some of them probably quite insidious) could spring up.	The team found that a single unifying metaphor to describe the architecture just wasn t suitable for such a large project.
^[4] As we explored in Chapter 5, this has proved to be the case generally in XP. ^[5] See the section Extreme Programming in Theory in Chapter 1 for a brief description of pair rotation. Also see Chapter 6. ^[6] Presentations are useful to bring the design to life and communicate it effectively to developers. Writing the design down (and keeping it up-to-date) can also save a lot of wasted work and misunderstanding.

As you can see, most of the ways in which the team tailored XP resulted in lower levels of communication ”exactly the opposite of what any team would want to happen in a large-scale project. Unfortunately, the way in which the team s practices slipped one by one is also exactly what you would expect to see as an XP project scales up (and hence its practices become steadily more difficult to adhere to).

For example (as we describe in Table 14-1), the XP metaphor practice (which many XPers see as a vital ingredient in XP s supposed ability to cut down on design documentation and up-front design) just doesn t work well on large projects ”it s too simplistic. From the case study:

Metaphors are unrealistic with large projects. They are just too complex. Period. ^[7]

A documented (and maintained ) domain model would have helped a lot. A domain model is a much more scalable and robust replacement for XP s metaphor.

Constant pair programming also turned out to be difficult to adhere to:

Some are really just more talented than others and are slowed down by it ” and it becomes obvious that it becomes a burden for these people. ^[8]

As mentioned in Table 14-1, the team abandoned pair programming when bug fixing or writing repetitive code. In some ways, bug fixing can be thought of as a surgical form of refactoring (albeit with a different goal: fixing bugs instead of tidying up code). So, to prevent the other high-discipline XP practices from slipping, the team should have adhered to pair programming even more religiously, not less.

Iteration planning meetings became more of a burden with more people involved. Meetings involving 50 people were found to be overwhelming (and were later abandoned altogether), but in the meantime, smaller groups of developers prepped each meeting in advance (resulting in more work, when the developers could have better spent their time designing and programming).

Collective ownership also showed signs of taking a back seat to the natural urge to simply make some progress:

With specialization, there is a trend with a small set of developers knowing more about different parts of the code ”so we tend to have them be more active in the continuing designs, but still communal ownership of the code. ^[9]

So there was an effort to retain communal ownership. Unfortunately, having the specialized programmers be more active in the design work is missing the point of collective ownership: that in XP-land, every time somebody refactors a piece of code somewhere in the system, they re doing design work. If programmers are specializing in certain areas, blindly saying we still have collective ownership could lead to refactoring without a vital safety net. ^[10]

It gets worse. As we saw earlier, refactoring on ATLAS wasn t backed up by rotation of programmers to different tasks . The danger (particularly on large projects) is, again, that of decreased communication. Refactoring is hampered because it s more difficult to change other people s code, and diving in to change unfamiliar code will almost certainly result in a higher defect rate. Small wonder , then, that the team was relying so absolutely heavily on unit tests to save the day (as we explore in the next section). Nevertheless, the code quality still suffered.

If refactoring is hampered by other problems, then of course emergent design ”one of the cornerstones of XP ”also becomes problematic.

Emergent Design on ATLAS-Sized Projects

The ATLAS team correctly identified that simple design is essential for a largescale project, and that refactoring is an important practice to keep the design in trim. This is true of any project to an extent, but much more so in XP, which places more of an emphasis on emergent design (as opposed to up-front design). Unfortunately, the article also highlights the inadequacy of emergent design when it comes to large projects:

Code is definitely worse than we started. But is this because the project is larger? Or is it because many people [that] touch the code are first timers? A little of both ”but at the same time we don t get islands of code that do not have anything to do with the rest of the application. There needs to be a constant cleaning up of code. ^[11]

This suggests that Constant Refactoring After Programming just isn t sufficient to make up for the paucity of up-front design.

Let s revisit that preceding quote for just a second, especially the first sentence of it: Code is definitely worse than we started. If XP has any purpose, isn t it to improve code quality? Isn t that exactly what XP is supposed to be all about? So, doesn t the sentence Code is definitely worse than we started pretty much tell the whole story?

Ironically, the team also showed signs of perhaps becoming too dependent on certain high-discipline XP practices. In particular, the article reveals the team s overreliance on unit tests and continuous integration when up-front design is skipped :

Unit tests and integrated builds ”are ABSOLUTLY MANDITORY [sic] ” we would be stopped in our tracks and not able to deliver one piece of code if we could not rely on tests. As the application gets larger and larger it becomes almost impossible to add new code or refactor existing code without going through tests. ^[12]

This highlights the true cost of the emergent design approach, particularly on large projects. The team also appears to have forgotten the benefits that a documented design would have provided when adding new code.

^[7] Amr Elssamadisy, op. cit., p. 5.

^[8] Ibid., p. 4.

^[9] Ibid., p. 3.

^[10] At least, it s vital in XP because the practices are taken to extremes.

^[11] Amr Elssamadisy, op. cit., p. 4.

^[12] Ibid.