Practice7.Commitment to Rearchitecture | Sustainable Software Development: An Agile Perspective

Practice 7. Commitment to Rearchitecture

Refactoring is a powerful discipline, but sometimes it is necessary to completely rearchitect and replace some portion of a product. In many cases, this is the preferred approach since the cost of the replacement will over time be less than trying to bend the current architecture in the required direction. Therefore, teams (and their management) need to be committed to understanding when rearchitecture is required and be committed to making it happen. Using the chemical plant analogy from Chapter 1, this work should be looked upon as preventive maintenance [md] e.g., taking your pumps offline for work that helps them last longer.

Some of the common reasons for considering rearchitecture work are:

A team has just released the first version of a product. Shipping a version one of a product is the most difficult task any team can undertake, because team members are most often learning about the product's ecosystem as they are developing. They are bound to make at least one, and most likely quite a few, design decisions that will ultimately limit long-term sustainability.
A product was written for a single platform (operating system, database, networking protocol, etc.) and it needs to be ported to a new platform. These ports are an excellent opportunity to introduce greater abstraction to hide the underlying platform-dependent implementations. When done well, this type of abstraction should simplify the application and make it easier to modify in the future.
There have been changes to the underlying technology such as system apis or third-party libraries that the product depends on. Examples might be new input or display devices, or perhaps new hardware or programming models.
A single-threaded application needs to be multi-threaded. If the goal is to maximize the usage of multiple processors, single-threaded software most often needs to be rewritten to break up its tasks into separable units of computation. Without this rework, usually the only work that can be done is to optimize loops or small sections of code. This type of optimization will not maximize use of additional processors.
It is natural that the team will understand the problem domain and ecosystem better over time. Often, the current knowledge can be used to further simplify the product.
The roadmap for the product has changed and in upcoming releases the architecture is going to have to support originally unanticipated workflows. This case is tricky, since you need to practice simple design and not build unnecessary architecture, but sometimes there are some simple changes that will make these future changes much easier.
A section of the code is particularly defect prone, is difficult to modify, and/or excessively coupled to other sections of the architecture.

When considering whether rearchitecture is required, it is vital to remember the underlying tenets of sustainable development. Your product is going to last for a long timehopefully it will last far longer than you can imagine. Consider the consequences of not making the necessary decision when the cost is low (i.e., early in time) versus being forced to do even more work in the future. It is vital that the decision to rearchitect or not should be made consciously by the team. The most dangerous scenario is when the need to rearchitect is ignored by the team or not discussed due to time pressures, external factors such as overwhelming customer requests or unrealistic expectations, or a laissez-faire attitude. Sustainability is often at stake, and the team must have the discipline, and guts, to confront these issues head-on.

Rearchitecture: The Good, the Bad, and the Ugly

In my experience, the most glaring need for rearchitecture work is just after the completion of the first version of a product. There are many reasons for this, but I think the most common is that development teams, no matter how experienced, are bound to make some tradeoffs and mistakes that will cost them over the long-term. I have strong memories of two projects in particular, one good, and the other bad (and ugly).

The Bad and Ugly

This was a product that was built in every team's worst-case scenario: unrealistic deadlines, milestones that were repeatedly missed, and attempts to bring the project back into schedule by adding more people, getting people to work overtime, etc.

The product had an excellent early design vision that was well modularized. To enforce the modularization of the architecture, the build system had been modified to ensure that dependencies were not introduced between modules that should have none. Unfortunately, the build system was complex enough that a decision was made to have a different build system for nightly builds than the one used by developers. The developer build system did no dependency checking, so developers didn't find out about dependency problems until the nightly build. The result was predictable: The nightly builds were always broken, and because of the number of developers, it was time-consuming to figure out which of the many changes broke the build.

Because this was impacting the development team (they couldn't get a good build without a lot of work), the decision was made to turn off the dependency checking. The system then built correctly every time, and the developers were more productive because they no longer had to worry about dependencies. The product did eventually ship.

Unfortunately, the result was akin to a big bowl of sticky spaghetti, there were dependencies everywhere. Heroic efforts could not undo the damage done through that early decision. Business concerns (i.e., profitability) dictated that the team must decrease in size, which largely limited the cleanup attempts to one or two people. The product continued to grow in complexity, and because of the demands for new features, there wasn't time to fix the architecture. The result was an increasingly brittle product, where a change made in one section of the code would cause a problem in one or more unpredictable places. And because the software had lost its modularity, it was virtually impossible to use any kind of automated tests, because in order to run even a simple test you had to load the entire program. Hence, there was heavy reliance on manual testing.

Given the decision to disable dependency checking, which I definitely disagree with, this is a perfect example of a project where an early decision to rearchitect would have had a huge impact on long-term productivity and sustainability. On any project like this, the more time that passes, the harder the decision becomes to rearchitect, no matter how necessary it is.

The Good

The second example is of a project that started off with an aggressive deadline. The team shipped the product on time, but after shipping team members realized that their architecture had a number of problems. The largest was that the software was not easily testable, and as a result, there was too much reliance on manual testing. There was also a looming requirement for an Asian version of the product, and the version one software had not considered internationalization.

Luckily, the team did the right thing and made the necessary but difficult decision to rearchitect. The team first developed a design vision and a revised set of guiding principles for the design. They then spent their next two two-week iterations creating the new architecture and reworking their existing code into it. At the end of the second iteration, there were no new features, but two iterations after that, a record/playback feature was exposed that made a huge impact on testability. From that point, users were able to press record, do some work, and then press stop. Their work was saved in a file and could be read back in at any time for testing. This was a huge milestone for the project because it lessened the burden on people for testing and also made the developers more productive because now when they needed to recreate a problem they could simply load the user's file and start the playback.

Happily for this project, this team continues today to be highly productive and has managed to avoid any kind of defect backlog. This, despite the fact that their code has continued to grow in size and the development team is less than half the size it was for the first version.

Rearchitecture and Testability

Before you do the rearchitecture work, be sure you have added as many tests as possible as safeguards. You need to ensure that the new code behaves as expected. Unit tests should be a given and, if at all possible, add some integration tests to ensure that the new code provides the desired interfaces.

Without tests, the rearchitecture exercise could introduce so many problems that the benefits will be lost. Therefore, if you're going to do it, do it right and don't cut corners. If the code you are replacing has no tests, at least put a set of good enough tests in place so that your confidence is as high as possible that the new code behaves as expected.

There is no point in doing rearchitecture work if automated tests are not in place for all the new code. Think about ruthless testing and design testability into the new code!