Section 7.3. Refactoring | Applied Software Project Management

7.3. Refactoring

To refactor a program is to improve the design of that program without altering its behavior.^[*] There are many different kinds of improvementscalled refactorings that can be performed.

^[*] Some people have a much narrower definition of the term refactoring than we use in this chapter. They use it to refer only to the specific activity of making source code smaller by taking specific code paths and turning them into runtime data or structures.

Every programmer knows that there are many ways to write code to implement one specific behavior. There are many choices that do not affect the behavior of the software but that can have an enormous impact on how easy the code is to read and understand. The programmers choose variable names, decide whether certain blocks of code should be pulled out into separate functions, choose among various different but syntactically equivalent statements, and make many other choices that can have a significant impact on how easy the software is to maintain.

Many programmers think of coding as a purely constructive task, for which the only reason to add, remove, or change the source code is to alter the behavior of the software. Refactoring introduces a new concept: adding, removing, or changing the source code for the sole purpose of making it easier to maintain. There are many different refactorings, or techniques, through which programmers can alter their code to make it easier to understand.

Refactoring is a new way of thinking about software design. Traditionally, software is designed first and then built. This is especially true of object-oriented programming, where the programmers might be handed a complex object model to implement. But most programmers who have worked on a reasonably complex project have run across instances when they discover ways that an object could have been designed better. They could not have predicted most of these improvements because they only became apparent during the construction of the code. Refactoring provides them with a way to incorporate those improvements in a structured, repeatable manner.

Because each refactoring is a change to the design, it may impact the design review process. If the software design has already been reviewed by project team members, then any changes that arise from refactoring activities should be communicated to the people who reviewed it. This does not necessarily mean that design specification must be reinspected after each refactoring; since refactoring changes the design without altering the functionality, it is usually sufficient to distribute just the changes to the design and have the team members approve those changes. In general, people do not object very often to refactoring, but they appreciate being given the opportunity to discuss it and suggest alternatives.

Each refactoring has a set of stepssimilar to the scripts we use to describe the tools in this bookwhich makes it much less likely for the programmer to introduce defects. It also has two design patterns that show what the code looks like before and after the refactoring. There are dozens of refactorings, each with its own particular pattern and steps. The example below demonstrates four of these refactorings: Extract Method, Replace Magic Number with Symbolic Constant, Decompose Conditional, and Introduce Explaining Variable. A comprehensive catalog of refactorings can be found at http://www.refactoring.com/catalog.

7.3.1. Refactoring Example

In this example, a programming team in an investment bank is reviewing a block of code for a feature that calculates fees and bonuses for brokers who sell a certain kind of investment account to their corporate clients. The programmer who wrote the code uses refactoring to clarify problems that were identified during the code review.

The inspection team performed a code review on this block of Java code, which included the class Account and a function calculateFee from a different class:

 1    class Account { 2        float principal; 3        float rate; 4        int daysActive; 5        int accountType; 6      7        public static final int  STANDARD = 0; 8        public static final int  BUDGET = 1; 9        public static final int  PREMIUM = 2; 10        public static final int  PREMIUM_PLUS = 3; 11    } 12      13    float calculateFee(Account accounts[]) { 14        float totalFee = 0; 15        Account account; 16        for (int i = 0; i < accounts.length; i++) { 17            account = accounts[i]; 18            if ( account.accountType == Account.PREMIUM || 19                account.accountType == Account.PREMIUM_PLUS ) { 20                totalFee += .0125 * ( account.principal 21                            * Math.exp( account.rate * (account.daysActive/365.25) ) 22                            - account.principal ); 23            } 24        } 25        return totalFee; 26    }

At first, the code seemed reasonably well designed. But as the inspection team discussed it, a few problems emerged. One of the inspectors was not clear about the purpose of the calculation that was being performed on lines 20 to 22. The programmer explained that this was a compound interest calculation to figure out how much interest was earned on the account, and suggested that they use the Extract Method refactoring to clarify it. They performed the refactoring right there during the code review. Since this calculation only used data that was available in the Account class, they moved it into that class, adding a new method called interestEarned (in lines 12 to 15 below):

 1    class Account { 2        float principal; 3        float rate; 4        int daysActive; 5        int accountType; 6      7        public static final int  STANDARD = 0; 8        public static final int  BUDGET = 1; 9        public static final int  PREMIUM = 2; 10        public static final int  PREMIUM_PLUS = 3; 11      12        float interestEarned(  ) { 13            return ( principal * (float) Math.exp( rate * (daysActive / 365.25 ) ) ) 14                    - principal; 15        } 16    } 17      18    float calculateFee(Account accounts[]) { 19        float totalFee = 0; 20        Account account; 21        for (int i = 0; i < accounts.length; i++) { 22            account = accounts[i]; 23            if ( account.accountType == Account.PREMIUM || 24                 account.accountType == Account.PREMIUM_PLUS ) 25                totalFee += .0125 * account.interestEarned(  ); 26        } 27        return totalFee; 28    }

An inspector then asked what the number .0125 in line 25 was, and if it could ever change in the future. It turned out that each broker earned a commission fee that was equal to 1.25% of the interest earned on the account. They used the Replace Magic Number with Symbolic Constantrefactoring, replacing it with the constant BROKER_FEE_PERCENT and defining that constant later in line 31 (and adding a leading zero to help people read the code quickly):

 1    class Account { 2        float principal; 3        float rate; 4        int daysActive; 5        int accountType; 6      7        public static final int  STANDARD = 0; 8        public static final int  BUDGET = 1; 9        public static final int  PREMIUM = 2; 10        public static final int  PREMIUM_PLUS = 3; 11      12        float interestEarned(  ) { 13            return ( principal * (float) Math.exp( rate * (daysActive / 365.25 ) ) ) 14                    - principal; 15        } 16    } 17      18    float calculateFee(Account accounts[]) { 19        float totalFee = 0; 20        Account account; 21        for (int i = 0; i < accounts.length; i++) { 22            account = accounts[i]; 23            if ( account.accountType == Account.PREMIUM || 24                 account.accountType == Account.PREMIUM_PLUS ) { 25                totalFee += BROKER_FEE_PERCENT * account.interestEarned(  ); 26            } 27        } 28        return totalFee; 29    } 30      31    static final double BROKER_FEE_PERCENT = 0.0125;

The next issue that was raised in the code review was confusion about why the accountType variable was being checked in lines 23 and 24. There were several account types, and it wasn't clear why the account was being checked for just these two types. The programmer explained that the brokers only earn a fee for premium accounts, which could either be of the type PREMIUM or PREMIUM_PLUS.

By using the Decompose Conditionalrefactoring, they were able to clarify the purpose of this code. Adding the isPremium function to the Account class (lines 17 to 22) made it more obvious that this was a check to verify whether the account was a premium account:

 1    class Account { 2        float principal; 3        float rate; 4        int daysActive; 5        int accountType; 6      7        public static final int  STANDARD = 0; 8        public static final int  BUDGET = 1; 9        public static final int  PREMIUM = 2; 10        public static final int  PREMIUM_PLUS = 3; 11      12        float interestEarned(  ) { 13            return ( principal * (float) Math.exp( rate * (daysActive / 365.25 ) ) ) 14                    - principal; 15        } 16      17        public boolean isPremium(  ) { 18            if (accountType == Account.PREMIUM || accountType == Account.PREMIUM_PLUS) 19                return true; 20            else 21                return false; 22        } 23    } 24      25    float calculateFee(Account accounts[]) { 26        float totalFee = 0; 27        Account account; 28        for (int i = 0; i < accounts.length; i++) { 29            account = accounts[i]; 30            if ( account.isPremium(  ) ) 31                totalFee += BROKER_FEE_PERCENT * account.interestEarned(  ); 32        } 33        return totalFee; 34    } 35      36    static final double BROKER_FEE_PERCENT = 0.0125;

The last problem found during the inspection involved the interestEarned( ) method that they had extracted. It was a confusing calculation, with several intermediate steps crammed into a single line. When that behavior was buried inside the larger function, the problem wasn't as glaring, but now that it had its own discrete function, they could get a clearer look at it.

The first problem was that it wasn't exactly clear why there was a division by 365.25 in line 13. The programmer explained that in the Account class, daysActive represented the number of days that the account was active, but the rate was an annual interest rate, so they had to divide daysActive by 365.25 to convert it to years. Another programmer asked why principal was being subtracted at the end of the interest calculation. The explanation was that this was done because the fee calculation was based only on the interest earned, regardless of the principal that initially was put into the account.

The refactoring Introduce Explaining Variablewas used to introduce two intermediate variables, years on line 13 and compoundInterest on line 14, to clarify the code:

 1    class Account { 2        float principal; 3        float rate; 4        int daysActive; 5        int accountType; 6      7        public static final int  STANDARD = 0; 8        public static final int  BUDGET = 1; 9        public static final int  PREMIUM = 2; 10        public static final int  PREMIUM_PLUS = 3; 11      12        float interestEarned(  ) { 13            float years = daysActive / (float) 365.25; 14            float compoundInterest = principal * (float) Math.exp( rate * years ); 15            return ( compoundInterest - principal ); 16        } 17      18        public boolean isPremium(  ) { 19            if (accountType == Account.PREMIUM || accountType == Account.PREMIUM_PLUS) 20                return true; 21            else 22                return false; 23        } 24    } 25      26    float calculateFee(Account accounts[]) { 27        float totalFee = 0; 28        Account account; 29        for (int i = 0; i < accounts.length; i++) { 30            account = accounts[i]; 31            if ( account.isPremium(  ) ) { 32                totalFee += BROKER_FEE_PERCENT * account.interestEarned(  ); 33            } 34        } 35        return totalFee; 36    } 37      38    static final double BROKER_FEE_PERCENT = 0.0125;

After these four refactorings, the inspection team agreed that the new version of this code was much easier to understand, even though it was almost 50% longer.

The code after refactoring must behave in exactly the same way it did beforehand. In general, every refactoring should be combined with an automated test to verify that the behavior of the software has not changed, because it is very easy to inject defects during refactoring. A framework of automated unit tests can ensure that the code behavior remains intact. Luckily, the team already had a set of unit tests. They had to add tests to verify the new Account.isPremium method, but the new code passed all of the other unit tests and the new version of the code was checked in (along with the new tests).

7.3.2. Refactoring Pays for Itself

Many people are initially uncomfortable with the idea of having programmers do tasks that don't change the behavior of the code. But, like time spent on project planning and software requirements engineering, the time spent refactoring is more than recouped over the course of the project. In fact, refactoring can help a team recover code that was previously written off as an unmaintainable mess, and can also help to keep new code from ever getting to that state.

Refactoring makes intuitive sense, when one considers the main reasons that code becomes difficult to maintain. As a project moves forward and changes, code that was written for one purpose is often extended and altered. A block of code may look pristine when it's first built, but it can evolve over time into a mess. New functionality or bug fixes can turn clear, sensible code into a mess of enormously long and complex loops, blocks, cases, and patches. Some people call this spaghetti code (a name that should make intuitive sense to anyone who has had to maintain a mess like that), but it is really just code whose design turned out not to be all that well suited to its purpose.

The goal of refactoring is to make the software easier for a human to understand, without changing what it does. Most modern programming languages are very expressive, meaning that any one behavior can be coded in many different ways. When a programmer builds the code, he makes many choices, some of which make the code much easier or harder to understand. Each refactoring is aimed at correcting a common pattern that makes the code harder to read. Code that is easier to understand is easier to maintain, and code that is harder to understand is harder to maintain.

In practice, maintenance tasks on spaghetti code are extraordinarily difficult and time-consuming. Refactoring that code can make each of these maintenance tasks much easier. Well-designed code is much easier to work on than poorly designed code, and refactoring improves the design of the software while it is being built. Any programmer who has to maintain spaghetti code should make extensive use of refactoring. It usually takes about as much time to refactor a block of spaghetti code as it does to simply try to trace through it. In fact, many programmers have found that it's much faster to use refactoring to permanently detangle messy code than it is to try to just fix the one problem that popped up at the time

In addition to saving time on programming, refactoring can also help a programmer find bugs more quickly. Poorly designed code tends to have more defects, and tracking these defects down is an unpleasant task. If the code is easier to read and follow, it is easier to find those bugs. And since much of the duplicated code has been eliminated, most bugs only have to be fixed once. Of course, the clearer the code is, the less likely it is that defects get injected in the first place.

There is no hard-and-fast rule about when to refactor. Many programmers find that it's effective to alternate between adding new behavior and refactoring the new code that was just added. Any time a reasonably large chunk of new code has been added, the programmer should take the time to go through it and find any possible refactorings. The same goes for bug fixesoften, a bug is easier to fix after the code that it's in has been refactored.

Note: Additional information on refactoring can be found in Refactoring: Improving the Design of Existing Code by Martin Fowler (Addison Wesley, 1999) and on his web site at http://www.refactoring.com.