24.2. Introduction to Refactoring

 < Free Open Study > 

The key strategy in achieving The Cardinal Rule of Software Evolution is refactoring, which Martin Fowler defines as "a change made to the internal structure of the software to make it easier to understand and cheaper to modify without changing its observable behavior" (Fowler 1999). The word "refactoring" in modern programming grew out of Larry Constantine's original use of the word "factoring" in structured programming, which referred to decomposing a program into its constituent parts as much as possible (Yourdon and Constantine 1979).

Reasons to Refactor

Sometimes code degenerates under maintenance, and sometimes the code just wasn't very good in the first place. In either case, here are some warning signs sometimes called "smells" (Fowler 1999) that indicate where refactorings are needed:

Code is duplicated Duplicated code almost always represents a failure to fully factor the design in the first place. Duplicate code sets you up to make parallel modifications whenever you make changes in one place, you have to make parallel changes in another place. It also violates what Andrew Hunt and Dave Thomas refer to as the "DRY principle": Don't Repeat Yourself (2000). I think David Parnas says it best: "Copy and paste is a design error" (McConnell 1998b).

A routine is too long In object-oriented programming, routines longer than a screen are rarely needed and usually represent the attempt to force-fit a structured programming foot into an object-oriented shoe.

One of my clients was assigned the task of breaking up a legacy system's longest routine, which was more than 12,000 lines long. With effort, he was able to reduce the size of the largest routine to only about 4,000 lines.

One way to improve a system is to increase its modularity increase the number of well-defined, well-named routines that do one thing and do it well. When changes lead you to revisit a section of code, take the opportunity to check the modularity of the routines in that section. If a routine would be cleaner if part of it were made into a separate routine, create a separate routine.

A loop is too long or too deeply nested Loop innards tend to be good candidates for being converted into routines, which helps to better factor the code and to reduce the loop's complexity.

A class has poor cohesion If you find a class that takes ownership for a hodgepodge of unrelated responsibilities, that class should be broken up into multiple classes, each of which has responsibility for a cohesive set of responsibilities.

A class interface does not provide a consistent level of abstraction Even classes that begin life with a cohesive interface can lose their original consistency. Class interfaces tend to morph over time as a result of modifications that are made in the heat of the moment and that favor expediency to interface integrity. Eventually the class interface becomes a Frankensteinian maintenance monster that does little to improve the intellectual manageability of the program.

A parameter list has too many parameters Well-factored programs tend to have many small, well-defined routines that don't need large parameter lists. A long parameter list is a warning that the abstraction of the routine interface has not been well thought out.

Changes within a class tend to be compartmentalized Sometimes a class has two or more distinct responsibilities. When that happens you find yourself changing either one part of the class or another part of the class but few changes affect both parts of the class. That's a sign that the class should be cleaved into multiple classes along the lines of the separate responsibilities.

Changes require parallel modifications to multiple classes I saw one project that had a checklist of about 15 classes that had to be modified whenever a new kind of output was added. When you find yourself routinely making changes to the same set of classes, that suggests the code in those classes could be rearranged so that changes affect only one class. In my experience, this is a hard ideal to accomplish, but it's nonetheless a good goal.

Inheritance hierarchies have to be modified in parallel Finding yourself making a subclass of one class every time you make a subclass of another class is a special kind of parallel modification and should be addressed.

case statements have to be modified in parallel Although case statements are not inherently bad, if you find yourself making parallel modifications to similar case statements in multiple parts of the program, you should ask whether inheritance might be a better approach.

Related data items that are used together are not organized into classes If you find yourself repeatedly manipulating the same set of data items, you should ask whether those manipulations should be combined into a class of their own.

A routine uses more features of another class than of its own class This suggests that the routine should be moved into the other class and then invoked by its old class.

A primitive data type is overloaded Primitive data types can be used to represent an infinite number of real-world entities. If your program uses a primitive data type like an integer to represent a common entity such as money, consider creating a simple Money class so that the compiler can perform type checking on Money variables, so that you can add safety checks on the values assigned to money, and so on. If both Money and Temperature are integers, the compiler won't warn you about erroneous assignments like bankBalance = recordLowTemperature.

A class doesn't do very much Sometimes the result of refactoring code is that an old class doesn't have much to do. If a class doesn't seem to be carrying its weight, ask if you should assign all of that class's responsibilities to other classes and eliminate the class altogether.

A chain of routines passes tramp data Finding yourself passing data to one routine just so that routine can pass it to another routine is called "tramp data" (Page-Jones 1988). This might be OK, but ask yourself whether passing the specific data in question is consistent with the abstraction presented by each of the routine interfaces. If the abstraction for each routine is OK, passing the data is OK. If not, find some way to make each routine's interface more consistent.

A middleman object isn't doing anything If you find that most of the code in a class is just passing off calls to routines in other classes, consider whether you should eliminate the middleman and call those other classes directly.

One class is overly intimate with another Encapsulation (information hiding) is probably the strongest tool you have to make your program intellectually manageable and to minimize ripple effects of code changes. Anytime you see one class that knows more about another class than it should including derived classes knowing too much about their parents err on the side of stronger encapsulation rather than weaker.

A routine has a poor name If a routine has a poor name, change the name of the routine where it's defined, change the name in all places it's called, and then recompile. As hard as it might be to do this now, it will be even harder later, so do it as soon as you notice it's a problem.

Data members are public Public data members are, in my view, always a bad idea. They blur the line between interface and implementation, and they inherently violate encapsulation and limit future flexibility. Strongly consider hiding public data members behind access routines.

A subclass uses only a small percentage of its parents' routines Typically this indicates that that subclass has been created because a parent class happened to contain the routines it needed, not because the subclass is logically a descendent of the superclass. Consider achieving better encapsulation by switching the subclass's relationship to its superclass from an is-a relationship to a has-a relationship; convert the superclass to member data of the former subclass, and expose only the routines in the former subclass that are really needed.

Comments are used to explain difficult code Comments have an important role to play, but they should not be used as a crutch to explain bad code. The age-old wisdom is dead-on: "Don't document bad code rewrite it" (Kernighan and Plauger 1978).

Global variables are used When you revisit a section of code that uses global variables, take time to reexamine them. You might have thought of a way to avoid using global variables since the last time you visited that part of the code. Because you're less familiar with the code than when you first wrote it, you might now find the global variables sufficiently confusing that you're willing to develop a cleaner approach. You might also have a better sense of how to isolate global variables in access routines and a keener sense of the pain caused by not doing so. Bite the bullet and make the beneficial modifications. The initial coding will be far enough in the past that you can be objective about your work yet close enough that you will remember most of what you need to make the revisions correctly. The time during early revisions is the perfect time to improve the code.

Cross-Reference

For guidelines on the use of global variables, see Section 13.3, "Global Data." For an explanation of the differences between global data and class data, see "Class data mistaken for global data" in Section 5.3.


A routine uses setup code before a routine call or takedown code after a routine call Code like this should be a warning to you:

Bad C++ Example of Setup and Takedown Code for a Routine Call
 WithdrawalTransaction withdrawal;       <-- 1 withdrawal.SetCustomerId( customerId );   | withdrawal.SetBalance( balance );         | withdrawal.SetWithdrawalAmount( withdrawalAmount ); withdrawal.SetWithdrawalDate( withdrawalDate );       <-- 1 ProcessWithdrawal( withdrawal ); customerId = withdrawal.GetCustomerId();       <-- 2 balance = withdrawal.GetBalance();               | withdrawalAmount = withdrawal.GetWithdrawalAmount(); withdrawalDate = withdrawal.GetWithdrawalDate();       <-- 2 

(1)This setup code is a warning.

(2)This takedown code is another warning.

A similar warning sign is when you find yourself creating a special constructor for the WithdrawalTransaction class that takes a subset of its normal initialization data so that you can write code like this:

Bad C++ Example of Setup and Takedown Code for a Method Call
withdrawal = new WithdrawalTransaction( customerId, balance,    withdrawalAmount, withdrawalDate ); withdrawal.ProcessWithdrawal(); delete withdrawal;

Anytime you see code that sets up for a call to a routine or takes down after a call to a routine, ask whether the routine interface is presenting the right abstraction. In this case, perhaps the parameter list of ProcessWithdrawal should be modified to support code like this:

Good C++ Example of a Routine That Doesn't Require Setup or Takedown Code
ProcessWithdrawal ( customerId, balance, withdrawalAmount, withdrawalDate );

Note that the converse of this example presents a similar problem. If you find yourself usually having a WithdrawalTransaction object in hand but needing to pass several of its values to a routine like the one shown here, you should also consider refactoring the ProcessWithdrawal interface so that it requires the WithdrawalTransaction object rather than its individual fields:

C++ Example of Code That Requires Several Method Calls
ProcessWithdrawal( withdrawal.GetCustomerId(), withdrawal.GetBalance(),    withdrawal.GetWithdrawalAmount(), withdrawal.GetWithdrawalDate() );

Any of these approaches can be right, and any can be wrong it depends on whether the ProcessWithdrawal() interface's abstraction is that it expects to have four distinct pieces of data or expects to have a WithdrawalTransaction object.

A program contains code that seems like it might be needed someday Programmers are notoriously bad at guessing what functionality might be needed someday. "Designing ahead" is subject to numerous predictable problems:

  • Requirements for the "design ahead" code haven't been fully developed, which means the programmer will likely guess wrong about those future requirements. The "code ahead" work will ultimately be thrown away.

  • If the programmer's guess about the future requirement is pretty close, the programmer still will not generally anticipate all the intricacies of the future requirement. These intricacies undermine the programmer's basic design assumptions, which means the "design ahead" work will have to be thrown away.

  • Future programmers who use the "design ahead" code don't know that it was "design ahead" code, or they assume the code works better than it does. They assume that the code has been coded, tested, and reviewed to the same level as the other code. They waste a lot of time building code that uses the "design ahead" code, only to discover ultimately that the "design ahead" code doesn't actually work.

  • The additional "design ahead" code creates additional complexity, which calls for additional testing, additional defect correction, and so on. The overall effect is to slow down the project.

Experts agree that the best way to prepare for future requirements is not to write speculative code; it's to make the currently required code as clear and straightforward as possible so that future programmers will know what it does and does not do and will make their changes accordingly (Fowler 1999, Beck 2000).

cc2e.com/2443

Checklist: Reasons to Refactor

  • Code is duplicated.

  • A routine is too long.

  • A loop is too long or too deeply nested.

  • A class has poor cohesion.

  • A class interface does not provide a consistent level of abstraction.

  • A parameter list has too many parameters.

  • Changes within a class tend to be compartmentalized.

  • Changes require parallel modifications to multiple classes.

  • Inheritance hierarchies have to be modified in parallel.

  • case statements have to be modified in parallel.

  • Related data items that are used together are not organized into classes.

  • A routine uses more features of another class than of its own class.

  • A primitive data type is overloaded.

  • A class doesn't do very much.

  • A chain of routines passes tramp data.

  • A middleman object isn't doing anything.

  • One class is overly intimate with another.

  • A routine has a poor name.

  • Data members are public.

  • A subclass uses only a small percentage of its parents' routines.

  • Comments are used to explain difficult code.

  • Global variables are used.

  • A routine uses setup code before a routine call or takedown code after a routine call.

  • A program contains code that seems like it might be needed someday.


Reasons Not to Refactor

In common parlance, "refactoring" is used loosely to refer to fixing defects, adding functionality, modifying the design essentially as a synonym for making any change to the code whatsoever. This common dilution of the term's meaning is unfortunate. Change in itself is not a virtue, but purposeful change, applied with a teaspoonful of discipline, can be the key strategy that supports steady improvement in a program's quality under maintenance and prevents the all-too-familiar software-entropy death spiral.

 < Free Open Study > 


Code Complete
Code Complete: A Practical Handbook of Software Construction, Second Edition
ISBN: 0735619670
EAN: 2147483647
Year: 2003
Pages: 334

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net