| 
 | < Day Day Up > | 
 | 
Even if the preventative measures are followed, you will encounter instances of cut-and-paste that requires reworking. Mistakes can be made, or sometimes it is not worth it initially to rework the application to avoid the cut-and-paste. Of course, you will also work with code from other programmers who did not follow the proper guidelines. Before we look at how to apply a cure to existing code, we must decide when it is appropriate to do so.
Refactoring is a balancing act between the work expended refactoring and the work saved by the refactored code. There can be no hard-and-fast rules that always apply, but you can follow guidelines as you learn for yourself when refactoring actually improves development time. To take a better look at when to refactor in the context of cut-and-paste and code duplication, we will start with the most straightforward case and work our way up from there.
The simplest case occurs when you create the duplicated code yourself. For instance, you are adding functionality to one module that is available in another module. Sharing the code would require a reasonably large amount of reworking. Instead, you cut-and-paste the code and modify it to fit the new situation. This is an acceptable solution when avoiding the code duplication would require considerably more effort than you expect to save from cleaner and more robust code. Be sure that there is a large difference between the work performed versus the work saved, and do not trust only in the accuracy of your estimations.
What happens when you suddenly realize that there is another instance where you require similar code? Refactoring should definitely occur here; otherwise, you will find yourself forgetting about one of the cases. Generally, two copies are acceptable but three should not occur, and four is almost guaranteed to cause problems. In addition, if you do have two copies of similar code, make sure the comments indicate the location of the other code and provide a warning about the effects of modifying or copying the code without taking into account the other copy. It is essential that both copies be commented in this manner, as it is not possible to predict which copy might be modified by you or another programmer.
Now we move on to the more difficult topic of refactoring legacy code, which generally comes from a past project. We will assume that the code is currently working; otherwise, refactoring would be required regardless and you should consider scrapping the old code entirely and writing your own. At this point, you might be thinking, if it’s not broke, why fix it? There are several reasons to consider refactoring legacy code. Start by considering what to do when you discover that code you modified also exists in another location. Chances are you want to make the changes in both locations. This is a good time to go looking for more copies, particularly if the ones you did find did not comment on the duplication. This is an indicator of poor coding practices and it would not be surprising to find multiple cut-and-paste copies.
Because duplicate code can cause considerable problems and be difficult to track down, it is a good idea to consider refactoring an entire module if you expect to make major changes to that module. This also can lead to a better understanding of what the module does and how.
One of the reasons why many programmers avoid refactoring is the tedium of making the necessary modifications, which can sometimes involve sweeping changes that cover large sections of the code base. Without the necessary refactoring work, cut-and-paste code is easy to create and can have long-lasting effects during development. The obvious choice is to make refactoring easier. One question you might ask yourself is, “what tools can help reduce the time it takes to refactor the code?”
The optimal tool for refactoring must understand the language that is being refactored and provide support for common refactoring operations. An example of this, and one of the first refactoring tools available, is the refactoring browser for the Smalltalk language. The refactoring browser allows refactoring to be achieved with minimal work from the programmer by automatically taking care of aspects such as renaming and temporary variables.
Other languages generally do not offer such strong support for refactoring tools, but this will change over time. Java is perhaps the only other language to have robust refactoring tools available, among them the IDEA integrated development environment from IntelliJ. IDEA supports many standard refactoring operations in the Java language, removing the need for tedious and error-prone manual refactoring.
A couple of examples should provide a better understanding of the true advantages provided by refactoring tools in IDEA. We will start with a simple renaming example for the following class:
public class RenameField {     boolean fieldToRename;     // ...     RenameField(boolean renamedField)     {         fieldToRename = renamedField;     }     // ... } Now we want to change the name of fieldToRename to renamedField. If we do a straightforward search and replace, we obtain:
public class RenameField {     boolean renamedField;     // ...     RenameField(boolean renamedField)     {         renamedField = renamedField;     }     // ... } Notice the assignment renamedField = renamedField, which should be this.renamedField = renamedField. However, by invoking the Rename refactoring in the IDEA interface, it correctly handles the change, resulting in:
public class RenameField {     boolean renamedField;     // ...     RenameField(boolean renamedField)     {         this.renamedField = renamedField;     }     // ... } Now let us look at a slightly more complex example. We start with the following class:
class ExtractMethod {     int length;     int width;     // ...          boolean isLargerThan(int height, int volume)     {         return(volume < length * width * height);     }     // ...      } Now imagine we want to use the volume calculation for another function. Since we do not want to duplicate the code involved, we select length * width * height and apply the Extract Method refactoring option to that. This results in the following functions with the only user input required being the name of the new function:
    boolean isLargerThan(int height, int volume)     {         return(volume < volume(height));     }     private int volume(int height) {         return length * width * height;     } Notice that it correctly creates the necessary parameter and replaces the original code with the correct function call. Although not shown in this example, it can also handle temporary variables correctly. Expect other editors to follow suit as the advantages of this become clear. Contact your favorite editor developer and encourage them to include this support as soon as possible. In the meantime, it is still better to do refactoring even if it must be done by hand. Let us look at some tools that can help with manual refactoring.
Other languages do not have as complete of support for refactoring, but there are still tools available that can help. The first tool that is a necessity is a good multiple-file search utility, which is included in most modern integrated development environments (IDEs). While you might be tempted to look for a multiple-file search-and-replace utility, this is usually not a good idea as it encourages blind changes to the code that are prone to error. However, there are even better utilities for refactoring searches that more advanced IDEs possess. These allow context-sensitive searches to be done with knowledge of language constructs, allowing the differentiation of similar names based on such concepts as type and instance. The reason why these search utilities are of prime importance to refactoring lies in the fact that changes often cut across the boundaries of language constructs and therefore across files.
While human involvement will still be necessary for a long time to come, it is possible to perform some refactoring to remove code duplication using an automated process. This automatic restructuring of code would provide considerable benefits to the development process by removing the time-consuming task of cut-and-paste refactoring. Keep an eye on research in this area and encourage your favorite editor company to pursue more work on refactoring tools and automated refactoring.
A taste of what is to come can be found by looking at Guru, a hierarchy-restructuring tool for the Self programming language. This tool was developed at the University of Manchester as part of the PhD research of Ivan Moore. This is an initial example of automatic refactoring of an object-oriented language. Expect automated refactoring to eventually become a standard part of future IDEs.
When we talked about premature optimization, the importance of testing was stressed to achieve maximum optimization with minimal risk. The same treatment of testing applies when dealing with cut-and-paste. Fixing cut-and-paste code often involves major changes to the application code, but this should produce no changes in application functionality. A proper set of application functionality tests can minimize the risk that this will occur. These tests should be automated and run after each individual change, as with any other form of refactoring. Throughout this book, the importance of testing will continue to be stressed. Although this might seem repetitive, testing is an extremely important and often overlooked aspect of minimizing development risk. If you are not already using at least a minimal set of tests, you would be well advised to consider how testing can be integrated into your development process.
Just as with the automation of cut-and-paste operations, the majority of tests should be automated as well. In this case, minimizing human involvement is not so much to prevent error, although that is a concern, but to make it less bothersome to perform the tests. Without this necessary step, many programmers will often skip the testing phase of coding and just pray that the code works. This can be disastrous, especially when refactoring several similar sections of code into one parameterized section of code. This type of refactoring is common when removing code duplication. Due to small differences between the copies of the code, the parameterization process can easily introduce a small error in one or more of the new parameterized calls that could go unnoticed until much later without proper testing.
| 
 | < Day Day Up > | 
 | 
