17. | Bug Patterns In Java

About This Bug Pattern

Perhaps the most common bug pattern-seen by all and often caused by beginning programmers-results from copying and pasting a block of code from one section of a program to another. Sometimes, small parts of the copy are changed because of slightly different functional requirements. For example, you may have written a tokenizer for the configuration file of an application. Later on, you may find that it is almost exactly the tokenizer you want when interactively reading user input for an unrelated application. Ideally, you would create a new class hierarchy of tokenizers, factoring out the common code into a new base class, and creating new subclasses for each use of the tokenizer. But that would require refactoring the old application; at the very least, you would have to change all references to the original tokenizer into references to the new subclass. Perhaps you don't have the time to do all of that refactoring, or perhaps you no longer have authority to modify the original code base. Or perhaps you're just being lazy that day. Whatever the reason, rather than doing all of this refactoring, you decide to just copy the tokenizer into a new package and tweak the parts that need tweaking.

Inevitably, bugs in the shared code will be found and fixed in one copy but not the other, leaving you scratching your head when the symptoms of errors that you have eliminated recur.

Although programmers quickly become familiar with this pattern of bug, few take appropriate measures to minimize its occurrence. It's always tempting to take a break from thinking and simply copy code that you believe to be working, but the productivity lost from fixing bugs due to indiscriminate copy-and-paste actions quickly dwarfs the productivity gained from copying the code.

I call this the Rogue Tile pattern because the various copies of a code block can be thought of as "tiles" covering the program. As the code in the various copies diverges, the copies become "rogue tiles."

The Symptoms

The most common symptom of this pattern of bug is a program that continues to exhibit erroneous behavior after you believe you've fixed the code causing that behavior.

Cause, Cures, and Prevention

To understand how this can happen, let's consider the following class hierarchy for binary trees:

Listing 7-1: A Common Binary Tree Class Hierarchy

 abstract class Tree {} class Leaf extends Tree {   private Object _value;   public Leaf(Object value) {     _value = value;   }   public Object getValue() {     return _value;   } } class Branch extends Tree {   private Object _value;   private Tree _left;   private Tree _right;   public Branch(Object value, Tree left, Tree right) {     _value = value;     _left = left;     _right = right;   }   public Object getValue() {     return _value;   }   public Tree getLeft() {     return _left;   }   public Tree getRight() {     return _right;   } }

The first thing to notice about these classes is that both concrete classes contain a value field of type Object. If you decide later to make trees containing, say, Integers, you might forget to update one of these field declarations.

If some other part of the program were to expect these fields to be Integers, the program likely would not compile. You'll probably remember that you changed the type of the value field in one of the classes, but you might overlook the fact that you did not make the change in the other.

Of course, this simple example is one that a beginning programmer would quickly learn to avoid by factoring out the common code. In this case, the field declaration should be moved to class Tree. Both subclasses will then inherit this field, and any changes to the field declaration need only occur in one place.

In order to make that work, we'll add an accessor method to class Tree and modify all field references to _value in Leaf and Branch to invocations of the accessor instead. Otherwise, the subclasses won't have access to it.

This is a typical example of the tension between encapsulating code and keeping a single point of control for each functional aspect of a program.

Note

There is a tension between encapsulating code and keeping a single point of control for each functional aspect of a program.

Sometimes factoring out code is not so simple a task, especially when it involves factoring out not just common data, but common functionality as well. For example, suppose we factor out the _value field in our example above and change its static type to Integer. We could then write methods for adding and multiplying all the nodes in a Tree as follows:

Listing 7-2: Factoring Out Common Code from Listing 7-1.

 abstract class Tree {   private Integer _value;   public Tree(Integer value) {     _value = value;   }   public Integer getValue() {     return _value;   }   public abstract int add();   public abstract int multiply(); } class Leaf extends Tree {   public Leaf(Integer value) {     super(value);   }   public int add() {     return getValue().intValue();   }   public int multiply() {     return getValue().intValue();   } } class Branch extends Tree {   private Tree _left;   private Tree _right;   public Branch(Integer value, Tree left, Tree right) {     super(value);     _left = left;     _right = right;   }   public Tree getLeft() {     return _left;   }   public Tree getRight() {     return _right;   }   public int add() {     return getValue().intValue() + _left.add() + _right.add();   }   public int multiply() {     return getValue().intValue() + _left.add() * _right.add();   } }

You may have noticed that there's a bug in this code. It's inside the multiply() method for class Branch. It adds the first term instead of multiplying by it.

The error occurred because I created the multiply() method by copying the code from the add() method and making slight (but incomplete) alterations.

This bug is particularly insidious because calling the multiply() method will compile just fine and will never signal an error. In fact, in many cases it will return what appears to be a perfectly reasonable result.

Just as before, we can minimize bugs of this sort by factoring out the common code. In this case, we could write a single method that accumulates an operator (passed as an argument) over a Tree. We can use a design pattern (not a bug pattern!) known as the Command pattern to encapsulate this operator in an object:

Listing 7-3: Using a Design Pattern to Factor Out Common Code (Correctly)

 abstract class Operator {   public abstract int apply(int l, int r); } class Adder extends Operator {   public int apply(int l, int r) {     return l + r;   } } class Multiplier extends Operator {   public int apply(int l, int r) {     return l * r;   } }

Then we can modify the code in our Tree classes as follows:

Listing 7-4: Modifying Class Code After Common Code Is Factored Out

 abstract class Tree {   private Integer _value;   public Tree(Integer value) {     _value = value;   }   public Integer getValue() {     return _value;   }   public abstract int accumulate(Operator o);   public int add() {     return this.accumulate(new Adder());   }   public int multiply() {     return this.accumulate(new Multiplier());   } } class Leaf extends Tree {   public Leaf(Integer value) {     super(value);   }   public int accumulate(Operator o) {     return getValue().intValue();   } } class Branch extends Tree {   private Tree _left;   private Tree _right;   public Branch(Integer value, Tree left, Tree right) {     super(value);     _left = left;     _right = right;   }   public Tree getLeft() {     return _left;   }   public Tree getRight() {     return _right;   }   public int accumulate(Operator o) {     return o.apply(getValue().intValue(),                     o.apply(_left.accumulate(o),                                   _right.accumulate(o)));   } }

By factoring out the common code, we eliminated the possibility of a copy-and-paste error occurring in the method bodies of add() and multiply(). Also, notice that we no longer need separate add() and multiply() methods for each subclass of Tree.

The Command Pattern Design

As mentioned in the paragraph introducing Listing 7-3, we used a design technique known as the Command pattern to encapsulate an operator (multiply, add, etc.) in an object.

The Command pattern allows us to encapsulate an operation on data as data itself. That way, other objects can send and receive it, and apply it as needed.

The key to this pattern is to define a special interface, Command, with a single method declaration that I will call apply():

    public interface Command {      public Object apply(Object o1, Object o2);    }

An implementation of this interface defines apply() differently depending on the particular operation it represents. For example, here is an implementation that concatenates the String representations of its arguments:

    class Concatenator implements Command {      public Object apply(Object o1, Object o2) {        return o1.toString() + o2.toString();      }    }

Of course, we would need a separate Command interface for each distinct operation signature.

For more information on design patterns, see the Resources chapter.