Move Accumulation to Visitor

Prev don't be afraid of buying books Next

Move Accumulation to Visitor

A method accumulates information from heterogeneous classes.



Move the accumulation task to a Visitor that can visit each class to accumulate the information.





Motivation

Ralph Johnson, one of the four authors of Design Patterns [DP], once observed, "Most of the time you don't need Visitor, but when you do need Visitor, you really need Visitor!" So when do you really need Visitor? Let's review what Visitors are before answering that question.

A Visitor is a class that performs an operation on an object structure. The classes that a Visitor visits are heterogeneous, which means they hold unique information and provide a specific interface to that information. Visitors can easily interact with heterogeneous classes by means of double-dispatch. This means that each of a set of classes accepts a Visitor instance as a parameter (via an "accept" method: accept(Visitor visitor)) and then calls back on the Visitor, passing itself to its corresponding visit method, as shown in the following diagram.

Because the first argument passed to a Visitor's visit(…) method is an instance of a specific type, the Visitor can call type-specific methods on the instance without performing type-casting. This makes it possible for Visitors to visit classes in the same hierarchy or different hierarchies.

The job of many real-world Visitors is to accumulate information. The Collecting Parameter pattern is also useful in this role (see Move Accumulation to Collecting Parameter, 313). Like a Visitor, a Collecting Parameter may be passed to multiple objects to accumulate information from them. The key difference lies in the ability to easily accumulate information from heterogeneous classes. While Visitors have no trouble with this task due to double-dispatch, Collecting Parameters don't rely on double-dispatch, which limits their ability to gather diverse information from classes with diverse interfaces.

Now let's get back to the question: When do you really need a Visitor? In general, you need a Visitor when you have numerous algorithms to run on the same heterogeneous object structure and no other solution is as simple or succinct as a Visitor. For example, say you have three domain classes, none of which share a common superclass and all of which feature code for producing different XML representations.

What's wrong with this design? The main problem is that you have to add a new toXml method to each of these domain classes every time you have a new XML representation. In addition, the toXml methods bloat the domain classes with representation code, which is better kept separate from the domain logic, particularly when you have a lot of it. In the Mechanics section, I refer to the toXml methods as internal accumulation methods because they are internal to the classes used in the accumulation. Refactoring to a Visitor changes the design as shown in the following diagram.

With this new design, the domain classes may be represented using whatever Visitor is appropriate. Furthermore, the copious representation logic that once crowded the domain classes is now encapsulated in the appropriate Visitor.

Another case when a Visitor is needed is when you have numerous external accumulation methods. Such methods typically use an Iterator [DP] and resort to type-casting heterogeneous objects to access specific information:

 public String extractText()...    while (nodes.hasMoreNodes()) {       Node node = nodes.nextNode();       if (node instanceof StringNode) {           StringNode stringNode = (StringNode)node;          results.append( stringNode.getText());       } else if (node instanceof LinkTag) {           LinkTag linkTag = (LinkTag)node;          if (isPreTag)             results.append( link.getLinkText());          else             results.append( link.getLink());      } else if ...    } 

Type-casting objects to access their specific interfaces is acceptable if it's not done frequently. However, if this activity becomes frequent, it's worth considering a better design. Would a Visitor provide a better solution? Perhaps—unless your heterogeneous classes suffer from the smell Alternative Classes with Different Interfaces [F]. In that case, you could likely refactor the classes to have a common interface, thereby making it possible to accumulate information without type-casting or implementing a Visitor. On the other hand, if you can't make heterogeneous classes look homogeneous by means of a common interface and you have numerous external accumulation methods, you can likely arrive at a better solution by refactoring to a Visitor. The opening code sketch and the Example section show such a case.

Finally, there are times when you have neither an external nor an internal accumulation method, yet your design could be improved by replacing your existing code with a Visitor. On the HTML Parser project, we once accomplished an information accumulation step by writing two new subclasses as shown in the figure on the following page.



After we studied the new subclasses we'd written, we realized that one Visitor could take the place of the subclasses and the code would be simpler and more succinct. Yet we didn't jump to implementing a Visitor at that point; we felt that we needed further justification before taking on the nontrivial task of a Visitor refactoring. We found that justification when we discovered several external accumulation methods in client code to the HTML Parser. This illustrates the kind of thinking that ought to go into a decision to refactor to Visitor because such a refactoring is by no means a simple transformation.

If the set of classes your would-be Visitor must visit is growing frequently, it's generally advisable to avoid a Visitor solution because it involves writing an accept method on each new visitable class along with a corresponding visit method on the Visitor. On the other hand, it's best to not follow this rule religiously. When I considered refactoring to Visitor on the HTML Parser project, I found that the initial set of classes the Visitor would need to visit was too large and changed too frequently. After further inquiry, I determined that only a subset of the classes actually needed to be visited; the rest of the classes could be visited by using the visit method for their superclass.

Some programmers object to the Visitor pattern for one reason or another before they get to know it. For example, one programmer told me that he didn't like Visitor because it "breaks encapsulation." In other words, if a Visitor can't perform its work on a visitee because one or more of the visitee methods aren't public, the method(s) must be made public (thereby breaking encapsulation) to let the Visitor do its work. True. Yet many Visitor implementations require no visibility changes on visitees (see the upcoming Example section) and, even if a few visibility changes are required, the price you pay for compromising a visitee's encapsulation may be far lower than the price you pay to live with a non-Visitor solution.

Another objection raised against the Visitor pattern is that it adds too much complexity or obscurity to code. One programmer said, "Looking at the visit loop tells you nothing about what is being performed." The "visit loop" is code that iterates over visitees in an object structure and passes the Visitor to each one of them. While it's true that a visit loop reveals little about what concrete Visitors actually do, it's clear what the visit loop does if you understand the Visitor pattern. So the complexity or obscurity of a Visitor implementation depends a lot on an individual's or team's comfort level with the pattern. In addition, if a Visitor is really needed in a system, it will make overly complex or obscure code simpler.

The double-edged sword of the Visitor pattern is its power and sophistication. When you need a Visitor, you really need one, as Ralph says. Unfortunately, too many programmers feel the need to use Visitor for the wrong reasons, like showing off or because they're still "patterns happy." Always consider simpler solutions before refactoring to Visitor, and use this pattern most judiciously.

Benefits and Liabilities

+

Accommodates numerous algorithms for the same heterogeneous object structure.

+

Visits classes in the same or different hierarchies.

+

Calls type-specific methods on heterogeneous classes without type-casting.

Complicates a design when a common interface can make heterogeneous classes homogeneous.

A new visitable class requires a new accept method along with a new visit method on each Visitor.

May break encapsulation of visited classes.







Mechanics

An accumulation method gathers information from heterogeneous classes. An external accumulation method exists on a class that isn't one of the heterogeneous classes, while an internal accumulation method exists on the heterogeneous classes themselves. In this section you will find mechanics for both internal and external accumulation methods. In addition, I've provided a third set of mechanics for Visitor replacement, which you can use if you have neither an internal nor an external accumulation method yet can achieve a better design by rewriting your accumulation code as a Visitor.

External Accumulation Method

The class that contains your accumulation method is known in this refactoring as the host. Does it make sense for your host to play the role of Visitor? If your host is already playing too many roles, extract the accumulation method into a new host by performing Replace Method with Method Object [F] prior to this refactoring.

1. In the accumulation method, find any local variables that are referenced in multiple places by the accumulation logic. Convert these local variables to fields of the host class.

  • Compile and test.

2. Apply Extract Method [F] on the accumulation logic for a given accumulation source, a class from which information is accumulated. Adjust the extracted method so it accepts an argument of the accumulation source's type. Name the extracted method accept(…).

Repeat this step on accumulation logic for the remaining accumulation sources.

  • Compile and test.

3. Apply Extract Method [F] on the body of an accept(…) method to produce a method called visitClassName(), where ClassName is the name of the accumulation source associated with the accept(…) method. The new method will accept one argument of the accumulation source's type (e.g., visitEndTag(EndTag endTag)).

Repeat this step for every accept(…) method.

4. Apply Move Method [F] to move every accept(…) method to its corresponding accumulation source. Each accept(…) method will now accept an argument of the host's type.

  • Compile and test.

5. In the accumulation method, apply Inline Method [F] on every call to an accept(…) method.

  • Compile and test.

6. Apply Unify Interfaces (343) on the superclasses and/or interfaces of the accumulation sources so the accept(…) method may be called polymorphically.

7. Generalize the accumulation method to call the accept(…) method polymorphically for every accumulation source.

  • Compile and test.

8. Apply Extract Interface [F] on the host to produce a visitor interface, an interface that declares the visit methods implemented by the host.

9. Change the signature on every occurrence of the accept(…) method so it uses the visitor interface.

  • Compile and test.

The host is now a Visitor.

Internal Accumulation Method

Use these mechanics when your accumulation method is implemented by the heterogeneous classes from which information is gathered. These mechanics assume that the heterogeneous classes are part of a hierarchy because that is a common case. The steps for these mechanics are largely based on mechanics defined in the paper "A Refactoring Tool for Smalltalk" [Roberts, Brant, and Johnson].

1. Create a visitor by creating a new class. Consider using visitor in the class name.

  • Compile.

2. Identify a visitee, a class from which the visitor will accumulate data. Add a method to the visitor called visitClassName(…), where ClassName is the name of the visitee (e.g., visitor.visitEndTag(…)). Make the visit method's return type void, and make it take a visitee argument (e.g., public void visitStringNode(StringNode stringNode)).

Repeat this step for every class in the hierarchy from which the visitor must accumulate information.

  • Compile.

3. On every visitee, apply Extract Method [F] on the body of the accumulation method so it calls a new method, which will be called the accept method. Make the signature of the accept method identical in all classes, so every accumulation method contains the same code for calling its accept method.

  • Compile and test.

4. The accumulation method is now identical in every class. Apply Pull Up Method [F] to move it to the hierarchy's superclass.

  • Compile and test.

5. Apply Add Parameter [F] to add an argument of type visitor to every implementation of the accept method. Make the accumulation method pass in a new instance of the visitor when it calls the accept method.

  • Compile.

6. Produce a visit method on the visitor by applying Move Method [F] on a visitee's accept method. The accept method now calls a visit method that accepts an argument of type visitee.

For example, given a visitee called StringNode and a visitor called Visitor, we'd have the following code:

 class StringNode...    accept(Visitor visitor) {       visitor.visitStringNode(this);    } class Visitor {    visitStringNode(StringNode stringNode)... } 

Repeat this step for every visitee.

  • Compile and test.

Visitor Replacement

This refactoring assumes you have neither an internal nor an external accumulation method, yet your code would be better if it were replaced with a Visitor.

1. Create a concrete visitor by creating a new class. Consider using visitor in the class name.

If you're creating your second concrete visitor, apply Extract Superclass [F] on your first concrete visitor to create your abstract visitor, and change message signatures on all visitees (defined in step 2) so they accept an abstract visitor instead of your first concrete visitor. When applying Extract Superclass [F], don't pull up any data or methods that are specific to a concrete visitor and not generic to all concrete visitors.

2. Identify a visitee, a class from which the concrete visitor must accumulate data. Add a method to the concrete visitor called visitClassName, where ClassName is the name of the visitee (e.g., concreteVisitor.visitEndTag(…)). Make the visit method's return type void and make it take a visitee argument (e.g., public void visitStringNode(StringNode stringNode)).

3. Add to the same visitee (from step 2) a public accept method that takes as a parameter the concrete visitor or, if you have one, the abstract visitor. Make the body of this method call back on the concrete visitor's visit method, passing a reference to the visitee.

For example:

 class Tag...    public void accept(NodeVisitor nodeVisitor){     nodeVisitor.visitTag(this)    } 

4. Repeat steps 2 and 3 for every visitee. You now have the skeleton of your concrete visitor.

5. Implement a public method on your concrete visitor to obtain its accumulated result. Make the accumulated result be empty or null.

  • Compile.

6. In the accumulation method, define a local field for the concrete visitor and instantiate it. Next, find accumulation method code where information is accumulated from each visitee, and add code to call each visitee's accept method, passing in the concrete visitor instance. When you're done, update the accumulation method so it uses the concrete visitor's accumulated result instead of its normal result. This last part will cause your tests to break.

7. Implement the method bodies for each visit method on the concrete visitor. This step is big, and there's no single set of mechanics that will work for it because all cases vary. As you copy code from the accumulation method into each visit method, make it fit into its new home by

  • Ensuring each visit method can access essential data/logic from its visitee

  • Declaring and initializing concrete visitor fields that are accessed by two or more of the visit methods

  • Passing essential data (used in accumulation) from the accumulation method to the concrete visitor's constructor (e.g., a TagAccumulatingVisitor accumulates all Tag instances that match the string, tagNameToFind, which is a value supplied via a constructor argument)

  • Compile and test that the accumulated results returned by the accumulation method are all correct.

8. Remove as much old code from the accumulation method as possible.

  • Compile and test.

9. You should now be left with code that iterates over a collection of objects, passing the concrete visitor to the accept method for each visitee. If some of the objects being iterated over don't have an accept method (i.e., aren't visitees), define a do-nothing accept method on those classes (or on their base class), so your iteration code doesn't have to distinguish between objects when it calls the accept method.

  • Compile and test.

10. Create a local accept method by applying Extract Method [F] on the accumulation method's iteration code. This new method should take the concrete visitor as its sole argument and should iterate over a collection of objects, passing the concrete visitor to each object's accept method.

11. Move the local accept method to a place where it will more naturally fit, such as a class that other clients can easily access.

  • Compile and test.

Example

It takes a good deal of patience to find a real-world case in which refactoring to a Visitor actually makes sense. I found numerous such cases while refactoring code in an open source, streaming HTML parser (see http://sourceforge.net/projects/htmlparser). The refactoring I'll discuss here occurred on an external accumulation method. To help you understand this refactoring, I need to give a brief overview of how the parser works.

As the parser parses HTML or XML, it recognizes tags and strings. For example, consider this HTML:

 <HTML>    <BODY>       Hello, and welcome to my Web page! I work for       <A HREF="http://industriallogic.com">          <IMG src="/books/1/476/1/html/2/http://industriallogic.com/images/logo141x145.gif">       </A>    </BODY> </HTML> 

The parser recognizes the following objects when parsing this HTML:

  • Tag (for the <BODY> tag)

  • StringNode (for the String, "Hello, and welcome . . .")

  • LinkTag (for the <A HREF="">…</A> tags)

  • ImageTag (for the <IMG SRC=""> tag)

  • EndTag (for the </BODY> tag)

Users of the parser commonly accumulate information from HTML or XML documents. The TextExTRactor class provides an easy way to accumulate textual data from documents. The heart of this class is a method called extractText():

 public class TextExtractor...    public String extractText() throws ParserException {       Node node;       boolean isPreTag = false;       boolean isScriptTag = false;       StringBuffer results = new StringBuffer();       parser.flushScanners();       parser.registerScanners();       for (NodeIterator e = parser.elements(); e.hasMoreNodes();) {          node = e.nextNode();          if (node instanceof StringNode) {             if (!isScriptTag) {                StringNode stringNode = (StringNode) node;                if (isPreTag)                   results.append(stringNode.getText());                else {                   String text = Translate.decode(stringNode.getText());                   if (getReplaceNonBreakingSpace())                      text = text.replace('\a0', ' ');                   if (getCollapse())                      collapse(results, text);                   else                      results.append(text);                }             }          } else if (node instanceof LinkTag) {             LinkTag link = (LinkTag) node;             if (isPreTag)                results.append(link.getLinkText());             else                collapse(results, Translate.decode(link.getLinkText()));             if (getLinks()) {                results.append("<");                results.append(link.getLink());                results.append(">");             }          } else if (node instanceof EndTag) {             EndTag endTag = (EndTag) node;             String tagName = endTag.getTagName();             if (tagName.equalsIgnoreCase("PRE"))                isPreTag = false;             else if (tagName.equalsIgnoreCase("SCRIPT"))                isScriptTag = false;          } else if (node instanceof Tag) {             Tag tag = (Tag) node;             String tagName = tag.getTagName();             if (tagName.equalsIgnoreCase("PRE"))                isPreTag = true;             else if (tagName.equalsIgnoreCase("SCRIPT"))                isScriptTag = true;          }       }       return (results.toString());    } 

This code iterates all nodes returned by the parser, figures out each node's type (using Java's instanceof operator), and then type-casts and accumulates data from each node with some help from local variables and user-configurable Boolean flags.

In deciding whether or how to refactor this code, I consider the following questions:

  • Would a Visitor implementation provide a simpler, more succinct solution?

  • Would a Visitor implementation enable similar refactorings in other areas of the parser or in client code to the parser?

  • Is there a simpler solution than a Visitor? For example, can I accumulate data from each node by using one common method?

  • Is the existing code sufficient?

I quickly determine that I cannot accumulate data from the nodes by using one common accumulation method. For instance, the code gathers either all of a LinkTag's text or just its link (i.e., URL) by calling two different methods. I also determine that there is no easy way to avoid all of the instanceof calls and type-casts without moving to a Visitor implementation. Is it worth it? I determine that it is because other areas in the parser and client code could also be improved by using a Visitor.

Before beginning the refactoring, I must decide whether it makes sense for the TextExTRactor class to play the role of Visitor or whether to extract a class from it that will play the Visitor role. In this case, because TextExtractor performs only the single responsibility of text extraction, I decide that it will make a perfectly good Visitor. Having made my choice, I proceed with the refactoring.

1. The accumulation method, extractText(), contains three local variables referenced across multiple legs of a conditional statement. I convert these local variables into TextExtractor fields:

 public class TextExtractor...     private boolean isPreTag;     private boolean isScriptTag;     private StringBuffer results;    public String extractText()...         boolean isPreTag = false;         boolean isScriptTag = false;         StringBuffer results = new StringBuffer();       ... 

I compile and test to confirm that the changes work.

2. Now I apply Extract Method [F] on the first chunk of accumulation code for the StringNode type:

 public class TextExtractor...    public String extractText()...       ...       for (NodeIterator e = parser.elements(); e.hasMoreNodes();) {          node = e.nextNode();          if (node instanceof StringNode) {              accept(node);          } else if (...     private void accept(Node node) {        if (!isScriptTag) {           StringNode stringNode = (StringNode) node;           if (isPreTag)              results.append(stringNode.getText());           else {              String text = Translate.decode(stringNode.getText());              if (getReplaceNonBreakingSpace())                 text = text.replace('\a0', ' ');              if (getCollapse())                 collapse(results, text);              else                 results.append(text);           }        }     } 

The accept() method currently type-casts its node argument to a StringNode. I will be creating accept() methods for each of the accumulation sources, so I must customize this one to accept an argument of type StringNode:

 public class TextExtractor...    public String extractText()...       ...       for (NodeIterator e = parser.elements(); e.hasMoreNodes();) {          node = e.nextNode();          if (node instanceof StringNode) {             accept( (StringNode)node);          } else if (...    private void accept( StringNode stringNode)...       if (!isScriptTag) {            StringNode stringNode = (StringNode) node;          ... 

After compiling and testing, I repeat this step for all other accumulation sources. This yields the following code:

 public class TextExtractor...    public String extractText()...       for (NodeIterator e = parser.elements(); e.hasMoreNodes();) {          node = e.nextNode();          if (node instanceof StringNode) {              accept((StringNode)node);          } else if (node instanceof LinkTag) {              accept((LinkTag)node);          } else if (node instanceof EndTag) {              accept((EndTag)node);          } else if (node instanceof Tag) {              accept((Tag)node);          }       }       return (results.toString());    } 

3. Now I apply Extract Method [F] on the body of the accept(StringNode stringNode) method to produce a visitStringNode() method:

 public class TextExtractor...    private void accept(StringNode stringNode) {        visitStringNode(stringNode);    }     private void visitStringNode(StringNode stringNode) {        if (!isScriptTag) {           if (isPreTag)              results.append(stringNode.getText());           else {              String text = Translate.decode(stringNode.getText());              if (getReplaceNonBreakingSpace())                 text = text.replace('\a0', ' ');              if (getCollapse())                 collapse(results, text);              else                 results.append(text);           }        }     } 

After compiling and testing, I repeat this step for all of the accept() methods, yielding the following:

 public class TextExtractor...    private void accept(Tag tag) {        visitTag(tag);    }     private void visitTag(Tag tag)...    private void accept(EndTag endTag) {        visitEndTag(endTag);    }     private void visitEndTag(EndTag endTag)...    private void accept(LinkTag link) {        visitLink(link);    }     private void visitLink(LinkTag link)...    private void accept(StringNode stringNode) {        visitStringNode(stringNode);    }     private void visitStringNode(StringNode stringNode)... 

4. Next, I apply Move Method [F] to move every accept() method to the accumulation source with which it is associated. For example, the following method:

 public class TextExtractor...    private void accept(StringNode stringNode) {       visitStringNode(stringNode);    } 

is moved to StringNode:

 public class StringNode...     public void accept(TextExtractor textExtractor) {        textExtractor.visitStringNode(this);     } 

and adjusted to call StringNode like so:

 public class TextExtractor...    private void accept(StringNode stringNode) {        stringNode.accept(this);    } 

This transformation requires modifying TextExtractor so its visitStringNode(…) method is public. Once I compile and test that the new code works, I repeat this step to move the accept() methods for Tag, EndTag, and Link to those classes.

5. Now I can apply Inline Method [F] on every call to accept() within exTRactText():

 public class TextExtractor...    public String extractText()...       for (NodeIterator e = parser.elements(); e.hasMoreNodes();) {          node = e.nextNode();          if (node instanceof StringNode) {              ((StringNode)node).accept(this);          } else if (node instanceof LinkTag) {              ((LinkTag)node).accept(this);          } else if (node instanceof EndTag) {              ((EndTag)node).accept(this);          } else if (node instanceof Tag) {              ((Tag)node).accept(this);          }       }       return (results.toString());    }      private void accept(Tag tag) {         tag.accept(this);         }      private void accept(EndTag endTag) {         endTag.accept(this);      }      private void accept(LinkTag link) {         link.accept(this);      }      private void accept(StringNode stringNode) {         stringNode.accept(this);      } 

I compile and test to confirm that all is well.

6. At this point, I want extractText() to call accept() polymorphically, rather than having to type-cast node to call the appropriate accept() method for each accumulation source. To make that possible, I apply Unify Interfaces (343) on the superclass and associated interface for StringNode, LinkTag, Tag, and EndTag:

 public interface Node...     public void accept(TextExtractor textExtractor); public abstract class AbstractNode implements Node...     public void accept(TextExtractor textExtractor) {     } 

7. Now I can change extractText() to call the accept() method polymorphically:

 public class TextExtractor...    public String extractText()       ...       for (NodeIterator e = parser.elements(); e.hasMoreNodes();) {          node = e.nextNode();           node.accept(this);       } 

Compiling and testing confirms that everything is working.

8. At this point, I extract a visitor interface from TextExtractor like so:

  public interface NodeVisitor {     public abstract void visitTag(Tag tag);     public abstract void visitEndTag(EndTag endTag);     public abstract void visitLinkTag(LinkTag link);     public abstract void visitStringNode(StringNode stringNode);  } public class TextExtractor  implements NodeVisitor... 

9. The final step is to change every accept() method so it takes a NodeVisitor argument rather than a TextExTRactor:

 public interface Node...    public void accept( NodeVisitor nodeVisitor); public abstract class AbstractNode implements Node...    public void accept( NodeVisitor nodeVisitor) {    } public class StringNode extends AbstractNode...    public void accept( NodeVisitor nodeVisitor) {        nodeVisitor.visitStringNode(this);    } // etc. 

I compile and test to confirm that TextExtractor now works beautifully as a Visitor. This refactoring has paved the way for additional refactorings to Visitor in the parser, none of which I will do without first taking a nice, long break.

Amazon


Refactoring to Patterns (The Addison-Wesley Signature Series)
Refactoring to Patterns
ISBN: 0321213351
EAN: 2147483647
Year: 2003
Pages: 103

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net