Refactoring


I mentioned refactoring previously as the third step in the general process of TDD, and I think I should briefly explain the term a bit more. Refactoring is about making small, well-known changes step-by-step in order to improve the design of existing code. That is, you make changes to improve the maintainability of the code without changing its observed behavior. Another way to say it is to change how, not what.

The term "refactoring" has become something of a buzzword; it's much overused and also misused. For instance, instead of saying that we are going to make loads of significant and ground-breaking changes to our code base, we say that we will refactor it. That's not, however, the central meaning of the term.

We talked quite a lot about what refactoring is and why you should use it in Chapter 1. In this section about refactoring, we'll take a look at how it's used. We will start with some basic refactorings for routine cleaning. Then we'll talk briefly about how productivity could gain from new tooling support. Finally, we'll refactor to, toward, or from patterns for preventing the occurrence of maintainability issues.

Let's Clean Some Smelly Code

We did discuss some refactorings in the previous TDD example, but I think it's time to have a look at another refactoring example. Let's start with the following piece of code, a Person class:

public class Person {       public string Name = string.Empty;       public DateTime BirthDate = DateTime.MinValue;       public IList Children = new ArrayList();       public int HowOld()       {             if (BirthDate != DateTime.MinValue &&                   BirthDate < DateTime.Now)             {                   int years = DateTime.Now.Year -                         BirthDate.Year;                   if (DateTime.Now.DayOfYear <                         BirthDate.DayOfYear)                         years--;                   return years;             }             else                   return 0;        } }


Smelly Code

I have talked about smelly code many times already by now, and code smells will be the focus of our refactoring efforts for the coming pages. Therefore, I'd like to point out that Martin Fowler and Kent Beck presents a list of code smells in the Refactoring book [Fowler R].

A typical example, and also the first one, is the code smell called Duplicated Code.


Routine Cleanings

The class we just saw, Person, is a nice little class that probably solves the problem at hand. Deducing from the nature of the class, the requirement was to track names and birthdates of certain persons, who their children are and how old these people are. Again, let's assume that the requirements are fulfilled. That's a good thing, but we have started to add to our technical debt because the code is a bit messy, and that's where refactoring comes in. We'll clean up the code with a couple of examples which I have called "Routine Cleanings," each of which is based on Fowler's work [Fowler R].

Routine Cleaning Example One: Encapsulate Field

The target of the first example is the Name-field. It's currently a public string and that might be too much for some to take. Let's transform it from this:

public string Name = string.Empty;


into something like this:

//Person private string _name = string.Empty; public string Name {       get {return _name;} } public Person(string name) {       _name = name; } 


I said that the requirements were fulfilled before, didn't I? Well, they were, kind of; it was possible to set the Name and read it. However, the requirements could be fulfilled even more successfully. The read-only aspect of the Name field wasn't addressed before, but it is now.

Are we done, or could we do better? Well, I think it could be even better to use a public readonly field instead, like this:

//Person public readonly string Name; public Person(string name) {         Name = name; }


I think that was even clearer, but unfortunately, there are drawbacks as well. The one I'm thinking about is that it's not possible to set the Name via reflection, which is important for many tools, such as many Object Relational (O/R) Mappers. In this case, I need the option of setting the Name via reflection, so I decide to go for the get-based version instead.

That reminds me...does all my consumer code still run? After all, the observable behavior of the code has actually changed slightly because I require the Name to get its value via the constructor. I get compiler errors, so that's simple to sort out; but are we done?

Nope, we aren't done. Assume the following snippet:

//Some consumer public void SetNameByReflection() {       Person p = new Person();       FieldInfo f = typeof(Person).GetField("Name",             BindingFlags.Instance | BindingFlags.Public);       f.SetValue(p, "Jimmy");       ...


That snippet sets the Name field via reflection (you might wonder why, but that's not important here). This won't work any longer, but the compiler won't detect the problem. Hopefully we have written automatic tests that will detect the problem.

So the lesson learned is that refactoring requires automatic tests! As a matter of fact, refactoring and TDD are symbiotic. As I said at the start of this chapter, the TDD mantra is Red-Green-Refactor, Red-Green-Refactor....

(Well, refactoring doesn't require TDD; it requires automatic tests. On the other hand, using TDD is a great way of getting those automatic tests to be written at all.)

Routine Cleaning Example Two: Encapsulate Collection

The second problem I'd like to address is the public list of children, which looked like this:

public IList Children = new ArrayList();


This solution means that the consumer can add to and delete from the list without the parent knowing about it. (To be fair, that's not a violation to any requirements here, but I still dislike it.) It's also possible to add strings and ints instead of Person-instances. It's even the case that the consumer can swap the whole list.

A typical solution is to create a type safe Children list class, with an Add() method that only accepts Person-instances. That addresses the problem of lack of type safety. It could also address the problem of letting the parent know about the adds and deletes, but it won't address the problem of swapping the whole list. That can be solved with Encapsulate Field refactoring for the list itself.

The problem with this solution is that you have to write some dummy code that you don't really want to have for getting that type safe list. It's certainly doable, but it's not the style I currently prefer. If we target a platform with generics (such as .NET 2.0), we can avoid the lack of type safety in a nicer way.

That said, with or without generics, I prefer the style of Encapsulate Collection refactoring. First, you use Encapsulate Field refactoring so that you get the following code:

//Person private IList _children = new ArrayList(); public IList Children {       get {return _children;} }


At least the consumer can't swap the whole list any longer, but the consumer can still add whatever he wants to the list and delete items as well. Therefore, the next step is to hand out a wrapped list to the consumer like this:

get {return ArrayList.ReadOnly(_children);}


Finally, we need to give the consumer a way of adding elements, so I add an AddChild() method to the Person class like this:

//Person public void AddChild(Person child) {       _children.Add(child); }


Then we have the type safety as well, and the parent knows about when new instances are added so it can intercept that action (if it's needed, whichto be fairit actually isn't in this particular example).

Unfortunately, this has changed the visible behavior of the Person class, and that has to be weighed in as a factor as to whether you can go ahead or not. In this case, most of the needed consumer changes aren't found by the compiler. Luckily, we have some automatic tests in place, standing in sort of as a second compiler layer, detecting the problem. The test that looked like this helped out:

[Test] public void CanCalculateNumberOfKids() {       Person p = new Person("Stig");       p.Children.Add(new Person("Ulla"));       p.Children.Add(new Person("Inga"));       Assert.AreEqual(2, p.Children.Count); }


Note

To be very clear, as you saw here, the test didn't point out the real consumer code that needed to be changed. It was just some test code of the Person-class. Tests of the consumer are needed to point out necessary changes of the consumer code itself.


The p.Children.Add() lines are easily changed so that I get this instead:

[Test] public void CanCalculateNumberOfKids() {       Person p = new Person("Stig");       p.AddChild(new Person("Ulla"));       p.AddChild(new Person("Inga"));       Assert.AreEqual(2, p.Children.Count); }


It's kind of unfortunate that the IList has an Add() method even though it has been wrapped as a read-only list, but tests easily catch the problem. To solve that, I can change Children to be an ICollection instead.

Is there any more to do? We need to do something about the HowOld() method. It's not overly clear, is it?

Routine Cleaning Example Three: Extract Method

Perhaps you have another, much better solution to the problem at hand? Before thinking about a better solution, I'd like to simplify the code, making it easier to read and understand. Refactoring is in itself actually a way of trying to understand the code. That's a nice side effect, and the better we understand it, the easier we can find simplifications and better solutions.

This is what we have:

//Person public int HowOld() {       if (BirthDate != DateTime.MinValue &&                     BirthDate < DateTime.Now);       {                int years = DateTime.Now.Year - BirthDate.Year;                if (DateTime.Now.DayOfYear < BirthDate.DayOfYear)                      years--;                return years;       }       else                return 0 }


First, there are two different situations that will return 0. That's not good. I prefer to throw an exception in the incorrect case.

Next, I'd like to take out that non-intuitive logical expression by using the Extract Method refactoring to get an explaining method name instead. So we go from

   if (BirthDate != DateTime.MinValue &&          BirthDate < DateTime.Now) to    if (_CorrectBirthDate())


Almost no matter what the logical expression was, its intention in the context of the HowOld() method is now much clearer.

Another style now comes to mind, that of Guard clauses, which means that I should take away the unusual, almost never happens situation first. In this way, I am letting go of the else clause, and the code now reads like this:

if (!_CorrectBirthDate())      throw new ArgumentException("Incorrect birthdate!");


Hmmm... Perhaps it's even better to change it to

if (_IncorrectBirthDate())       throw new ArgumentException("Incorrect birthdate!");


I think this is slightly clearer and more fluent.

What's left is the calculation itself when we have the unusual situation taken care of. It looks like this:

int years = DateTime.Now.Year - BirthDate.Year; if (DateTime.Now.DayOfYear < BirthDate.DayOfYear)       years--; return years;


Yet again I use the Extract Method refactoring, moving the calculation to another private method called _NumberOfYears(). The complete HowOld() method then looks like this:

//Person public int HowOld() {       if (_IncorrectBirthDate())             throw new ArgumentException("Incorrect birthdate!");       return _NumberOfYears(); }


Sure, we could find more to do (for example in the newly created private method _NumberOfYears() or factoring out some logic from the class itself), but you get the message by now.

It's important to stress that we should re-execute the tests after each step to confirm that we didn't introduce any errors, according to the tests at least.

What we did, though, was time-consuming and error prone. If you use lots of discipline, it is possible to deal with the error proneness, but it's still time-consuming. Having the concepts in place, but finding out that they are time-consuming to apply is a perfect situation for using the support of a tool, if you ask me.

Cleaning Tool

There are several different refactoring tools around. If we focus on .NET, we have, for example, ReSharper [ReSharper] and Refactor! [Refactor!], which are add-ins to Visual Studio .NET. In the 2005 edition of Visual Studio .NET, there is also built-in refactoring support in the IDE.

Those refactoring tools make the changes similar to what we just made in order to help us become more productive. The value of this should certainly not be underestimated. As a matter of fact, when you get used to a refactoring tool, there's no way you will want to give it up!

The tools not only help you make the changes you want, they can also tell you about when and what changes need to be made. In this way, the tools free you to some small degree from some of the discipline of having to think about refactoring by providing a visual reminder.

Note

Do you remember that I said that a failing test is red and a test that executes successfully is green, but that a need for refactoring doesn't have a color? As a matter of fact, the refactoring tool ReSharper uses orange to tell you that refactoring is needed (according to ReSharper, that is). So before starting a new iteration, you should also see to it that you change the orange into green by cleaning up the code before moving on. Of course, this won't make it possible for you to stop thinking about what code is smelly yourself; it's just a help.


When you see a developer in the flow with a good refactoring tool in his hands, the speed at which he moves is just amazing, and without creating bad code. It's rather the opposite regarding the code quality!

So far we have mostly dealt with simple stuff. There's nothing wrong with that, of course, but from time to time you just aren't happy with the code even after cleaning up the small things. It might also be that you have a situation where there is a steady stream of new problems that needs to be cleaned up.

Prevent Growing Problems

Perhaps you think that using patterns is the natural solution to that problem. The problem with patterns is that they are traditionally used for up-front design, which often means too much guesswork (whereas refactoring, as we have discussed, is used later on during development). Should you choose patterns or refactoring?

Before discussing that any further, I think a short example of another pattern is in order. I'd like to discuss the Template Method pattern [GoF Design Patterns]. The idea is that you define the overall algorithm, or template method, in a base class like this:

public abstract class TotalBase {       public void TotalAlgorithm()       {             _DoSomeStuff();             VariationPoint();             _DoSomeMoreStuff();       }       protected abstract void VariationPoint(); }


Now the subclasses can inherit from TotalBase and provide a custom implementation of the VariationPoint() method. It could look like this:

public class TotalSub : TotalBase {       protected override void VariationPoint()       {             Console.WriteLine("Hello world!");       } }


The consumer code isn't aware of how the algorithm is created. It doesn't even know about the base class, only the subclass. The consumer code could look like this:

//Some consumer TotalSub t = new TotalSub(); t.TotalAlgorithm();


Note that you decide in the base class if the hook should be optional or mandatory. In the previous example, I used a mandatory one because I used abstract for the VariationPoint() method. With virtual, it would have been optional instead, such as when you want to make it optional for subclasses to define a piece of the total template method.

Let's say we decide up front to use Template Method, which then gives us the problem of deciding where we should make variations possible or forced. That's a very tough decision.

Nevertheless, patterns are extremely useful, and we don't have to choose; we can use both patterns and refactoring. As a matter of fact, I think that's the way to go. Refactoring toward, to, or from patterns! This is what Kerievsky talks about in his book Refactoring to Patterns [Kerievsky R2P].

Let's see this in action with a fourth example, which will connect to the pattern we just talked about.

Prevent Growing Problems Example One: Form Template Method

Let's assume that we have written a batch monitor in which we can hook in different programs. For a program to be a batch program, it has to adjust to some rules, and those are basically to tell the batch monitor (by some custom logging routines) when the program starts and finishes.

After a while, we will probably have written a couple of batch programs, and we will also have decided to reuse a base class for each batch program so that we don't have to write the log code over and over again. An example of the Execute() method in a subclass might look like this:

//A subclass to the batchprogram baseclass public void Execute() {       base.LogStart();       _DoThis();       _DoThat();       base.LogEnd(); }


What is smelly here? As you see, the subclass is responsible for calling up to the base class at certain points in the execution of the Execute() method. I think that's too loose a contract for the subclass; the developer of the subclass has to remember to add those calls and to do it at the right places. Another thing that smells here (which is more obvious in a more realistic example where there are many log spots, such as after introduction, after first phase, after second phase and so on) is that within the core method, there's a lot of infrastructure code, that is, the log calls. Because of that, the calls to the interesting methods, such as _DoThis() and _DoThat(), aren't as obvious as they should be.

As I see it, this code snippet is crying out for Template Method. So let's use the refactoring called Form Template Method [Kerievsky R2P].

The basic idea is to not have a public Execute() in the subclasses any longer, but to move that responsibility to the superclass. First let's move that Execute() method to the superclass. It could then look like this:

//The batchprogram baseclass public void Execute() {        _LogStart();        DoTheWork();        _LogEnd(); }


The _LogStart() and _LogEnd() methods are now private instead of protected. The DoTheWork() method is defined in the following way in order to force the subclasses to implement it:

protected abstract void DoTheWork();


Finally, the DoTheWork() implementation in the subclass could look like this:

//A subclass to a batchprogram baseclass protected override void DoTheWork() {          _DoThis();          _DoThat(); }


This is pretty much the other way around. Instead of the subclass calling up to the superclass, now the superclass calls down to the subclass. If the variation points are good, then this is a very powerful solution, yet clean and simple! The subclass can totally forget about the infrastructure (such as logging) and focus on the interesting and specific stuff, which is to implement DoTheWork(). (Again, remember that this was a scaled-down version. In the real-world case that this was based on, there were several methods that the subclasses could implement, not just one.)

Another problem where it is hard to know whether to use the solution up front or not is the State pattern [GoF Design Patterns]. It's often a matter of over-design if you apply it from the beginning. Instead, it's often better to wait until you know that it's a good idea because you have the problem. But I talked quite a lot about the State pattern in Chapter 2, "A Head Start on Patterns," so I'm sure you have a good feeling for how to approach it.

So we have discussed refactoring to patterns a bit now. I mentioned refactoring from patterns as well, so let's take one such example.

Prevent Growing Problems Example Two: Inline Singleton

More and more often the Singleton pattern [GoF Design Patterns] is considered a worst practice. The reason for that is you create global instances, which are often a problem in themselves. Another problem is that singletons often make the tests harder to write (remember, testability is crucial), and the interaction with the singletons isn't very clear in the code.

Let's assume you have a code snippet that goes like this:

public void DoSomething() {     DoThis();     DoThat();     MySingleton.Instance().Execute(); }


The refactoring called Inline Singleton [Kerievsky R2P] doesn't go "exactly" like this, but this is a variant.

Instead of letting the DoSomething() method call the global variable (the singleton itself), the DoSomething() method could ask for the instance as a parameter.

Another typical solution is for DoSomething() to expect to find the instance as a private member that has been injected at construction time or via a setter before the call to DoSomething() is made. Then DoSomething() goes like this:

public void DoSomething() {     DoThis();     DoThat();     _nowMyPrivateMember.Execute(); }


So the singleton is gone, and testability is much better!

Note

To be fair, singletons can be used without affecting testability in the previous code if the singleton is injected by the consumer. Then you can use a test object during testing, and the real singleton during runtime.


Problems? Well, the most apparent problem is that if you need the value way down in the call stack, the parameter is sent a long way before it's used. On the other hand, it's very clear where the value is used, and this might be a sign of smelly code in itself that might need some refactoring. So singletons might actually hide smelly code.

It's also the case that Dependency Injection (which will be discussed in Chapter 10) will gracefully reduce the problem of too many singletons.




Applying Domain-Driven Design and Patterns(c) With Examples in C# and  .NET
Applying Domain-Driven Design and Patterns: With Examples in C# and .NET
ISBN: 0321268202
EAN: 2147483647
Year: 2006
Pages: 179
Authors: Jimmy Nilsson

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net