The Sieve | Test-Driven Development in Microsoft .NET (Microsoft Professional)

The code we will refactor implements an algorithm to generate small prime numbers (say up to 10,000,000). The algorithm is called the Sieve of Eratosthenes . Make a list of all the integers less than or equal to n (and greater than one). Strike out the multiples of all primes less than or equal to the square root of n; the numbers that are left are the primes ( http://primes.utm.edu/glossary/page.php? sort =SieveOfEratosthenes ).

The existing implementation is shown here:

 using System; using System.Collections; public class Primes {     public static ArrayList Generate(int maxValue)     {         ArrayList result = new ArrayList();         int[] primes = GenerateArray(maxValue);         for(int i = 0; i < primes.Length; ++i)             result.Add(primes[i]);         return result;     }     [Obsolete("This method is obsolete, use Generate instead")]      public static int[] GenerateArray(int maxValue)     {         if(maxValue >= 2)         {             // declarations             int s = maxValue + 1; // size of array             bool[] f = new bool[s];             int i;             // initialize the array to true             for(i=0; i<s; i++)                 f[i] = true;             // get rid of known nonprimes             f[0] = f[1] = false;             // sieve             int j;             for(i=2; i<Math.Sqrt(s)+1; i++)             {                 for(j=2*i; j<s; j+=i)                     f[j] = false; // multiple is not prime             }             // how many primes are there?             int count = 0;             for(i=0; i<s; i++)                 if(f[i]) // if prime                     count++; // bump count             int[] primes = new int[count];             // move the primes into the result             for(i=0, j=0; i<s; i++)             {                 if(f[i]) // if prime                     primes[j++] = i;             }             return primes;         } // maxValue >= 2         else             return new int[0]; // return null array     } }

As you can see from the code, there are two methods defined to generate prime numbers. The first method, Generate , returns the prime numbers in an ArrayList . The second method, GenerateArray , was written to return an array of integers. The GenerateArray method is also marked with the Obsolete attribute, which is usually an indicator that the code will be removed when possible. It turns out that today is the day we will remove this function because the GenerateArray method is no longer called by the application code but it is still called by the Generate method. It looks like we won t be able to just delete it. Luckily, the code has a set of tests written using NUnit for it:

 using System; using System.Collections; using NUnit.Framework; [TestFixture] public class PrimesFixture {     private int[] knownPrimes = new int[]     { 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 };     [Test]     public void Zero()     {         int[] primes = Primes.GenerateArray(0);         Assert.AreEqual(0, primes.Length);     }     [Test]     public void ListZero()     {         ArrayList primes = Primes.Generate(0);         Assert.AreEqual(0, primes.Count);     }     [Test]     public void Single()     {         int[] primes = Primes.GenerateArray(2);         Assert.AreEqual(1, primes.Length);         Assert.AreEqual(2, primes[0]);     }     [Test]     public void ListSingle()     {         ArrayList primes = Primes.Generate(2);         Assert.AreEqual(1, primes.Count);         Assert.IsTrue(primes.Contains(2));     }     [Test]     public void Prime()     {         int[] centArray = Primes.GenerateArray(100);         Assert.AreEqual(25, centArray.Length);         Assert.AreEqual(97, centArray[24]);     }     [Test]     public void ListPrime()     {         ArrayList centList = Primes.Generate(100);         Assert.AreEqual(25, centList.Count);         Assert.AreEqual(97, centList[24]);     }     [Test]     public void Basic()     {         int[] primes =              Primes.GenerateArray(knownPrimes[knownPrimes.Length-1]);         Assert.AreEqual(knownPrimes.Length, primes.Length);                  int i = 0;         foreach(int prime in primes)             Assert.AreEqual(knownPrimes[i++], prime);     }     [Test]     public void ListBasic()     {         ArrayList primes =              Primes.Generate(knownPrimes[knownPrimes.Length-1]);         Assert.AreEqual(knownPrimes.Length, primes.Count);         int i = 0;         foreach(int prime in primes)             Assert.AreEqual(knownPrimes[i++], prime);     }     [Test]     public void Lots()     {         int bound = 10101;         int[] primes = Primes.GenerateArray(bound);         foreach(int prime in primes)             Assert.IsTrue(IsPrime(prime), "is prime");         foreach(int prime in primes)         {             if(IsPrime(prime))                 Assert.IsTrue(Contains(prime, primes),                      "contains primes");             else                 Assert.IsFalse(Contains(prime, primes),                      "doesnt contain composites");         }     }     [Test]     public void ListLots()     {         int bound = 10101;         ArrayList primes = Primes.Generate(bound);         foreach(int prime in primes)             Assert.IsTrue(IsPrime(prime), "is prime");         foreach(int prime in primes)         {             if(IsPrime(prime))                 Assert.IsTrue(primes.Contains(prime),                      "contains primes");             else                 Assert.IsFalse(primes.Contains(prime),                      "doesnt contain composites");         }     }     private static bool IsPrime(int n)     {         if(n < 2) return false;          bool result = true;         double x = Math.Sqrt(n);         int i = 2;         while(result && i <= x)         {             result = (0 != n % i);             i += 1;         }         return result;     }     private static bool Contains(int value, int[] primes)     {              return (Array.IndexOf(primes, value) != -1);     } }

Before Refactoring the Code: Make Sure It All Works

It is important to remember that refactoring has to be done in conjunction with running tests for the code being refactored. After all, refactoring is not supposed to change the externally observable functionality of the code being refactored. The tests are the tools needed to verify such functionality. So the first step of the refactoring process is to run the tests before you make any code changes.

Let s run the tests. All of them pass, so we can begin from a known good state.

Refactoring Cycle

The cycle we will follow is straightforward: Identify a problem, select a refactoring to address the problem, apply the refactoring by making the appropriate code change; compile and run the tests; repeat. The emphasis is on the code changes being very small ”and running the tests. Why small changes? We transition the system from a known good state to the next desirable state.

Think of it as climbing a wall. If the wall is high, you might break your neck attempting to climb it, but you could use a ladder to assist you. With a ladder in place if you feel tired , you can just stop and rest. The tests are your ladder ”they are both your safety net and a climbing tool. So, before you start climbing, what should you do? Do yourself a favor: Make sure that your ladder is not broken. This brings us to the following rule for refactoring:

Important

As you refactor your code, make sure that the tests are up-to-date. If you need to change them to reflect the changing requirements, do it first.

In short, maintain your ladder. Let s take a look at the tests.

Refactoring 0: Remove Unneeded Code

There are five test methods for the array -based version and five test methods for the ArrayList version. Because the GenerateArray method is being removed, it appears that we can remove the tests for that method. We can do this safely because we are not losing any test coverage by removing the array -based tests. The ArrayList -based tests are exact duplicates in terms of what is being tested .

After the array -based tests are removed, the following tests remain :

ListZero
ListSingle
ListPrime
ListBasic
ListLots

We can also get rid of the utility method Contains because it was used only by the array -based tests. After we finish removing the code, we compile and run the tests. The test method count drops to five and we have a green bar, so it is time to move on.

Refactoring 1: Rename Method

The next refactoring is still in the test code. After we remove the array -based tests, there is no need to preface each method with the word List . We need to implement the Rename method of refactoring. The reasoning is that you should call an apple an apple; no need to call it a green apple unless the greenness of the apple is of the essence. Meaningful method names are important for code readability and in turn its overall maintainability. In short, method names should convey their intentions.

Here is the test code after each method has been renamed ; the contents of the methods have not changed, so they are not shown here:

 using System; using System.Collections; using NUnit.Framework; [TestFixture] public class PrimesFixture {     private int[] knownPrimes = new int[]     { 2, 3, 5, 7, 11, 13, 17, 19, 23, 29 };     [Test]     public void Zero()     {       //      }     [Test]     public void Single()     {       //      }     [Test]     public void Prime()     {       //      }     [Test]     public void Basic()     {       //      }     [Test]     public void Lots()     {       //      }     private static bool IsPrime(int n)     {       //      } }

In this case, the renaming of the methods is straightforward. There should be no code calling the test methods. The more general case is a bit more complicated because you might have to change the callers of the method being renamed. Modern development environments make it easier to accomplish this task and pretty much take care of the process of finding and replacing the method names. If you are using a simple text editor, you might let your compiler tell you which classes you need to fix (which is crude, but it works). As always, after we make the changes, we compile and run the tests. The tests passed, so it s time to continue.

Refactoring 2: Add a Test

The Single test method verifies that 2 is a prime number; the Zero test method verifies that 0 is not a prime number. What about the number 1? We should also have a test that ensures that 1 is not a prime number.

We add a new test named ZeroOne and rename the Single method to be ZeroTwo to reflect the range of values being tested:

 [Test]     public void ZeroOne()     {         ArrayList primes = Primes.Generate(1);         Assert.AreEqual(0, primes.Count);     }     [Test]     public void ZeroTwo()     {         ArrayList primes = Primes.Generate(2);         Assert.AreEqual(1, primes.Count);         Assert.IsTrue(primes.Contains(2));     }

Now we have three methods that test the special cases of 0, 1, and 2. They look very similar, but it is not apparent how to factor out any commonality . When we compile and run all the tests, they succeed. We now have six tests.

When Are We Finished?

At every step of the way, an important question to ask is Am I finished? By the very nature of moving from a known good state to the next state, it is possible to stop at any time. What is there left to do? Why didn t we stop after removing the array -based tests? Because we immediately saw what we could not possibly see before: the method names could be improved. Each simple refactoring we implement opens up opportunities for further refactorings to make the code communicate its intentions more clearly.

What makes the process interesting is that it is a process of discovery. We probably don t know what refactoring we ll implement next. We also don t create a grand plan of 1001 refactorings that are needed to make this code better. We let the code itself drive the process. The code tells us which refactoring is needed at the appropriate time, and it evolves gradually into the shape it wants to take over time. The answer for now is that we are not done. We have not removed the array -based implementation. However, we are done with refactoring the test code.

Refactoring 3: Hide Method

Let s look at the code that generates the prime numbers. The GenerateArray method is used internally by the Generate method. There is no need to keep it public any more. We ll implement the Hide method refactoring, which is quite simple. In C#, it is accomplished by changing the visibility of the method from public to private . The following code

 public static int[] GenerateArray(int maxValue)

now becomes

 private static int[] GenerateArray(int maxValue)

Why did we do this refactoring? The less the code promises, the easier it is to deliver. The GenerateArray method now becomes an implementation detail. The code compiles and the tests pass, so let s move on to the next step.

Refactoring 4: Replace Nested Conditional with Guard Clauses

Large monitors with high resolutions allow you to see many more lines of code onscreen than you could a few years ago. But you won t make many friends if you continue to write (or tolerate ) code like the GenerateArray method.

What is the biggest problem with the GenerateArray method? Let s distill it down to the essence:

 if(maxValue >= 2)       {             pages and pages of code that wont fit on your screen             return primes;       } // maxValue >= 2       else             return new int[0]; // return null array

The problem is that when you finally get to the else statement, the if statement has probably scrolled off the screen, so you do not have the context in which the statement is being executed. One way to correct this problem is to use the Replace nested conditional with a guard clause refactoring. Employing a guard clause at the beginning of the method dispenses with the bad input and focuses the method on processing the good input. Changing the method to use a guard clause looks like this:

 if(maxValue < 2) return new int[0];     the rest of the code here.

Those of you who subscribe to one of the major tenets of structured programming (single entry point/single exit point) are probably jumping out of your chair . The reason this other approach is all right in this situation is because the guard clause identifies a rare situation that can be handled immediately. This frees up the rest of the code to handle the typical calling scenario without having to worry about the rare or invalid situations. In short, with the guard clause in place, the code is easier to read. After we insert the guard clause, the code compiles and the tests pass.

Refactoring 5: Inline Method

Now that the only code that calls GenerateArray is the Generate method, we can use the Inline method refactoring to put the method s body into the body of its caller and completely remove the method. This is not a license to create huge methods. If we intended to stop refactoring after inlining this method, we would argue to not inline the method.

The point that needs to be stressed is communication. If it makes sense to inline a method because it communicates the intent better than it did previously, you should do it. It also decreases the surface area of the code, which should improve its testability if you don t have huge methods. Because the Generate method returns an ArrayList and the GenerateArray method returns an array , we will need to slightly alter the guard clause introduced in the previous step to return an empty ArrayList instead of an empty array . Here is the Generate method after inlining the GenerateArray method (the modified guard clause is in boldface):

 public static ArrayList Generate(int maxValue)     {         ArrayList result = new ArrayList();  if(maxValue  <  2) return result;  // declarations         int s = maxValue + 1; // size of array         bool[] f = new bool[s];         int i;                  // initialize the array to true         for(i=0; i<s; i++)             f[i] = true;                  // get rid of known nonprimes         f[0] = f[1] = false;                  // sieve         int j;         for(i=2; i<Math.Sqrt(s)+1; i++)         {             for(j=2*i; j<s; j+=i)                 f[j] = false; // multiple is not prime         }                  // how many primes are there?         int count = 0;         for(i=0; i<s; i++)             if(f[i]) // if prime                 count++; // bump count                  int[] primes = new int[count];                  // move the primes into the result         for(i=0, j=0; i<s; i++)         {             if(f[i]) // if prime                 primes[j++] = i;         }                  for(i = 0; i < primes.Length; ++i)             result.Add(primes[i]);         return result;     }

This refactoring often requires more effort due to local variable name clashes . When performing this refactoring, you will find it useful to comment out the method that is being inlined instead of deleting it. After the code compiles and the tests pass, you can safely delete the commented-out code, which is useful to go back to in case your tests do not pass. You could also use your source-code control system to achieve the same benefit.

The code compiles, and the tests pass. The GenerateArray function has now been removed (or to be more exact, consumed, by the Generate method). Remember, this was the objective of the task. We could stop right now and be finished. However, we are still left with the legacy of the array -based implementation, which is filled with bad variable names and loops that iterate over the list of numbers many times. We need to do some more work to get this code in better shape.

Refactoring 6: Rename Variable

Looking at the code in the Generate method, we see several variables whose names do not communicate much about their intended uses, so we should give them more descriptive names. For example, what does the variable f mean? Does f indicate that the number is prime or not prime? Let s take a look at the following code snippet to demonstrate the point:

 if(f[i]) // if prime

Instead of having comments in the code describing what the variable f means, it is better to give the variable a more descriptive name. In almost all cases in the existing program, every time the variable f is used there is an associated comment. Let s remove the need for the comment by providing a more descriptive variable name. The name isPrime describes what the variable means in the code more clearly. After the name is changed, we can remove the comment because the variable name is descriptive enough:

 public static ArrayList Generate(int maxValue)     {         ArrayList result = new ArrayList();         if(maxValue < 2) return result;                  // declarations         int s = maxValue + 1; // size of array  bool[] isPrime = new bool[s];  int i;                  for(i=0; i<s; i++)  isPrime[i] = true;   isPrime[0] = isPrime[1] = false;  // sieve         int j;         for(i=2; i<Math.Sqrt(s)+1; i++)         {             for(j=2*i; j<s; j+=i)  isPrime[j] = false;  // multiple is not prime         }                  // how many primes are there?         int count = 0;         for(i=0; i<s; i++)  if(isPrime[i])  count++; // bump count                  int[] primes = new int[count];                  // move the primes into the result         for(i=0, j=0; i<s; i++)         {  if(isPrime[i])  primes[j++] = i;         }                  for(i = 0; i < primes.Length; ++i)             result.Add(primes[i]);         return result;     }

The changes are made, the code compiles, and the tests pass. It does not look as if we are finished, however. The code still has a lot of the remnants of the array -based implementation and it still has many loops that seem as if they all iterate over the same elements.

Refactoring 7: Collapse Loops

Looking at the last few lines of the Generate method, you can see two loops doing almost entirely the same thing. Here is the existing code:

 int[] primes = new int[count];                  // move the primes into the result         for(i=0, j=0; i<s; i++)         {             if(isPrime[i])                 primes[j++] = i;         }                  for(i = 0; i < primes.Length; ++i)             result.Add(primes[i]);

The first loop cycles through the isPrime array to create a new array named primes . The second loop cycles through the primes array to build the list. This is a remnant of the array -based implementation returning an array and the ArrayList function converting it into an ArrayList . Because we no longer return an array, we can do this without creating the primes array, as follows :

 for(i = 0; i < s; ++i)         {             if(isPrime[i])                 result.Add(i);         }

After this change is made, the code compiles and the test passes .

Refactoring 8: Remove Dead Code

The array -based legacy is almost gone. Because we no longer create the primes array, we no longer need the count variable because it was just used to size the primes array . Therefore, we can get rid of the count variable and the loop that calculates it. Let s move on.

Refactoring 9: Collapse Loops (Again)

Are we done? We could be, but it appears as if a few more changes could make the code a lot clearer, so let s continue for awhile longer.

Look at this loop:

 for(i=2; i<Math.Sqrt(s)+1; i++) {    for(j=2*i; j<s; j+=i)       isPrime[j] = false; // multiple is not prime }

Can we make it better? The algorithm states that you have to remove multiples only if the number is a prime number, so the code is not as efficient as it could be. Try this:

 for(i=2; i<Math.Sqrt(s)+1; i++) {    if(isPrime[i])    {       for(j=2*i; j<s; j+=i)          isPrime[j] = false; // multiple is not prime    } }

We make the change, compile, and run the tests. They pass, so adding this did not have an impact on the functionality, and the code is closer to the intent of the algorithm.

We Can Do Some More

Is the code faster? Probably, but because we do not have a performance test, we do not know the answer to that. However, after we make this change, the two loops at the bottom of the program look very similar; they have the same if statement in them. Perhaps we can collapse the two loops together.

Here s the existing code:

 int j;         for(i = 2; i < Math.Sqrt(s)+1; i++)         {             if(isPrime[i])             {                 for(j=2*i; j<s; j+=i)                     isPrime[j] = false; // multiple is not prime             }         }                                  for(i = 0; i < s; ++i)         {             if(isPrime[i])                 result.Add(i);         }

The boundaries of the loops are different. The first loop iterates over the isPrime array, beginning at 2 and continuing to Math.Sqrt(s) + 1. The second loop iterates over the isPrime array, starting at 0 and continuing all the way to s .

Enough about symbols. Let s look at real numbers. If s were equal to 100, the first loop would execute 10 times, and the second loop would execute 100 times. It looks as if it would be simple to have the second loop start at 2 instead of 0. Let s make that change. All the tests pass, so it works and the lower boundary conditions are now the same.

Now what about the upper boundary? It looks as if we could change the first loop to continue all the way to s . This is clearly less efficient, but (as stated previously) it is hard to say whether that is a problem because the code does not have a performance test. Let s change the code to the following and see whether it works:

 int j;         for(i = 2; i < s; i++)         {             if(isPrime[i])             {                 for(j=2*i; j<s; j+=i)                     isPrime[j] = false; // multiple is not prime             }         }                                  for(i = 2; i < s; i++)         {             if(isPrime[i])                 result.Add(i);         }

All the tests pass, and the loops have identical boundary conditions. It is clearer now, after looking at the code and knowing that the tests run successfully, that we can safely collapse the loops into a single loop.

 int j;         for(i = 2; i < s; i++)         {             if(isPrime[i])             {                 result.Add(i);                 for(j=2*i; j<s; j+=i)                     isPrime[j] = false; // multiple is not prime             }         }

That works ”the tests passed. It is difficult to say that the code is less efficient because we did get rid of the second loop. And we removed a couple of other loops that were used in the array -based implementation, so it is possible that what we have now is more efficient than it used to be. We leave it up to you to verify whether the code performs worse now than it did before we started.

Refactoring 10: Reduce Local Variable Scope

Because of all the previous refactorings, the variable j is now used in only one loop. We can now change its scope by moving its declaration into the loop where it is used:

 for(i = 2; i < s; i++)         {             if(isPrime[i])             {                 result.Add(i);                 for(int j = 2 * i; j < s; j += i)                     isPrime[j] = false; // multiple is not prime             }         }

That works just fine, and the local variable j s scope is diminished.

Refactoring 11: Replace Temp with Query

The next step is to replace the temporary variable s because it does not communicate what it actually means:

 int s = maxValue + 1; // size of array

Instead of a temporary variable, we can replace the variable entirely by using the expression isPrime.Length, which communicates what we really mean and is already provided by the array implementation. The changes are in boldface as follows:

 public static ArrayList Generate(int maxValue)     {         ArrayList result = new ArrayList();         if(maxValue < 2) return result;                  bool[] isPrime = new bool[  maxValue+1  ];         int i;                  for(i = 0; i <  isPrime.Length;  i++)             isPrime[i] = true;                  isPrime[0] = isPrime[1] = false;                  // sieve         for(i = 2; i <  isPrime.Length;  i++)         {             if(isPrime[i])             {                 result.Add(i);                 for(int j = 2 * i; j <  isPrime.Length;  j += i)                     isPrime[j] = false; // multiple is not prime             }         }         return result;     }

Refactoring 12: Remove Dead Code

There still is some code that is not used any more due to the collapse of loops done a few refactorings ago. Because the loop that does the sieve process starts at 2 and we load the list from within that loop, we no longer need to initialize 0 and 1 to false because they are never accessed. We can safely remove the following line:

 isPrime[0] = isPrime[1] = false;

The tests pass when we compile and run them, so it was probably safe to assume that we could remove the line.

Refactoring 13: Extract Method

Even though the code has come a long way, there is still room for improvement, especially for making the code much more explicit about what it is doing. For example, look at the boldface code in the following snippet:

 for(i = 2; i < isPrime.Length; i++) {    if(isPrime[i])    {       result.Add(i);  for(int j = 2 * i; j  <  isPrime.Length; j += i)   isPrime[j] = false; // multiple is not prime  } }

What does the highlighted loop do? It is clear what the loop does; there is a code comment explaining what it does. The comment is a good indicator that the code does not communicate its intent directly. It needs the comment to say what it does.

Note	When you see a block of code with a comment attached to it, it is often a good idea to extract that code into a method and make sure that the method s name conveys the meaning specified by the comment.

Let s extract the boldface code into its own method named RemoveMultiples :

 private static void RemoveMultiples(int prime, bool[] isPrime)     {         for(int j = 2 * prime; j < isPrime.Length; j += prime)             isPrime[j] = false;     }

After the method is extracted, we need to modify the code to use it. Here is the modified code:

 for(i = 2; i < isPrime.Length; i++)         {             if(isPrime[i])             {                 result.Add(i);                 RemoveMultiples(i, isPrime);             }           }

Instead of needing the comment, the method name communicates exactly what it is doing.

Refactoring 14: Extract Method (Again)

The code is getting smaller and smaller with more explicitly named methods and variables; in fact, we can now see that there are two stages in the algorithm: initialization and elimination. Let s extract the elimination portion into a method called Sieve using the Extract method refactoring (the changes are boldface):

 public static ArrayList Generate(int maxValue)     {         ArrayList result = new ArrayList();         if(maxValue < 2) return result;                  bool[] isPrime = new bool[maxValue+1];         int i;                  for(i = 0; i < isPrime.Length; i++)             isPrime[i] = true;  Sieve(isPrime, result);  return result;     }  private static void Sieve(bool[] isPrime, ArrayList result)   {   for(int i = 2; i  <  isPrime.Length; i++)   {   if(isPrime[i])   {   result.Add(i);   RemoveMultiples(i, isPrime);   }   }   }  private static void RemoveMultiples(int prime, bool[] isPrime)     {         for(int j = 2 * prime; j < isPrime.Length; j += prime)             isPrime[j] = false;     }

The code is much more explicit. Before we go on, however, let s make one more change. The Sieve function can return the ArrayList instead of getting it passed to it; as you see here:

 public static ArrayList Generate(int maxValue)     {         if(maxValue < 2) return new ArrayList();                  bool[] isPrime = new bool[maxValue+1];         int i;                  for(i = 0; i < isPrime.Length; i++)             isPrime[i] = true;  return Sieve(isPrime);  }  private static ArrayList Sieve(bool[] isPrime)   {   ArrayList result = new ArrayList();   for(int i = 2; i  <  isPrime.Length; i++)   {   if(isPrime[i])   {   result.Add(i);   RemoveMultiples(i, isPrime);   }   }   return result;  }     private static void RemoveMultiples(int prime, bool[] isPrime)     {         for(int j = 2 * prime; j < isPrime.Length; j += prime)             isPrime[j] = false;     }

Refactoring 15: Reduce Local Variable Scope

Because we extracted a method that used the variable i , we can reduce the scope of the variable in the Generate method. The following code

 int i; for(i=0; i < isPrime.Length; i++)    isPrime[i] = true;

now becomes

 for(  int  i=0; i < isPrime.Length; i++)       isPrime[i] = true;

Even though the step is small, it is still important to compile the code and run the tests. If you don t, you could have a failure a couple of steps ahead and not know exactly what was changed.

Refactoring 16: Convert Procedural Design to Objects

We previously discussed the two steps in the algorithm: initialization and elimination. There is also a variable, isPrime , that is shared between the two stages. So we have the following:

State ( isPrime )
Logic to initialize the state
Logic to operate on the state

This set of conditions sounds as if we need an object to hold this state, a constructor to initialize the state, and a method to manipulate this state. Meet the next refactoring: Convert procedural design to objects. This step is a little bit larger, so it probably makes sense to comment out the existing code first so that we have something to fall back on if we fail. Another alternative is to check the file into your source code control system and then make the change. If you fail, you can easily roll back to the previous version. The code after the refactoring looks like this:

 public static ArrayList Generate(int maxValue)     {         if(maxValue < 2) return new ArrayList();         Primes primes = new Primes(maxValue);         return primes.Sieve();     }     private bool[] isPrime;      private Primes(int maxValue)     {            isPrime = new bool[maxValue+1];         for(int i = 0; i < isPrime.Length; i++)             isPrime[i] = true;     }     private ArrayList Sieve()     {         ArrayList result = new ArrayList();         for(int i = 2; i < isPrime.Length; i++)         {             if(isPrime[i])             {                 result.Add(i);                 RemoveMultiples(i, isPrime);             }         }         return result;     }     private void RemoveMultiples(int prime, bool[] isPrime)     {         for(int j = 2 * prime; j < isPrime.Length; j += prime)             isPrime[j] = false;     }

We really did not write a lot of new code; we just moved what we had around a bit. After we compiled and ran the tests, they did pass the first time. We then went back and removed the commented-out code. We are definitely getting close to a point of diminishing returns, but let s move on.

Refactoring 17: Keep the Data Close to Where It Is Used

For the first time, the code actually looks like object-oriented code. What a departure from what we had! Now that we have an object, we can see that the Sieve method could do a bit more, and the Generate method might do a bit less. The guard clause from the Generate method can be tucked away into the Sieve method to fully encapsulate the algorithm. Here is the code after applying this refactoring:

 public static ArrayList Generate(int maxValue)     {         Primes primes = new Primes(maxValue);         return primes.Sieve();     }     private bool[] isPrime;      private Primes(int maxValue)     {            isPrime = new bool[maxValue+1];         for(int i = 0; i < isPrime.Length; i++)             isPrime[i] = true;     }     private ArrayList Sieve()     {         if(isPrime.Length < 2) return new ArrayList();         ArrayList result = new ArrayList();         for(int i = 2; i < isPrime.Length; i++)         {             if(isPrime[i])             {                 result.Add(i);                 RemoveMultiples(i, isPrime);             }         }         return result;     }     private void RemoveMultiples(int prime, bool[] isPrime)     {         for(int j = 2 * prime; j < isPrime.Length; j += prime)             isPrime[j] = false;     }

After scanning the code, there really isn t much left to do, so we are finished.