A Simple Example of Refactoring: Generating Primes

Consider the code in Listing 5-1. This program generates prime numbers. It is one big function with many single-letter variables and comments to help us read it.

Listing 5-1. `GeneratePrimes.cs, version 1`

/// <remark> /// This class Generates prime numbers up to a user specified /// maximum. The algorithm used is the Sieve of Eratosthenes. /// /// Eratosthenes of Cyrene, b. c. 276 BC, Cyrene, Libya -- /// d. c. 194, Alexandria. The first man to calculate the /// circumference of the Earth. Also known for working on /// calendars with leap years and ran the library at /// Alexandria. /// /// The algorithm is quite simple. Given an array of integers /// starting at 2. Cross out all multiples of 2. Find the /// next uncrossed integer, and cross out all of its multiples. /// Repeat until you have passed the square root of the /// maximum value. /// /// Written by Robert C. Martin on 9 Dec 1999 in Java /// Translated to C# by Micah Martin on 12 Jan 2005. ///</remark> using System; /// <summary> /// author: Robert C. Martin /// </summary> public class GeneratePrimes {   ///<summary>   /// Generates an array of prime numbers.   ///</summary>   ///   /// <param name="maxValue">The generation limit.</param>   public static int[] GeneratePrimeNumbers(int maxValue)   {     if (maxValue >= 2) // the only valid case     {       // declarations       int s = maxValue + 1; // size of array       bool[] f = new bool[s];       int i;       // initialize array to true.       for (i = 0; i < s; i++)         f[i] = true;       // get rid of known non-primes       f[0] = f[1] = false;       // sieve       int j;       for (i = 2; i < Math.Sqrt(s) + 1; i++)       {         if(f[i]) // if i is uncrossed, cross its multiples.         {           for (j = 2 * i; j < s; j += i)             f[j] = false; // multiple is not prime         }       }       // how many primes are there?       int count = 0;       for (i = 0; i < s; i++)       {         if (f[i])           count++; // bump count.       }       int[] primes = new int[count];       // move the primes into the result       for (i = 0, j = 0; i < s; i++)       {         if (f[i])             // if prime           primes[j++] = i;       }       return primes;  // return the primes     }     else // maxValue < 2       return new int[0]; // return null array if bad input.   } }

Unit Testing

The unit test for GeneratePrimes is shown in Listing 5-2. It takes a statistical approach, checking whether the generator can generate primes up to 0, 2, 3, and 100. In the first case, there should be no primes. In the second, there should be one prime, and it should be 2. In the third, there should be two primes, and they should be 2 and 3. In the last case, there should be 25 primes, the last of which is 97. If all these tests pass, I make the assumption that the generator is working. I doubt that this is foolproof, but I can't think of a reasonable scenario in which these tests would pass but the function fail.

Listing 5-2. `GeneratePrimesTest.cs`

using NUnit.Framework; [TestFixture] public class GeneratePrimesTest {   [Test]   public void TestPrimes()   {     int[] nullArray = GeneratePrimes.GeneratePrimeNumbers(0);     Assert.AreEqual(nullArray.Length, 0);     int[] minArray = GeneratePrimes.GeneratePrimeNumbers(2);     Assert.AreEqual(minArray.Length, 1);     Assert.AreEqual(minArray[0], 2);     int[] threeArray = GeneratePrimes.GeneratePrimeNumbers(3);     Assert.AreEqual(threeArray.Length, 2);     Assert.AreEqual(threeArray[0], 2);     Assert.AreEqual(threeArray[1], 3);     int[] centArray = GeneratePrimes.GeneratePrimeNumbers(100);     Assert.AreEqual(centArray.Length, 25);     Assert.AreEqual(centArray[24], 97);   } }

Refactoring

To help me refactor this program, I am using Visual Studio with the ReSharper refactoring add-in from JetBrains. This tool makes it trivial to extract methods and rename variables and classes.

It seems pretty clear that the main function wants to be three separate functions. The first initializes all the variables and sets up the sieve. The second executes the sieve, and the third loads the sieved results into an integer array. To expose this structure more clearly, I extracted those functions into three separate methods (Listing 5-3). I also removed a few unnecessary comments and changed the name of the class to PrimeGenerator. The tests all still ran.

Extracting the three functions forced me to promote some of the variables of the function to static fields of the class. This makes it much clearer which variables are local and which have wider influence.

Listing 5-3. `PrimeGenerator.cs, version 2`

///<remark> /// This class Generates prime numbers up to a user specified /// maximum. The algorithm used is the Sieve of Eratosthenes. /// Given an array of integers starting at 2: /// Find the first uncrossed integer, and cross out all its /// multiples. Repeat until there are no more multiples /// in the array. ///</remark> using System; public class PrimeGenerator {   private static int s;   private static bool[] f;   private static int[] primes;   public static int[] GeneratePrimeNumbers(int maxValue)   {     if (maxValue < 2)       return new int[0];     else     {       InitializeSieve(maxValue);       Sieve();       LoadPrimes();       return primes;  // return the primes     }   }   private static void LoadPrimes()   {     int i;     int j;     // how many primes are there?     int count = 0;     for (i = 0; i < s; i++)     {       if (f[i])         count++; // bump count.     }     primes = new int[count];     // move the primes into the result     for (i = 0, j = 0; i < s; i++)     {       if (f[i])             // if prime         primes[j++] = i;     }   }   private static void Sieve()   {     int i;     int j;     for (i = 2; i < Math.Sqrt(s) + 1; i++)     {       if(f[i]) // if i is uncrossed, cross its multiples.       {         for (j = 2 * i; j < s; j += i)           f[j] = false; // multiple is not prime       }     }   }   private static void InitializeSieve(int maxValue)   {     // declarations     s = maxValue + 1; // size of array     f = new bool[s];     int i;     // initialize array to true.     for (i = 0; i < s; i++)       f[i] = true;     // get rid of known non-primes     f[0] = f[1] = false;   } }

The InitializeSieve function is a little messy, so I cleaned it up considerably (Listing 5-4). First, I replaced all usages of the s variable with f.Length. Then I changed the names of the three functions to something a bit more expressive. Finally, I rearranged the innards of InitializeArrayOfIntegers (née InitializeSieve) to be a little nicer to read. The tests all still ran.

Listing 5-4. `PrimeGenerator.cs, version 3 (partial)`

public class PrimeGenerator {   private static bool[] f;   private static int[] result;   public static int[] GeneratePrimeNumbers(int maxValue)   {     if (maxValue < 2)       return new int[0];     else     {       InitializeArrayOfIntegers(maxValue);       CrossOutMultiples();       PutUncrossedIntegersIntoResult();       return result;     }   }   private static void InitializeArrayOfIntegers(int maxValue)   {      // declarations      f = new bool[maxValue + 1];      f[0] = f[1] = false; //neither primes nor multiples.      for (int i = 2; i < f.Length; i++)        f[i] = true;   } }

Next, I looked at CrossOutMultiples. There were a number of statements in this function, and in others, of the form if(f[i] == true). The intent was to check whether i was uncrossed, so I changed the name of f to unCrossed. But this led to ugly statements, such as unCrossed[i] = false. I found the double negative confusing. So I changed the name of the array to isCrossed and changed the sense of all the Booleans. The tests all still ran.

I got rid of the initialization that set isCrossed[0] and isCrossed[1] to true and simply made sure that no part of the function used the isCrossed array for indexes less than 2. I extracted the inner loop of the CrossOutMultiples function and called it CrossOutMultiplesOf. I also thought that if (isCrossed[i] == false) was confusing, so I created a function called NotCrossed and changed the if statement to if (NotCrossed(i)). The tests all still ran.

I spent a bit of time writing a comment that tried to explain why you have to iterate only up to the square root of the array size. This led me to extract the calculation into a function where I could put the explanatory comment. In writing the comment, I realized that the square root is the maximum prime factor of any of the integers in the array. So I chose that name for the variables and functions that dealt with it. The result of all these refactorings are in Listing 5-5. The tests all still ran.

Listing 5-5. `PrimeGenerator.cs, version 4 (partial)`

public class PrimeGenerator {   private static bool[] isCrossed;   private static int[] result;   public static int[] GeneratePrimeNumbers(int maxValue)   {     if (maxValue < 2)       return new int[0];     else     {       InitializeArrayOfIntegers(maxValue);       CrossOutMultiples();       PutUncrossedIntegersIntoResult();       return result;     }   }   private static void InitializeArrayOfIntegers(int maxValue)   {     isCrossed = new bool[maxValue + 1];     for (int i = 2; i < isCrossed.Length; i++)       isCrossed[i] = false;   }   private static void CrossOutMultiples()   {     int maxPrimeFactor = CalcMaxPrimeFactor();     for (int i = 2; i < maxPrimeFactor + 1; i++)     {       if(NotCrossed(i))         CrossOutputMultiplesOf(i);     }   }   private static int CalcMaxPrimeFactor()   {     // We cross out all multiples of p, where p is prime.     // Thus, all crossed out multiples have p and q for     // factors.  If p > sqrt of the size of the array, then     // q will never be greater than 1. Thus p is the     // largest prime factor in the array and is also     // the iteration limit.     double maxPrimeFactor = Math.Sqrt(isCrossed.Length) + 1;     return (int) maxPrimeFactor;   }   private static void CrossOutputMultiplesOf(int i)   {     for (int multiple = 2*i;        multiple < isCrossed.Length;        multiple += i)       isCrossed[multiple] = true;   }   private static bool NotCrossed(int i)   {     return isCrossed[i] == false;   } }

The last function to refactor is PutUncrossedIntegersIntoResult. This method has two parts. The first counts the number of uncrossed integers in the array and creates the result array of that size. The second moves the uncrossed integers into the result array. I extracted the first part into its own function and did some miscellaneous cleanup (Listing 5-6). The tests all still ran.

Listing 5-6. `PrimerGenerator.cs, version 5 (partial)`

private static void PutUncrossedIntegersIntoResult() {   result = new int[NumberOfUncrossedIntegers()];   for (int j = 0, i = 2; i < isCrossed.Length; i++)   {     if (NotCrossed(i))       result[j++] = i;   } } private static int NumberOfUncrossedIntegers() {   int count = 0;   for (int i = 2; i < isCrossed.Length; i++)   {     if (NotCrossed(i))       count++; // bump count.   }   return count; }

The Final Reread

Next, I made one final pass over the whole program, reading it from beginning to end, rather like one would read a geometric proof. This is an important step. So far, I've been refactoring fragments. Now I want to see whether the whole program hangs together as a readable whole.

First, I realize that I don't like the name InitializeArrayOfIntegers. What's being initialized is not, in fact, an array of integers but an array of Booleans. But InitializeArrayOfBooleans is not an improvement. What we are really doing in this method is uncrossing all the relevant integers so that we can then cross out the multiples. So I change the name to UncrossIntegersUpTo. I also realize that I don't like the name isCrossed for the array of Booleans. So I change it to crossedOut. The tests all still run.

One might think that I'm being frivolous with these name changes, but with a refactoring browser, you can afford to do these kinds of tweaks; they cost virtually nothing. Even without a refactoring browser, a simple search and replace is pretty cheap. And the tests strongly mitigate any chance that we might unknowingly break something.

I don't know what I was smoking when I wrote all that maxPrimeFactor stuff. Yikes! The square root of the size of the array is not necessarily prime. That method did not calculate the maximum prime factor. The explanatory comment was simply wrong. So I rewrote the comment to better explain the rationale behind the square root and rename all the variables appropriately.^[2] The tests all still run.

^[2] I once watched Kent Beck refactor this very same program. He did away with the square root altogether. His rationale was that the square root was difficult to understand and that no test that failed if you iterated right up to the size of the array. I can't bring myself to give up the efficiency. I guess that shows my assembly language roots.

What the devil is that +1 doing in there? It must have been paranoia. I was afraid that a fractional square root would convert to an integer that was too small to serve as the iteration limit. But that's silly. The true iteration limit is the largest prime less than or equal to the square root of the size of the array. I'll get rid of the +1.

The tests all run, but that last change makes me pretty nervous. I understand the rationale behind the square root, but I've got a nagging feeling that there may be some corner cases that aren't being covered. So I'll write another test that checks that there are no multiples in any of the prime lists between 2 and 500. (See the TestExhaustive function in Listing 5-8.) The new test passes, and my fears are allayed.

The rest of the code reads pretty nicely. So I think we're done. The final version is shown in Listings 5-7 and 5-8.

Listing 5-7. `PrimeGenerator.cs (final)`

///<remark> /// This class Generates prime numbers up to a user specified /// maximum. The algorithm used is the Sieve of Eratosthenes. /// Given an array of integers starting at 2: /// Find the first uncrossed integer, and cross out all its /// multiples. Repeat until there are no more multiples /// in the array. ///</remark> using System; public class PrimeGenerator {   private static bool[] crossedOut;   private static int[] result;   public static int[] GeneratePrimeNumbers(int maxValue)   {     if (maxValue < 2)       return new int[0];     else     {       UncrossIntegersUpTo(maxValue);       CrossOutMultiples();       PutUncrossedIntegersIntoResult();       return result;     }   }   private static void UncrossIntegersUpTo(int maxValue)   {     crossedOut = new bool[maxValue + 1];     for (int i = 2; i < crossedOut.Length; i++)       crossedOut[i] = false;   }   private static void PutUncrossedIntegersIntoResult()   {     result = new int[NumberOfUncrossedIntegers()];     for (int j = 0, i = 2; i < crossedOut.Length; i++)     {       if (NotCrossed(i))         result[j++] = i;     }   }   private static int NumberOfUncrossedIntegers()   {     int count = 0;     for (int i = 2; i < crossedOut.Length; i++)     {       if (NotCrossed(i))         count++; // bump count.     }     return count;   }   private static void CrossOutMultiples()   {     int limit = DetermineIterationLimit();     for (int i = 2; i <= limit; i++)     {       if(NotCrossed(i))         CrossOutputMultiplesOf(i);     }   }   private static int DetermineIterationLimit()   {     // Every multiple in the array has a prime factor that     // is less than or equal to the root of the array size,     // so we don't have to cross off multiples of numbers     // larger than that root.     double iterationLimit = Math.Sqrt(crossedOut.Length);     return (int) iterationLimit;   }   private static void CrossOutputMultiplesOf(int i)   {     for (int multiple = 2*i;        multiple < crossedOut.Length;        multiple += i)       crossedOut[multiple] = true;   }   private static bool NotCrossed(int i)   {     return crossedOut[i] == false;   } }

Listing 5-8. `GeneratePrimesTest.cs (final)`

using NUnit.Framework; [TestFixture] public class GeneratePrimesTest {   [Test]   public void TestPrimes()   {     int[] nullArray = PrimeGenerator.GeneratePrimeNumbers(0);     Assert.AreEqual(nullArray.Length, 0);     int[] minArray = PrimeGenerator.GeneratePrimeNumbers(2);     Assert.AreEqual(minArray.Length, 1);     Assert.AreEqual(minArray[0], 2);     int[] threeArray = PrimeGenerator.GeneratePrimeNumbers(3);     Assert.AreEqual(threeArray.Length, 2);     Assert.AreEqual(threeArray[0], 2);     Assert.AreEqual(threeArray[1], 3);     int[] centArray = PrimeGenerator.GeneratePrimeNumbers(100);     Assert.AreEqual(centArray.Length, 25);     Assert.AreEqual(centArray[24], 97);   }   [Test]   public void TestExhaustive()   {     for (int i = 2; i<500; i++)       VerifyPrimeList(PrimeGenerator.GeneratePrimeNumbers(i));   }   private void VerifyPrimeList(int[] list)   {     for (int i=0; i<list.Length; i++)       VerifyPrime(list[i]);   }   private void VerifyPrime(int n)   {     for (int factor=2; factor<n; factor++)       Assert.IsTrue(n%factor != 0);   } }

Listing 5-1. GeneratePrimes.cs, version 1

Unit Testing

Listing 5-2. GeneratePrimesTest.cs

Refactoring

Listing 5-3. PrimeGenerator.cs, version 2

Listing 5-4. PrimeGenerator.cs, version 3 (partial)

Listing 5-5. PrimeGenerator.cs, version 4 (partial)

Listing 5-6. PrimerGenerator.cs, version 5 (partial)