Iterators

Earlier, this chapter went into detail on the internals of the foreach loop. This section discusses how to use iterators to create your own implementation of the IEnumerator<T> and nongeneric IEnumerator interfaces for custom collections. Iterators provide clean syntax for specifying how to iterate on data in collection classes, especially using the foreach loop. The iterator allows end users of a collection to navigate its internal structure without knowledge of that structure.

Advanced Topic: Origin of Iterators

In 1972, Barbara Liskov and a team of scientists at MIT began researching programming methodologies, focusing on user-defined data abstractions. To prove much of their work, they created a language called CLU that had a concept called "clusters" (CLU being the first three letters), a predecessor to the primary data abstraction programmers use today, objects. As part of their research, the team realized that although they were able to use the CLU language to abstract some data representation away from end users of their types, they consistently found themselves having to reveal the inner structure of their data in order to allow others to intelligently consume it. Through their consternation came the creation of a language construct called an iterator. (The CLU language offered many insights into what would eventually be popularized as object-oriented programming.)

If classes want to support iteration using the foreach loop construct, they must implement the enumerator pattern. As you saw in the earlier section, in C# the foreach loop construct is expanded by the compiler into the while loop construct based on the IEnumerator<T> interface that is retrieved from the IEnumerable<T> interface.

The problem with the enumeration pattern is that it can be cumbersome to implement manually because it maintains an internal state machine. This internal state machine may be simple for a list collection type class, but for data structures that require recursive traversal, such as binary trees, the state machine can be quite complicated. To overcome the challenges and effort associated with implementing this pattern, C# 2.0 includes a construct that makes it easier for a class to dictate how the foreach loop iterates over its contents.

Defining an Iterator

Iterators are a means to implement methods of a class, and they are syntactic shortcuts for the more complex enumerator pattern. When the C# compiler encounters an iterator, it expands its contents into CIL code that implements the enumerator pattern. As such, there are no runtime dependencies for implementing iterators. Because the C# compiler handles implementation through CIL code generation, there is no real runtime performance benefit to using iterators. However, there is a substantial programmer productivity gain in choosing iterators over manual implementation of the enumerator pattern. To begin, the next section examines how an iterator is defined in code.

Iterator Syntax

An iterator provides shorthand implementation of iterator interfaces, the combination of the IEnumerable<T> and IEnumerator<T> interfaces. Listing 12.18 declares an iterator for the generic BinaryTree<T> type by creating a GetEnumerator() method. In the previous chapter, you coded parts of the BinaryTree<T> class. In this chapter, you will add support for the iterator interfaces.

Listing 12.18. Iterator Interfaces Pattern

    using System;     using System.Collections.Generic;     public class BinaryTree<T>:     IEnumerable<T>                                                            {  public BinaryTree ( T value)  {  Value = value;  }   #region IEnumerable<T>   public IEnumerator<T> GetEnumerator()                              {  ...  }   #endregion IEnumerable<T>  public T Value  {  get{ return _value; }  set{ _value = value; }  }  private T _value;  public Pair<BinaryTree<T>> SubItems  {  get{ return _subItems; }  set{ _subItems = value; }  }  private Pair<BinaryTree<T>> _subItems;    }    public struct Pair<T>    {  public Pair(T first, T second)  { _first = first;  _second = second;  }  public T First  {  get{ return _first; }  private set{ _first = value; }     }     private T _first;     public T Second     {  get{ return _second; }    private set{ _second = value;       }     }     private T _second;  }

To begin, add the declaration for the IEnumerator<T> IEnumerable<T>.GetEnumerator() method.

Yielding Values from an Iterator

Iterators are like functions, but instead of returning values, they yield them. In the case of BinaryTree<T>, the yield type of the iterator corresponds to the type parameter, T. If the nongeneric version of IEnumerator is used, then the return type will instead be object. To correctly implement the iterator pattern, you need to maintain an internal state machine in order to keep track of where you are while enumerating the collection. In the BinaryTree<T> case, you track which elements within the tree have already been enumerated and which are still to come.

Iterators have built-in state machines to keep track of the current and next elements. The new yield return statement returns values each time an iterator encounters it. Then, when the next iteration starts, the code begins executing immediately following the last yield return statement. In Listing 12.19, you return the C# primitive data type keywords sequentially.

Listing 12.19. Yielding the C# Keywords Sequentially

    using System;     using System.Collections.Generic;          public class CSharpPrimitiveTypes: IEnumerable<string>           {             public IEnumerator<string> GetEnumerator()             {                 yield return "object";                 yield return "byte";                 yield return "uint";                 yield return "ulong";                 yield return "float";                 yield return "char";                 yield return "bool";                 yield return "ushort";                 yield return "decimal";                 yield return "int";                 yield return "sbyte";                 yield return "short";                 yield return "long";                 yield return "void";                 yield return "double";                 yield return "string";             }                 // IEnumerator also required because IEnumerator<T>                 // derives from it.             System.Collections.IEnumerator                 System.Collections.IEnumerable.GetEnumerator()             {                 // Delegate to IEnumerator<string> GetEnumerator() above                 return GetEnumerator();             }           }           public class Program           {              static void Main()              {                 CSharpPrimitiveTypes primitives =                     new CSharpPrimitiveTypes();                 foreach (string primitive in primitives)                 {                     Console.WriteLine(primitive);                 }              }           }

The results of Listing 12.19 appear in Output 12.5.

Output 12.5.

 object  byte  uint  ulong  float  char  bool  ushort  decimal  int  sbyte  short  long  void  double  string

The output from this listing is a listing of the C# primitive types.^[1]

^[1] In alpha versions of the C# 2.0 compiler, yield was a keyword rather than a contextual keyword. However, such a change could result in an incompatibility between C# 1.0 and C# 2.0. Instead, yield became a contextual keyword that must appear before return. As a result, no code-breaking change occurred because C# 1.0 did not allow any text (besides comments) prior to the return keyword.

Iterators and State

When an iterator is first called in a foreach statement (such as foreach primitive in primitives in Listing 12.19), its state is initialized within the enumerator. The iterator maintains its state as long as the foreach statement at the call site continues to execute. When you yield a value, process it, and resume the foreach statement at the call site, the iterator continues where it left off the previous time around the loop and continues processing. When the foreach statement at the call site terminates, the iterator's state is no longer saved and it is safe to call the iterator again, knowing that it will reset to the beginning.

Figure 12.9 shows a high-level sequence diagram of what takes place. Remember that the MoveNext() method appears on the IEnumerator<T> interface.

Figure 12.9. Sequence Diagram with `yield return`

In Listing 12.19, the foreach statement at the call site initiates a call to GetEnumerator() on the CSharpPrimitiveTypes instance called primitives. Given the iterator instance (referenced by iterator), foreach begins each iteration with a call to MoveNext(). Within the iterator, you yield a value back to the foreach statement at the call site. After the yield return statement, the GetEnumerator() method seemingly pauses until the next MoveNext() request. Back at the call site, the foreach statement displays the yielded value on the screen. It then loops back around and calls MoveNext() on the iterator again. Notice that the second time, processing picks up at the second yield return statement. Once again, the foreach displays on the screen what CSharpPrimitiveTypes yielded and starts the loop again. This process continues until there are no more yield return statements within the iterator. At that point, the foreach loop at the call site terminates.

More Iterator Examples

Before you modify BinaryTree<T>, you must modify Pair<T> to support the IEnumerable<T> interface using an iterator. Listing 12.20 is an example that yields each element in Pair<T>.

Listing 12.20. Using `yield` to Implement `BinaryTree<T>`

    public struct Pair<T>: IPair<T>,     IEnumerable<T>                                                              {  public Pair(T first, T second)  {  _first = first;  _second = second;  }  public T First  {  get{ return _first; }  private set{ _first = value; }     }     private T _first;     public T Second     {  get{ return _second; }  private set{ _second = value; }     }  private T _second;   #region IEnumerable<T>                                              public IEnumerator<T> GetEnumerator()                               {                                                                     yield return First;                                                 yield return Second;                                              }                                                                   #endregion IEnumerable<T>                                           #region IEnumerable Members                                         System.Collections.IEnumerator                                      System.Collections.IEnumerable.GetEnumerator()                      {                                                                   return GetEnumerator();                                             }                                                                   #endregion                                                             }

In Listing 12.20, iterating over a Pair<T> data type loops twice: first through yield return First, and then through yield return Second. Each time the yield return statement within GetEnumerator() is encountered, the state is saved and execution appears to "jump" out of the GetEnumerator() method context and into the context of the call site. When the second iteration starts, GetEnumerator() begins executing again with the yield return Second statement.

System.Collections.Generic.IEnumerable<T> inherits from System.Collections.IEnumerable. Therefore, when implementing IEnumerable<T>, it is also necessary to implement IEnumerable. In Listing 12.20 you do so explicitly, and the implementation simply involves a call to IEnumerable<T>'s GetEnumerator() implementation. This call from IEnumerable.GetEnumerator() to IEnumerable<T>.GetEnumerator() will always work because of the type compatibility (via inheritance) between IEnumerable<T> and IEnumerable. This call from x to y will work in all cases because of type compatibility between IEnumerable and IEnumerator<T>. Since the signatures for both GetEnumerator()s are identical (the return type does not distinguish a signature), one or both implementations must be explicit. Given the additional type safety offered by IEnumerable<T>'s version, you implement IEnumerable's implementation explicitly.

Listing 12.21 uses the Pair<T>.GetEnumerator() method and displays "Inigo" and "Montoya" on two consecutive lines.

Listing 12.21. Using `Pair<T>.GetEnumerator()` via `foreach`

 Pair<string> fullname = new Pair<string>("Inigo", "Montoya");  foreach (string name in fullname)  {  Console.WriteLine(name);  }

Notice that the call to GetEnumerator() is implicit within the foreach loop.

Placing a `yield return` within a Loop

It is not necessary to hardcode each yield return statement, as you did in both CSharpPrimitiveTypes and Pair<T>. Using the yield return statement, you can return values from inside a loop construct. Listing 12.22 uses a foreach loop. Each time the foreach within GetEnumerator() executes, it returns the next value.

Listing 12.22. Placing `yield return`Statements within a Loop

 public class BinaryTree<T>: IEnumerable<T>  {  // ...  #region IEnumerable<T>  public IEnumerator<T> GetEnumerator()  {  // Return the item at this node.  yield return Value;  // Iterate through each of the elements in the pair.   foreach (BinaryTree<T> tree in SubItems)                {                                                         if (tree != null)                                       {                                                         // Since each element in the pair is a tree,            // traverse the tree and yield each                     // element.                                             foreach (T item in tree)                               {                                                             yield return item;                                 }                                                            }                                                                    }                                                                        }  #endregion IEnumerable<T>  #region IEnumerable Members  System.Collections.IEnumerator  System.Collections.IEnumerable.GetEnumerator()  {  return GetEnumerator();  }  #endregion  }

In Listing 12.22, the first iteration returns the root element within the binary tree. During the second iteration you traverse the pair of subelements. If the subelement pair contains a non-null value, then you traverse into that child node and yield its elements. Note that foreach(T item in tree) is a recursive call to a child node.

As observed with CSharpPrimitiveTypes and Pair<T>, you can now iterate over BinaryTree<T> using a foreach loop. Listing 12.23 demonstrates this, and Output 12.6 shows the results.

Listing 12.23. Using `foreach` with `BinaryTree<string>`

 // JFK  jfkFamilyTree = new BinaryTree<string>(  "John Fitzgerald Kennedy");  jfkFamilyTree.SubItems = new Pair<BinaryTree<string>>(  new BinaryTree<string>("Joseph Patrick Kennedy"),  new BinaryTree<string>("Rose Elizabeth Fitzgerald"));  // Grandparents (Father's side)  jfkFamilyTree.SubItems.First.SubItems =  new Pair<BinaryTree<string>>(  new BinaryTree<string>("Patrick Joseph Kennedy"),  new BinaryTree<string>("Mary Augusta Hickey"));  // Grandparents (Mother's side)  jfkFamilyTree.SubItems.Second.SubItems =  new Pair<BinaryTree<string>>(  new BinaryTree<string>("John Francis Fitzgerald"),  new BinaryTree<string>("Mary Josephine Hannon"));   foreach (string name in jfkFamilyTree)                            {                                                                 Console.WriteLine(name);                                  }

Output 12.6.

 John Fitzgerald Kennedy  Joseph Patrick Kennedy  Patrick Joseph Kennedy  Mary Augusta Hickey  Rose Elizabeth Fitzgerald  John Francis Fitzgerald  Mary Josephine Hannon

Beginner Topic

struct versus class

An interesting side effect of defining Pair<T> as a struct rather than a class is that SubItems.First and SubItems.Second cannot be assigned directly. The following will produce a compile error indicating that SubItems cannot be modified, "because it is not a variable."

jfkFamilyTree.SubItems.First =       new BinaryTree<string>("Joseph Patrick Kennedy");

The issue is that SubItems is a property of type Pair<T>, a struct. Therefore, when the property returns the value, a copy of _SubItems is made, and assigning First on a copy that is promptly lost at the end of the statement would be misleading. Fortunately, the C# compiler prevents this.

To overcome the issue don't assign it (see the approach in Listing 12.23), use class rather than struct for Pair<T>, don't create a SubItems property and instead use a field, or provide properties in BinaryTree<T> that give direct access to _SubItems members.

Canceling Further Iteration: `yield break`

Sometimes you might want to cancel further iteration. You can do this by including an if statement so that no further statements within the code are executed. However, you can also jump back to the call site, causing MoveNext() to return false. Listing 12.24 shows an example of such a method.

Listing 12.24. Escaping Iteration via `yield break`

 public System.Collections.Generic.IEnumerable<T>  GetNotNullEnumerator()  {   if((First == null) || (Second == null))               {                                                     yield break;                                          }                                                     yield return Second;  yield return First;  }

This method cancels the iteration if either of the elements in the Pair<T> class is null.

A yield break statement is similar to placing a return statement at the top of a function when it is determined there is no work to do. It is a way to exit from further iterations without surrounding all remaining code with an if block. As such, it allows multiple exits, and therefore, you should use it with caution because casual reading of the code may miss the early exit.

Advanced Topic

How Iterators Work

When the C# compiler encounters an iterator, it expands the code into the appropriate CIL for the corresponding enumerator design pattern. In the generated code, the C# compiler first creates a nested private class to implement the IEnumerator<T> interface, along with its Current property and a MoveNext() method. The Current property returns a type corresponding to the return type of the iterator. Listing 12.20 of Pair<T> contains an iterator that returns a T type. The C# compiler examines the code contained within the iterator and creates the necessary code within the MoveNext method and the Current property to mimic its behavior. For the Pair<T> iterator, the C# compiler generates roughly equivalent code (see Listing 12.25).

Listing 12.25. C# Equivalent of Compiler-Generated C# Code for Iterators

 using System;  using System.Collections.Generic;  public class Pair<T> : IPair<T>, IEnumerable<T>  {  // ...  // The iterator is expanded into the following   // code by the compiler  public virtual IEnumerator<T> GetEnumerator()  {  __ListEnumerator result = new __ListEnumerator(0);  result._Pair = this;  return result;  }  public virtual System.Collections.IEnumerator  System.Collections.IEnumerable.GetEnumerator()  {  return new GetEnumerator();  }  private sealed class __ListEnumerator<T> : IEnumerator<T>  {  public __ListEnumerator(int itemCount)  {  _ItemCount = itemCount;  }  Pair<T> _Pair;   T _Current;   int _ItemCount;   public object Current   {  get  {  return _Current;  }  }  public bool MoveNext()  {  switch (_ItemCount)  {  case 0:  _Current = _Pair.First;  _ItemCount++;  return true;  case 1:  _Current = _Pair.Second;  _ItemCount++;  return true;  default:  return false;  }  }  }  }

Because the compiler takes the yield return statement and generates classes that correspond to what you probably would have written manually, iterators in C# exhibit the same performance characteristics as classes that implement the enumerator design pattern manually. While there is no performance improvement, the programmer productivity gained is significant.

Creating Multiple Iterators in a Single Class

Previous iterator examples implemented IEnumerable<T>.GetEnumerator(). This is the method that foreach seeks implicitly. Sometimes you might want different iteration sequences, such as iterating in reverse, filtering the results, or iterating over an object projection other than the default. You can declare additional iterators in the class by encapsulating them within properties or methods that return IEnumerable<T> or IEnumerable. If you want to iterate over the elements of Pair<T> in reverse, for example, you provide a GetreverseEnumerator() method, as shown in Listing 12.26.

Listing 12.26. Using `yield return`in a Method That Returns `IEnumerable<T>`

 public struct Pair<T>: IEnumerable<T>  {  ...   public IEnumerable<T> GetReverseEnumerator()                {  yield return Second;  yield return First;  }  ...  }  public void Main()  {  Pair<string> game = new Pair<string>("Redskins", "Eagles");   foreach (string name in game.GetReverseEnumerator())          {  Console.WriteLine(name);  }  }

Note that you return IEnumerable<T>, not IEnumerator<T>. This is different from IEnumerable<T>.GetEnumerator(), which returns IEnumerator<T>. The code in Main() demonstrates how to call GeTReverseEnumerator() using a foreach loop.

`yield` Statement Characteristics

You can declare the yield return statement only in members that return an IEnumerator<T> or IEnumerable<T> type, or their nongeneric equivalents. More specifically, you can use yield only in GetEnumerator() methods that return IEnumerator<T>, or in methods that return IEnumerable<T> but are not called GetEnumerator().

Methods that include a yield return statement may not have a simple return. If the method uses the yield return statement, then the C# compiler generates the necessary code to maintain the state machine for the iterator. In contrast, if the method uses the return statement instead of yield return, the programmer is responsible for maintaining his own state machine and returning an instance of one of the iterator interfaces. Further, just as all code paths in a method with a return type must contain a return statement accompanied by a value, all code paths in an iterator must contain a yield return statement.

Additional restrictions on the yield statement that result in compiler errors are as follows.

The yield statement may not appear outside a method, operator, or property accessor.
The yield statement may not appear in an anonymous method (see Chapter 13).
The yield statement may not appear inside the try,catch, and finally clauses.