Earlier, this chapter went into detail on the internals of the foreach loop. This section discusses how to use iterators to create your own implementation of the IEnumerator<T> and nongeneric IEnumerator interfaces for custom collections. Iterators provide clean syntax for specifying how to iterate on data in collection classes, especially using the foreach loop. The iterator allows end users of a collection to navigate its internal structure without knowledge of that structure.
If classes want to support iteration using the foreach loop construct, they must implement the enumerator pattern. As you saw in the earlier section, in C# the foreach loop construct is expanded by the compiler into the while loop construct based on the IEnumerator<T> interface that is retrieved from the IEnumerable<T> interface.
The problem with the enumeration pattern is that it can be cumbersome to implement manually because it maintains an internal state machine. This internal state machine may be simple for a list collection type class, but for data structures that require recursive traversal, such as binary trees, the state machine can be quite complicated. To overcome the challenges and effort associated with implementing this pattern, C# 2.0 includes a construct that makes it easier for a class to dictate how the foreach loop iterates over its contents.
Defining an Iterator
Iterators are a means to implement methods of a class, and they are syntactic shortcuts for the more complex enumerator pattern. When the C# compiler encounters an iterator, it expands its contents into CIL code that implements the enumerator pattern. As such, there are no runtime dependencies for implementing iterators. Because the C# compiler handles implementation through CIL code generation, there is no real runtime performance benefit to using iterators. However, there is a substantial programmer productivity gain in choosing iterators over manual implementation of the enumerator pattern. To begin, the next section examines how an iterator is defined in code.
An iterator provides shorthand implementation of iterator interfaces, the combination of the IEnumerable<T> and IEnumerator<T> interfaces. Listing 12.18 declares an iterator for the generic BinaryTree<T> type by creating a GetEnumerator() method. In the previous chapter, you coded parts of the BinaryTree<T> class. In this chapter, you will add support for the iterator interfaces.
Listing 12.18. Iterator Interfaces Pattern
To begin, add the declaration for the IEnumerator<T> IEnumerable<T>.GetEnumerator() method.
Yielding Values from an Iterator
Iterators are like functions, but instead of returning values, they yield them. In the case of BinaryTree<T>, the yield type of the iterator corresponds to the type parameter, T. If the nongeneric version of IEnumerator is used, then the return type will instead be object. To correctly implement the iterator pattern, you need to maintain an internal state machine in order to keep track of where you are while enumerating the collection. In the BinaryTree<T> case, you track which elements within the tree have already been enumerated and which are still to come.
Iterators have built-in state machines to keep track of the current and next elements. The new yield return statement returns values each time an iterator encounters it. Then, when the next iteration starts, the code begins executing immediately following the last yield return statement. In Listing 12.19, you return the C# primitive data type keywords sequentially.
Listing 12.19. Yielding the C# Keywords Sequentially
The results of Listing 12.19 appear in Output 12.5.
The output from this listing is a listing of the C# primitive types.
Iterators and State
When an iterator is first called in a foreach statement (such as foreach primitive in primitives in Listing 12.19), its state is initialized within the enumerator. The iterator maintains its state as long as the foreach statement at the call site continues to execute. When you yield a value, process it, and resume the foreach statement at the call site, the iterator continues where it left off the previous time around the loop and continues processing. When the foreach statement at the call site terminates, the iterator's state is no longer saved and it is safe to call the iterator again, knowing that it will reset to the beginning.
Figure 12.9 shows a high-level sequence diagram of what takes place. Remember that the MoveNext() method appears on the IEnumerator<T> interface.
Figure 12.9. Sequence Diagram with yield return
In Listing 12.19, the foreach statement at the call site initiates a call to GetEnumerator() on the CSharpPrimitiveTypes instance called primitives. Given the iterator instance (referenced by iterator), foreach begins each iteration with a call to MoveNext(). Within the iterator, you yield a value back to the foreach statement at the call site. After the yield return statement, the GetEnumerator() method seemingly pauses until the next MoveNext() request. Back at the call site, the foreach statement displays the yielded value on the screen. It then loops back around and calls MoveNext() on the iterator again. Notice that the second time, processing picks up at the second yield return statement. Once again, the foreach displays on the screen what CSharpPrimitiveTypes yielded and starts the loop again. This process continues until there are no more yield return statements within the iterator. At that point, the foreach loop at the call site terminates.
More Iterator Examples
Before you modify BinaryTree<T>, you must modify Pair<T> to support the IEnumerable<T> interface using an iterator. Listing 12.20 is an example that yields each element in Pair<T>.
Listing 12.20. Using yield to Implement BinaryTree<T>
In Listing 12.20, iterating over a Pair<T> data type loops twice: first through yield return First, and then through yield return Second. Each time the yield return statement within GetEnumerator() is encountered, the state is saved and execution appears to "jump" out of the GetEnumerator() method context and into the context of the call site. When the second iteration starts, GetEnumerator() begins executing again with the yield return Second statement.
System.Collections.Generic.IEnumerable<T> inherits from System.Collections.IEnumerable. Therefore, when implementing IEnumerable<T>, it is also necessary to implement IEnumerable. In Listing 12.20 you do so explicitly, and the implementation simply involves a call to IEnumerable<T>'s GetEnumerator() implementation. This call from IEnumerable.GetEnumerator() to IEnumerable<T>.GetEnumerator() will always work because of the type compatibility (via inheritance) between IEnumerable<T> and IEnumerable. This call from x to y will work in all cases because of type compatibility between IEnumerable and IEnumerator<T>. Since the signatures for both GetEnumerator()s are identical (the return type does not distinguish a signature), one or both implementations must be explicit. Given the additional type safety offered by IEnumerable<T>'s version, you implement IEnumerable's implementation explicitly.
Listing 12.21 uses the Pair<T>.GetEnumerator() method and displays "Inigo" and "Montoya" on two consecutive lines.
Listing 12.21. Using Pair<T>.GetEnumerator() via foreach
Notice that the call to GetEnumerator() is implicit within the foreach loop.
Placing a yield return within a Loop
It is not necessary to hardcode each yield return statement, as you did in both CSharpPrimitiveTypes and Pair<T>. Using the yield return statement, you can return values from inside a loop construct. Listing 12.22 uses a foreach loop. Each time the foreach within GetEnumerator() executes, it returns the next value.
Listing 12.22. Placing yield returnStatements within a Loop
In Listing 12.22, the first iteration returns the root element within the binary tree. During the second iteration you traverse the pair of subelements. If the subelement pair contains a non-null value, then you traverse into that child node and yield its elements. Note that foreach(T item in tree) is a recursive call to a child node.
As observed with CSharpPrimitiveTypes and Pair<T>, you can now iterate over BinaryTree<T> using a foreach loop. Listing 12.23 demonstrates this, and Output 12.6 shows the results.
Listing 12.23. Using foreach with BinaryTree<string>
Canceling Further Iteration: yield break
Sometimes you might want to cancel further iteration. You can do this by including an if statement so that no further statements within the code are executed. However, you can also jump back to the call site, causing MoveNext() to return false. Listing 12.24 shows an example of such a method.
Listing 12.24. Escaping Iteration via yield break
This method cancels the iteration if either of the elements in the Pair<T> class is null.
A yield break statement is similar to placing a return statement at the top of a function when it is determined there is no work to do. It is a way to exit from further iterations without surrounding all remaining code with an if block. As such, it allows multiple exits, and therefore, you should use it with caution because casual reading of the code may miss the early exit.
Creating Multiple Iterators in a Single Class
Previous iterator examples implemented IEnumerable<T>.GetEnumerator(). This is the method that foreach seeks implicitly. Sometimes you might want different iteration sequences, such as iterating in reverse, filtering the results, or iterating over an object projection other than the default. You can declare additional iterators in the class by encapsulating them within properties or methods that return IEnumerable<T> or IEnumerable. If you want to iterate over the elements of Pair<T> in reverse, for example, you provide a GetreverseEnumerator() method, as shown in Listing 12.26.
Listing 12.26. Using yield returnin a Method That Returns IEnumerable<T>
Note that you return IEnumerable<T>, not IEnumerator<T>. This is different from IEnumerable<T>.GetEnumerator(), which returns IEnumerator<T>. The code in Main() demonstrates how to call GeTReverseEnumerator() using a foreach loop.
yield Statement Characteristics
You can declare the yield return statement only in members that return an IEnumerator<T> or IEnumerable<T> type, or their nongeneric equivalents. More specifically, you can use yield only in GetEnumerator() methods that return IEnumerator<T>, or in methods that return IEnumerable<T> but are not called GetEnumerator().
Methods that include a yield return statement may not have a simple return. If the method uses the yield return statement, then the C# compiler generates the necessary code to maintain the state machine for the iterator. In contrast, if the method uses the return statement instead of yield return, the programmer is responsible for maintaining his own state machine and returning an instance of one of the iterator interfaces. Further, just as all code paths in a method with a return type must contain a return statement accompanied by a value, all code paths in an iterator must contain a yield return statement.
Additional restrictions on the yield statement that result in compiler errors are as follows.