Recipe19.18.Looking Ahead into an Iterator

Recipe 19.18. Looking Ahead into an Iterator

Credit: Steven Bethard, Peter Otten

Problem

You are using an iterator for some task such as parsing, which requires you to be able to "look ahead" at the next item the iterator is going to yield, without disturbing the iterator state.

Solution

The best solution is to wrap your original iterator into a suitable class, such as the following one (Python 2.4-only):

import collections class peekable(object):     """ An iterator that supports a peek operation.  Example usage:     >>> p = peekable(range(4))     >>> p.peek( )     0     >>> p.next(1)     [0]     >>> p.peek(3)     [1, 2, 3]     >>> p.next(2)     [1, 2]     >>> p.peek(2)     Traceback (most recent call last):       ...     StopIteration     >>> p.peek(1)     [3]     >>> p.next(2)     Traceback (most recent call last):       ...     StopIteration     >>> p.next( )     3     """     def _ _init_ _(self, iterable):         self._iterable = iter(iterable)         self._cache = collections.deque( )     def _ _iter_ _(self):         return self     def _fillcache(self, n):         if n is None:             n = 1         while len(self._cache) < n:             self._cache.append(self._iterable.next( ))     def next(self, n=None):         self._fillcache(n)         if n is None:             result = self._cache.popleft( )         else:             result = [self._cache.popleft( ) for i in range(n)]         return result     def peek(self, n=None):         self._fillcache(n)         if n is None:             result = self._cache[0]         else:             result = [self._cache[i] for i in range(n)]         return result

Discussion

Many iterator-related tasks, such as parsing, require the ability to "peek ahead" (once or a few times) into the sequence of items that an iterator is yielding, in a way that does not alter the iterator's observable state. One approach is to use the new Python 2.4 function iterator.tee to get two independent copies of the iterator, one to be advanced for peeking purposes and the "other" one to be used as the "main" iterator. It's actually handier to wrap the incoming iterator once for all, at the start, with the class peekable presented in this recipe; afterwards, a peek method, which is safe and effective, can be counted on. A little added sweetener is the ability to call peek (and, as long as we're at it, the standard next method too) with a specific number argument n, to request a list of the next n items of the iterator (without disturbing the iterator's state when you call peek(n), with iterator state advancement when you call next(n)just like for normal calls without arguments to the same methods).

The obvious idea used in this recipe for implementing peekable is to have it keep a cache of peeked-ahead arguments. Since the cache must grow at the tail and get consumed from the end, a natural choice is to make the cache a collections.deque, a new type introduced in Python 2.4. However, if you need this code to run under version 2.3 as well, make self._cache a list insteadyou only need to change method next's body a little bit, making it:

        if n is None:             result = self._cache.pop(0)         else:             result, self_cache = self._cache[:n], self._cache[n:]

As long as you're caching only one or just a few items of lookahead at a time, performance won't suffer much by making self._cache a list rather than a deque.

An interesting characteristic of the peekable class presented in this recipe is that, if you request too many items from the iterator, you get a StopIteration exception but that does not throw away the last few values of the iterator. For example, if p is an instance of peekable with just three items left, when you call p.next(5), you get a StopIteration exception. You can later call p.next(3) and get the list of the last three items.

A subtle point is that the n argument to methods peek and next defaults to None, not to 1. This gives you two distinct ways to peek at a single item: the default way, calling p.peek( ), just gives you that item, while calling p.peek(1) gives you a list with that single item in it. This behavior is quite consistent with the way p.peek behaves when called with different arguments: any call p.peek(n) with any non-negative integer n returns a list with n items (or raises StopIteration if p has fewer than n items left). This approach even supports calls such as p.next(0), which in practice always returns an empty list [ ] without advancing the iterator's state. Typically, you just call p.peek( ), without arguments, and get one look-ahead item without problems.

As an implementation detail, note that the docstring of the class peekable presented in this recipe is essentially made up of examples of use with expected results. Besides being faster to write, and arguably to read for an experienced Pythonista, this style of docstring is perfect for use with the Python Standard Library module doctest.

Recipe19.18.Looking Ahead into an Iterator