21.4 Operator Overloading

We introduced operator overloading in the prior chapter; let's fill in more details here and look at a few commonly used overloading methods. Here's a review of the key ideas behind overloading:

Operator overloading lets classes intercept normal Python operations.
Classes can overload all Python expression operators.
Classes can also overload operations: printing, calls, qualification, etc.
Overloading makes class instances act more like built-in types.
Overloading is implemented by providing specially named class methods.

Here's a simple example of overloading at work. When we provide specially named methods in a class, Python automatically calls them when instances of the class appear in the associated operation. For instance, the Number class in file number.py below provides a method to intercept instance construction (__init__), as well as one for catching subtraction expressions (__sub__). Special methods are the hook that lets you tie into built-in operations:

class Number:     def __init__(self, start):               # On Number(start)         self.data = start     def __sub__(self, other):                # On instance - other         return Number(self.data - other)    # result is a new instance >>> from number import Number               # Fetch class from module. >>> X = Number(5)                           # Number.__init__(X, 5) >>> Y = X - 2                               # Number.__sub__(X, 2) >>> Y.data                                  # Y is new Number instance. 3

21.4.1 Common Operator Overloading Methods

Just about everything you can do to built-in objects such as integers and lists has a corresponding specially named method for overloading in classes. Table 21-1 lists a few of the most common; there are many more. In fact, many overload methods come in multiple versions (e.g., __add__, __radd__, and __iadd__ for addition). See other Python books or the Python Language Reference Manual for an exhaustive list of special method names available.

Table 21-1. Common operator overloading methods
Method	Overloads	Called for
`__init__`	Constructor	Object creation: `Class( )`
`__del__`	Destructor	Object reclamation
`__add__`	Operator '`+`'	`X + Y, X += Y`
`__or__`	Operator '`\|`' (bitwise or)	`X \| Y, X \|= Y`
`__repr__,__str__`	Printing, conversions	print X, `X`, str(X)
`__call__`	Function calls	`X( )`
`__getattr__`	Qualification	`X.undefined`
`__setattr__`	Attribute assignment	`X.any = value`
`__getitem__`	Indexing	`X[key], for loops, in tests`
`__setitem__`	Index assignment	`X[key] = value`
`__len__`	Length	`len(X), truth tests`
`__cmp__`	Comparison	`X == Y, X < Y`
`__lt__`	Specific comparison	`X < Y (or else __cmp__)`
`__eq__`	Specific comparison	`X == Y (or else __cmp__)`
`__radd__`	Right-side operator '`+`'	`Noninstance + X`
`__iadd__`	In-place (augmented) addition	`X += Y (or else __add__)`
`__iter__`	Iteration contexts	for `loops`,in `tests, others`

All overload methods have names that start and end with two underscores, to keep them distinct from other names you define in your classes. The mapping from special method name to expression or operations is simply predefined by the Python language (and documented in the standard language manual). For example, name __add__ always maps to + expressions by Python language definition, regardless of what an __add__ method's code actually does.

All operator overloading methods are optional if you don't code one, that operation is simply unsupported by your class (and may raise an exception if attempted). Most overloading methods are only used in advanced programs that require objects to behave like built-ins; the __init__ constructor tends to appear in most classes, however. We've already met the __init__ initialization-time constructor method, and a few others in Table 21-1. Let's explore some of the additional methods in the table by example.

21.4.2 getitem Intercepts Index References

The __getitem__ method intercepts instance indexing operations. When an instance X appears in an indexing expression like X[i], Python calls a __getitem__ method inherited by the instance (if any), passing X to the first argument and the index in brackets to the second argument. For instance, the following class returns the square of an index value:

>>> class indexer: ...     def __getitem__(self, index): ...         return index ** 2 ... >>> X = indexer(  ) >>> X[2]                        # X[i] calls __getitem__(X, i). 4 >>> for i in range(5):  ...     print X[i],              ... 0 1 4 9 16

21.4.3 getitem and iter Implement Iteration

Here's a trick that isn't always obvious to beginners, but turns out to be incredibly useful: the for statement works by repeatedly indexing a sequence from zero to higher indexes, until an out-of-bounds exception is detected. Because of that, __getitem__ also turns out to be one way to overload iteration in Python if defined, for loops call the class's __getitem__ each time through, with successively higher offsets. It's a case of "buy one, get one free": any built-in or user-defined object that responds to indexing also responds to iteration:

>>> class stepper: ...     def __getitem__(self, i): ...         return self.data[i] ... >>> X = stepper(  )              # X is a stepper object. >>> X.data = "Spam" >>> >>> X[1]                       # Indexing calls __getitem__. 'p' >>> for item in X:             # for loops call __getitem__. ...     print item,            # for indexes items 0..N. ... S p a m

In fact, it's really a case of "buy one, get a bunch for free": any class that supports for loops automatically supports all iteration contexts in Python, many of which we've seen in earlier chapters. For example, the in membership test, list comprehensions, the map built-in, list and tuple assignments, and type constructors, will also call __getitem__ automatically if defined:

>>> 'p' in X                   # All call __getitem__ too. 1 >>> [c for c in X]             # List comprehension ['S', 'p', 'a', 'm'] >>> map(None, X)               # map calls ['S', 'p', 'a', 'm'] >>> (a,b,c,d) = X              # Sequence assignments >>> a, c, d ('S', 'a', 'm') >>> list(X), tuple(X), ''.join(X) (['S', 'p', 'a', 'm'], ('S', 'p', 'a', 'm'), 'Spam') >>> X <__main__.stepper instance at 0x00A8D5D0>

In practice, this technique can be used to create objects that provide a sequence interface, and add logic to built-in sequence type operations; we'll revisit this idea when extending built-in types in Chapter 23.

21.4.3.1 User-defined iterators

Today, all iteration contexts in Python will first try to find a __iter__ method, which is expected to return an object that supports the new iteration protocol. If provided, Python repeatedly calls this object's next method to produce items, until the StopIteration exception is raised. If no such method is found, Python falls back on the __getitem__ scheme and repeatedly indexes by offsets as before, until an IndexError exception.

In the new scheme, classes implement user-defined iterators by simply implementing the iterator protocol introduced in Chapter 14 for functions. For example, the following file, iters.py, defines a user-defined iterator class that generates squares:

class Squares:     def __init__(self, start, stop):         self.value = start - 1         self.stop  = stop     def __iter__(self):                    # Get iterator object         return self     def next(self):                       # on each for iteration.         if self.value == self.stop:             raise StopIteration         self.value += 1         return self.value ** 2 % python >>> from iters import Squares >>> for i in Squares(1,5): ...     print i, ... 1 4 9 16 25

Here, the iterator object is simply the instance, self, because the next method is part of this class. The end of the iteration is signaled with a Python raise statement (more on raising exceptions in the next part of this book).

An equivalent coding with __getitem__ might be less natural, because the for would then iterate through offsets zero and higher; offsets passed in would be only indirectly related to the range of values produced (0..N would need to map to start..stop). Because __iter__ objects retain explicitly-managed state between next calls, they can be more general than __getitem__.

On the other hand, __iter__-based iterators can sometimes be more complex and less convenient than __getitem__. They are really designed for iteration, not random indexing. In fact, they don't overload the indexing expression at all:

>>> X = Squares(1,5) >>> X[1] AttributeError: Squares instance has no attribute '__getitem__'

The __iter__ scheme implements the other iteration contexts we saw in action for __getitem__ (membership tests, type constructors, sequence assignment, and so on). However, unlike __getitem__, __iter__ is designed for a single traversal, not many. For example, the Squares class is a one-shot iteration; once iterated, it's empty. You need to make a new iterator object for each new iteration:

>>> X = Squares(1,5) >>> [n for n in X]                     # Exhausts items [1, 4, 9, 16, 25] >>> [n for n in X]                     # Now it's empty. [  ] >>> [n for n in Squares(1,5)] [1, 4, 9, 16, 25] >>> list(Squares(1,3)) [1, 4, 9]

For more details on iterators, see Chapter 14. Notice that this example would probably be simpler if coded with generator functions a topic introduced in Chapter 14 and related to iterators:

>>> from __future__ import generators    # Need in 2.2 >>> >>> def gsquares(start, stop): ...     for i in range(start, stop+1): ...         yield i ** 2 ... >>> for i in gsquares(1, 5): ...     print i, ... 1 4 9 16 25

Unlike the class, the function automatically saves its state between iterations. Classes may be better at modeling more complex iterations, though, especially when they can benefit from inheritance hierarchies. Of course, for this artificial example, you might as well skip both techniques, and simply use a for loop, map, or list comprehension, to build the list all at once; the best and fastest way to accomplish a task in Python is often also the simplest:

>>> [x ** 2 for x in range(1, 6)] [1, 4, 9, 16, 25]

21.4.4 getattr and setattr Catch Attribute References

The __getattr__ method intercepts attribute qualifications. More specifically, it's called with the attribute name as a string, whenever you try to qualify an instance on an undefined (nonexistent) attribute name. It is not called if Python can find the attribute using its inheritance tree-search procedure. Because of its behavior, __getattr__ is useful as a hook for responding to attribute requests in a generic fashion. For example:

>>> class empty: ...     def __getattr__(self, attrname): ...         if attrname == "age": ...             return 40 ...         else: ...             raise AttributeError, attrname ... >>> X = empty(  ) >>> X.age 40 >>> X.name ...error text omitted... AttributeError: name

Here, the empty class and its instance X have no real attributes of their own, so the access to X.age gets routed to the __getattr__ method; self is assigned the instance (X), and attrname is assigned the undefined attribute name string ("age"). The class makes age look like a real attribute by returning a real value as the result of the X.age qualification expression (40). In effect, age becomes a dynamically computed attribute.

For other attributes the class doesn't know how to handle, it raises the built-in AttributeError exception, to tell Python that this is a bona fide undefined name; asking for X.name triggers the error. You'll see __getattr__ again when we show delegation and properties at work in the next two chapters, and we will say more about exceptions in Part VII.

A related overloading method, __setattr__, intercepts all attribute assignments. If this method is defined, self.attr=value becomes self.__setattr__('attr',value). This is a bit more tricky to use, because assigning to any self attributes within __setattr__ calls __setattr__ again, causing an infinite recursion loop (and eventually, a stack overflow exception!). If you want to use this method, be sure that it assigns any instance attributes by indexing the attribute dictionary, discussed in the next section. Use self.__dict__['name']=x, not self.name=x:

>>> class accesscontrol: ...     def __setattr__(self, attr, value): ...         if attr == 'age': ...             self.__dict__[attr] = value ...         else: ...             raise AttributeError, attr + ' not allowed' ... >>> X = accesscontrol(  ) >>> X.age = 40                     # Calls __setattr__ >>> X.age 40 >>> X.name = 'mel' ...text omitted... AttributeError: name not allowed

These two attribute access overloading methods tend to play highly specialized roles, some of which we'll meet later in this book; in general, they allow you to control or specialize access to attributes in your objects.

21.4.5 repr and strReturn String Representations

The next example exercises the __init__ constructor and the __add__ overload methods we've already seen, but also defines a __repr__ that returns a string representation for instances. String formatting is used to convert the managed self.data object to a string. If defined, __repr__, or its sibling __str__, is called automatically when class instances are printed or converted to strings; they allow you to define a better print string for your objects than the default instance display.

>>> class adder: ...     def __init__(self, value=0): ...         self.data = value                  # Initialize data. ...     def __add__(self, other): ...         self.data += other                 # Add other in-place. >>> class addrepr(adder):                      # Inherit __init__, __add__. ...     def __repr__(self):                     # Add string representation. ...         return 'addrepr(%s)' % self.data   # Convert to string as code. >>> x = addrepr(2)              # Runs __init__ >>> x + 1                       # Runs __add__ >>> x                           # Runs __repr__ addrepr(3) >>> print x                     # Runs __repr__ addrepr(3) >>> str(x), repr(x)             # Runs   __repr__ ('addrepr(3)', 'addrepr(3)')

So why two display methods? Roughly, __str__ is tried first for user-friendly displays, such as the print statement and the str built-in function. The __repr__ method should in principle return a string that could be used as executable code to recreate the object, and is used for interactive prompt echoes and the repr function. Python falls back on __repr__ if no __str__ is present, but not vice-versa:

>>> class addstr(adder):             ...     def __str__(self):                      # __str__ but no __repr__ ...         return '[Value: %s]' % self.data   # Convert to nice string. >>> x = addstr(3) >>> x + 1 >>> x                                          # Default repr <__main__.addstr instance at 0x00B35EF0> >>> print x                                    # Runs __str__ [Value: 4] >>> str(x), repr(x) ('[Value: 4]', '<__main__.addstr instance at 0x00B35EF0>')

Because of this, __repr__ may be best if you want a single display for all contexts. By defining both methods, though, you can support different displays in different contexts:

>>> class addboth(adder): ...     def __str__(self): ...         return '[Value: %s]' % self.data   # User-friendly string ...     def __repr__(self): ...         return 'addboth(%s)' % self.data   # As-code string >>> x = addboth(4) >>> x + 1 >>> x                                  # Runs __repr__ addboth(5) >>> print x                            # Runs __str__ [Value: 5] >>> str(x), repr(x) ('[Value: 5]', 'addboth(5)')

21.4.6 radd Handles Right-Side Addition

Technically, the __add__ method in the prior example does not support the use of instance objects on the right side of the + operator. To implement such expressions, and hence support commutative style operators, code the __radd__ method as well. Python calls __radd__ only when the object on the right of the + is your class instance, but the object on the left is not an instance of your class. The __add__ method for the object on the left is called instead in all other cases:

>>> class Commuter: ...     def __init__(self, val): ...         self.val = val ...     def __add__(self, other): ...         print 'add', self.val, other ...     def __radd__(self, other): ...         print 'radd', self.val, other ... >>> x = Commuter(88) >>> y = Commuter(99) >>> x + 1                      # __add__:  instance + noninstance add 88 1 >>> 1 + y                      # __radd__: noninstance + instance radd 99 1 >>> x + y                      # __add__:  instance + instance add 88 <__main__.Commuter instance at 0x0086C3D8>

Notice how the order is reversed in __radd__: self is really on the right of the +, and other is on the left. Every binary operator has a similar right-side overloading method (e.g., __mul__ and __rmul__). Typically, a right-side method like __radd__ usually just converts if needed and reruns a + to trigger __add__, where the main logic is coded. Also note that x and y are instances of the same class here; when instances of different classes appear mixed in an expression, Python prefers the class of the one on the left.

Right-side methods are an advanced topic, and tend to be fairly rarely used; you only code them when you need operators to be commutative, and then only if you need to support operators at all. For instance, a Vector class may use these tools, but an Employee or Button class probably would not.

21.4.7 call Intercepts Calls

The __call__ method is called when your instance is called. No, this isn't a circular definition if defined, Python runs a __call__ method for function call expressions applied to your instances. This allows class instances to emulate the look and feel of things like functions:

>>> class Prod: ...     def __init__(self, value): ...         self.value = value ...     def __call__(self, other): ...         return self.value * other ... >>> x = Prod(2) >>> x(3) 6 >>> x(4) 8

In this example, the __call__ may seem a bit gratuitous a simple method provides similar utility:

>>> class Prod: ...     def __init__(self, value): ...         self.value = value ...     def comp(self, other): ...         return self.value * other ... >>> x = Prod(3) >>> x.comp(3) 9 >>> x.comp(4) 12

However, __call__ can become more useful when interfacing with APIs that expect functions. For example, the Tkinter GUI toolkit we'll meet later in this book allows you to register functions as event handlers (a.k.a., callbacks); when events occur, Tkinter calls the registered object. If you want an event handler to retain state between events, you can either register a class's bound method, or an instance that conforms to the expected interface with __call__. In our code, both x.comp from the second example and x from the first can pass as function-like objects this way. More on bound methods in the next chapter.

21.4.8 del Is a Destructor

The __init__ constructor is called whenever an instance is generated. Its counterpart, destructor method __del__, is run automatically when an instance's space is being reclaimed (i.e., at "garbage collection" time):

>>> class Life: ...     def __init__(self, name='unknown'): ...         print 'Hello', name ...         self.name = name ...     def __del__(self): ...         print 'Goodbye', self.name ... >>> brian = Life('Brian') Hello Brian >>> brian = 'loretta' Goodbye Brian

Here, when brian is assigned a string, we lose the last reference to the Life instance, and so, trigger its destructor method. This works, and may be useful to implement some cleanup activities such as terminating server connections. However, destructors are not as commonly used in Python as in some OOP languages, for a number of reasons.

For one thing, because Python automatically reclaims all space held by an instance when the instance is reclaimed, destructors are not necessary for space management.^[5] For another, because you cannot always easily predict when an instance will be reclaimed, it's often better to code termination activities in an explicitly-called method (or try/finally statement, described in the next part of the book); in some cases, there may be lingering references to your objects in system tables, which prevent destructors from running.

^[5] In the current C implementation of Python, you also don't need to close files objects held by the instance in destructors, because they are automatically closed when reclaimed. However, as mentioned in Chapter 7, it's better to explicitly call file close methods, because auto-close-on-reclaim is a feature of the implementation, not the language itself (and can vary under Jython).

That's as many overloading examples as we have space for here. Most work similarly to ones we've already seen, and all are just hooks for intercepting built-in type operations; some overload methods have unique argument lists or return values. You'll see a few others in action later in the book, but for a complete coverage, we'll defer to other documentation sources.

21.4.1 Common Operator Overloading Methods

Table 21-1. Common operator overloading methods

21.4.2 __getitem__ Intercepts Index References

21.4.3 __getitem__ and __iter__ Implement Iteration

21.4.3.1 User-defined iterators

21.4.4 __getattr__ and __setattr__ Catch Attribute References

21.4.5 __repr__ and __str__Return String Representations

21.4.6 __radd__ Handles Right-Side Addition

21.4.7 __call__ Intercepts Calls