Recipe4.1.Copying an Object


Recipe 4.1. Copying an Object

Credit: Anna Martelli Ravenscroft, Peter Cogolo

Problem

You want to copy an object. However, when you assign an object, pass it as an argument, or return it as a result, Python uses a reference to the original object, without making a copy.

Solution

Module copy in the standard Python library offers two functions to create copies. The one you should generally use is the function named copy, which returns a new object containing exactly the same items and attributes as the object you're copying:

import copy new_list = copy.copy(existing_list)

On the rare occasions when you also want every item and attribute in the object to be separately copied, recursively, use deepcopy:

import copy new_list_of_dicts = copy.deepcopy(existing_list_of_dicts)

Discussion

When you assign an object (or pass it as an argument, or return it as a result), Python (like Java) uses a reference to the original object, not a copy. Some other programming languages make copies every time you assign something. Python never makes copies "implicitly" just because you're assigning: to get a copy, you must specifically request a copy.

Python's behavior is simple, fast, and uniform. However, if you do need a copy and do not ask for one, you may have problems. For example:

>>> a = [1, 2, 3] >>> b = a >>> b.append(5) >>> print a, b [1, 2, 3, 5] [1, 2, 3, 5]

Here, the names a and b both refer to the same object (a list), so once we alter the object through one of these names, we later see the altered object no matter which name we use for it. No original, unaltered copy is left lying about anywhere.

To become an effective Python programmer, it is crucial that you learn to draw the distinction between altering an object and assigning to a name, which previously happened to refer to the object. These two kinds of operations have nothing to do with each other. A statement such as a=[ ] rebinds name a but performs no alteration at all on the object that was previously bound to name a. Therefore, the issue of references versus copies just doesn't arise in this case: the issue is meaningful only when you alter some object.


If you are about to alter an object, but you want to keep the original object unaltered, you must make a copy. As this recipe's solution explains, the module copy from the Python Standard Library offers two functions to make copies. Normally, you use copy.copy, which makes a shallow copyit copies an object, but for each attribute or item of the object, it continues to share references, which is faster and saves memory.

Shallow copying, alas, isn't sufficient to entirely "decouple" a copied object from the original one, if you propose to alter the items or attributes of either object, not just the object itself:

>>> list_of_lists = [ ['a'], [1, 2], ['z', 23] ] >>> copy_lol = copy.copy(lists_of_lists) >>> copy_lol[1].append('boo') >>> print list_of_lists, copy_lol [['a'], [1, 2, 'boo'], ['z', 23]] [['a'], [1, 2, 'boo'], ['z', 23]]

Here, the names list_of_lists and copy_lol refer to distinct objects (two lists), so we could alter either of them without affecting the other. However, each item of list_of_lists is the same object as the corresponding item of copy_lol, so once we alter an item reached by indexing either of these names, we later see the altered item no matter which object we're indexing to reach it.

If you do need to copy some container object and also recursively copy all objects it refers to (meaning all items, all attributes, and also items of items, items of attributes, etc.), use copy.deepcopysuch deep copying may cost you substantial amounts of time and memory, but if you gotta, you gotta. For deep copies, copy.deepcopy is the only way to go.

For normal shallow copies, you may have good alternatives to copy.copy, if you know the type of the object you want to copy. To copy a list L, call list(L); to copy a dict d, call dict(d); to copy a set s (in Python 2.4, which introduces the built-in type set), call set(s). (Since list, dict, and, in 2.4, set, are built-in names, you do not need to perform any "preparation" before you use any of them.) You get the general pattern: to copy a copyable object o, which belongs to some built-in Python type t, you may generally just call t(o). dicts also offer a dedicated method to perform a shallow copy: d.copy( ) and dict(d) do the same thing. Of the two, I suggest you use dict(d): it's more uniform with respect to other types, and it's even shorter by one character!

To copy instances of arbitrary types or classes, whether you coded them or got them from a library, just use copy.copy. If you code your own classes, it's generally not worth the bother to define your own copy or clone method. If you want to customize the way instances of your class get (shallowly) copied, your class can supply a special method _ _copy_ _ (see Recipe 6.9 for a special technique relating to the implementation of such a method), or special methods _ _getstate_ _ and _ _setstate_ _. (See Recipe 7.4 for notes on these special methods, which also help with deep copying and serializationi.e., picklingof instances of your class.) If you want to customize the way instances of your class get deeply copied, your class can supply a special method _ _deepcopy_ _ (see Recipe 6.9.)

Note that you do not need to copy immutable objects (strings, numbers, tuples, etc.) because you don't have to worry about altering them. If you do try to perform such a copy, you'll just get the original right back; no harm done, but it's a waste of time and code. For example:

>>> s = 'cat' >>> t = copy.copy(s) >>> s is t True

The is operator checks whether two objects are not merely equal, but in fact the same object (is checks for identity; for checking mere equality, you use the == operator). Checking object identity is not particularly useful for immutable objects (we're using it here just to show that the call to copy.copy was useless, although innocuous). However, checking object identity can sometimes be quite important for mutable objects. For example, if you're not sure whether two names a and b refer to separate objects, or whether both refer to the same object, a simple and very fast check a is b lets you know how things stand. That way you know whether you need to copy the object before altering it, in case you want to keep the original object unaltered.

You can use other, inferior ways exist to create copies, namely building your own. Given a list L, both a "whole-object slice" L[:] and a list comprehension [x for x in L] do happen to make a (shallow) copy of L, as do adding an empty list, L+[ ], and multiplying the list by 1, L*1 . . . but each of these constructs is just wasted effort and obfuscationcalling list(L) is clearer and faster. You should, however, be familiar with the L[:] construct because for historical reasons it's widely used. So, even though you're best advised not to use it yourself, you'll see it in Python code written by others.


Similarly, given a dictionary d, you could create a shallow copy named d1 by coding out a loop:

>>> d1 = {  } >>> for somekey in d: ...    d1[somekey] = d[somekey]

or more concisely by d1 = { }; d1.update(d). However, again, such coding is a waste of time and effort and produces nothing but obfuscated, fatter, and slower code. Use d1=dict(d), be happy!

See Also

Module copy in the Library Reference and Python in a Nutshell.



Python Cookbook
Python Cookbook
ISBN: 0596007973
EAN: 2147483647
Year: 2004
Pages: 420

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net