13.3. Garbage CollectionPython's garbage collection normally proceeds transparently and automatically, but you can choose to exert some direct control. The general principle is that Python collects each object x at some time after x becomes unreachablethat is, when no chain of references can reach x by starting from a local variable of a function instance that is executing, nor from a global variable of a loaded module. Normally, an object x becomes unreachable when there are no references at all to x. In addition, a group of objects can be unreachable when they reference each other but no global nor local variables reference any of them, even indirectly (such a situation is known as a mutual reference loop). Classic Python keeps with each object x a count, known as a reference count, of how many references to x are outstanding. When x's reference count drops to 0, CPython immediately collects x. Function getrefcount of module sys accepts any object and returns its reference count (at least 1, since geTRefcount itself has a reference to the object it's examining). Other versions of Python, such as Jython or IronPython, rely on other garbage-collection mechanisms supplied by the platform they run on (e.g., the JVM or the MSCLR). Modules gc and weakref therefore apply only to CPython. When Python garbage-collects x and there are no references at all to x, Python then finalizes x (i.e., calls x._ _del_ _( )) and makes the memory that x occupied available for other uses. If x held any references to other objects, Python removes the references, which in turn may make other objects collectable by leaving them unreachable. 13.3.1. The gc ModuleThe gc module exposes the functionality of Python's garbage collector. gc deals only with objects that are unreachable in a subtle way, being part of mutual reference loops. In such a loop, each object in the loop refers to others, keeping the reference counts of all objects positive. However, no outside references to any one of the set of mutually referencing objects exist any longer. Therefore, the whole group, also known as cyclic garbage, is unreachable, and therefore garbage-collectable. Looking for such cyclic garbage loops takes some time, which is why module gc exists. This functionality of "cyclic garbage collection," by default, is enabled with some reasonable default parameters: however, by importing the gc module and calling its functions, you may choose to disable the functionality, change its parameters, or find out exactly what's going on in this respect. gc exposes functions you can use to help you keep cyclic garbage-collection times under control. These functions can sometimes help you track down a memory leakobjects that are not getting collected even though there should be no more references to themby letting you discover what other objects are in fact holding on to references to them.
When you know you have no cyclic garbage loops in your program, or when you can't afford the delay of cyclic garbage collection at some crucial time, suspend automatic garbage collection by calling gc.disable( ). You can enable collection again later by calling gc.enable( ). You can test if automatic collection is currently enabled by calling gc.isenabled( ), which returns true or False. To control when the time needed for collection is spent, you can call gc.collect( ) to force a full cyclic collection run to happen immediately. An idiom for wrapping time-critical code is: import gc gc_was_enabled = gc.isenabled( ) if gc_was_enabled: gc.collect( ) gc.disable( ) # insert some time-critical code here if gc_was_enabled: gc.enable( ) Other functionality in module gc is more advanced and rarely used, and can be grouped into two areas. Functions get_threshold and set_threshold and debug flag DEBUG_STATS help you fine-tune garbage collection to optimize your program's performance. The rest of gc's functionality can help you diagnose memory leaks in your program. While gc itself can automatically fix many leaks (as long as you avoid defining _ _del_ _ in your classes, since the existence of _ _del_ _ can block cyclic garbage collection), your program runs faster if it avoids creating cyclic garbage in the first place. 13.3.2. The weakref ModuleCareful design can often avoid reference loops. However, at times you need two objects to know about each other, and avoiding mutual references would distort and complicate your design. For example, a container has references to its items, yet it can often be useful for an object to know about a container holding it. The result is a reference loop: due to the mutual references, the container and items keep each other alive, even when all other objects forget about them. Weak references solve this problem by letting you have objects that mutually reference each other but do not keep each other alive. A weak reference is a special object w that refers to some other object x without incrementing x's reference count. When x's reference count goes down to 0, Python finalizes and collects x, then informs w of x's demise. The weak reference w can now either disappear or get marked as invalid in a controlled way. At any time, a given weak reference w refers to either the same target object x as when w was created, or to nothing at all; a weak reference is never retargeted. Not all types of objects support being the target x of a weak reference w, but class instances and functions do. Module weakref exposes functions and types to create and manage weak references.
WeakKeyDictionary lets you noninvasively associate additional data with some hashable objects without changing the objects. WeakValueDictionary lets you noninvasively record transient associations between objects and build caches. In each case, it's better to use a weak mapping than a normal dict to ensure that an object that is otherwise garbage-collectable is not kept alive just by being used in a mapping. A typical example of use could be a class that keeps track of its instances, but does not keep them alive just in order to keep track of them: import weakref class Tracking(object): _instances_dict = weakref.WeakValueDictionary( ) def _ _init_ _(self): Tracking._instances_dict[id(self)] = self def instances( ): return _instances_dict.values( ) instances = staticmethod(instances) |