Item 34: Be careful with circular data structures. | Effective Perl Programming: Writing Better Programs with Perl

Perl uses a reference counting approach ^[4] to memory management. Each time an object (scalar, array, hash, etc.) acquires a name or a new reference, Perl increments that object's reference count. Whenever an object loses a name or a reference, Perl decrements its reference count. Once an object's reference count reaches zero, Perl deletes the object and reclaims the storage used by it.

^[4] It is possible that in the future Perl will employ some other sort of memory management strategy, in which case this Item may no longer be relevant. Reference counting is fast and efficient, however, and it may be a while before a compelling replacement comes along.

Reference counting fails when objects point to one another in a circular or self-referential fashion. A simple $a = \$a will create a cycle, but let's consider the following more interesting example:

package circular; sub New { shift; bless { name => shift }; }	Ignore package name.
sub DESTROY { my $self = shift; print "$self->{name}: nuked\n"; }
package main; { my $a = New circular 'a'; my $b = New circular 'b';
$a->{next} = $b; $b->{next} = $a; } print "the end\n";

The block inside the main package creates two objects belonging to the class circular , each one containing a reference to the other. The situation looks like this just before the end of the block:

graphics/06fig09.gif

Each object has a reference count of two: one due to the reference from $a / $b and the other due to the reference from the other object. The lexical variables $a and $b go out of scope once the block exits, and then we have the following situation:

graphics/06fig10.gif

Hmm. We can no longer get at these objects, because neither has a name and we have no external references to either of them. They are just taking up space. Unfortunately, there's nothing Perl can do to help us, because both objects still have a reference count of one. These objects will continue to hang around until the entire program exits. Eventually, these objects will be formally destroyed . At the very end of a thread of execution, Perl makes a pass with a " mark-sweep" garbage collector. This final pass will destroy all of the objects created by the interpreter, accessible or otherwise . If you run the example above, you will see the final pass in action:

 %  tryme  the end  b: nuked  a: nuked

As you might expect, the objects are destroyed after the interpreter executes the last statement in the normal flow of the program.

This final pass is important. Perl can be used as an embedded language. If the interpreter were used repeatedly within the same process to execute code like the above, it would leak memory if there were not a sure-fire means of destroying all the objects created during that thread. There is no way to clean up this mess once you get into it, but you can prevent it by the careful application of brute force. You have to implement a technique for explicitly breaking the circular references. One solution that would work in the case above is the following:

 package main;  {    my $a = New circular 'a';    my $b = New circular 'b';    $a->{next} = $b;    $b->{next} = $a;    $head = $a;  }  undef $head->{next};  undef $head;

A link into the data.

Break the cycle.

Have to get this one too.

Here we save a link into the circular data structure into the variable $head . Because there is only a single cycle in the structure, breaking a single link is enough to allow Perl to reclaim all the objects in it. If this doesn't seem thorough enough, you can handle them all yourself:

while ($head) { $next = $head->{next}; undef $head->{next}; $head = $next; } undef $head;	Explicitly traverse and clean up.
	Every last one.

Here we traverse the structure and explicitly destroy every one of the troublesome references. Note that what we are doing is destroying references to the objects we want to delete so that their reference count goes to zero. There is no way to explicitly destroy an object in Perl regardless of its reference countif there were, it could be a horrendous source of bugs and crashes.

Another approach is to do the work in two passes , in a fashion somewhat like a mark-sweep collector. First, we acquire a list or "catalog" of the references that need to be destroyed:

$ptr = $head; do { push @refs, $head->{next}; $head = $head->{next}; } while ($ptr != $head); $ptr = $head = undef;	Construct a list of references (to references) in `@refs` .
	Don't leave any lying around.

This loop traverses the self-referential structure and collects a list of references to all the references that need to be destroyed. The next pass just traverses the list and destroys them:

 foreach (@refs) {    print "preemptive strike on $$_\n";    undef $$_;  }

A two-pass approach is extravagant in the case of a simple circular list such as this one, but in the case of a graphlike structure containing many cycles, it may be the only alternative.