Section 5.4. When Reference Counting Goes Bad

5.4. When Reference Counting Goes Bad

Reference counting as a way to manage memory has been around for a long time. A really long time. The downside of reference counting is that it breaks when the data structure is not a directed graph that is, when some parts of the structure point back in to other parts in a looping way. For example, suppose each of two data structures contains a reference to the other (see Figure 5-1):

 my @data1 = qw(one won); my @data2 = qw(two too to); push @data2, \@data1; push @data1, \@data2;

Figure 5-1. When the references in a data structure form a loop, Perl's reference-counting system may not be able to recognize and recycle the no-longer-needed memory space

At this point, we have two names for the data in @data1: @data1 itself and @{$data2[3]}, and two names for the data in @data2: @data2 itself and @{$data1[2]}. We've created a loop. In fact, we can access won with an infinite number of names, such as $data1[2][3][2][3][2][3][1].

What happens when these two array names go out of scope? Well, the reference count for the two arrays goes down from two to one, but not zero. And because it's not zero, Perl thinks there might still be a way to get to the data, even though there isn't! Thus, we've created a memory leak. A memory leak in a program causes the program to consume more and more memory over time. Ugh.

At this point, you're right to think that example is contrived. Of course we would never make a looped data structure in a real program! Actually, programmers often make these loops as part of doubly linked lists, linked rings, or a number of other data structures. The key is that Perl programmers rarely do so because the most important reasons to use those data structures don't apply in Perl. Most of that deals with managing memory and connecting discontiguous memory blocks, which Perl does for us. If you've used other languages, you may have noticed programming tasks that are comparatively easy in Perl. For example, it's easy to sort a list of items or to add or remove items, even in the middle of the list. Those tasks are difficult in some other languages, and using a looped data structure is a common way to get around the language's limitations.

Why mention it here? Well, even Perl programmers sometimes copy an algorithm from another programming language. There's nothing inherently wrong with doing this, although it would be better to decide why the original author used a "loopy" data structure and then recode the algorithm to use Perl's strengths. Perhaps you should use a hash instead, or perhaps the data should go into an array that will be sorted later.

Some upcoming version of Perl is likely to use garbage collection in addition to, or instead of, referencing counting.^[*] Until then, we must be careful not to create circular references or, if we do, break the circle before the variables go out of scope. For example, the following code doesn't leak:

^[*] Just don't ask us about it. We wrote this book a long time before you got a chance to read it, so we didn't exactly know the details back then.

 {   my @data1 = qw(one won);   my @data2 = qw(two too to);   push @data2, \@data1;   push @data1, \@data2;   ... use @data1, @data2 ...   # at the end:   @data1 = (  );   @data2 = (  ); }

We eliminated the reference to @data2 from within @data1, and vice versa. Now the data have only one reference each, which all go to zero references at the end of the block. In fact, we can clear out either one and not the other, and it still works nicely. Chapter 13 shows how to create weak references, which can help with many of these problems.