8.6 Advanced Features | Perl 6 and Parrot Essentials, Second Edition

Since the languages Parrot targets (like Perl and Ruby) have sophisticated concepts as core features, it's in Parrot's best interest to have core support for them. This section covers some (but not all) of these features.

8.6.1 Garbage Collection

It's expected that modern languages have garbage collection built in. The programmer shouldn't have to worry about explicitly cleaning up after dead variables , or even identifying them. For interpreted languages, this requires support from the interpreter engine, so Parrot provides that support.

Parrot has two separate allocation systems built into it. Each allocation system has its own garbage collection scheme. Parrot also has some strict rules over what can be referenced and from where. This allows it to have a more efficient garbage collection system.

The first allocation system is responsible for PMC and string structures. These are fixed-sized objects that Parrot allocates out of arenas, which are pools of identically sized things. Using arenas makes it easy for Parrot to find and track them, and speeds up the detection of dead objects.

Parrot's dead object detection system works by first running through all the arenas and marking all strings and PMCs as dead. It then runs through the stacks and registers, marking all strings and PMCs they reference as alive. Next, it iteratively runs through all the live PMCs and strings and marks everything they reference as alive . Finally, it sweeps through all the arenas looking for newly dead PMCs and strings, which it puts on the free list. At this point, any PMC that has a custom destruction routine, such as an object with a DESTROY method, has its destruction routine called. The dead object detector is triggered whenever Parrot runs out of free objects, and can be explicitly triggered by running code. Often a language compiler will force a dead object sweep when leaving a block or subroutine.

Parrot's memory allocation system is used to allocate space for the contents of strings and PMCs. Allocations don't have a fixed size ; they come from pools of memory that Parrot maintains. Whenever Parrot runs out of memory in its memory pools, it makes a compacting run ”squeezing out unused sections from the pools. When it's done, one end of each pool is entirely actively used memory, and the other end is one single chunk of free memory. This makes allocating memory from the pools faster, as there's no need to walk a free list looking for a segment of memory large enough to satisfy the request for memory. It also makes more efficient use of memory, as there's less overhead than in a traditional memory allocation system.

Splitting memory pool compaction from dead object detection has a nice performance benefit for Perl and languages like it. For most Perl programs, the interpreter allocates and reallocates far more memory for string and variable contents than it does actual string and variable structures. The structures are reused over and over as their contents change. With a traditional single-collector system, each time the interpreter runs out of memory it has to do a full scan for dead objects and compact the pools after. With a split system, Parrot can just sweep through the variables it thinks are live and compact their contents. This does mean that Parrot will sometimes move data for variables and strings that are really dead because it hasn't found that out yet. That expense is normally much less than the expense of doing a full tracing run to find out which variables are actually dead.

Parrot's allocation and collection systems have some compromises that make interfacing with low-level code easier. The structure that describes a PMC or string is guaranteed not to move over the lifetime of the string or variable. This allows C code to store pointers to variables in internal structures without worrying that what they're referencing may move. It also means that the garbage collection system doesn't have to worry about updating pointers that C code might hold, which it would have to do if PMC or string structures could move.

8.6.2 Multimethod Dispatching

Multimethod dispatching (also known as signature-based dispatching) is a powerful technique that uses the parameters of a function or method call to help decide at runtime which function or method Parrot should call. This is one of the new features being built into Perl 6. It allows you to have two or more subroutines or methods with the same name that differ only in the types of their arguments.

In a standard dispatch system, each subroutine or method name must be unique within a namespace. Attempting to create a second routine with the same name either throws an error or overlays the original one. This is certainly straightforward, but in some circumstances it leads to code that looks like:

 sub foo {     my ($self, $arg) = @_;     if ($arg->isa("Foo")) {         # Do something with a Foo arg     } elsif ($arg->isa("Bar")) {         # Do something with a Bar arg     } elsif ($arg->isa("Baz")) {         # Do something with a Baz arg     } else {         # . . .      } }

This method effectively dispatches both on the type of the object and on the type of the argument to the method. This sort of thing is common, especially in operator overloading functions. Manually checking the types of the arguments to select an action is both error-prone and difficult to extend. Multimethod dispatch solves this problem.

With multimethod dispatch, there can be more than one method or subroutine with the same name as long as each variant has different parameters in its declaration. When code calls a method or subroutine that participates in multiple dispatch, the system chooses the variant that most closely matches the types of the parameters in the call.

One very notable thing about subs and methods that do multimethod dispatch is that the named subroutines and methods live outside of any namespace. By default, when searching for a method or subroutine, Parrot first looks for an explict sub or method of that name in the current namespace (or the inheritance hierarchy of an object), then for the default subroutine or method (AUTOLOAD or its equivalent) in the inheritance hierarchy, and only when those fail will it look for a multimethod dispatch version of the subroutine or method. Since Parrot allows individual PMC classes to control how their dispatching is done, this sequence may be changed on a per-class basis if need be.

Parrot itself makes heavy use of multimethod dispatch, with most of the core PMC classes using it to provide operator overloading. The only reason we don't use it for all our operator dispatching is that some of the languages we're interested in require a left-side wins scheme. It's so heavily used for operator overloading, in fact, that we actually have two separate versions of multiple dispatch built into Parrot, one specially tailored to operator overloading and a more general version for normal subroutine and method dispatch.

8.6.3 Continuations

Continuations are possibly the most powerful high-level flow control construct. Originating with lambda calculus, and built into Lisp over thirty years ago, continuations can be thought of as a closure for control flow. They not only capture their lexical scope, which gets restored when they're invoked, but also capture their call stack, so when they're invoked it's as if you never left the spot where they were created. Like closures, though, while they capture the variables in scope when the continuation is taken, they don't capture the values of the variables. When you invoke a continuation it's not like rolling back a transaction.

Continuations are phenomenally powerful, and have the undeserved reputation of being bizarre and mind-warping things. This turns out not to be the case. Originally we put continuations into Parrot to support Ruby, which has them. This decision turned out to be fortuitous.

In a simple call/return system, which many languages use, when you make a subroutine call the return address is pushed onto a stack somewhere. When the subroutine is done it takes the address off the stack and returns there. This is a simple and straightforward operation, and quite fast. The one disadvantage is that with a secure system the calling routine needs to preserve any information that is important before making the call and restore it on return.

An alternative calling scheme is called Continuation Passing Style (CPS). With CPS, rather than pushing a return address onto the stack you create a return continuation and pass that into the subroutine as a parameter. When the subroutine is done it invokes the return continuation, effectively returning to the caller with the caller's environment automatically restored. This includes not only things like the call stack and lexical variables, but also meta-information like security credentials.

When we were originally designing Parrot we'd planned on the simpler call/return style, with the caller preserving everything important before the call, and restoring it afterwards. Three things soon became clear: we were saving and restoring a lot of individual pieces; we were going to have to add new pieces in the future; and there wasn't any difference between what we were doing for a call and what we were doing for a continuation, except that the call was a lot more manual.

The future-proofing was what finally made the decision. Parrot is making a strong guarantee of backward compatibility, which means that code compiled to Parrot bytecode once we've released will run safely and unchanged on all future version of Parrot. If we require all the individual pieces of the environment (registers, lexical pads, nested namespaces, opcode libraries, stack pointers, exception handlers, and assorted things) to be saved manually for a subroutine call, it means that we can't add any new pieces in the future, as then old code would no longer work properly. We briefly toyed with the idea of an opcode to package up the entire environment in one go. Then we realized that package was a continuation, and as such we might as well just go all the way and use them.

As a result, Parrot implements a full CPS system internally, and uses it for all subroutine and method calls. We also have the simpler call/return style of flow control available for languages that don't need the heavier-weight call system, as well as for compilers to use for internal processing and optimization. We do go to some lengths to hide the continuations. PIR code, for example, allows compiler writers to create subroutines and methods (and calls to them) that conform to Parrot's CPS mechanism without ever touching continuations directly. We then have the benefits of what appears to be a simple calling scheme, secure future-proofing, and the full power of continuations for languages that want them.

8.6.4 Coroutines

A coroutine is a subroutine or method that can suspend itself partway through, then later pick up where it left off. This isn't quite the same thing as a continuation, though it may seem so at first. Coroutines are often used to implement iterators and generators, as well as threads on systems that don't have native threading support. Since they are so useful, and since Perl 6 and Python provide them either directly or as generators, Parrot has support for them built in.

Coroutines present some interesting technical challenges. Calling into an existing coroutine requires reestablishing not only the lexical state and potentially the hypothetical state of variables, but also the control state for just the routine. In the presence of exceptions they're a bit more complex than plain subroutines and continuations, but they're still very useful things, and as such we've given them our full support.