6.7 Subroutines | Perl 6 Essentials

Subroutines and methods are the basic building blocks of larger programs. At the heart of every subroutine call are two fundamental actions: it has to store the current location so it can come back to it, and it has to transfer control to the subroutine. The bsr opcode does both. It pushes the address of the next instruction onto the control stack, and then branches to a label that marks the subroutine:

 print "in main\n"   bsr _sub   print "and back\n"   end _sub:   print "in sub\n"   ret

At the end of the subroutine, the ret instruction pops a location back off the control stack and goes there, returning control to the caller. The jsr opcode pushes the current location onto the call stack and jumps to a subroutine. Just like the jump opcode, it takes an absolute address in an integer register, so the address has to be calculated first with the set_addr opcode:

 print "in main\n"   set_addr I0, _sub   jsr I0   print "and back\n"   end _sub:   print "in sub\n"   ret

6.7.1 Calling Conventions

A bsr or jsr is fine for a simple subroutine call, but few subroutines are quite that simple. The biggest issues revolve around register usage. Parrot has 32 registers of each type, and the caller and the subroutine share the same set of registers. How does the subroutine keep from destroying the caller's values? More importantly, who is responsible for saving and restoring registers? Where are arguments for the subroutine stored? Where are the subroutine's return values stored? A number of different answers are possible. You've seen how many ways Parrot has of storing values. The critical point is that the caller and the called subroutine have to agree on all the answers.

6.7.1.1 Reserved registers

A very simple system would be to declare that the caller uses registers through 15, and the subroutine uses 16 through 31. This works in a small program with light register usage. But what about a subroutine call from within another subroutine or a recursive call? The solution doesn't extend to a large scale.

6.7.1.2 Callee saves

Another possibility is to make the subroutine responsible for saving the caller's registers:

 set I0, 42   save I0              # pass args on stack   bsr _inc             # j = inc(i)   restore I1           # restore args from stack   print I1   print "\n"   end _inc:   saveall              # preserve all registers   restore I0           # get argument   inc I0               # do all the work   save I0              # push return value   restoreall           # restore caller's registers   ret

This example stores arguments to the subroutine and return values from the subroutine on the user stack. The first statement in the _inc subroutine is a saveall to save all the caller's registers onto the backing stacks, and the last statement before the return restores them.

One advantage of this approach is that the subroutine can choose to save and restore only the register frames it actually uses, for a small speed gain. The example above could use pushi and popi instead of saveall and restoreall because it only uses integer registers. One disadvantage is that it doesn't allow optimization of tail calls, where the last statement of a recursive subroutine is the call to itself.

6.7.1.3 Parrot calling conventions

Internal subroutines can use whatever calling convention serves them best. Externally visible subroutines and methods need stricter rules, since they might be called from a variety of contexts, even from multiple different high-level languages.

Under the Parrot calling conventions, ^[10] the caller is responsible for preserving its own registers. The first 11 arguments of each register type are passed in Parrot registers, as are several other pieces of information. Register usage for subroutine calls is listed in Table 6-4.

^[10] These conventions are still open to changes, so you'll want to check for the latest details in Parrot Design Document 3 (pdd03), available at http://dev.perl.org/perl6/pdd/ and in docs/pdds/pdd03_calling_conventions.pod .

Table 6-4. Calling conventions

Register	Usage
`P0`	Subroutine object.
`P1`	Continuation if applicable .
`P2`	Object for a method call.
`P3`	Array with overflow parameters.
`S0`	Fully qualified subroutine name .
`I0`	True for prototyped parameters.
`I1`	Number of overflow arguments.
`I3`	Expected return type.
`I5` ... `I15`	First 11 integer arguments.
`N5` ... `N15`	First 11 float arguments.
`S5` ... `S15`	First 11 string arguments.
`P5` ... `P15`	First 11 PMC arguments.

If there are more than 11 arguments of one type for the subroutine, overflow parameters are passed in an array in P3 . Subroutines without a prototype pass all their arguments in the user stack or overflow array. ^[11]

^[11] Prototyped subroutines have a defined signature.

Return values and additional information about them are also passed in registers. The individual registers used on return are listed in Table 6-5.

Table 6-5. Return conventions

Register	Usage
`I0`	Registers on the stack.
`I1`	Number of integer return results.
`I2`	Number of string return results.
`I3`	Number of PMC return results.
`I4`	Number of float return results.
`P3`	Array with overflow return values.
`I5` ... `I15`	First 11 integer return values.
`N5` ... `N15`	First 11 float return values.
`S5` ... `S15`	First 11 string return values.
`P5` ... `P15`	First 11 PMC return values.

Overflow return values and return values from a subroutine without a prototype are passed in the overflow array, just like subroutine arguments.

The _inc subroutine from above can be rewritten as a prototyped subroutine:

 set I0, 42   new P0, .Sub       # create a new Sub object   set_addr I1, _inc  # get address of function   set P0, I1         # and set it on the Sub object   set I5, I0         # first integer argument   set I0, 1          # prototype used   saveall            # preserve environment   invoke             # call function object in P0   save I5            # save return value   restoreall         # restore registers   restore I1         # restore return value from stack   print I1   print "\n"   end _inc:   inc I5             # do all the work   ret

Instead of using a simple bsr , this set of conventions uses a subroutine object. There are several kinds of subroutine-like objects, but Sub is a class for PASM subroutines. The location of the subroutine is set in the Sub object by the absolute address of the subroutine's label.

Subroutine objects of all kinds can be called with the invoke opcode. With no arguments, it calls the subroutine in P0 , which is the standard for the Parrot calling conventions. There is also an invoke P x instruction for calling objects held in a different register.

6.7.2 Native Call Interface

A special version of the Parrot calling conventions are used by the Native Call Interface (NCI) for calling subroutines with a known prototype in shared libraries. This is not really portable across all libraries, but it's worth a short example. This is the first of some tests in t/pmc/nci.t :

 loadlib P1, "libnci.so"       # get library object for a shared lib   print "loaded\n"   dlfunc P0, P1, "nci_dd", "dd" # obtain the function object   print "dlfunced\n"   set I0, 1                     # prototype used - unchecked   set I1, 0                     # items on stack - unchecked   set N5, 4.0                   # first argument   saveall                       # preserve regs   invoke                        # call nci_dd   save N5                       # save return result   restoreall                    # restore registers   restore N5   ne N5, 8.0, nok_1             # the test functions returns 2*arg   print "ok 1\n"   end nok_1:   ...

This shows two new instructions: loadlib obtains a handle for a shared library, and dlfunc gets a function object from a loaded library (second argument) of a specified name (third argument) with a known function signature (fourth argument). The function signature is a string where the first character is the return value and the rest of the parameters are the function parameters. The characters used in NCI function signatures are listed in Table 6-6.

Table 6-6. Function signature letters

Character	Register set	C type
`v`	-	void (no return value)
`c`	`I`	char
`s`	`I`	short
`i`	`I`	int
`l`	`I`	long
`f`	`N`	float
`d`	`N`	double
`t`	`S`	char *
`p`	`P`	void * (or other pointer)
`I`	-	Parrot_Interp *interpreter

6.7.3 Closures

A closure is a subroutine that keeps values from the lexical scope where it was defined, even when it's called from an entirely different scope. The closure shown here is equivalent to this Perl 5 code snippet:

 #   sub foo {   #       my ($n) = @_;   #       sub {$n += shift}   #   }   #   my $closure = foo(10);   #   print &$closure(3), "\n";   #   print &$closure(20), "\n";   # call _foo   new P0, .Sub           # new subroutine object   set_addr I3, _foo      # get address of _foo   set P0, I3             # attach address   new P5, .PerlInt       # define $n   set P5, 10   saveall                # caller save   invoke                 # call foo   save P5                # save return value   restoreall             # restore registers   restore P0             # get return value (the closure)   # call _closure   new P5, .PerlInt       # argument to closure   set P5, 3   saveall   invoke                 # call closure(3)   save P5                # return value   restoreall   restore P2             # print result   print P2               # prints 13   print "\n"   # call _closure   set P5, 20             # and again   saveall   invoke                 # call closure(20)   save P5   restoreall   restore P2   print P2               # prints 33   print "\n"   end _foo:   new_pad 0              # push a new pad   store_lex 0, "n", P5   # store $n   new P5, .Sub           # P5 has the lexical "n" in the pad   set_addr I3, _closure  # because the Sub inherits the lex pad   set P5, I3             # set address of function   pop_pad                # cleanup   ret                    # the Sub in P5 is the return value _closure:   find_lex P2, "n"       # invoking the Sub pushes the lexical pad                          # of the closure on the pad stack   add P2, P5             # n += shift   set P5, P2             # set return value   pop_pad                # on each call, the lex pad is there   ret                    # so pop it at end and return

That's quite a lot of PASM code for such a little bit of Perl 5 code, but anonymous subroutines and closures hide a lot of magic under that simple interface. The core of this example is that when the new subroutine is created in _foo with:

 new P5, .Sub            # P5 has the lexical "n" in the pad

it inherits and stores the current lexical scratchpad ”the topmost scratchpad on the pad stack at the time. Later, when _closure is invoked from the main body of code, the stored pad is automatically pushed onto the pad stack. So, all the lexical variables that were available when _closure was defined are available when it's called.

6.7.4 Coroutines

As we mentioned in the previous chapter, coroutines are subroutines that can suspend themselves and return control to the caller ”and then pick up where they left off the next time they're called, as if they never left.

In PASM, coroutines are subroutine-like objects:

 new P0, .Coroutine

The Coroutine object has its own user stack, context stack, and pad stack. The pad stack is inherited from the caller. When the coroutine invokes itself, it returns to the caller. The next time it's invoked, it continues to execute where it returned:

 new_pad 0                # push a new lexical pad on stack   new P0, .PerlInt         # save one variable in it   set P0, 10   store_lex -1, "var", P0   new P0, .Coroutine       # make a new coroutine object   set_addr I0, _cor   set P0, I0               # set the address   saveall                  # preserve enivronment   invoke                   # invoke the coroutine   restoreall   print "back\n"   saveall   invoke                   # invoke coroutine again   restoreall   print "done\n"   pop_pad   end _cor:   find_lex P1, "var"       # inherited pad from caller   print "in cor "   print P1   print "\n"   inc P1                   # var++   invoke                   # yield(  )   print "again "   branch _cor              # next invocation of the coroutine

This prints out the result:

 in cor 10 back again in cor 11 done

The invoke inside the coroutine is commonly referred to as "yield." The coroutine never ends. When it reaches the bottom, it branches back up to _cor and executes until it hits invoke again.

6.7.5 Continuations

A continuation is a subroutine that gets a complete copy of the caller's context, including its own copy of the call stack. Invoking a continuation starts or restarts it at the entry point:

 new P1, .PerlInt   set P1, 5   new P0, .Continuation   set_addr I0, _con   set P0, I0 _con:   print "in cont "   print P1   print "\n"   dec P1   unless P1, done   invoke                        # P0 done:   print "done\n"   end

This prints:

 in cont 5 in cont 4 in cont 3 in cont 2 in cont 1 done

6.7.6 Evaluating a Code String

This isn't really a subroutine operation, but it does produce a code object that can be invoked. In this case, it's a bytecode segment object.

The first step is to get an assembler or compiler for the target language:

 compreg P1, "PASM1"

Within the Parrot interpreter the only language available is PASM1 , which compiles a single, fully qualified PASM instruction to bytecode: ^[12]

^[12] IMCC also accepts PASM for PASM source files, and PIR for PIR source files.

 compile P0, P1, "set_i_ic I0, 10"

This places a bytecode segment object into the destination register P0 , which can then be invoked with invoke :

 compreg P1, "PASM1"                # get compiler set S1, "in eval\n" compile P0, P1, "print_s S1" invoke                             # eval code P0 print "back again\n" end

Fully qualified opcode names include the types of their arguments in the name: i is an integer register, ic is an integer constant, s is a string register, sc is a string constant, n is a float register, nc is a float constant, and p is a PMC register.