Item 47: Use XS for low-level interfaces andor speed. | Effective Perl Programming: Writing Better Programs with Perl

' XS for low-level interfaces and/or speed."-->

Item 47: Use XS for low-level interfaces and/or speed.

Recent versions of Perl have a documented interface language, called XS , ^[1] that you can use to write functions in C or C++ that can be called from Perl. Within XSUBs , as they are called, you have full access to Perl's internals. You can create variables , change their values, execute Perl code, or do pretty much anything that suits your fancy.

^[1] Parts of XS are still under development as of this writing. XS shouldn't change radically in the near future, but don't be surprised to find some differences between your version and what's documented here.

An XS module is actually a dynamically loaded shareable library. ^[2] Creating one is a complex process fraught with details. Fortunately, with the XS interface you get a bunch of tools that handle most of those details for you. For reasonably simple situations, you need only run x2hs to get a set of boilerplate files, add some Perl and XS code, and run make . There are make targets for building the shareable library, testing it, building a distribution kit, and installing it.

^[2] If your system does not support shareable libraries, you can still use XSUBs by linking them into the Perl executable itself.

XSUBs are a handy way to add operating-system supported features to Perlthey beat the heck out of syscall . You also can use XSUBs to speed up scripts that are spending a significant proportion of their time in a small number of subroutines. And, of course, you can use XSUBs to add a Perl interface to an existing C or C++ library of your own.

As an example of the sort of things you might write an XSUB for, let's suppose that you would like to write a subroutine that returns a copy of a list, but with all the elements in random order. In Perl, you might write something like:

 sub shuffle1 {    my @orig = @_;    my @result;    push @result, splice @orig, rand @orig, 1 while @orig;    @result;  }

This isn't exactly a model of efficiency, because you have to call splice once for each element in the listnot good if you're planning on shuffling long lists. A more efficient approach might be:

 sub shuffle2 {    my @result = @_;    my $n = @result;    while ($n > 1) {      my $i = rand $n;      $n--;      @result[$i, $n] = @result[$n, $i];    }    @result;  }

This is better, but not a lot better. There is no splicing going on, but on the other hand the swapping of values involves a lot of assignments and subscripting , which takes a surprisingly large amount of time. (See the benchmarks at the end of this Item.) In truth, shuffling is best not written in Perl if efficiency is a prime concern. So let's write shuffle as an XSUB in C.

Generating the boilerplate

When writing XSUBs, you should always start out by using h2xs to generate a set of boilerplate files, much as you would for an ordinary Perl-only module (see Item 45). In this case, let's start by creating boilerplate for a module called List::Shuffle .

 %  h2xs -A -n List::Shuffle  Writing List/Shuffle/Shuffle.pm  Writing List/Shuffle/Shuffle.xs  Writing List/Shuffle/Makefile.PL  Writing List/Shuffle/test.pl  Writing List/Shuffle/Changes  Writing List/Shuffle/MANIFEST

This is similar to the example in Item 45. However, since we have chosen a hierarchical package name , the boilerplate files are created in a directory named Shuffle that is itself nested in a directory named List .

As before, you will also need to generate a makefile:

 %  perl Makefile.PL

You may want to make some changes to the boilerplate code before you begin writing your XSUB. For example, if you want to be able to call the shuffle() subroutine without having to prepend the package name to it, you must export its name in Shuffle.pm :

 @EXPORT = qw(shuffle);

Writing and testing an XSUB

The XS language that XSUBs are written in is really just a kind of C preprocessor designed to make writing XSUBs easier.

XS source code goes into files ending with " .xs ". The XS compiler, xsubpp , compiles XS into C code with the glue needed to interface with Perl. You probably will never have to invoke xsubpp on your own, though, because it is invoked automatically via the generated makefile.

XS source files begin with a prologue of C code that is passed through unaltered by xsubpp . A MODULE directive follows the prologue; for example:

 MODULE = List::Shuffle     PACKAGE = List::Shuffle

This indicates the start of the actual XS source, which is itself a list of XSUBs. ^[3]

^[3] Unfortunately, a thorough description of how to write XSUBs would fill an entire book (and it probably will, one of these days), and there isn't space to explain XSUBs in great detail here. For now, for detailed information about XS, you will have to rely on the documentation shipped with Perl, notably the perlxs , perlxstut , and perlguts man pages.

So, how about some examples of XS?

Here is a very simple XSUB that just calls the C standard library log function and returns the result:

 double  log(x)    double x

The return type appears first, on a line by itself at the beginning of the line. Next is the function name and a list of parameter names . The lines following the return type and function name are ordinarily indented for readability. In this case, there is only a single line in the XSUB body, declaring the type of the parameter x .

In a simple case like this, xsubpp generates code that creates a Perl subroutine named log() that calls the C function of the same name. The generated code also includes the glue necessary to convert the Perl argument to a C double and to convert the result back. (To see how this works, take a look at the C code generated by xsubpp it's actually pretty readable.)

Here is a slightly more complex example that calls the Unix realpath() function (not available on all systems):

 char *  realpath(filename)    char *filename    PREINIT:      char realname[1024]; /* or use MAXPATHLEN */    CODE:      RETVAL = realpath(filename, realname);    OUTPUT:      RETVAL

This creates a Perl function that takes a string argument and has a string return value. The XS glue takes care of converting Perl strings to C's char * type and back:

 $realname = realpath($filename);

The CODE section contains the portion of the code used to compute the result from the subroutine. The PREINIT section contains declarations of variables used in the CODE section; they should go here rather than in the CODE section. RETVAL is a "magic" variable supplied by xsubpp used to hold the return value. Finally, the OUTPUT section lists values that will be returned to the caller. This ordinarily will include RETVAL . It can also include input parameters that are modified and returned as if through call by reference.

These examples all return single scalar values. In order to write shuffle , which returns a list of scalars, we have to use a PPCODE section and do our own popping and pushing of arguments and return values. This is less difficult than it may sound. To get things rolling, insert the following code in Shuffle.xs following the MODULE line:

XS source code for List::Shuffle

PROTOTYPES: DISABLE	Turn off Perl prototype processing for the following XSUBs.
void shuffle(...) PPCODE: { int i, n; SV *array; SV tmp; array = New(0, array, items, SV *);	Declared `void` because we are returning values "manually" with `XPUSHs` .
	`SV` is the "scalar value" type.
	Allocate storage. `New()` requests memory from Perl's memory allocator.
for (i = 0; i < items; i++) { array[i] = sv_mortalcopy(ST(i)); }	Copy input args.
n = items; while (n > 1) { i = rand() % n; tmp = array[i]; array[i] = array[--n]; array[n] = tmp; }	Shuffle off to Buffalo!
for (i = 0; i < items; i++) { XPUSHs(array[i]); } Safefree(array); }	Push result (a list) onto stack.
	Free storage. `Safefree()` returns memory to Perl.

The PROTOTYPES: DISABLE directive turns off Perl prototype processing (see Item 28) for the XSUBs that follow.

The strategy here is to copy the input arguments into a temporary array, shuffle them, and then push the result onto the stack. The arguments will be scalar values, which are represented internally in Perl with the type SV * .

Instead of a CODE block, we use a PPCODE block inside this XSUB . This disables the automatic handling of return values on the stack. The number of arguments passed in is contained in the magic variable items , and the arguments themselves are contained in ST(0) , ST(1) , and so on.

The SV pointers on the stack refer to the actual values supplied to the shuffle() function. We don't want this. We want copies instead, so we use the function sv_mortalcopy() to make a reference counted clone of each incoming scalar.

The scalars go into an array allocated with Perl's internal New() function, and then we shuffle them. After shuffling, we push the return values on the stack one at a time with the XPUSHs() function and free the temporary storage we used to hold the array of pointers. If all this seems sketchy, which it probably does, consult the perlguts and perlxs man pages for more details.

At this point, you can save Shuffle.xs and build it:

 %  make

You will see a few lines of gobbledygook as Shuffle.xs compiles (hopefully) and as some other files are created and copied into their preinstallation locations. If the build was successful, you should create a test script. Open the test script template test.pl and add the following lines to the bottom:

 @shuffle = shuffle 1..10;  print "@shuffle\n";  print "ok 2\n";

Now, type:

 %  make test

You should see something like the following at the bottom of the output:

 ok 1  6 4 1 5 3 7 10 2 8 9  ok 2

Voila! Now you just need to write the documentation POD (see Item 46). This is left as an exercise to the reader.

Let's take this module and run some benchmarks. In the process we will also try out the blib pragma (see Item 43). Create a short benchmark program, called tryme as usual:

 use Benchmark;  use List::Shuffle;  # insert shuffle1() and shuffle2() from above here  timethese(500, {      'shuffle' => 'shuffle 1..1000',      'shuffle1' => 'shuffle1 1..1000',      'shuffle2' => 'shuffle2 1..1000',  });

Even though List::Shuffle isn't installed, we can use it if we tell Perl how to find the copy in the build library. The usual way is to invoke the blib pragma from the command line (this assumes tryme is in the same directory as the blib directory):

 %  perl -Mblib tryme  Using /home/joseph/perl-play/xs/List/Shuffle/blib  Benchmark: timing 500 iterations of shuffle, shuffle1,  shuffle2...     shuffle:  3 secs ( 3.31 usr  0.00 sys =  3.31 cpu)    shuffle1: 28 secs (28.79 usr  0.00 sys = 28.79 cpu)    shuffle2: 31 secs (30.68 usr  0.00 sys = 30.68 cpu)

Well, that's an improvement in speed. As you can see, the XSUB shuffle ran approximately eight times faster (on my machine, anyway) than the two versions written in Perl. With better stack handling and some other optimizations it could be made a little faster still. But we will have to leave that as another exercise for the interested reader.