XS Interface Design and Construction | Writing Perl Modules for CPAN

Chapter 9 - Writing C Modules with XS

by?Sam Tregar?

Apress ? 2002

Companion Web Site

Being able to easily produce one-to-one mappings between a set of C functions and subroutines in a Perl module is undeniably very useful. XS is designed to allow you to accomplish this task with a minimum amount of coding required; simply describe the C functions, and out pops a new module, fully baked and ready to consume. Unfortunately, much like a microwave dinner, the ease of preparation comes at a price in palatability.

Consider the interface that Gnome::MIME would provide if this recipe were followed for each function in the API. The Gnome MIME type functions follow a common pattern in C APIs—they provide a variety of functions that all perform the same task with different parameters:

 $type = Gnome::MIME::type($filename); $type = Gnome::MIME::type_or_default($filename, $default); $type = Gnome::MIME::type_of_file($filename); $type = Gnome::MIME::type_or_default_of_file($filename, $default);

The or_default versions allow the user to specify a default to be returned if the MIME-type cannot be determined. The of_file versions actually read from the file to guess its MIME type rather than relying on the filename alone. However, every one of these functions provides the same core functionality—determining the MIME type of a file. Clearly this would be a difficult module for a Perl programmer to use if he or she had to pick from the preceding function list. To make matters worse, module authors who allow XS to write their interfaces often abdicate on the documentation as well, saying something along the lines of "I've wrapped every C function in the library, see the docs for the C library for details."

Every module needs to have a sensible interface. Whether it's implemented in Perl or in C shouldn't determine the interface used. This section will give you some ideas about how to design and code better XS interfaces.

Supporting Named Parameters

In Perl, a better interface to the preceding MIME type functions might be this one:

 $type = Gnome::MIME::file_type(filename     => $filename,                                default_type => "text/html",                                read_file    => 1);

This follows the named-parameter function style introduced in Chapter 2.

Using XS

Unfortunately XS doesn't have native support for this highly Perl-ish function style, but it's not hard to support it with a little code (see Listing 9-6 for the full XSUB).

Listing 9-6: Named-Parameter XSUB

 const char * file_type(...) PREINIT:   char *filename = NULL;    // variables for named params   char *default_type = NULL;   IV read_file = 0;   IV x;                     // loop counter CODE:   // check that there are an even number of args   if (items % 2) croak("Usage: Gnome::MIME::file_type(k => v, ...)");   // loop over args by pairs and fill in parameters   for(x = 0; x < items; x+=2) {     char *key = SvPV(ST(x), PL_na);     if (strEQ(key, "filename")) {       filename = SvPV(ST(x+1), PL_na);     } else if (strEQ(key, "default_type")) {       default_type = SvPV(ST(x+1), PL_na);     } else if (strEQ(key, "read_file")) {       read_file = SvIV(ST(x+1));     } else {       croak("Unknown key found in Gnome::MIME::file_type parameter list: %s",                   SvPV(ST(x), PL_na));     }   }   // make sure we have a filename parameter   if (filename == NULL) croak("Missing required parameter filename.");   // call the appropriate function based on arguments   if (read_file && default_type != NULL) {      RETVAL = gnome_mime_type_or_default_of_file(filename, default_type);   } else if (read_file) {     RETVAL = gnome_mime_type_of_file(filename);   } else if (default_type != NULL) {     RETVAL = gnome_mime_type_or_default(filename, default_type);   } else {     RETVAL = gnome_mime_type(filename);   } OUTPUT:   RETVAL

The XSUB starts with syntax for a variable argument function that mimics C's syntax:

 const char * file_type(...)

Note that unlike C's …, you don't need to have at least one fixed parameter.

Next, I set up a number of local variables in a PREINIT block. The contents of PREINIT are placed first in the output XSUB. In some cases this is essential, but in this case it's merely a convenient place for local declarations:

 PREINIT:   char *filename     = NULL; // variables for named params   char *default_type = NULL;   IV read_file       = 0;   IV x;                      // loop counter

Next comes the CODE block proper, where the automatic XS variable items is used to check the number of parameters:

  // check that there are an even number of args  if (items % 2) croak("Usage: Gnome::MIME::file_type(k => v, ...)");

and then iterate through the key/value pairs:

  // loop over args by pairs and fill in parameters  for(x = 0; x < items; x+=2) {    char *key = SvPV(ST(x), PL_na);    if (strEQ(key, "filename")) {    filename = SvPV(ST(x+1), PL_na);  } else if (strEQ(key, "default_type")) {    // ...

The preceding block uses the ST macro to access the SVs passed in as arguments. The strEQ() function is used to compare the keys to the available parameter names. When a match is found, the value is assigned to one of the variables initialized in the PREINIT section. After that, a series of conditionals determines which gnome_mime_type function to call:

  // call the appropriate function based on arguments  if (read_file && default_type != NULL) {    RETVAL = gnome_mime_type_or_default_of_file(filename, default_type);  } else if (read_file) {    // ...

With the new named-parameter style, the test cases will need adjusting:

 # test some simple mime-file_type recognitions is(Gnome::MIME::file_type(filename => "foo.gif"), 'image/gif',  "test .gif"); is(Gnome::MIME::file_type(filename => "foo.jpg"), 'image/jpeg', "test .jpg"); is(Gnome::MIME::file_type(filename => "foo.html"), 'text/html', "test .html"); # test defaulting is(Gnome::MIME::file_type(filename => "", default_type => "text/html"),    "text/html", "test default"); # ...

Using Perl

The XSUB shown previously gets the job done, but at the cost of some long and relatively complicated C code. An easier way to get the same functionality is to divide the module into an XS back end and a Perl front end. The XS layer will provide a thin wrapper around the existing API, and the Perl front end will add the code necessary to support a friendly interface.

To start with, I'll define the back-end XSUBs in a separate package using the PACKAGE command on the MODULE line:

 MODULE = Gnome::MIME   PACKAGE = Gnome::MIME::Backend   PREFIX = gnome_mime_

After this line every XSUB defined will have its Perl interface defined in the Gnome::MIME::Backend package. An XS file can contain any number of such lines and PACKAGEs, although only one MODULE may be used.

Then each of the functions is wrapped in the plain style shown earlier:

 const char * gnome_mime_type(filename)   char * filename const char * gnome_mime_type_or_default(filename, default)   char * filename   char * default const char * gnome_mime_type_of_file(filename)   char * filename const char * gnome_mime_type_of_file_or_default(filename, default)   char * filename   char * default

The Perl code to implement the named-parameter interface is then added to MIME.pm:

 use Carp qw(croak); sub file_type {   croak("Usage: Gnome::MIME::file_type(k => v, ...)") if @_ % 2;   my %args = @_;   # check for bad parameter names   my %allowed = map { $_, 1 } qw(filename default_type read_file);   for (keys %args) {     croak("Unknown key found in Gnome::MIME::file_type parameter list: $_")       unless exists $allowed{$_};   }   # make sure filename is specified   croak("Missing required parameter filename.") unless exists $args{filename};   # call appropriate back-end function   if ($args{read_file} and $args{default_type}) {     return Gnome::MIME::Backend::type_or_default_of_file($args{filename},                                                  $args{default_type});   } elsif ($args{read_file}) {     return Gnome::MIME::Backend::type_of_file($args{filename});   } elsif ($args{default_type}) {     return Gnome::MIME::Backend::type_or_default($args{filename},                                                  $args{default_type});   } else {     return Gnome::MIME::Backend::type($args{filename});   } }

This code essentially does the same things as the XS code in the previous section, but instead of being in C, it's in good-old maintainable, forgiving Perl. As an added bonus, instead of translating the calls to croak() in the XS version into die() calls, I added a use Carp and used Carp's version of croak(), which will yield much better error messages than die() or its C equivalent.

This pattern, a thin layer of XS with a Perl front end, is worthy of emulation. It provides a way to write Perl-ish interfaces in Perl and get the most out of XS at the same time.

Providing Access to Complex Data Structures

Many C interfaces are more complicated than the gnome_mime_type functions— they manipulate complex data structures, often using C structs and arrays. Providing a convenient interface to Perl programmers is often a matter of translating data from C structs into Perl arrays and hashes, and back again.

It just so happens that the Gnome::MIME module has need of this functionality. The Gnome MIME API supplies two functions that access a set of key/value pairs associated with each MIME type, gnome_mime_get_keys() and gnome_mime_get_value(). There are keys for the program the user has chosen to open the MIME type (that is, an image viewer for image/gif, a text editor for text/plain, and so on) as well as other types of metadata.

It would, of course, be possible to provide a Perl interface directly to these calls. For example, to print out the available key/value pairs for image/gif, you could do the following:

 @keys = Gnome::MIME::get_keys("image/gif"); foreach $key (@keys) {   $value = Gnome::MIME::get_value("image/gif", $key);   print "$key => $value\n"; }

If your Perl-sense isn't screaming hash by now, you might want to see a doctor! A much better interface would be this one:

 my $type_data = Gnome::MIME::type_data("image/gif"); while (($key, $value) = each %$type_data) {   print "$key => $value\n"; }

To provide this interface, you need to build a hash dynamically from the results of calling gnome_mime_get_keys() and gnome_mime_get_value(). The full XSUB used to support this interface is shown in Listing 9-7.

Listing 9-7: XSUB Implementing Gnome::MIME::type_data()

 SV * type_data(type)   const gchar * type PREINIT:   GList *keys, *iter;   HV *hv;   SV *value;   char *key; CODE:   // initialize hash   hv = newHV();   // get GList of keys for this type   keys = gnome_mime_get_keys(type);   // iterate through keys   for (iter = keys; iter; iter = iter->next) {     // get the key from the iterator     key = iter->data;     // create a new SV and load it with the value for this key     value = newSVpv(gnome_mime_get_value(type, key), 0);     // store the key/value pair in     hv_store(hv, key, strlen(key), value, 0);   }   // return a reference to the new hash   RETVAL = newRV_noinc((SV *)hv); OUTPUT:   RETVAL CLEANUP:   g_list_free(keys);

The XSUB starts with a return type of SV:

 SV * type_data(type)   const gchar * type

Since the subroutine will return a reference to a hash, the return type must be SV*. Next, several local variables are declared in a PREINIT block, including two GLists. The GList type is the Gnome API's linked-list type. The CODE body starts by initializing the HV that will be returned to the user:

  // initialize hash  hv = newHV();

Next, I call gnome_mime_get_keys() and begin iterating over the GList:

  // get GList of keys for this type  keys = gnome_mime_get_keys(type);  // iterate through keys  for (iter = keys; iter; iter = iter->next) {    // get the key from the iterator    key = iter->data;

If keys returns a NULL pointer, then this loop won't be executed, and an empty hash will be returned to the user. Next, the key-value pairs are stored in the HV:

     // create a new SV and load it with the value for this key     value = newSVpv(gnome_mime_get_value(type, key), 0);     // store the key/value pair in     hv_store(hv, key, strlen(key), value, 0);

Finally, RETVAL is assigned a reference to the hash to be returned:

   // return a reference to the new hash   RETVAL = newRV_noinc((SV *)hv);

You might have expected to see a call to sv_2mortal() at the end, but XSUB return values of SV* are automatically mortalized by xsubpp. As proof, here's the relevant slice of the generated MIME.c:

   // return a reference to the new hash   RETVAL = newRV_noinc((SV *)hv); #line 52 "MIME.c"                ST(0) = RETVAL;       sv_2mortal(ST(0));

The generated code places RETVAL into the return stack and is then mortalized. Since the reference to hv was created with newRV_noinc(), the hv has a refcount of 1 and will live as long RETVAL does. The same is true of the SVs stored in hv—they have a refcount of 1 and will be freed when hv is freed. The end result is a chain of Perl objects with refcounts of 1 anchored by a single mortal reference. When building up complex data structures in XS, this is the end result you should be working toward.

Returning Multiple Values in List Context

The new interface for Gnome::MIME::type_data() is a big improvement. Now a user who wants to open a file can do something like the following:

 $type_data = Gnome::MIME::type_data("text/html"); $program = $type_data->{open};

However, inevitably a less-experienced user is going to mix up the hash reference with a hash and try something like this:

 %type_data = Gnome::MIME::type_data("text/html"); $program = $type_data{open};

This will only work if Gnome::MIME::type_data() is smart enough to return a list of key/value pairs in list context. Supporting this usage provides a chance to demonstrate a new technique for building XSUBs: using PPCODE to access the return stack directly. The complete listing for the new XSUB is in Listing 9-8.

Listing 9-8: XSUB for Gnome::MIME::type_data() with Multiple-Value Return

 void type_data(type)   const gchar * type PREINIT:   GList *keys, *iter;   HV *hv;   SV *value;   char *key; PPCODE:   // initialize hash   hv = newHV();   // get GList of keys for this type   keys = gnome_mime_get_keys(type);   // iterate through keys   for (iter = keys; iter; iter = iter->next) {     // get the key from the iterator     key = iter->data;     // create a new SV and load it with the value for this key     value = newSVpv(gnome_mime_get_value(type, key), 0);     // store the key/value pair in     hv_store(hv, key, strlen(key), value, 0);   }   // free keys GList   g_list_free(keys);   // test context with GIMME_V   if (GIMME_V == G_ARRAY) {     // list context - return a list of key/value pairs     IV count = hv_iterinit(hv);     IV i;     I32 len;   // loop over key/value pairs   for (i = 1; i <= count; i++) {     value = hv_iternextsv(hv, &key, &len);     // push key and value     XPUSHs(sv_2mortal(newSVpvn(key, len)));     XPUSHs(sv_mortalcopy(value));   }   // free hv explicitly   SvREFCNT_dec((SV *)hv);   // return two SVs for each key in the hash   XSRETURN(count * 2); } // G_SCALAR or G_VOID context - return a reference to the new hash XPUSHs(sv_2mortal(newRV_noinc((SV *)hv)));

The first change in the XSUB is in the return value. By using PPCODE instead of CODE, the XSUB is responsible for managing the return stack. As a result, the return type is set to void so that xsubpp knows not to provide RETVAL:

 void type_data(type)    const gchar * type

After that, aside from using PPCODE instead of CODE, the function begins much like the XSUB in Listing 9-7. The first new statement is as follows:

   // free keys GList   g_list_free(keys);

This line used to live in the CLEANUP block, but PPCODE blocks cannot have CLEANUP blocks, so the call to g_list_free() is moved into the main code body.

The next section uses a new macro, GIMME_V, to examine the context of the current call. GIMME_V is the XS author's version of Perl's wantarray built-in. It can return three possible values—G_ARRAY,^[3]G_SCALAR, and G_VOID. In this case, I test for list context:

   // test context with GIMME_V   if (GIMME_V == G_ARRAY) {

If you're in list context, the code iterates through the hash and pushes key/value pairs onto the return stack. The hash iteration code should look familiar from the "Hash Values (HV)" section in Chapter 8, but instead of printing out keys and values, I'm pushing them onto the return stack with XPUSHs:

   // loop over key/value pairs   for (i = 1; i <= count; i++) {     value = hv_iternextsv(hv, &key, &len);     // push key and value     XPUSHs(sv_2mortal(newSVpvn(key, len)));     XPUSHs(sv_mortalcopy(value));   }

Notice that each value pushed onto the stack is made mortal. If this weren't done, then this subroutine would leak memory on every call.

After pushing the key/value pairs onto the return stack, the code explicitly frees the HV used by the XSUB:

   // free hv explicitly   SvREFCNT_dec((SV *)hv);

At this point the HV and all SVs contained in it as values are freed. This is the reason that when pushing the values onto the stack I used sv_mortalcopy(). Actually, a slightly more efficient implementation would be as follows:

   SvREFCNT_inc(value);       // value will now survive the destruction of hv   XPUSHs(sv_2mortal(value)); // and is mortal on the return stack

This avoids the copy of the value at the expense of some dangerous refcount manipulation. Now when the HV is freed, the value SVs will still be live with a refcount of 1 and mortal. This is dangerous since this kind of manipulation is easy to get wrong, resulting in memory leaks or crashes. Using sv_mortalcopy() offers a simpler solution at a the expense of a small amount of time and memory.

Once the list of return values is pushed onto the stack, the code returns early with the XSRETURN macro:

   // return two SVs for each key in the hash   XSRETURN(count * 2);

This macro takes as an argument the number of items pushed on the stack, in this case twice the number of keys in the hash.

In scalar or void context, the code behaves the same as before, returning a reference to the hash:

   // G_SCALAR or G_VOID context - return a reference to the new hash   XPUSHs(sv_2mortal(newRV_noinc((SV *)hv)));

The only difference here is that since I'm in a PPCODE block, I have to manually call XPUSHs and mortalize the return value. Note that an XSRETURN(1) isn't required since xsubpp will automatically provide a return at the end of the function.

With this new XSUB, the test cases can now be written to work with both scalar and list context calls:

 # test type_data in scalar context my $type_data = Gnome::MIME::type_data("image/gif"); ok($type_data->{open}, "image/gif has an open key"); # test type_data in list context my %type_data = Gnome::MIME::type_data("image/gif"); ok($type_data{open}, "image/gif has an open key");

Now, if that's not quality service I don't know what is!

Writing a Typemap

In terms of Gnome::MIME, wrapping gnome_mime_get_keys() and gnome_mime_get_value() as a single function returning a hash was a good choice. However, exploring an alternate implementation will provide a look at an important aspect of XS development: typemap creation. In particular, imagine that you wanted to provide an interface similar to one passed over earlier:

 $keys = Gnome::MIME::get_keys("image/gif"); foreach $key (@$keys) {    $value = Gnome::MIME::get_value("image/gif", $key);    print "$key => $value\n"; }

In the preceding code, get_keys() returns a reference to an array of keys that are then used to call get_value().

To get started, I would create two literal XSUBs:

 MODULE = Gnome::MIME PACKAGE = Gnome::MIME PREFIX = gnome_mime_ GList * gnome_mime_get_keys(type)   const char * type const char * gnome_mime_get_value(type, key)   const char * type   const char * key

But this generates a compilation error:

 Error: 'GList *' not in typemap in MIME.xs, line 10

The last time you saw this error, it was for const char * and const gchar *, and the solution was to simply alias those types to the type used for char *. In this case, the solution won't be so easy—there's no existing typemap with the behavior I'm looking for. In particular, I want a typemap that will take a GList* and return a reference to an AV*. You can find the completed typemap file in Listing 9-9.

Listing 9-9: typemap File with GList* Typemap

 const char  *    T_PV const gchar *    T_PV GList       *    T_GLIST INPUT T_GLIST        croak("GList * input typemap unimplemented!"); OUTPUT T_GLIST        $arg = GList_to_AVref($var);

A typemap file has three sections. The first is a list of C type names and corresponding typemap names. This allows a group of C types to share a single typemap. This section already has two lines in my typemap file:

 const char  *    T_PV const gchar *    T_PV

I'll add a line for the GList* type:

 GList *       T_GLIST

The next section in a typemap is the INPUT section. An INPUT typemap describes how to translate from an SV* to the specified type. It is used when an XSUB takes the type as a parameter. Since gnome_mime_get_keys() returns GList* and nothing I'm wrapping uses a GList* as a parameter, I'll leave this one unimplemented:

 INPUT T_GLIST         croak("GList * input typemap unimplemented!");

Next comes the OUTPUT section, which specifies how to turn the type into an SV*:

 OUTPUT T_GLIST         $arg = GList_to_AVref($var);

Typemap code uses two placeholder variables: $arg and $var. In an OUTPUT typemap, $var is a variable of the type being output and $arg is the SV* to be returned. As you can see, my OUTPUT typemap calls the GList_to_AVref() function. This function doesn't exist in either Perl or Gnome, so I'll have to write it!

Where to place external functions like GList_to_AVref() is a matter of preference. Some XS coders prefer to put them in a separate .c file compiled and linked separately. I prefer to place them in the C section above the XSUB declarations, and that's the way I'll do it here (see Listing 9-10).

Listing 9-10: GList_to_AVref() Function Included in MIME.xs

 #include "EXTERN.h" #include "perl.h" #include "XSUB.h" #include <gnome.h> SV * GList_to_AVref(GList *list) {   GList *iter;       // list iterator   AV *av = newAV();  // initialize new array   // iterate through GList   for (iter = list; iter; iter = iter->next) {     // push data onto array     av_push(av, newSVpv(iter->data,0));   }   // free glist passed into function   g_list_free(list);   // return a reference to the new array   return sv_2mortal(newRV_noinc((SV *)av)); }

The code for the function should be easy enough to understand. It simply takes a GList* and adds each data item contained inside to an array, returning a new mortal reference to the array constructed. Notice that the function also frees the GList*:

   // free glist passed into function   g_list_free(list);

This could just as well have been done in a CLEANUP block, but putting it in the typemap function provides the cleanest access for the XSUB.

With this XSUB in place, the test code is updated to use the new interface:

 # test get_keys and get_value $keys = Gnome::MIME::get_keys("image/gif"); isa_ok($keys, 'ARRAY', "get_keys return"); foreach $key (@$keys) {   ok(Gnome::MIME::get_value("image/gif", $key), "got value for \"$key\""); }

The best way to learn to program typemaps is to examine the typemaps that come with Perl. You'll find them in a file called typemap in the library directory where the ExtUtils modules are installed. On my system, the path to typemap is /usr/lib/perl5/5.6.1/ExtUtils/typemap, but different platforms will place the file in different locations.

^[3]Which probably should be G_LIST since there's no such thing as "array context!"