The Perl C API | Writing Perl Modules for CPAN

Chapter 8 - Programming Perl in C

by?Sam Tregar?

Apress ? 2002

Companion Web Site

To write Perl modules in C, you need to have a working knowledge of the Perl C API (Perl API for short). This section will give you the basics, but you should know where to go for the details: perlapi. The perlapi documentation that comes with Perl is the Perl API bible. Nearly every C function and macro that Perl supports is listed there along with descriptions generated from comments in the Perl source.

Data Types

For every data type available to Perl programmers, there is a corresponding C type (see Table 8-1). Perl's data types are a lot like objects in that they each have a set of functions that operate on them. Note that these data types are always manipulated through pointers.

Table 8-1: Perl Data Types
Type	Full Name	Perl Example

`SV*`	Scalar value	`$scalar`

`AV*`	Array value	`@array`

`HV*`	Hash value	`%hash`

`CV*`	Code value	`&sub`

`GV*`	Glob value	`*glob`

Perl's data types are like objects in another sense—they support a lightweight form of polymorphism. Polymorphism is the property of an object in a derived class to behave like an object of a parent class. In this case the inheritance tree is simple—SV behaves as the parent class, and all the other types derive from SV. The Perl internals exploit this by using function parameter and return types of SV* to actually contain any of the available data types.

Scalar Values (SV)

The SV type represents a Perl scalar (that is, $scalar). A scalar may contain a signed integer (IV), an unsigned integer (UV), a floating-point number (NV), or a string value (PV). These types are typedef aliases for fundamental C types. For example, IV may be a typedef for int or long. NV is usually a typedef for double. You can find out what the underlying types are on your system using the perl –V switch. For example, here's the output on my system:

 $ perl -V:ivtype -V:uvtype -V:nvtype ivtype='long'; uvtype='unsigned long'; nvtype='double';

Notice that PV wasn't included here—that's because PV is always char * regardless of the platform.

Perl uses aliases for C's fundamental types to improve portability; C's types can differ wildly between platforms, but Perl's aliases maintain a modicum of consistency. Perl's IV and UV types are guaranteed to be at least 32-bits wide and large enough to safely hold a pointer. For exact bit widths, Perl contains typedefs called I8, U8, I16, U16, I32, and U32 that are guaranteed to be at least the size specified and as small as possible.

Like all Perl data types, SV is an opaque type. An opaque type is one that you are not intended to interact with directly. Instead, you call functions that operate on the data type. This means that you should never dereference a pointer to a Perl data type directly—doing so exposes the underlying implementation of the data type, which may change wildly between Perl versions.

Creation

The simplest way to create a new SV is with the NEWSV macro:

 SV *sv = NEWSV(0,0);

The first parameter is an "id" used to detect memory leaks; unless you're doing memory debugging, you can safely use zero. The second parameter can be used to preallocate memory inside the SV for strings. If you know you'll be using the SV to hold a large string, then you can potentially improve performance by preallocating the space.

In practice, NEWSV is rarely used. This is because the Perl API supplies convenience functions to create SVs directly from the available value types:

 SV *sv_iv, *sv_uv, *sv_nv, *sv_pv; sv_iv = newSViv(-10);        // sv_iv contains the signed integer value -10 sv_uv = newSVuv(10);         // sv_uv contains the unsigned integer value 10 sv_nv = newSVnv(10.5);       // sv_nv contains the floating-point value 10.5 sv_pv = newSVpv("ten", 0);   // sv_pv contains the string "ten", the second                              // parameter tells Perl to compute the length with                              // strlen()

A more efficient version of newSVpv() called newSVpvn() doesn't offer automatic strlen() calling:

 sv_pv = newSVpvn("ten", 3); // second parameter gives the length of "ten"

A version that uses sprintf()-style format strings, newSVpvf(), is also available:

 sv_pv = newSVpvf("%d", 10); // sv_pv contains the string "10"

Note

The comments used in the C examples are actually C++-style comments (// comment). This was done to improve readability and reduce the space required by the comments. Most modern C compilers will accept these comments, but if yours doesn't you'll need to change them to C-style comments (/* comment */) or omit them entirely.

Type Checking

You can test the type of an SV using the SV*OK macros. Specific versions exist for the specific types:

 if (SvIOK_notUV(sv)) warn("sv contains an IV."); if (SvIOK_UV(sv))    warn("sv contains a UV."); if (SvNOK(sv))       warn("sv contains an NV."); if (SvPOK(sv))       warn("sv contains a PV.");

There are also tests that combine one or more of the preceding tests:

 if (SvNIOK(sv))    warn("sv contains a number of some type (IV, UV or NV)"); if (SvIOK(sv))     warn("sv contains an integer of some type (IV or UV)");

Getting Values

The following macros return the value stored inside the SV as the requested type. If necessary, they will convert the value to the requested type.

 IV iv = SvIV(sv);         // get an IV from sv UV uv = SvUV(sv);         // get a UV from sv NV nv = SvNV(sv);         // get an NV from sv STRLEN len; char *pv = SvPV(sv, len); // get a PV from sv, setting len to the                           // length of the string

Note

If an SV contains a nonnumeric string, then calling SvIV(), SvUV(), or SvNV() on it will result in the value 0. To find out if an SV contains something that will result in a valid number, use the looks_like_number() function.

These functions can have a side effect—they may change the internal representation of the SV. For example, after a call to SvPV(), the stringified form of the SV will be cached inside the SV, and both SvIOK and SvPOK will return true. As a result, future calls to SvPV on this scalar will use the cached copy instead of doing the conversion again. This has two implications: First, the type of an SV may change even if it isn't written to, and second, the memory usage of an SV may grow even if it isn't written to.

There is a version of SvPV that is guaranteed to produce an SV with only a string value, SvPV_force. The behavior is the same as SvPV, but afterward only SvPOK will return true and only the string value will be retained inside the SV. This function is necessary if you're going to be changing the string value directly with the SvPVX macro introduced later.

Setting Values

Given an initialized SV, you can load it with any of the value types using the sv_set* family of functions:

 sv_setiv(sv, -10);    // sv contains the signed integer (IV) -10 sv_setuv(sv, 10);     // sv contains the unsigned integer (UV) 10 sv_setnv(sv, 10.5);   // sv contains the unsigned integer (UV) -10 sv_setpv(sv, "10");   // sv contains the string value (PV) "10"

The PV forms also come in a few more flavors. There's one that uses an sprintf()-style format string:

 sv_setpvf(sv, "ten: %d", 10); // sv contains the string value (PV) "ten: 10"

and one that takes a length argument to allow for strings that aren't null terminated or that contain null bytes:

 sv_setpvn(sv, "10", 2); // sets sv to the 2-character string "10"

Direct Access

If you know the type of an SV, then you can directly access the underlying value type using the an Sv*X macro. This is useful for two reasons—it is faster since it avoids testing the type of the data, and it is lvaluable. In C, a macro is said to be lvaluable if it may legally be used as an lvalue. The most common lvalue is the left-hand side of an assignment operator. Sv*X allows you to efficiently set the value of an SV without needing to call a function.

 SvIVX(sv_iv) = -100;            // directly set the IV inside sv_iv to -100 SvUVX(sv_uv) = 100;             // directly set the UV inside sv_uv to 100 SvNVX(sv_nv) = 100.10;          // directly set the NV inside sv_nv to 100.5 warn("PV: %s", SvPVX(sv_pv));   // directly access the string inside sv_pv

Note that you cannot safely use the return value from SvPVX() as an lvalue— doing so would change the string pointed to by the SV and would cause an instant memory leak. Other bad things would happen too, because the SV structure keeps track of more than just the pointer to the string—it also tracks the length and an offset into the string buffer where the string begins.

Caution

Be careful with Sv*X macros; if you use one without first checking that the SV is of the correct type, you might get a segmentation fault, or worse, silently corrupt nearby data in memory!

After using an Sv*X macro to update the value inside an SV, it's often necessary to update the type information of the SV. This is because SVs will cache conversion values when converting between types. You need to tell the SV to invalidate any other cached representations using a macro of the form Sv*OK_only(). For example:

 SvIVX(sv_iv) = 100;    // directly set the IV inside sv_iv to 100 SvIOK_only(sv_iv);     // invalidate non-IV representations inside sv_iv

In general it is better to use the sv_set functions rather than Sv*X macros. However, in some cases the performance improvement can make it worth the risk.

String Functions

Just like Perl, the Perl API contains functionality to make string processing easier. There are a set of functions for string concatenation:

 sv_catpv(sv, "foo");          // adds "foo" to the end of sv sv_catpvn(sv, "foo", 3);      // adds "foo" to the end of sv, with a length arg sv_catpvf(sv, "ten: %d", 10); // adds "ten: 10" to the end of sv sv_catsv(sv_to, sv_from);     // adds the contents of sv_from to the                               // end of sv_to

Getting the length of an SV is done as follows:

 STRLEN len = sv_len(sv);

If you want to grow the size of the string, do the following:

 char *new_ptr = sv_grow(sv, 1024); // grows sv to 1k and returns a pointer to                                    // the new character buffer

Truncate the string in this manner:

 SvCUR_set(sv, 10);                 // the SV is now 10 bytes long

Inserting a string into the middle of an SV, similar to the substr built-in in Perl, is done as follows:

 sv_insert(sv, offset, length, "string to insert", strlen("string to insert"));

The next example shows how to remove characters from the start of a string:

 SV *sv = newSVpv("Just another Perl hacker.", 0); sv_chop(sv, SvPVX(sv) + 13); // sv contains "Perl hacker" after this

The second parameter to sv_chop is a pointer into the string to the new first character.

If you need to do substring searches over a large string, you can speed up the process using the Boyer-Moore search algorithm.^[3] This is done by first compiling the SV to be searched for with fbm_compile()^[4] and then searching with fbm_instr(). For example, here's a function that takes two SVs and returns the offset o f the second inside the first or -1 on failure. This function uses SvPVX and SvEND^[5] so it's only safe to call if both SVs are SvPOK()—real code would include the necessary checks and conversions of course!

 int fast_search (SV *source, SV *search) {   char *found; // pointer to hold result of search   // compile search string using Boyer-Moore algorithm   fbm_compile(search, 0);   // conduct the search for the search string inside source   found = fbm_instr(SvPVX(source), SvEND(source), search, 0);   // if the search failed, return -1   if (found == Nullch) return -1;   // return the offset of search within source   return found SvPVX(source); }

In my tests (looking for a single word in a string containing all of /usr/dict/words), this version was between two and three times faster than a version that used Perl's index() function.

Comparison Functions

The Perl API contains a set of calls to make comparing SVs easier. First, there are functions to test whether an SV is true in the Perl sense:

 if (sv_true(sv)) warn("sv is true!");

Tests for equality can be expressed using these two functions:

 if (sv_eq(sv1, sv2))       warn("The SVs are equal"); if (sv_cmp(sv1, sv2) == 0) warn("The SVs are equal");

The Perl API also comes with a full set of normal string comparison functions. These are useful when you have an SV and a normal C string to compare. You might be tempted to "upgrade" the string to an SV and use sv_eq(), but that's generally not an efficient solution.

 char *string = SvPV(sv, len); // extract string from an SV if (strEQ(string, "foo"))     warn("SV contains foo"); if (strNE(string, "foo"))     warn("SV does not contain foo"); if (strGT(string, "foo"))     warn("SV is greater than foo"); if (strGE(string, "foo"))     warn("SV is greater than or equal to foo"); if (strLT(string, "foo"))     warn("SV is less than foo"); if (strLE(string, "foo"))     warn("SV is less than or equal to foo"); if (strnEQ(string, "foo", 3)) warn("SV starts with foo");

You can test for undef by comparing the SV* to the globally defined PL_sv_undef:

 if (sv == &PL_sv_undef) warn("sv is undefined!");

Notice that the preceding test uses the & operator to get the address of PL_sv_undef and compares it to the address of the SV since SVs are always handled using pointers. A common mistake is to leave off the & on PL_sv_undef and end up with confusing compiler errors about type mismatches.

Array Values (AV)

Perl's arrays are represented in C by the AV type. Underneath the covers an AV is simply a normal C array of SVs with some bookkeeping information to make certain operations faster. However, just like SVs, AVs are opaque types, and you must work with them through the supplied set of functions.

Creation

The simplest way to create an AV is to use the newAV() function:

 AV *av = newAV();

If you have an array of SV*s, then you can create an array from them using av_make():

 AV *av; SV *sv_array[3]; sv_array[0] = newSVpv("foo",0); sv_array[1] = newSVpv("bar",0); sv_array[2] = newSVpv("baz",0); av = av_make(3, sv_array);      // create an array from the three SVs

Fetching Values

AVs support access by index as well as the familiar pop and shift operations. You can fetch an SV from an array using the av_fetch() function:

 SV **svp; svp = av_fetch(av, 10, 0); // fetch $av[10] (the 0 indicates this isn't an                            // lvalue) if (!svp) croak("fetch failed: av doesn't have a tenth element!");

Notice that the return value from av_fetch() is a pointer to a pointer to SV (that is, an SV**) not a normal pointer to SV (that is, SV*). If you try to fetch a value that doesn't exist, then av_fetch() will return a NULL pointer. Be sure to check the return value before dereferencing or you'll end up with a segmentation fault if the element doesn't exist. The preceding code checks the return value and calls croak()— the Perl API version of die— if av_fetch() returns NULL.

However, you can skip testing the return value from av_fetch() if you know the element exists. You can get this information using av_exists(), which tests whether an index exists in an AV:

 SV *sv; if (av_exists(av, 9)) {        // check that the 10th element exists    sv = *(av_fetch(av, 9, 0)); // safely trust av_fetch to return non-NULL } else {    croak("av doesn't have a tenth element!"); }

You can get the same effect using av_len() to check the length of the array:

 SV *sv; if (av_len(av) >= 9) {         // check that $#av >= 9    sv = *(av_fetch(av, 9, 0)); // safely trust av_fetch to return non-NULL } else {    croak("av doesn't have a tenth element!"); }

The av_len() function works the same way as the $#array magic value—it returns the last valid index in an array.

Combining the preceding functions, you can now write a function to iterate through an array and print out each value:

 void print_array (AV *av) {    SV   *sv;     // SV pointer to hold return from array    char *string; // string pointer to hold SV string value    STRLEN len;   // unused length value for SvPV()    I32 i = 0;    // loop counter    // loop over all valid indexes    for (i = 0; i <= av_len(av); i++) {       sv = *(av_fetch(av, i, 0));   // get the SV for this index       string = SvPV(sv, len);       // get a stringified form of the SV       printf("%s\n", string);       // print it out, one value per line     } }

As I mentioned earlier, AVs also support a version of Perl's pop and shift builtins. These functions, av_pop() and av_shift(), return regular SV* pointers rather than the SV** pointers returned by av_fetch(). Using av_shift(), you could write a destructive version of the for loop just shown:

    for (i = 0; i <= av_len(av); i++) {       sv = av_shift(av);            // shift off the SV for this index       string = SvPV(sv, len);       // get a stringified form of the SV       printf("%s\n", string);       // print it out, one value per line    }

Or, using av_pop(), create a version that prints them out in the reverse order:

    for (i = 0; i <= av_len(av); i++) {       sv = av_pop(av);              // pop off the SV for this index       string = SvPV(sv, len);       // get a stringified form of the SV       printf("%s\n", string);       // print it out, one value per line    }

Storing Values

The Perl API offers two ways to store values in an AV, av_store() and av_push(). For example:

 SV *sv = newSVpv("foo", 0); av_store(av, 9, sv);        // $av[9] = "foo"

This will work fine if you know the AV has room for a tenth element. If not, you need to first grow the array with a call to av_fill():

 av_fill(av, 9); // set av's length to 9

This works the same as setting the $#array magic value in Perl—it will truncate or grow the length of the array to the supplied value as required.

If you only need to add elements to the end of the array, av_push() offers a simpler solution. av_push() will automatically extend the array as it adds elements, so you don't need to call av_fill():

 SV *sv = newSVpv("foo", 0); av_push(av, sv);            // push(@av,"foo");

The Perl API does provide an av_unshift() function, but it doesn't work the same as the Perl unshift built-in. Instead of adding elements to the front of the array, it only adds empty slots. You then need to fill those slots with av_store(). For example, to unshift the string "foo" onto an AV:

 SV *sv = newSVpv("foo", 0); av_unshift(av, 1);          // unshift(@av, undef); av_store(av, 0, sv);        // $av[0] = "foo";

It's a bit more work, but the result is identical to Perl's unshift built-in.

Deletion

An entire AV can be cleared with the av_clear() function:

 av_clear(av); // @av = ();

or you can use it to clear just a single element:

 av_delete(av, 9, 0); // delete the tenth element (the last arg is ignored)

Hash Values (HV)

Perl's hashes are represented in the Perl API as HVs. The HV type is the most complicated of the Perl data types, and it has many more functions and macros associated with it than can be described here. I'll give you a subset of the available functions that will let you do most of what you'll need to do with hashes. In particular, I've avoided discussing the HE type that combines keys and values in one structure. For these functions, see the perlapi documentation.

Creation

HVs have a single constructor, newHV():

 HV *hv = newHV();

Fetching Values

The simplest way to fetch values from a hash is with hv_fetch():

 SV **svp; // fetch $hv{foo} (last arg indicates lvalue status) svp = hv_fetch(hv, "foo", strlen("foo"), 0); if (!svp) croak("fetch failed: hv does not contain value for key foo");

Notice that this call is similar to av_fetch(), and similarly returns an SV** that may be NULL if the requested key does not exist. Just like av_exists(), hv_exists() provides a simple way to avoid dealing with SV**s:

 SV *sv; // check that $hv{foo} exists if (hv_exists(hv, "foo", strlen("foo"))) {    // safely trust hv_fetch to return non-NULL    sv = *(hv_fetch(hv, "foo", strlen("foo"), 0)); } else {   croak("fetch failed: hv does not contain value for key foo"); }

Aside from reading a specific key, the other common way to read from a hash is to iterate through its keys and values. This is done using the hv_iter functions. For example, here's a function that prints out the keys and values in a hash:

 void print_hash (HV *hv) {    SV *sv;    I32 i, count;    char *key_string;    STRLEN len;    // initialize the iteration    count = hv_iterinit(hv);    // loop over key/value pairs    for (i = 1; i <= count; i++) {       sv = hv_iternextsv(hv, &key_string, (I32*) &len);       printf("%s : %s\n", key_string, SvPV(sv, len));    } }

The preceding function uses two new Perl API calls, hv_iterinit() and hv_iternextsv(). The first initializes the iteration and returns the number of key-value pairs in the hash:

 count = hv_iterinit(hv);

Then a loop is run for count iterations calling hv_iternextsv(). The call takes three parameters, the HV* for the hash, a pointer to a char* to store the key, and a pointer to an integer to store the length of the key. The function returns an SV* for the value of this key.

Storing Values

Values are stored in a hash using the hv_store() function. For example, the following stores the value 100 under to key fuel_remaining in hv:

 SV *sv_value = newSViv(100); hv_store(hv, "fuel_remaining", strlen("fuel_remaining"), sv_value, 0);

The last value allows you to pass in a precomputed hash value; setting it to 0 tells Perl to compute the hash value for you. Notice that this function doesn't have the restrictions that av_store() does—HVs grow automatically, and you don't have to extend them manually to store new values.

Deletion

An entire HV can be cleared with the hv_clear() function:

 hv_clear(hv); // %hv = ();

Or you can use it to clear just a single key:

 hv_delete(hv, "foo", strlen("foo"), 0); // delete $hv{foo}

Reference Values (RV)

In Perl, complex data structures are built using references. For example, if you want to create an array of hashes, you do it by assigning references to arrays as hash values:

 %hash_of_arrays = (     foo => [ 1, 2, 3 ],     bar => [ 4, 5, 6 ], );

In the Perl API, references are represented by SVs containing RV values. Much like SVs can contain IV or PV values, SVs can also contain RV values that reference other objects.

Creation

You can create a new RV using the newRV_inc() function:

 SV *sv = newSVpv("foo",0); // $sv = "foo"; SV *rv = newRV_inc(sv);    // $rv = \$sv;

This function officially takes an SV* as a parameter, but it can actually be used with any Perl type that you can cast to an SV* (such as an AV* or an HV*). This pattern is repeated across the entire RV API—instead of having separate functions for SV, AV, and HV references, there is a single API, and you must cast everything to and from SV*. For example, the following creates the hash of arrays data structure shown earlier:

 HV *hash_of_arrays_hv = newHV(); Asidebar *foo_av      = newAV(); AV *bar_av            = newAV(); // fill in arrays push_av(foo_av, newSViv(1)); push_av(foo_av, newSViv(2)); push_av(foo_av, newSViv(3)); push_av(bar_av, newSViv(4)); push_av(bar_av, newSViv(5)); push_av(bar_av, newSViv(6)); // create references and assign to hash hv_store(hash_of_arrays_hv, "foo", 3, newRV_inc((SV*)foo_av), 0); hv_store(hash_of_arrays_hv, "bar", 3, newRV_inc((SV*)bar_av), 0);

Once created, an RV can be distinguished from a normal SV using the SvROK macro. For example, this code would print "ok" twice after the preceding code:

 if (SvROK(*(hv_fetch(hash_of_arrays_hv, "foo", 3, 0)))) printf("ok\n"); if (SvROK(*(hv_fetch(hash_of_arrays_hv, "bar", 3, 0)))) printf("ok\n");

Another way to create a reference is to use one of the sv_setref functions. These functions take an initialized SV and one of the value types (IV, UV, and so on) and creates a new SV. They then make the SV passed as an argument a reference to the new SV. Here are some examples:

 SV *sv_rv = NEWSV(0,0); sv_setref_iv(sv_rv, Nullch, -10);  // sv_rv now a ref to an SV containing -10 sv_setref_uv(sv_rv, Nullch, 10);   // sv_rv now a ref to an SV containing 10 sv_setref_nv(sv_rv, Nullch, 10.5); // sv_rv now a ref to an SV containing 10.5 sv_setref_pvn(sv_rv, Nullch, "foo", 3); // sv_rv now a ref to an SV                                         // containing "foo"

The Nullch argument indicates that I'm not creating a blessed reference (that is, an object). If you pass a class name here, you'll create a blessed reference:

 sv_setref_iv(sv_rv, "Math::BigInt", -10); // sv_rv is now a reference blessed                                           // into the Math::BigInt class.

One function in the sv_setref family was left out of the preceding list: sv_setref_pv(). This function is a bit of an oddball—it doesn't copy the string passed to it. Instead it copies the pointer itself into the new SV. It's easy to misuse this function; for example:

 sv_setref_pv(sv_rv, Nullch, "foo"); // ERROR!

This is an error because I just copied a pointer to an immutable string into a new SV. When the SV eventually tries to call free() on the string, it will cause the program to crash or at least misbehave. Instead the pointer passed to sv_setref_pv() must be dynamically allocated. I'll cover Perl's API for dynamically allocating memory later in the "System Wrappers" section. In general, it's best to avoid this function unless you have a good reason to want to copy a pointer into a new SV.

Type Checking

You can find out what type of object an RV points to using the SvTYPE macro on the result of dereferencing the RV with SvRV. For example:

 if (SvTYPE(SvRV(sv_rv)) == SVt_PVAV) printf("sv_rv is a ref to an AV.\n"); if (SvTYPE(SvRV(sv_rv)) == SVt_PVHV) printf("sv_rv is a ref to an HV.\n"); if (SvTYPE(SvRV(sv_rv)) == SVt_IV ||    SvTYPE(SvRV(sv_rv))  == SVt_NV ||    SvTYPE(SvRV(sv_rv))  == SVt_PV)   printf("sv_rv is a ref to an SV.\n");

You can find the complete table of possible SvTYPE return values in perlapi.

Dereferencing

Once you know what kind of object an RV points to, you can safely cast the return value to the correct type:

 AV *av; if (SvTYPE(SvRV(sv_rv)) == SVt_PVAV) {    av = (AV *) SvRV(sv_rv); // safely cast dereferenced value to an AV } else {   croak("sv_rv isn't a reference to an array!"); }

Caution

Always check your RVs with SvROK and SvTYPE before casting them. It's all too common for C modules to crash when passed a normal scalar where they were expecting a reference. It's much nicer to print an error message!

Memory Management

So far I've ignored memory management. As a result, most of the preceding examples will leak memory.^[6] This is because, unlike Perl, C expects you to manage both allocation and deallocation. The Perl API offers some help in this area, and learning to use it correctly is the key to creating C modules that don't leak memory.

Reference Counts

Perl uses an explicit reference-counting garbage collector to manage memory. This means that every object (SV, AV, HV, and so on) created by Perl has a number associated with it called a reference count, or refcount for short. A reference count is simply a count of the number of objects that refer to the object. When the reference count of an object reaches 0, the memory used by the object is freed by the garbage collector.

Objects start their lives with a refcount of 1. When a reference to the object is created, its refcount is incremented. When an object goes out of scope, its refcount is decremented. Finally, when a reference is removed, the refcount of the object is decremented. The object is freed when its refcount reaches 0.

Most variables don't get referenced and simply go from having a refcount of 1 to having a refcount 0 when they go out of scope:

 {    my $a = "foo";   # $a has a refcount of 1 } # $a goes out of scope and has its refcount decremented. Since its refcount is # now 0, the memory used by $a is freed.

Even when references are created they are normally confined to a single scope:

 {    my $a = "foo";    # $a has a refcount of 1    my $b = \$a;      # $b has a refcount of 1, $a has a refcount of 2 } # $a and $b go out of scope and have their refcounts decremented. Since $b # referenced $a, $a has its refcount decremented again. Now both $a and $b # have refcounts of 0 and their memory is freed.

Things start getting complicated when an object is referenced by a variable from another scope. Here's a simple example:

 my $a;            # $a has a refcount of 1 {    my $b = "foo"; # $b has a refcount of 1    $a = \$b;      # $a now references $b. $b has a refcount of 2 } # $b goes out of scope. $b has its refcount decremented and now has a # refcount of 1. $b is still live since $a has a reference to it. $a = 10; # $a no longer references $b. $b now has a refcount of 0 and is          # freed by the garbage collector.

Now that you understand reference counting, you can understand why circular references cause memory leaks. Consider this example:

 {   my $a;    # $a starts with a refcount of 1   my $b;    # $b starts with a refcount of 1   $a = \$b; # $b now has a refcount of 2   $b = \$a; # $a now has a refcount of 2 } # both $a and $b go out of scope and have their reference counts # decremented. Now they both have refcounts of 1. Both are still live # and the garbage collector cannot free them!

Inspecting Reference Counts

You can inspect the reference count of an object from C using the SvREFCNT macro:

 SV *sv = newSV(); SV *rv; printf("%d\n", SvREFCNT(sv)); // prints 1 rv = newRV_inc(sv);           // create a reference to sv printf("%d\n", SvREFCNT(sv)); // prints 2

As you can see in this example, newRV_inc() increments the reference count of the target object as it creates the new RV. If this isn't what you want, then you can use newRV_noinc(), which creates a reference and doesn't increment the reference count. If you create a new SV and then attach a reference to it with newRV_noinc(),when the reference is freed the SV will be too. This is such a cozy setup that the sv_setref functions use this process to introduce earlier work. The result is that you can forget about the original SV and only worry about freeing the RV.

The same procedure works with AVs and HVs—they can even use the same macros and functions through the magic of polymorphism:

 SV *hv = newHV(); SV *rv; printf("%d\n", SvREFCNT((SV *) hv)); // prints 1 rv = newRV_inc(hv);                  // create a reference to hv printf("%d\n", SvREFCNT((SV *) hv)); // prints 2

Explicitly Freeing Memory

A simple way to make sure you don't leak memory is to explicitly decrement the reference counts of the variables you create. This is done using the SvREFCNT_dec macro:

 SvREFCNT_dec(sv); // decrement sv's refcount

Perl's garbage collector works by freeing an object the moment its reference count reaches zero. After an SvREFCNT_dec that causes an object's refcount to reach zero, the object is no longer valid—calls that attempt operate on it will usually yield crashes or other unpleasant behavior.

Using SvREFCNT and SvREFCNT_dec, you can write a function to unconditionally free any Perl object:

 void free_it_now (SV *sv) {    while(SvREFCNT(sv)) SvREFCNT_dec(sv); }

But you shouldn't need to do something like this very often; in fact, if you do you should stop and consider what's wrong with the way you're managing the reference counts on your variables.

Implicitly Freeing Memory

The Perl API provides a way for you to hook into Perl's automatic garbage collection from C. The way this is done is by marking an SV, AV, or HV as mortal. Marking an object mortal is simply a way of deferring an SvREFCNT_dec until the end of the current scope. Here's an example that marks an SV as mortal using sv2_mortal():

 SV *sv = newSVpv("foo",0); // sv contains "foo" and has a refcount of 1 sv_2mortal(sv);            // sv is now mortal. At the next scope exit                            // SvREFCNT_dec(sv) will be called and the sv                            // will be freed.

This can be stated more succinctly, since sv_2mortal() returns the SV* passed to it:

 SV *sv = sv_2mortal(newSVpv("foo",0)); // creates sv and mortalizes it

Of course, just like SvREFCNT, sv_2mortal() isn't just for SVs. You can mortalize anything you can cast to an SV*:

 AV *av = (AV *) sv_2mortal((SV *) newAV()); // create a mortal AV HV *hv = (HV *) sv_2mortal((SV *) newHV()); // create a mortal HV

There are also two constructor versions just for SVs that come in handy occasionally:

 SV *mort_sv = sv_newmortal();     // create an empty mortal SV SV *mort_sv2 = sv_mortalcopy(sv); // create a mortal clone of sv

Caution

Be careful never to mortalize an object twice accidentally. This will result in SvREFCNT_dec being called twice, with possibly disastrous results.

So, if mortalizing an SV (or AV, HV, and so on) schedules it for a SvREFCNT_dec at the end of the current scope, when does that happen? The answer is not what you might expect. In Perl, a scope ends at the next } in the code:

 {        # scope start   my $a; }        # scope end

In C, things are a little more verbose:

 ENTER; SAVETMPS;                      // start a new scope SV *sv = sv_2mortal(newSVpv("foo",0); // create a mortal variable FREETMPS; LEAVE;                      // end a scope, freeing mortal                                       // variables with SvREFCNT == 1

Using the ENTER and SAVETMPS macros, you can start a new scope roughly the same way the Perl interpreter does. Then you can close the scope with FREETMPS and LEAVE. This triggers a SvREFCNT_dec on all variables mortalized inside the scope. The exact mechanics of Perl scopes are outside the reach of this chapter; for more details, see the perlguts documentation that comes with Perl.

It is worth noting that it is rarely necessary to create new scopes in C modules just to manage memory. This is because you can usually trust the Perl code calling your module to contain a scope when calling your code. You can usually mortalize variables without worrying about when exactly the current scope will end; the answer is usually "soon enough", which is usually good enough!

The Perl Environment

Often when you're writing a C module for Perl you'll need to interact with the Perl environment. The most basic means for this interaction—providing functions written in C that can be called from Perl—will be described later in 10. This section is about going the other way—calling back into Perl from C.

Accessing Variables

The simplest way to get information from Perl is to access global and package variables. The Perl API supports this with the get_ family of calls:

 SV *sv = get_sv("Data::Dumper::Purity", 0); // access $Data::Dumper::Purity AV *av = get_av("main::DATA", 1);           // create/access the global @DATA HV *av = get_hv("main::VALUES", 1);         // create/access the global %VALUES

Each of these calls take two parameters—the fully qualified name of the variable and a Boolean indicating whether to create the variable if it doesn't yet exist. By using a get_ function with the second argument set to true, you can create new variable in Perl space from C. If you set the second parameter to false, the calls will return NULL if the variable cannot be found.

Calling Perl Subroutines from C

The subroutine calling convention is probably the Perl API's most complicated feature. Fortunately for you, it's also its best documented one. Perl comes with an excellent manual page on the subject—perlcall. I'll demonstrate some simple examples here; you can find all the gritty details in perlcall.

Example 1: No Parameters, No Return Value

The simplest type of call is one that passes no parameters and accepts no return values. Let's say I've defined a subroutine in the package Hello called say_hello:

 sub say_hello {    print "Hello, world.\n"; }

To call this subroutine from C, I'd have to do the following:

 dSP;                                             // declare SP PUSHMARK(SP);                                    // setup for call call_pv("Hello::say_hello", G_NOARGS|G_DISCARD); // call say_hello with no args                                                  // and no return value

The dSP macro is necessary to declare the SP variable used by the PUSHMARK macro. These two macros set up the correct context for a call to call_pv(). The first argument is a string giving the name of the subroutine. The second specifies options for the call. In this case, combining G_NOARGS and G_DISCARD with | specifies a call with no arguments and no return value.

Example 2: One Parameter, No Return Value

A slightly more complicated call is one that passes a parameter. For example, let's say I modified say_hello to take an argument:

 sub say_hello {    my $who = shift;    print "Hello, $who.\n"; }

This code would be required to call it from C:

 dSP;                                    // declare SP ENTER; SAVETMPS;                        // start a new scope PUSHMARK(SP) ;                          // prepare to push args XPUSHs(sv_2mortal(newSVpv("Human",0))); // push a single parameter onto the                                         // argument stack PUTBACK;                                // done pushing arguments call_pv("Hello::say_hello", G_DISCARD); // make the call FREETMPS; LEAVE;                        // end the scope - freeing mortal                                         // variables

There are a few new things to notice here. First, this code creates a new scope using the ENTER, SAVETMPS, FREETMPS, and LEAVE macros you saw back in the "Memory Management" section. This is done to provide for mortal variables created for use as parameters as well as those created by the Perl code to be called. Second, a single argument is pushed onto the argument stack using the XPUSHs macro. Next, the PUTBACK macro is used to mark that the last of the arguments has been pushed onto the stack. Finally, call_pv() is used, just as in the first example, but this time only G_DISCARD is used as an option.

Example 3: Variable Parameters, One Return Value

For a slightly more realistic example, let's call the Data::Counter::count function defined a couple of chapters back. If you remember, this subroutine takes a variable number of arguments and returns the number of arguments it received:

 sub count { return scalar @_; }

To call this subroutine, I need the following code:

 dSP;                                       // declare SP SV *return                                 // declare an SV for the return value ENTER; SAVETMPS;                           // start a new scope PUSHMARK(SP) ;                             // prepare to push args XPUSHs(sv_2mortal(newSVpv("one",0)));      // push three parameters onto the stack XPUSHs(sv_2mortal(newSVpv("two",0))); XPUSHs(sv_2mortal(newSVpv("three",0))); PUTBACK;                                   // done pushing arguments call_pv("Data::Counter::count", G_SCALAR); // make the call SPAGAIN;                                   // refresh SP - it might have changed return = POPs;                             // get return value off the stack printf("count: %d\n", SvIV(return));       // print out the return as an integer FREETMPS; LEAVE;                           // end the scope - freeing mortal                                            // variables

This code is similar to the last example with a couple additions. First, I passed G_SCALAR as an option to call_pv(), indicating that I expect to get a single argument back. After the call, a new macro is used: SPAGAIN. This macro refreshes SP in case it changed during the call. Finally, the return value from the subroutine is popped off the stack with POPs. The "s" in POPs refers to the fact that it pops a scalar off the stack.

Note

The return value from a Perl subroutine obtained by POPs is a mortal variable. This means you must handle it before you end the scope with FREETMPS and LEAVE. After that point the variable will have been freed by the garbage collector and will no longer be accessible.

Example 4: Variable Parameters, Variable Return Values

In order to demonstrate multiple value returns, let's suppose that count() were modified to take a list of array refs as a parameter and returns a list of counts for each array:

 sub count { map { scalar @$_ } @_ }

Now count() would be called like this from Perl:

 @counts = Data::Counter::count([1,2,3],[4,5,6],[7,8,9]); # returns (3,3,3)

Here's what the equivalent call would look like from C, assuming I already have the preceding three arrays in the variables av1, av2, and av3.

 dSP;                                     // declare SP int num;                              // declare an int for the return count int i;                                   // declare a loop counter ENTER; SAVETMPS;                         // start a new scope PUSHMARK(SP) ;                           // prepare to push args XPUSHs(sv_2mortal(newRV_inc((SV*)av1))); // push three arrays onto the stack XPUSHs(sv_2mortal(newRV_inc((SV*)av2))); // by reference XPUSHs(sv_2mortal(newRV_inc((SV*)av3))); PUTBACK;                                 // done pushing arguments num = call_pv("Data::Counter::count", G_ARRAY); // make the call SPAGAIN;                                 // refresh SP - it might have changed // print out the returned counts, in reverse order for(i = num; i > 0; i++) {   printf("count %d: %d\n", i, SvIV(POPs)); } FREETMPS; LEAVE;                         // end the scope - freeing mortal                                          // variables

There are two new pieces in this example. First, the call is made using the G_ARRAY option indicating that I expect to get a list of return values back. Also, the return value from call_pv(), the number of values returned on the stack, is saved. Then a loop is run that prints out the values in reverse order. A more elaborate loop could be written to process the return values in the "correct" order, but since POPs works in reverse, it's easiest to follow suit.

Signaling Errors

There are two ways to return from a Perl subroutine—by using return() or by generating an exception with die(). The interface to die() from the C API is croak(), which supports a printf()-style interface for generating exception strings:

 croak("Big trouble, Indy, the %s has got the %s!", villain, bauble);

A warning, provided by the warn() function in Perl, can be produced using the warn() API function:

 warn("Ouch, don't touch that, %s!", SvPVX(name_sv));

This is an unconditional warning—it will always generate output no matter what the state of the warnings pragma. If you want to check whether warnings are on or not, you can use the isLEXWARN_on macro:^[7]

 if (isLEXWARN_on) warn("You use warnings; good for you!");

You can also test if a particular category of warnings is on using the ckWARN macro:^[8]

 if (ckWARN(WARN_DEPRECATED)) warn("Use of this function is deprecated.");

The constants used by this macro are lists in the warnings.h file in the Perl source.

System Wrappers

The Perl API provides a number of wrappers around common system functions. These allow you to do things like dynamically allocate memory and perform IO operations without needing to worry about the underlying platform details. Using these wrappers instead of calling the functions directly will improve the portability of your code.

Memory Allocation

Perl provides a wrapper around malloc() called New(). For example, the following allocates a buffer of 1024 bytes:

 char *buffer = NULL; New(0, buffer, 1024, char); if (buffer == NULL) croak("Memory allocation failed!");

The first argument to all of these functions is an ID used to track memory allocations during debugging; for most purposes it can be left at 0. The second parameter is a pointer variable to receive the newly allocated pointer. The third is the number of items to be allocated. Finally, the last argument is the type of object to be allocated, in this case a char.

New() allocates memory without initializing it. Use Newz() to allocate a zerofilled buffer:

 char *zbuffer = NULL; Newz(0, zbuffer, 1024, char); if (zbuffer == NULL) croak("Memory allocation failed!");

To access realloc(), use Renew:

 Renew(buffer, 2048, char); // increase buffer to 2048 bytes

The Perl API interface to free() is called Safefree():

 Safefree(buffer); // free buffer

You must call Safefree() on every pointer allocated with New() or Newz() to avoid leaking memory. No garbage collection is performed on buffers allocated with these functions.

Also provided are wrappers around memcpy() (Copy()) and memmove() (Move()) that can be used to efficiently copy sections of memory. For example:

 Copy(src, dst, 1024, char);      // copy 1024 bytes from src to dst Move(buf, buf + 10, 1024, char); // copy 1024 bytes down 10 bytes in buf

IO Operations

The Perl API defines wrappers around most of the stdio functions you know and love. These wrappers all begin with PerlIO_ and are usually followed by the name of the stdio function they wrap. The perlapio documentation included with Perl contains a complete listing the available functions and their signatures. Be aware that in some cases the Perl developers have "fixed" parameter ordering and function names while wrapping stdio functions.

The principal difference between stdio's functions and the PerlIO set is that stdio uses the FILE* type to represent file handles, whereas the Perl API uses PerlIO*. For example, to open a log file and print a line to it, you might use code like this:

 PerlIO *fh = PerlIO_open("/tmp/my.log", "a"); // open logfile in append mode if (!fh) croak("Unable to open /tmp/my.log"); // check that open succeeded PerlIO_printf(fh, "Log test...\n");           // print a test line PerlIO_close(fh);                             // close the file

Notice that the PerlIO_printf() is actually a wrapper around fprintf(). For printf() functionality, use PerlIO_stdoutf():

 PerlIO_stdoutf("Hello world!\n");

The Perl API also provides functions for interfacing the PerlIO model with existing stdio systems. For example, you can translate between PerlIO* and FILE* file handles with the PerlIO_import and PerlIO_export functions:

 PerlIO *pfh; FILE   *fh; pfh = PerlIO_open("some.file", "r");  // open a PerlIO handle fh  = PerlIO_export(pfh, 0);          // export to a FILE * // ... do some stdio work with fh PerlIO_releaseFILE(pfh, fh);          // release fh mapping to pfh PerlIO_close(pfh);

The Perl IO API is currently under development, and it is expected that in the near future Perl will cut its ties to stdio entirely. At that point, only C modules that use the PerlIO interface will work correctly. As such, it is important to get used to using PerlIO now.

^[3]Boyer-Moore is a search algorithm that matches without examining every character. It has the unusual feature of actually going faster the longer the match string is.

^[4]Note that fbm_compile() modifies the SV passed to it. As a result, it can't be used on constant SVs like those produced from string constants.

^[5]A macro that returns a pointer to the end of the string inside an SV

^[6]A piece of code is said to leak memory when it fails to deallocate memory that is no longer being used. The classic test for memory leaks is to run a piece of code inside a loop and watch to see if the memory used by the program grows over time.

^[7]Which isn't listed in perlapi and as such may change without notice

^[8]Also not listed in perlapi—beware!