PerlNET Component Builder | Programming in the .NET Environment

We presented the Perl for .NET Research compiler to the public during the Microsoft Professional Developers Conference (PDC) in summer 2000 and a week later at the O'Reilly Open Source conference.

Some of the feedback we got included the following:

What is the point of modifying Perl in such a way that normal Perl code won't run on Perl for .NET and Perl for .NET code (using typed variables ) won't run on "normal" Perl? Aren't you just creating a new syntax for C#?
Compiled Perl code should run faster than interpreted code, not slower!
If I can't use my existing Perl code with Perl for .NET, then this may just as well be a different language. Without access to CPAN modules, this technology doesn't work for migrating any of our existing code.

At the O'Reilly conference, it was also announced that work on Perl 6 would start soon. Perl 6 will be a complete redesign from the ground up, making it easier to extend the language and to target different execution environments. Ideally, this redesign should make it easier to produce an efficient compiler for Perl 6 to .NET conversions.

As it will obviously take several years to implement a complete redesign, we decided to try a different approach to bring Perl 5 to the .NET runtime environment. The goal was now full syntax compatibility, full support for CPAN modules, and normal execution speed. To achieve these goals, we had to sacrifice generation of verifiable managed code.

Interface with Standard Perl Interpreter

The new approach runs all Perl code using the normal Perl interpreter in unmanaged code outside .NET. The PerlNET component builder reads the interface specification inside the Perl comments described earlier and generates managed .NET proxy objects for the Perl code. These proxies are bona fide .NET classes, carrying all the .NET metadata about the component.

A class constructor of the proxy is responsible for compiling the corresponding Perl code once the first instance of a PerlNET component is instantiated . The proxy method transports all parameters over to the unmanaged Perl stack and dispatches the corresponding Perl method. Upon return of the Perl method, the proxy extracts the return value from the Perl stack and converts it into the correct .NET type.

The PerlNET approach avoids the disadvantages of the Research compiler: All Perl code runs at normal speed, the syntax is 100% compatible with "normal" Perl, and it supports all the same extensions.

However, there are some new disadvantages as well: PerlNET assemblies are pretty heavyweight. Each assembly creates an instance of the unmanaged Perl interpreter (but all instances of all classes within a single PerlNET assembly share the same interpreter). Moving data between Perl and other .NET languages is also more expensive: Everything has to be marshaled back and forth via PInvoke or COM Interop.

Challenges

The PerlNET approach presents a number of interesting challenges.

Efficient Data Marshalling

How can data be passed back and forth with minimal overhead? How can we make sure that a managed object is not garbage collected when it is being passed to unmanaged code and no managed reference remains?

PerlNET uses PInvoke to transfer control from managed to unmanaged code. Passing value types is efficient: The bit pattern is copied verbatim.

Reference types are marshaled via the System.Runtime.InteropServices.GCHandle type (as System.IntPtr integers). On the Perl side, they are held in a .NET wrapper class. This wrapper manages the lifetime of the GCHandle . Once the reference count of the wrapper goes to zero, it calls back into .NET to release the GCHandle .

In the other direction, PerlNET uses a custom COM interface and COM Interop to call back from Perl into the .NET proxy. Perl variables are always marshaled by address, as integers. A managed wrapper class has knowledge about the Perl data structure layout and can extract values if and when needed. If the Perl variable is just passed back to another Perl method, it only puts the address back on the Perl stack and increments the reference count. As we are already interfacing with unmanaged code, we are no longer verifiable anyway and can now manipulate Perl structures directly by using "unsafe" managed code.

This approach also has advantages when the Perl variable is being used on the managed side. For example, to translate a Perl variable containing a string into a System.String object, the proxy extracts the string address out of the Perl variable and then calls the following System.String constructor:

 unsafe public String(sbyte* value, int startIndex,                       int length, Encoding enc);

Even a Unicode string (stored by Perl in UTF-8) can be directly transformed into a .NET String object without any intermediate copies.

Method Overloading

As shown earlier, Perl doesn't have the concept of method overloading. For overloaded PerlNET methods, we create the .NET methods with the correct signatures, but all proxies will call the same Perl method. In most cases, the Perl method doesn't even need to differentiate based on argument type, as all variables are typeless. The code for foo(int x) and the code for foo(short x) are normally the same. In case the method does need to discriminate based on parameter type, the System.Reflection mechanism can be used to determine actual types.

In the other direction, calling .NET methods from Perl, PerlNET uses reflection to determine the correct method to call. Currently, the argument types must match one of the overloaded signatures; otherwise , a runtime exception is thrown. We have been discussing the implementation of a multimethod dispatch mechanism, but in practice the need for it has not been proved.

Exception Handling

Perl not only keeps the execution state on a primary parameter/return stack, but also maintains several additional status/context stacks. One example is the scope stack, which registers cleanup/restore actions to be performed at scope exit. PerlNET needs to make sure that .NET exceptions will not unwind the unmanaged stack without also unwinding the additional "private" stacks of the Perl interpreter.

For this reason, PerlNET will always set up an exception handler before calling back into .NET code. If the .NET code throws an exception, PerlNET catches it and rethrows it as a Perl exception. This approach ensures that all Perl stack maintenance is performed properly.

PerlNET also sets up a Perl exception handler whenever the proxy calls from managed code into Perl. The Perl-level exception is then converted into a .NET exception and rethrown by the proxy code.

Thus, even without user -level exception handlers, an exception is bounced back and forth between managed and unmanaged code, making sure that all cleanup happens in the right sequence. The only downside of this mechanism is that the Stacktrace property of the exception doesn't show the full back trace, but only the trace from the last point where the exception has been rethrown.