Perl for .NET Research Compiler | Programming in the .NET Environment

The first approach was to create a full Perl compiler, generating verifiable .NET Intermediate Language (IL) code and supporting all features of the Common Language Specification (CLS).

The implementation phase of this project started in early 2000. The major parts of the project were the parser, the code generator, and the runtime support library.

The Parser

Perl syntax is especially difficult to parse, as it doesn't have a context-free grammar. Parts of already-compiled code may be executed during compile time, altering the way additional code is being parsed: Both the use statement and BEGIN blocks are executed as soon as they are completely parsed.

As parentheses on function calls are optional in Perl, the function's prototype determines the interpretation of the following tokens. A famous example by Randal Schwartz shows this:

 time /3 ;#/; print "hello";  sin /3 ;#/; print "goodbye";

The first line takes the output of the time() function and divides it by 3. The # sign starts a comment and the print statement is ignored. The second line computes the sinus of the result of a pattern match: sin(/3 ;#/). The print statement is executed normally. Both time() and sin() are built-in functions, so their prototypes are known at compile time. But for user -defined functions, the prototypes cannot in general be determined without actually executing the code in BEGIN blocks and use statements. We therefore decided to use the existing Perl interpreter itself to parse Perl.

The Code Generator

The standard Perl distribution includes the B module that provides access to the opcode tree and the B::CC module that turns the opcode tree into C source code. The generated source code still needs to be linked to the Perl interpreter library, which contains the implementation for all the Perl opcodes.

We used the B::CC module as the basis for generating .NET code. The initial plan was to generate IL directly. But in early 2000, the Reflection/Emit API was still under heavy development. After learning that the Python for .NET compiler had to be modified for each biweekly drop of .NET beta code, we decided to generate C# code as our intermediate language. That way, tracking the changes to the Reflection/Emit API was punted to the Microsoft code and we could concentrate on the Perl-specific challenges of the code generator.

Interface Specification

Perl is an untyped language. Methods don't have prototypes, and even the number of arguments is in general not known until runtime: All arguments are passed in via the @_ array. This makes it impossible to generate a strongly typed interface for a Perl class automatically. We introduced a comment convention that allowed users to annotate their Perl source code with a C#-like interface declaration (see Listing D.1).

Listing D.1

 =for interface     int MyMethod(str Arg1);     int MyMethod(str Arg1, str Arg2); =cut sub MyMethod {     my($arg1,$arg2) = @_;     $arg2 = "default" unless defined $arg2;     # ... }

As the example in Listing D.1 demonstrates , it is possible to create multiple .NET signatures for the same Perl method. We would create both an overloaded "outer" .NET MyMethod() method and an "inner" MyMethod() implementation to which all the outer methods would delegate. Unfortunately, the .NET debugger exposes this implementation detail.

Typed Variables

Perl already has some minimal support for declaration of typed variables. It is used so far only for the experimental "pseudo-hashes." We used this language feature to support native .NET types (see Listing D.2).

Listing D.2

 use namespace 'System.Text'; my StringBuilder $str = StringBuilder("Hello"); my Int32 $len = $str->{Length};

In the example in Listing D.2, the code generator would create variables of the corresponding .NET type and not the normal Perl scalars. Primitive operations on value types would also be generated directly without calling out to the Perl runtime library.

Using typed variables creates much more efficient code, but is no longer compatible with "normal" Perl. It is obviously also not possible to assign objects of different types to these typed variables without throwing an exception.

The Runtime Library

The runtime library implements the Perl-specific data types ”Perl scalars, arrays, hashes, and objects ”as well as the internal bookkeeping for the Perl execution environment. This library was implemented in C#.

Status

The Perl for .NET Research compiler is not a full implementation of the Perl programming language. It does support the following features:

Instantiation of .NET objects
Accessing of .NET methods and properties
Implementation of .NET classes
Typed Perl variables
Recursive function calls with local variables
.NET Platform Invoke (PInvoke) support
Cross-language inheritance (Perl can be used for both base classes and derived classes.)
.NET exception handling

Sample code for these features is available on the Web at http://www. activestate .com/Corporate/Initiatives/NET/Samples/.

Problems

The implementation of the Perl for .NET Research compiler exposed a number of problems with this approach. The most significant related to execution speed, full language support, and compatibility with existing extensions.

Execution Speed

The generated code is surprisingly slow (more than 10 times slower than the normal Perl interpreter). Part of the problem is that both the generated code and the runtime library are all verifiable managed code. Perl often already " knows " the actual type of a Perl scalar object and accesses its internals directly. In the .NET environment, we always have to " prove " to the runtime system that the object is, indeed, of the correct type (by a runtime cast operation) before being allowed access to it. The accessor then uses a virtual method call to modify the internal state of the object.

In the traditional Perl interpreter, these operations are just two pointer indirections away. The structures have been laid out carefully , so that field offsets are always the same, independent of the scalar "type." These internal representations have been optimized over many years ; the .NET implementation could definitely be improved a lot, too. But even when using typed variables, the code ran more slowly than standard Perl did.

Full Language Support

Another problem area is compatibility between "normal" Perl and Perl for .NET. Some features would just need a lot of additional implementation work, such as regular expressions. But other features are virtually impossible to implement with this approach ”for example, the string form of the eval statement and the runtime require .

Both features need to reenter the parser and code generator at runtime, which is already difficult, but not impossible with the current design. They would also need to have access to the current opcode tree, because new code would have to be compiled in the correct lexical context, providing access to lexical variables in outer scopes and potentially creating new closures. The opcode tree, however, is no longer available.

Compatibility with Existing Extensions

Perl's utility lies not only in the language itself, but also in the vast amount of readily available modules from CPAN. As noted earlier, more than 2000 modules are available for all the common programming tasks developers face every day. Many of these modules are not written in plain Perl but also contain some low-level glue code in XS (a C preprocessor-like extension language). This XS code makes assumptions about the internals of the Perl interpreter, such as the layout of the data structures. For this reason, XS extension modules need to be recompiled for different versions of the Perl interpreters.

All of these XS extensions are unavailable to the Perl for .NET Research compiler: XS code is compiled to C, which will not compile to managed .NET code. The internals of the Perl for .NET runtime environment are so significantly different from the "normal" Perl interpreter as to make automatic translation impossible. Furthermore, the XS mechanism exposes all internal Perl interpreter functions, not just the documented XS extension API. A lot of the modules on CPAN do use additional APIs beyond the documented set. Emulating them all in a Perl for .NET API would be impossible.