Chapter 7. The Intermediate Code Compiler
The Intermediate Code Compiler (IMCC) is an alternate tool for creating and running Parrot bytecode. It has several advantages over the method introduced in the previous chapter. It's a Parrot assembler and embeds the Parrot runtime engine, so it can compile a PASM file to bytecode and immediately run the bytecode with a single command. IMCC can also perform code optimizations, though it doesn't by default. IMCC includes its own language, which is commonly called Parrot Intermediate Language (PIR). PIR is an overlay on top of Parrot assembly language and has many higher-level features, though it still isn't a high-level language. Assembly files containing PIR code end with an .imc extension. |
7.1 Getting StartedThe first step to working with IMCC is to compile it. First, build Parrot following the steps in the previous chapter. Then, from within the languages/imcc directory in the parrot repository, type: $ make $ make test
It's likely that by the time you read this, you won't have to compile IMCC at all. One of the planned
After compiling IMCC, create a file fjords.pasm in the languages/imcc directory with these two lines (or reuse the file from Chapter 6): print "He's pining for the fjords.\n" end IMCC compiles and runs the code in a single step: $ ./imcc fjords.pasm
It's a little more
If your system supports soft links, you might find it handy to have a
$ imcc example.imc or with the -t option to trace the code as it executes: $ imcc -t example.imc |
7.2 BasicsIMCC's main purpose is assembling PASM or PIR source files. It can run them immediately or generate a Parrot bytecode file for running later. Internally, IMCC works a little differently with PASM and PIR source code, so each has different restrictions. The default is to run in a "mixed" mode that allows PASM code to mix with the higher-level syntax unique to PIR.
A file with a
.pasm
extension is treated as pure PASM code, as is any file run with the
-a
command-line option. These files can use macros,
[1]
but none of PIR's syntax. This mode is
The documentation that comes with IMCC in languages/imcc/docs/ and the test suite in languages/imcc/t are good starting points for digging deeper into its syntax and functionality. 7.2.1 StatementsThe syntax of statements in PIR is much more flexible than PASM. All PASM opcodes are valid PIR code, so the basic syntax is still an opcode followed by its arguments: print "He's pining for the fjords.\n"
The statement
LABEL: print I1 But unlike PASM, PIR has some higher-level constructs, including symbol operators: I1 = 5
named
count = 5 and complex statements built from multiple keywords and symbol operators: if I1 <= 5 goto LABEL We'll get into these in more detail as we go. 7.2.2 Comments
Comments are
I1 = 5 # assign '5' 7.2.3 Variables and Constants
Constants in PIR are the same as constants in PASM. Integers and floating-point
print 42 # integer constant print 0x2A # hexadecimal integer print 0b1101 # binary integer print 3.14159 # floating point constant print 1.e6 # scientific notation
Strings are
print "fjord" These can use the standard escape sequences, like \t (tab), \n (newline), \r (return), \f (form feed), \\ (literal slash), \ " (literal double quote), etc. The one difference from PASM strings is that in PIR strings the NULL character must be escaped as \x00 : print "Binary\x00nul embedded" 7.2.3.1 PASM registers
PIR code has a variety of ways to store values while you work with them. The most basic way is to use Parrot registers directly. Parrot register
set S0, "Hello, Polly.\n" print S0 end This example is plain PASM syntax, but you can also use PASM registers in PIR code.
When you work directly with PASM registers, you can only have 32 registers of any one type at a time.
[2]
If you have more than that, you have to start shuffling stored values on and off the
7.2.3.2 Temporary registersIMCC provides an easier way to work with Parrot registers. The temporary register variables are named like the PASM registers ”with a single character for the type of register and a number ”but they start with a $ character: set $S42, "Hello, Polly.\n" print $S42 end The most obvious difference between PASM registers and temporary register variables is that you have an unlimited number of temporaries. IMCC handles register allocation for you. It keeps track of how long a value in a Parrot register is needed and when that register can be reused. The previous example used the $S42 temporary. When the code is compiled, that temporary is allocated to the Parrot register S0 . As long as that temporary is needed, it is stored in S0 . When it's no longer needed, S0 is re-allocated to some other value: $S42 = "Hello, " print $S42 $S43 = "Polly.\n" print $S43 end This example uses two temporary string registers. Since they don't overlap, both will be allocated to the S0 register. If you change the order a little so both temporaries are needed at the same time, they're allocated to different registers: $S42 = "Hello, " # allocated to S1 $S43 = "Polly.\n" # allocated to S0 print $S42 print $S43 end In this case, $S42 is allocated to S1 and $S43 is allocated to S0 .
IMCC
If you want to peek behind the curtain and see how IMCC is allocating registers, you can run it with the
-d
switch to
$ imcc -d1000 hello.imc If hello.imc is the first example above, it produces this output: code_size(ops) 11 oldsize 0 0 set_s_sc 0 1 set S0, "Hello, " 3 print_s 0 print S0 5 set_s_sc 0 0 set S0, "Polly.\n" 8 print_s 0 print S0 10 end end Hello, Polly.
That's probably a lot more information than you wanted if you're just starting out. You can also generate a PASM file with the
-o
switch and have a look at how the PIR code
$ imcc -o hello.pasm hello.imc You'll find more details on these options and many others in Section 7.5 later in this chapter. 7.2.3.3 Named variables
Named variables can be used
.local string hello set hello, "Hello, Polly.\n" print hello end
This example defines a string variable named
hello
,
The valid types are string , int , float , and any Parrot class name (like PerlInt or PerlString ). It should come as no surprise that these are the same divisions as Parrot's four register types. IMCC allocates named variables to Parrot registers the same way it allocates temporary register variables.
The name of a variable must be a valid PIR identifier. It can contain
7.2.3.4 Parrot classes
Any integer, floating-point number, or string can be
P0 = new PerlString # same as new P0, .PerlString P0 = "Hello, Polly.\n" print P0 end Here, a PerlString object is created with the new CLASSNAME syntax [4] and stored in the PMC register P0 .
It gets assigned the string value "Hello, Polly.\n" and then printed. The syntax is exactly the same with temporary register variables: $P4711 = new PerlString $P4711 = "Hello, Polly.\n" print $P4711 end With named variables the Parrot class has to be specified both as the type for the .local statement and as the class name for the new : .local PerlString hello hello = new PerlString hello = "Hello, Polly.\n" print hello end Another important instruction for working with Parrot classes is clone . A simple assignment of a Parrot class only creates an alias: .local PerlString hello hello = new PerlString hello = "Hello, " $P0 = hello # PASM: set P0, P1 $P0 = "Polly.\n" hello = hello . $P0 print hello end This prints: Polly. Polly.
In this example,
$P0
and
hello
are really the same string. When you assign to one, you've assigned to both. To get a true copy, you have to use
$P0 =
clone hello
instead of
$P0 = hello
, as
.local PerlString hello hello = new PerlString hello = "Hello, " $P0 = clone hello # PASM: clone P0, P1 $P0 = "Polly.\n" hello = hello . $P0 print hello end This prints: Hello, Polly. 7.2.3.5 Named constantsNamed constants are declared with a .const statement. It's very similar to .local , and requires a type and a name. The value must be assigned in the declaration statement: .const string hello = "Hello, Polly.\n" print hello end This example declares a named string constant hello and prints the value. Named constants can be used in all the same places as literal constants, but have to be declared beforehand: .const int the_answer = 42 # integer constant .const string mouse = "Mouse" # string constant .const float pi = 3.14159 # floating point constant 7.2.3.6 Register spilling
As we mentioned earlier, IMCC allocates Parrot registers for all temporary register variables and named variables. When IMCC runs out of registers to allocate, some of the variables have to be stored elsewhere. This is known as "spilling." IMCC spills the variables with the
set $I1, 1 set $I2, 2 ... set $I33, 33 ... print $I1 print $I2 ... print $I33 If you create 33 integer variables like this ”all containing values that are used later ”IMCC allocates the available integer registers to variables with a higher score and spills the variables with a lower score. In this example it picks $I1 and $I2 . Behind the scenes, IMCC generates code to store the values: new P31, .PerlArray ... set I0, 1 # I0 allocated to $I1 set P31[0], I0 # spill $I1 set I0, 2 # I0 reallocated to $I2 set P31[1], I0 # spill $I2 It creates a PerlArray object and stores it in register P31 . [5]
The
set
instruction is the last time
$I1
is used for a while, so immediately after that, IMCC stores its value in the spill array and
Just before $I1 and $I2 are accessed to be printed, IMCC generates code to fetch the values from the spill array: ... set I0, P31[0] # fetch $I1 print I0 7.2.4 Symbol OperatorsYou probably noticed the = assignment operator in some of the earlier examples: $S2000 = "Hello, Polly.\n" print $S2000 end Standing alone, it's the same as the PASM set opcode. In fact, if you run imcc in bytecode debugging mode (as in Section 7.2.3.2), you'll see it really is just a set opcode underneath. PIR has many other symbol operators: arithmetic, concatenation, comparison, bitwise, and logical. Many of these combine with assignment to produce the equivalent of a PASM opcode: .local int sum sum = $I42 + 5 print sum print "\n" end The statement sum = $I42 + 5 translates to add I0, I1, 5 . A complete list of operators is available in Section 7.6. We'll discuss the comparison operators in Section 7.3. 7.2.5 Labels
A label names a line of code so other instructions can refer to it. Label names have to be valid PIR identifiers, just like named variables, so they're made of letters, numbers, and underscores. Simple labels are often all caps to make them stand out more clearly. A label definition is simply the name of the label followed by a
LABEL:
print "Norwegian Blue\n"
or before a statement on the same line:
LABEL:
print "Norwegian Blue\n"
IMCC has both local and global labels. Global labels start with an underscore. The name of a global label has to be unique, since it can be called at any point in the program. Local labels start with a letter. A local label is accessible only in the compilation unit where it's defined. [6]
The name has to be unique there, but it can be reused in a different compilation unit. branch L1 # local label bsr _L2 # global label
Labels are most often used in branching instructions and in calculating addresses for
7.2.6 Compilation Units
Compilation units in PIR are
.sub _main
print "Hello, Polly.\n"
end
.end
This example defines a compilation unit named _main that prints a string. The name is actually a global label for this piece of code. If you generate a PASM file from the PIR code (see Section 7.2.3.2), you'll see that the name translates to an ordinary label:
_main:
print "Hello, Polly.\n"
end
The compilation units in a file and the code outside of compilation units are parsed and
The first compilation unit in a file is special. The convention is to call it
_main
, but the name isn't critical. Since it's
Any statements outside a compilation unit are emitted after all the compilation units. Generally this means such code is
print "Polly want a cracker?\n"
.sub _main
print "Hello, Polly.\n"
end
.end
This code prints out "Hello, Polly." but not "Polly want a cracker?" because end halts the interpreter, so it never reaches the statement outside the compilation unit. Directives to IMCC (which start with a ".") aren't delayed like other statements. So, if you declare a named variable or named constant outside a compilation unit, it will be available to any statements that follow it:
.local string hello
hello = "Polly want a cracker?\n"
print hello
.sub _main
hello = "Hello, Polly.\n"
print hello
end
.end
In the first line of this example, the .local directive defines a file global variable named hello . The _main routine uses the same variable, and would give you a parse error if it hadn't been defined. "Polly want a cracker?" is never assigned to the variable and printed. Pure PASM compilation units can use the .emit and .eom directives instead of .sub and .end :
.emit
print "Hello, Polly.\n"
end
.eom
The .emit directive doesn't take a name. The section coming up on Section 7.4 goes into much more detail about compilation units and their uses. 7.2.7 Scope and NamespacesThe .namespace directive creates a scoped namespace for variables. Variables from outside the namespace are visible in the inner scope unless that scope has a local variable with the same name:
.sub _scoped_hello
.local PerlString hello
hello = new PerlString
hello = "Welcome, Python!\n"
.namespace inner
.local PerlString hello
hello = new PerlString
hello = "Hello, Perl 6.\n"
print hello
.endnamespace inner
print hello
end
.end
This example prints: Hello, Perl 6. Welcome, Python! The first .local directive defines a named variable hello in the default outer namespace. The second .local defines a named variable in the inner namespace. Internally, it actually mangles the name of the variable as inner::hello . The first print is nested in the inner namespace, so it prints inner::hello , "Hello, Perl 6." The second print statement retrieves the hello variable of the outer namespace, so it prints "Welcome, Python!"
Constants are collected for the whole program so they can be
|