Fundamentals of Perl Scripting

A Perl script is effectively similar to a shell script, such as those explained in Chapter 10. The first line (known as the interpreter line) tells the shell which interpreter to use to run the rest of the script's contents:

#!/usr/local/bin/perl

Or, to avoid compatibility issues on systems that might have Perl installed in a different location, a more portable interpreter line would be as follows:

#!/usr/bin/env perl

The rest of the script, as in shell programming, is made up of variable assignments, flow-control blocks and loops, system calls, and operations on I/O handlesto name a few fundamental parts of a script's anatomy. Here's a simple example:

#!/usr/bin/env perl $string = "Hello world!"; $hostname = `hostname`; if ($hostname eq "uncia") {   print $string."\n";   print `date`; }

Notice the use of C-style curly brackets to delimit the if block, rather than the "if/fi" and "case/esac" syntax of shell scripting. Also as in C, each statement is terminated by a semicolon (;), allowing you to use multiline statements in most cases. The whitespace between the statements and operators is also optional: "$a = 1" is just as valid as "$a=1". Perl's syntax much more closely resembles that of a simplified version of C than that of the shell language. Think of Perl as a cross between shell scripting and C, incorporating the best features of both.

A Perl script doesn't need to be compiled. This makes debugging very easy: Just edit the file, make a change, run it again, see what new errors there are, and edit it again. The Perl interpreter does the "compiling" at runtime, and it doesn't write out any compiled bytecode. This makes Perl slower than full-fledged Cwhich is why almost all critical system functions are written in Cbut it's fine for scripts that aren't especially time- or performance-sensitive.

Tip

While you learn Perl and during development of your programs, you might want to use perl -c in the interpreter line, which enables more verbose syntax checking and helps greatly in debugging.

To run a Perl script, you have to set it as executable. This is done with the chmod command:

# chmod +x myscript.pl

Now, to run the script, you would use the ./ prefix to specify the script in the current directory because you most likely won't have "." (the current directory) in your path, especially as root:

# ./myscript.pl Hello world! Sat Apr 28 15:29:17 PDT 2001

Note

Because Perl scripts are interpreted, you don't necessarily have to make them executable in order to run them, or even have the interpreter line at the top. If you prefer, although this method is less conveniently "encapsulated" than the previous method, you can run the script as an argument to perl itself. This way you can run the program even if the script is not set executable or does not contain the interpreter line:

# perl myscript.pl

Note

Perl has several options that can be used either in the interpreter line or at the command line to alter its behavior. The -w switch, for example, turns on warnings in which the Perl interpreter informs you of improperly written code that can execute but isn't deterministically correct. This option is so widely relied upon that not using it is seen as poor form. You can use the w switch in either of the following ways if you want to ensure that the code you write is "proper":

# perl -w myscript.pl

#!/usr/bin/perl w

Variables and Operators

A variable in Perl is a scalar, an array, or an associative array (or hash), which are three different ways of storing pieces of data. This data can take the form of a number (expressed in any of several different ways), a string of text, or other various types of what are known in Perl as literalspieces of data, such as 3 or car, whose meanings are exactly as represented. A scalar variable (the most common kind) has a name beginning with the dollar sign ($).

The nice thing about Perl is that you never have to worry about whether a number is an integer, a float, a short, a long, or whatever. You also don't need to treat a string as an array of characters, a pointer to a string in memory, or anything like that. Perl handles all that stuff internally. You don't even have to convert strings to numbers, or vice versa. Perl will recognize if a string has only numbers in it and allow you to multiply it by a number or apply any mathematical operators to it. Everything from the first nonnumerical character in a string onward is dropped, so 123blah would be treated by numerical operators as 123, and blah would be treated as zero.

Operators are available to modify these variables. There are the mathematical operators you'd expect: +, -, =, and so on. C-style incrementation operators are available (++ and --) as are space-saving composite mathematical operators (+=, -=, *=, and so on). Extra operators include exponentiation (**), modulus (%), and the comparison operators used in conditional clauses (>, <, =>, =<, ==, and !=). Strings can be concatenated with the dot operator (.) or repeated any number of times using the string repetition operator (x).

Perl makes use of more operators than you are ever likely to need to know about. Full coverage of these operators could fill an entire book, but Table 11.1 lists the purposes of some of the common operators you're likely to use when writing scripts for use in FreeBSD.

Table 11.1. Numeric Operators
Operator	Meaning	Usage	Value of `$c`
`+`	Addition	`$c = $a + $b;`	`7`
`-`	Subtraction	`$c = $a - $b;`	`1`
`*`	Multiplication	`$c = $a * $b;`	`12`
`/`	Division	`$c = $a / $b;`	`1.333...`
`=`	Assignment	`$c = $a;`	`4`
`+=`	Implicit addition	`$c += $a;`	`$c + 4`
`++`	Incrementation	`$c++;`	`$c + 1`
`**`	Exponentiation	`$c = $a ** $b;`	`64`
`%`	Modulus	`$c = $a % $b;`	`1`
`.`	Concatenation	`$c = $a . $b;`	`43`
`x`	Repetition	`$c = $a x $b;`	`444`
`>`	Greater than
`<`	Less than
`>=`	Greater than or equal to
`<=`	Less than or equal to
`==`	Equal to
`!=`	Not equal to

Here are a few simple lines of Perl that show the use of variables, literals, and operators:

$a = 5; $a++; $b = $a ** 2; $c = "test" . $b; print "$c";

This block of code would print out the string test36. First $a is assigned to 5; then it is incremented to 6. Next, $b is assigned to $a squared, or 36. Then, $c is assigned to the string test with $b appended. Finally, $c, whose value now is test36, is printed to the screen. If that's clear to you, you understand the building blocks of Perl.

Scalars, Arrays, and Associative Arrays

Variables can be used individually or in arrays of arbitrary numbers of dimensions. You've already seen scalar variables, such as $a, $b, and $c, in the previous example; a scalar variable contains a single number or string. But each of these variables travels separately, and there will be times when you will need to work with groups of associated pieces of data. This is where arrays come in:

@array1 = ("blah",5,12.7,$a); @array2 = ($a, $b, $c);

An array has the same kind of naming conventions as a scalar variable, except that it begins with an "at" sign (@) instead of a dollar sign. As these examples show, an array does not need to be declared with a certain size or contain only a consistent type of data. Arrays can contain numbers, strings, other arrays, or whatever you like.

You can access an element of an array using square brackets. The third element of the previous @array1 array, a scalar value, would be $array1[2]. Remember that array element numbering begins at zero!

You will also see elements of arrays addressed with the @ prefix instead of $. This prefix indicates that you can address a "slice" of an array by specifying more than one element (for example, @array1[1,2]). This example is really an array in itself with two elements. If you say @array1[2], you're talking about a slice with one element, which is effectively the same thing as a scalar variable, and it works the same way. However, for consistency's sake, you may want to keep in mind that the "preferred" method is $array1[2]the prefix of the variable determines what kind of variable it is.

You can use the various array operators, essentially built-in functions, to set up your arrays in any way you like. Arrays are often also called lists. In that context, you can think of an array in "stack" terminology, which gives you the push(), pop(), shift(), and unshift() operators. These operators are listed in Table 11.2, where @array1 undergoes each of them in turn.

Table 11.2. List Operators
Operator	Function Result	Syntax
`push()`	Adds a value to the end of a list.	`push(@array1,"test"); @array1 = ("blah",5,12.7,6,"test")`
`pop()`	Removes a value from the end of a list and returns it.	`$d = pop(@array1); @array1 = ("blah",5,12.7,6), $d = "test"`
`unshift()`	Adds a value to the beginning of a list.	`unshift(@array1,"test"); @array1 = ("test","blah",5,12.7,6)`
`shift()`	Removes a value from the beginning of a list and returns it.	`$d = shift(@array1); @array1 = ("blah",5,12.7,6), $d = "test"`

Tip

You can accomplish each of these operations by setting the output of the operator to a new array, or even to the same array. Therefore, by writing @array3 = push(@array1,"test"), you can create a new array with the new lengthened contents, leaving the original array (@array1) untouched.

A further useful array function is sort(). For instance, sort(@array1) would arrange all the elements in lexicographical order, treating them as strings. You can specify an alternate sorting algorithm of your own construction to extend the functionality of the sort() routine to do whatever you like. For instance, if you create a subroutine called numerically() that sorts two arguments in numerical order, you can use the following:

sort numerically (@array1)

Tip

Custom sort subroutines (you'll learn about creating subroutines later) can be used to sort lists of elements based on any criteria you like, either by comparing members of data structures or by performing complex calculations on the values to be compared before evaluating them. A simple numerical sort routine can be reduced to the single operator <=>, so you can numerically sort @array1 like this:

sort <=> (@array1);

Arrays are especially useful when you're working with relational data, either through interfaces to real databases or simply delimited text files such as /etc/passwd. Using arrays is how you would access the individual lines in a file that you've read in from standard input. We'll be looking at how that's done a bit later in the chapter.

Tip

You can get the size of an array by accessing it in "scalar context." The easiest way to do this is to assign the list to a scalar variable:

$size = @array1;

Now, $size is equal to 4.

An array can be created from a scalar string using the split() function. This will divide up the string based on whatever delimiter you specify, omitting the delimiter from each of the new array's elements:

$mystring = "Test|my name|Interesting data|123"; @mydata = split(/\|/,$mystring);

In the first line, $mystring is assigned to a string of four different chunks of text, separated by the pipe character (|). In the second, the split() function separates the string into four parts, removing the pipe characters on which the string was split, and assigns the parts to elements of the @mydata array.

Note that slashes are used to delimit the delimiter expression, and you have to escape the pipe character (|) with a backslash to make sure it's evaluated as a delimiter and not the "alternative" operator, which will make sense a little later, in the section on regular expressions. In any case, @mydata now contains the strings Test, my name, Interesting data, and 123 as its elements.

A special kind of array is an associative array; this is equivalent to a hash table in which the different values in the array are stored as key/value pairs. The prefix for an associative array is the percent sign (%), but each value of the array is a scalar, so you use the scalar prefix ($) to refer to the individual elements. Here's how you set up an associative array:

$assoc1{key1} = "value1"; $assoc1{key2} = "value2";

You can then use any of several associative array operators on the array as a whole:

@myvalues = values(%assoc1); while (($mykey,$myvalue) = each(%assoc1)) {   print "$mykey -> $myvalue\n"; }

Associative arrays are very useful in applications such as CGI programming, in which all the variables from HTML forms are sent to the server and read into an associative array based on the form field names. You'll see more about CGI programming in Chapter 26, "Configuring a Web Server."

Flow Control

One thing that qualifies Perl as a full-featured programming environment rather than a simple batch scripting language is its complete set of flow-control structures. These are what allow you to create complex data-flow paths and iterations in your programs.

`if/elsif/else`

The most common control structure is the if block:

if ($a == 5) {                    # If $a equals 5...   print "It's 5\n"; } elsif ($a > 5) {                # Otherwise, if $a is greater than 5...   print "Greater than 5\n"; } else {                          # In all other cases...   print "Must be less than 5\n"; }

Note

Note that the conditional clause ($a == 5) must use the equality operator (==) rather than the assignment operator (=). The == operator and other comparison operators (listed in Table 11.1) can always be used in conditionals. But if the items you're comparing are strings, you can use the string equivalents: eq for ==, lt for <, ne for !=, and others.

`foreach`

Another common flow-control player in Perl is foreach, which allows you to iterate over all the elements in an array. Here, the foreach statement divides @buffer into its component elements, assigns each one to $line for the duration of the loop it controls, and allows you to use it as many times as there are members in the array:

foreach $line (@buffer) {   print $line; }

If you omit the optional variable name that refers to the element the loop is looking at ($line in this example), use the default $_ variable name to refer to the current element. It's a good idea to specify a variable name here in order to prevent confusion when you're using multiple nested foreach loops.

`for`

Perl also has a standard for loop, which is almost identical to the for loop in C. The purpose of for is simply to iterate a specified number of times rather than over the elements of an array. The for loop is controlled by an iteration variablegenerally one that isn't used anywhere else in the scriptthat is iterated automatically by for until it reaches your specified limit. Its arguments, as in C, are separated by semicolons. These include the name of the iteration variable, the incrementation operation, and the end condition. Here's an example:

for ($i; $i++; $i<100) {   print "$i\n"; }

This sample for loop will print out 100 lines, numbered from 0 to 99. The first argument sets up $i as the iteration variable, and the second says that $i should be incremented upward once. The for loop will execute unless the condition specified in the third argument is false, which in this example occurs once $i has reached 100.

`while/until/do`

Finally, we have the while loop, which acts like a simplified version of for without the iteration variable. It has a conditional statement as its argument, which is evaluated every time the loop begins, and it keeps executing until the condition is false. Here's an example:

while ($i < 100) {   $i += 5;   $j++; } print "$j\n";

This loop will execute 20 times, and the output of the print statement will be 20.

A variant of while is until, which has effectively the opposite meaning: It keeps executing until the conditional is true. The following example has the same effect as the previous while loop:

until($i == 100) {   $i += 5;   $j++; } print "$j\n";

This is very seldom used; most Perl programmers instead just use while loops with the exit condition set appropriately to the situation (which is generally easier to understand anyway).

Another way to use while or until is via the do...while or do...until construct. This guarantees that the loop will execute at least once, and the while or until conditional is evaluated at the end, rather than the beginning. Here's an example:

do {   $i += 5;   $j++; } while ($i < 100); print "$j\n";

Backquotes (`) enable Perl to execute any command as you would at the command line or within a shell script. Simply enclose your command in backquotes, and Perl will execute it using /bin/sh, waiting until the spawned process quits before proceeding. What's more, the output from the backquoted command is available as the return value, so you can put it into a variable for later use. Here's a commonly used example:

$date = `date`;

Note that this returned string generally has \n at the end, so you can use chomp() to snip it off, either on a separate line or by enclosing the original assignment as an expression:

chomp($date = `date`);

Caution

Be aware that Perl won't necessarily know your command path. Commands called by your script might work for you on your own machine, but if you put the script on another system, it might fail because the commands you're trying to run in backquotes can't be found. The best defense against this is to specify the full path to each invoked command:

@who = `/usr/bin/who`;

One way to keep control over externals that must be called from your scripts is to define their paths in variables at the beginning of a script. Declaring these variables in an easy-to-find section helps other users locate and maintain them:

$who = "/usr/bin/who";

The variable $who can then be used in backticks to invoke the who program. Be very careful! An error that puts the wrong value into $who, or an attack that does so maliciously, means the script will be executing someone's arbitrary commands instead of the harmless who command.

Command-Line Arguments

You can pass practically as many arguments as you want on the command line to a Perl program. These arguments, separated by whitespace (unless enclosed in quotes), are placed at runtime into the @ARGV array and are available for any kind of use:

# ./myscript.pl test "My String" 123

$ARGV[0] is now test, $ARGV[1] is My String, and $ARGV[2] is 123. This also works on CGI programs, as you will see in Chapter 26. If you specify a URL with arguments separated by + characters (the usual way of passing arguments to CGI programs), @ARGV will be populated the same way:

http://www.example.com/myscript.cgi?test+My%20String+123

Variables and Operators

Table 11.1. Numeric Operators

Scalars, Arrays, and Associative Arrays

Table 11.2. List Operators

Flow Control

if/elsif/else

foreach

for

while/until/do

Command-Line Arguments

`if/elsif/else`

`foreach`

`for`

`while/until/do`