Section 4.17. Rolling Your Own Programs: Perl

[Page 152]

4.17. Rolling Your Own Programs: Perl

When your task requires combining several of the utilities we've examined, you might write a shell script, as we will see in the next few chapters. Shell scripts are slower than C programs, since they are interpreted instead of compiled, but they are also much easier to write and debug. C programs allow you to take advantage of many more Linux features, but generally require more time both to write and to modify.

In 1986, Larry Wall found that shell scripts weren't enough and C programs were overkill for many purposes. He set out to write a scripting language that would be the best of both worlds. The result was Perl. The Practical Extraction Report Language addressed many of Larry's problems generating reports and other text-oriented functions, although it also provides easy access to many other Linux facilities that shell scripts do not.

The Perl language syntax will look familiar to shell and C programmers, since much of the syntax was taken from elements of both. I can only hope to give you a high-level view of Perl here. Like awk/gawk, entire books have been written on Perl that describe it in detail (e.g., [Medinets, 1996], [Wall, 1996]). That level of detail is beyond the scope of this book. But by whetting your appetite with an introduction, I'm sure you'll want to find out more about Perl.

4.17.1. Getting Perl

Perl comes with all Linux distributions. Perl is also available for all versions of UNIX and even runs on MacOS and Windows. You do have to watch out for inconsistencies in system calls and file system locations of data files, but your code will require very few changes to run properly on different platforms.

The best source for all things Perl is:

http://www.perl.com

This site contains distributions of Perl for various platforms in the "downloads" section as well as documentation and links to many other useful resources.

But the biggest advantage of Perl is that it is free. Perl itself is licensed by a variation of the GNU Public License, known as the Artistic License. This does not impact any code you write in Perl. You are free to use and distribute your own code in any way you see fit. And you generally don't need to worry about redistributing Perl for someone to be able to run your code, since it is so freely available.

4.17.2. Running Perl

Figure 4-33 demonstrates the most commonly used arguments to Perl.

Figure 4-33. Description of the perl command.
Utility: perl [-c] fileName perl -v perl interprets and executes the Perl script code in fileName. If the -c argument is present, the script is checked for syntax but not executed. When the -v argument is used, Perl prints version information about itself.

[Page 153]

In most cases, you simply run a Perl script in a file with the following command:

$ perl file.pl

4.17.3. Printing Text

Without the ability to print output, most programs wouldn't accomplish much. So in the UNIX tradition, I'll start our Perl script examples with one that prints a single line:

print "hello world.\n";

Just from this simple example, you can infer that each line in Perl must end with a semicolon (;). Also note the "\n" is used (as in the C programming language) to print a newline character at the end of the line.

4.17.4. Variables, Strings, and Integers

To write useful programs, of course, requires the ability to assign and modify values like strings and integers. Perl provides variables much like the shells. These variables can be assigned any type of value, Perl keeps track of variable type for you. The major difference between Perl variables and shell variables is that the dollar sign is not simply used to expand the value of the variable but is always used to denote the variable. Even when assigning a value to a variable:

$i = 3;

you put the $ on the variable. This is probably the most difficult adjustment for seasoned shell programmers to make.

In addition to all of the "typical" mathematical operators (add, subtract, etc.), integers also support a range operator ".." which is used to specify a range of integers. This is useful when building a loop around a range of values, as we will see later.

Strings, as in most languages, are specified by text in quotation marks. Strings also support a concatenation operator "." which puts strings together.

print 1, 2, 3..15, "\n";    # range operator print "A", "B", "C", "\n";  # strings $i = "A" . "B" ;            # concatenation operator print "$i", "\n" ;

The previous example lines of Perl generate the following output:

123456789101112131415 ABC AB

You can see that each value, and only each value, is printed, giving you control over all spacing.

[Page 154]

4.17.5. Arrays

Most programming languages provide arrays, which are lists of data values. Arrays in Perl are quite simple to use, as they are dynamically allocated (you don't have to define how large they will be, and if you use more than what is currently allocated, Perl will allocate more space and enlarge the array). The syntax is probably new, however. Rather than using a dollar sign, as you do with Perl variables, you denote an array by an at sign (@):

@arr = (1,2,3,4,5);

This line defines the array "arr" and puts 5 values in it. You could also define the same array with the line:

@arr = (1..5);

using the range operator with integers.

You can access a single element with a subscript in brackets, like:

print @arr[0],"\n";

As with most array implementations, the first element is numbered zero. Using the definition from before, this line would print "1" since it's the first value.

If you print an array without subscripts, all defined values are printed. If you use the array name without a subscript in a place where a scalar value is expected, the number of elements in the array is used.

@a1 = (1);           # array of 1 element @a2 = (1,2,3,4,5);   # array of 5 elements @a3 = (1..10);       # array of 10 elements print @a1, " ", @a2, " ", @a3, "\n"; print @a1[0], " ", @a2[1], " ", @a3[2], "\n"; # using as scalar will yield number of items print @a2 + @a3, "\n";

will result in the following output:

1 12345 12345678910 1 2 3 15

A special type of array provided in Perl is the associative array. Whereas you specify an index or position of a normal array with an integer between zero and the maximum size of the array, an associative array can have indices in any order and of any value.

[Page 155]

Consider, for example, an array of month names. You can define an array called $month with 12 values of "January," "February," and so on (since arrays begin with index 0, you either remember to subtract one from your index or you define an array of 13 values and ignore $month[0], starting with $month[1]="January").

But what if you are reading month names from the input and want to look up the numeric value? You could use a for loop to search through the array until you found the value that matched the name you read, but that requires extra code. Wouldn't it be nice if you could just index into the array with the name? With an associative array you can:

@month{'January'} = 1; @month{'February'} = 2;      .      .      .

and so on. Then you can read in the month name and access its numeric value this way:

$monthnum = $month{$monthname};

without having to loop through the array and search for the name. Rather than setting up the array one element at a time, as we did above, you can define it at the beginning of your Perl program like this:

%month = ("January", 1, "February", 2, "March", 3,           "April", 4, "May", 5, "June", 6,           "July", 7, "August", 8, "September", 9,           "October", 10, "November", 11, "December", 12);

The set of values that can be used in an associative array, or the keys to the array, are returned as a regular array by a call to the Perl function keys():

@monthnames = keys(%month);

If you attempt to use a value as a key that is not a valid key, a null or zero (depending on how you use the value) will be returned.

4.17.6. Mathematical and Logical Operators

Once you have your variables assigned, the next thing you usually want to do with them is change their values. Most operations on values are familiar from C programming. The typical operators add, subtract, multiply, and divide are +, -, *, and /, respectively, for both integers and real numbers. Integers also support the C constructs to increment and decrement before and after the value is used and logical ANDs and ORs. Notice in this example that I have to backslash the $ used in print statement text, since I don't want the value of the variable in those places, but I actually want the name with the $ prepended to it:

[Page 156]

$n = 2; print ("\$n=", $n, "\n"); $n = 2 ; print ("increment after \$n=", $n++, "\n"); $n = 2 ; print ("increment before \$n=", ++$n, "\n"); $n = 2 ; print ("decrement after \$n=", $n--, "\n"); $n = 2 ; print ("decrement before \$n=", --$n, "\n"); $n = 2;                         # reset print ("\$n+2=", $n + 2, "\n"); print ("\$n-2=", $n - 2, "\n"); print ("\$n*2=", $n * 2, "\n"); print ("\$n/2=", $n / 2, "\n"); $r = 3.14;                      # real number print ("\$r=", $r, "\n"); print ("\$r*2=", $r * 2, "\n"); # double print ("\$r/2=", $r / 2, "\n"); # cut in half print ("1 && 1 -> ", 1 && 1, "\n"); print ("1 && 0 -> ", 1 && 0, "\n"); print ("1 || 1 -> ", 1 || 1, "\n"); print ("1 || 0 -> ", 1 || 0, "\n");

This script generates the following output:

$n=2 increment after $n=2 increment before $n=3 decrement after $n=2 decrement before $n=1 $n+2=4 $n-2=0 $n*2=4 $n/2=1 $r=3.14 $r*2=6.28 $r/2=1.57 1 && 1 -> 1 1 && 0 -> 0 1 || 1 -> 1 1 || 0 -> 1

[Page 157]

4.17.7. String Operators

Operations on string types are more complex and usually require using string functions (discussed later). The only simple operation that makes sense for a string (since you can't add or subtract a string) is concatenation. Strings are concatenated with the "." operator.

$firstname = "Bob"; $lastname = "Smith"; $fullname = $firstname . " " . $lastname; print "$fullname\n";

results in the output:

Bob Smith

However, several simple matching operations are available:

if ($value =~ /abc/) { print "contains 'abc'\n"}; $value =~ s/abc/def/;    # change 'abc' to 'def' $value =~ tr/a-z/A-Z/;   # translate to upper case

The experienced Linux or UNIX user will recognize the substitute syntax from vi and sed as well as the translation syntax based on the tr command.

4.17.8. Comparison Operators

You'll also want operators to compare values to one another. Comparison operators are the usual suspects (Figure 4-34).

Figure 4-34. Perl comparison operators.
Operation	Numeric values	String values
Equal to	==	eq
Not equal to	!=	ne
Greater than	>	gt
Greater than or equal to	>=	ge
Less than	<	lt
Less than or equal to	<=	le

In the case of greater-than or less-than comparisons with strings, this compares their sorting order. In most cases you're usually most concerned with comparing strings for equivalence (and lack thereof).

[Page 158]

4.17.9. If, While, for and Foreach Loop Constructs

An essential part of any programming language is the ability to execute different statements depending on the value of a variable and create loops for repetitive tasks or indexing through array values. If statements and while loops in Perl are similar to those in the C language.

In an "if" statement, a comparison operator is used to compare two values, and different sets of statements are executed depending on the result of the comparison (true or false):

$i = 0; if ( $i == 0 ) {    print "it's true\n"; } else {    print "it's false\n"; }

results in "it's true" being printed. As with C, other comparison operators can be != (not equal), < (less than), > (greater than), among others.

You could also loop in a while statement to print the text until the comparison was no longer true:

while ( $i == 0 ) {    print "it's true\n";    ...    <do some things that may modify the value of $i>    ... }

Perl also handles both "for" loops from C and "foreach" loops from the C shell:

for ($i = 0 ; $i < 10 ; $i++ ) {    print $i, " "; } print "\n";

counts from 0 to 9 and prints the value (without a newline until the end) and generates:

0 1 2 3 4 5 6 7 8 9

A foreach loop looks like this:

foreach $n (1..15) {    print $n, " "; } print "\n";

and generates about what you would expect:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

[Page 159]

4.17.10. File I/O

One big improvement in Perl over shell scripts is the ability to do input and output to specific files rather than just the standard input, output, or error channels. You still can access standard input and output:

while (@line=<stdin>) {   foreach $i (@line) {      print "->", $i;             # also reads in EOL     } }

This script will read each line from the standard input and print it. However, perhaps you have a specific data file you wish to read from:

$FILE="info.dat"; open (FILE);                     # name of var, not eval @array = <FILE>; close (FILE); foreach $line (@array) {    print "$line"; }

This Perl script opens "info.dat" and reads all its lines into the array called "array" (clever name, wouldn't you say?). It then does the same as the previous script and prints out each line.

4.17.11. Functions

To be able to effectively separate various tasks a program performs, especially if the same task is needed in several places, a language needs to provide a subroutine or function capability. The Korn shell provides a weak type of function implemented through the command interface, and it is the only major shell that provides functions at all. Of course, the C language does, but script writers had a harder time of it before Perl came along.

Perl functions are simple to use, although the syntax can get complicated. A simple example of a Perl function will give you the idea:

sub pounds2dollars {    $EXCHANGE_RATE = 1.54;     # modify when necessary    $pounds = $_[0];    return ($EXCHANGE_RATE * $pounds); }

This function changes a value specified in pounds sterling (British money) into US dollars (given an exchange rate of $1.54 to the pound, which can be modified as necessary). The special variable $_[0] references the first argument to the function. To call the function, our Perl script would look like this:

[Page 160]

$book = 3.0;                  # price in British pounds $value = pounds2dollars($book); print "Value in dollars = $value\n";

When we run this script (which includes the Perl function at the end), we get:

Value in dollars = 4.62

In the next section, we'll see an example of a function that returns more than one value.

4.17.12. Library Functions

One capability that is conspicuously absent from shell scripting is that of making Linux system calls. Perl provides an interface to many Linux system calls. The interface is via Perl library functions, not directly through the system call library, therefore its use is dependent on the implementation and version of Perl, and you should consult the documentation for your version for specific information. When an interface is available, it is usually very much like its C library counterpart.

Without even realizing it, we looked at a few Perl functions in previous sections when we saw the use of open(), close(), and print(). Another simple example of a useful system-level function is:

exit(1);

to exit a Perl program and pass the specified return code to the shell. Perl also provides a special exit function to print a message to stdout and exit with the current error code:

open(FILE) or die("Cannot open file.");

Thus if the call to open() fails, the die() function will be executed, causing the error message to be written to stdout and the Perl program to exit with the error code returned by the failure from open().

Some string functions to assist in manipulating string values are length(), index(), and split():

$len = length($fullname);

sets the $len variable to the length of the text stored in the string variable $fullname. To locate one string inside another:

$i = index($fullname, "Smith");

The value of $i will be zero if the string begins with the text you specify as the search string (the second argument). To divide up a line of text based on a delimiting character (for example, if you want to separate the tokens from the Linux password file into its various parts):

($username, $password, $uid, $gid, $name, $home, $shell)                                          = split(/:/, $line)

[Page 161]

In this case, the split() function returns an array of values found in the string specified by $line and separated by a colon. We have specified separate variables in which to store each item in this array so we can use the values more easily than indexing into an array.

Another common function provides your Perl program with the time and date:

($s, $m, $h, $dy, $mo, $yr, $wd, $yd, $dst) = gmtime(); $mo++;                     # month begins counting at zero $yr+=1900;                 # Perl returns years since 1900 print "The date is $mo/$dy/$yr.\n"; print "The time is $h:$m:$s.\n";

The code above produces the following result:

The date is 3/25/2005. The time is 13:40:27.

Note that gmtime() returns 9 values. The Perl syntax is to specify these values in parentheses (as you would if you were assigning multiple values to an array).

4.17.13. Command-Line Arguments

Another useful capability is to be able to pass command-line arguments to a Perl script. Shell scripts provide a very simple interface to command-line arguments, while C programs provide a slightly more complex (but more flexible) interface. The Perl interface is somewhere in between:

$n = $#ARGV+1;  # number of arguments (beginning at zero) print $n, " args: \n"; for ( $i = 0 ; $i < $n ; $i++ ) {    print "   @ARGV[$i]\n"; }

This Perl script prints the number of arguments that were supplied on the perl command (after the name of the Perl script itself) and then prints out each argument on a separate line.

We can modify our pounds-to-dollars script from before to allow a value in British pounds to be specified on the command line:

if ( $#ARGV < 0 ) {  # if no argument given    print "Specify value in to convert to dollars\n";    exit } $poundvalue = @ARGV[0];     # get value from command line $dollarvalue = pounds2dollars($poundvalue); 
[Page 162]print "Value in dollars = $dollarvalue\n"; sub pounds2dollars {    $EXCHANGE_RATE = 1.54;   # modify when necessary    $pounds = $_[0];    return ($EXCHANGE_RATE * $pounds); }

4.17.14. A Real-World Example

All of these short examples should have given you the flavor for how Perl works, but so far we haven't done anything that's really very useful. So let's take what we've seen and write a Perl script to print out a table of information about a loan. We define a command with the syntax shown in Figure 4-35.

Figure 4-35. Description of the loan command written in Perl.
Utility: loan -a amount -p payment -r rate loan prints a table given a loan amount, interest rate, and payment to be made each month. The table shows how many months will be required to pay off the loan as well as how much interest and principal will be paid each month. All arguments are required.

The Perl script loan.pl is available online (see the Preface for more information) and looks like this:

# show loan interest $i=0; while ( $i < $#ARGV) {              # process args    if ( @ARGV[$i] eq "-r" ) {       $RATE=@ARGV[++$i];            # interest rate    } else {       if ( @ARGV[$i] eq "-a" ) {          $AMOUNT=@ARGV[++$i];       # loan amount       } else {          if ( @ARGV[$i] eq "-p" ) {             $PAYMENT=@ARGV[++$i];   # payment amount          } else {             print "Unknown argument (@ARGV[$i])\n";             exit          }       }    }    $i++; } 
[Page 163]} if ($AMOUNT == 0 || $RATE == 0 || $PAYMENT == 0) {    print "Specify -r rate -a amount -p payment\n";    exit } print "Original balance: \$$AMOUNT\n"; print "Interest rate:     ${RATE}%\n"; print "Monthly payment:  \$$PAYMENT\n"; print "\n"; print "Month\tPayment\tInterest\tPrincipal\tBalance\n\n"; $month=1; $rate=$RATE/12/100;      # get actual monthly percentage rate $balance=$AMOUNT; $payment=$PAYMENT; while ($balance > 0) { # round up interest amount    $interest=roundUpAmount($rate * $balance);    $principal=roundUpAmount($payment - $interest);    if ( $balance < $principal ) {    # last payment       $principal=$balance;           # don't pay too much!       $payment=$principal + $interest;    }    $balance = roundUpAmount($balance - $principal);    print "$month\t\$$payment\t\$$interest\t\t\$$principal\t\t\$$balance\n";    $month++; } sub roundUpAmount { # # in: floating point monetary value # out: value rounded (and truncated) to the nearest cent #    $value=$_[0];    $newvalue = ( int ( ( $value * 100 ) +.5 ) ) / 100;    return ($newvalue); }

[Page 164]

If I want to pay $30 a month on my $300 credit card balance and the interest rate is 12.5% APR, my payment schedule looks like this:

$ perl loan.pl -r 12.5 -a 300 -p 30 Original balance: $300 Interest rate:     12.5% Monthly payment:  $30 Month    Payment Interest      Principal     Balance 1        $30     $3.13         $26.87        $273.13 2        $30     $2.85         $27.15        $245.98 3        $30     $2.56         $27.44        $218.54 4        $30     $2.28         $27.72        $190.82 5        $30     $1.99         $28.01        $162.81 6        $30     $1.7          $28.3         $134.51 7        $30     $1.4          $28.6         $105.91 8        $30     $1.1          $28.9         $77.01 9        $30     $0.8          $29.2         $47.81 10       $30     $0.5          $29.5         $18.31 11       $18.5   $0.19         $18.31        $0 $ _

So I find it will take 11 months to pay off the balance at $30 per month, but the last payment will only be $18.31. If I want to pay it off faster than that, I know I need to raise my monthly payment!