4.3 Perl Syntax Rules

First, the basic Perl syntax rules. Hang in there; although there are a plethora of hello, world 's here, the somewhat unnerving friendliness and patience engendered by repeating this mantra will serve you well later when things get more involved.

4.3.1 A First Perl Program ” hello, world

As always, the best way to learn is by doing. For example, here is the ever popular print hello, world program:

 #! /usr/bin/perl -w  # file: hello.pl  print "hello, world!\n"; 

The first line is a shell directive (also known as the hash/bang or shebang ). This line informs the shell that this script uses /usr/bin/perl . Your distribution might have it in a different place, /usr/local/bin/perl , for example. [3] The -w switch causes Perl to print warnings to the standard error ( stderr ) output. [4] The -w switch is recommended when you are developing Perl scripts, because it is possible to write programs that are syntactically correct but logically incorrect because of typos or just bad programming.

[3] Remember that which, where, find , and locate are your friends !

[4] The terms stdout , stdin, and stderr are Unix file descriptors that, by default, point to the terminal (tty) you ran the program from but can be piped to a file or /dev/null . You can also cat the output to a file with the " > " redirection.

The next line, which begins with # , is a comment. Comments extend from the first # to the end of line. The print statement:

 print "hello, world!\n"; 

executes the built-in print() function, which writes text to standard output ( stdout ). The " \n " is the newline character and prints a carriage return to standard output.

Put the preceding text in a hello.pl text file and make it executable.

 $  chmod a+x hello.pl  

To execute it, type

 $  ./hello.pl  

The " ./ " tells the shell to execute the file in this directory (" . ") and not look for it in your PATH . If all goes as expected, you'll see this wonderfully reassuring message:

 hello, world! 

You may notice a few other things about the language from this example.

  • Perl is free format ”whitespace can be scattered about to make code more readable, as much or as little as you like.

  • All statements must end in a semicolon (" ; ") except the first line, which is not really a Perl statement but a shell escape to execute Perl.

  • Variables do not have to be declared but can be.

  • The Perl scripts are stored as text files. When executed ( $ ./hello.pl ), the source text file is first compiled into byte code, an intermediate form, not text and not binary. Perl then interprets the byte code, executing it. Therefore, each time a Perl script is executed, it is compiled and then interpreted (this is not the place to go into the compiled versus interpreted languages). This leads to the question, Is there a Perl compiler that compiles the Perl script into binary form? The answer is, Yes, but ... The Perl compiler is still in beta test, and it doesn't always work very well. Feel free to try it ” perlcc .

  • Built-in functions, such as print() can be invoked with or without parentheses. In the preceding, we are invoking print() without parentheses, but we could if we wanted to or needed to.

4.3.2 Another Example

This program adds two numbers :

 #! /usr/bin/perl -w  # file: simpleadd.pl  $a = 5;  $b = 6;  $c = $a + $b;  print "$c is now: $c\n"; 

$a , $b , and $c are examples of scalar variables (more on these later). Note that unlike other C-like languages, in Perl you don't have to explicitly declare them as variables (but you could if you wanted to, and we will in the next section) ”we simply use them, and voil  ! They are created for us.

Put this into the file simpleadd.pl and execute it as earlier:

 $  ./simpleadd.pl  $c is now 11 

Because in this example we want the dollar sign to be printed for pedantic purposes, the output starts with $c . The dollar sign is escaped in the string (escaped means "preceded by the backslash character so that the print statement knows to print the character literally instead of using it in its regular Perl use"):

 print "$c is now: $c\n"; 

We'll talk more about this soon.

4.3.3 Declaring Variables with use strict;

Perl, as a scripting language, has looser rules than other "real" programming languages like C and Java. One such quality is the ability to create variables by simply using them, much as in shell programming. Still, it is considered good style to declare them. Doing so can preclude the silly typo problems that everyone is likely to incur from time to time:

 $food = pizza;  print "I am hungry for $foo!\n"; 

The variable $food is assigned, but $foo will be printed. Perl allows this without a syntax error because neither variable has to be declared. The output of the program will be " I am hungry for ! " because $foo is undeclared and has no value. Perl will warn about this sort of thing if the -w flag is used, as it should be.

Another alternative to this situation is to force declaration of all variables before use, just as in the more strict, prescriptive compiled languages. For example:

 use strict; 

With this, variables must be declared before use, and this is done with the my() function as follows :

 my $a;  my $b; 

Variables can be declared and initialized at the same time:

 my $a = 5;  my $b = 6; 

More than one variable can be declared at the same time (the parentheses are required):

 my($a, $b, $c); 

If we do a use strict; and declare the variables, the earlier food example becomes:

 use strict;  my $food = pizza;  print "I am hungry for $foo!\n"; 

Since $foo is not declared with my() , the use of $foo returns a syntax error, forcing the correction of the typo before the program can execute.

We recommend use strict; ”it catches many common errors, requires a more disciplined programming mind-set , and is in general just good practice ”so the examples from this point on use it. However, TMTOWTDI.

The previous simple example becomes:

 #! /usr/bin/perl -w  # file: simpleadd.pl  use strict;  my $a = 5;  my $b = 6;  my $c = $a + $b;  print "$c is now: $c\n"; 

4.3.4 Variables

Perl has several types of variables. We discuss the major ones. [5]

[5] Perl has several minor data types, including loop labels, directory handles, and formats; see the Camel Book [Wall+ 00] for details.

Scalar Variables

A scalar variable is a variable that holds a single value.

  • It can hold the value of a number (integer or floating) or a string (from zero characters to enough characters to consume your virtual memory). [6]

    [6] It can also hold the value of a reference, but that's beyond the scope of this book.

  • The value can change over time, and it can change type (for example, it can change from an integer to a float to a string and back to an integer)

  • Numeric literals are in the following format:

    - Integers:

    • Decimal: 10 -2884 2_248_188 The underscore can be used to chunk numbers for ease of reading, but don't use the usual comma format, as in 2,248,188, because that means something completely different.

    • Octal: 037 04214

    • Hexadecimal: 0x25 0xaB80

    - Floats:

    • Standard: 7.1 .8 9.

    • Scientific: 3 . 4E-34 -8 . 023e43

  • String literals can be created with either single quotes or double quotes:

    - Single quotes: hello, world

    - Double quotes (needed to evaluate variables and special characters): "hello, world\n" , "\$c is now $c\n"

One must take care with quotes and scalar variables in Perl.

First, there is a difference between a tick () and a backtick (`), and a double quote ( " ). Mostly, we use ticks and double quotes ”the backtick has a special purpose in Perl, which we will talk about later. To show how these affect scalar variables within quotes, you should assume that $ name has a value:

 $name = Larry Wall; 

Because scalar variables are replaced with their values within double-quoted strings, the string "Hello $name!" has the value Hello, Larry Wall! If the $ is escaped with the backslash, "Hello \$name!" , the value is Hello $name! . Single quotes do not replace the variable with its value, so Hello $name! has the value Hello $name! .

Array Variables

An array is an ordered collection of scalars. Array names begin with the @ character:

 @data = (Joe, 39, "Test data", 49.3); 

Like scalars, they can be declared with my() , as in:

 my @data = (Joe, 39, "Test data", 49.3); 

The data in this array includes strings, an integer, and a floating point number. Unlike many languages, Perl allows the scalars in an array to have different types (they are all really the same data type: scalars). This example shows printing an array variable:

 #! /usr/bin/perl -w  # file: array1.pl  use strict;  my @data = (Joe, 39, "Test data", 49.3);  print "Within double quotes: @data\n";  print "Outside any quotes: ", @data, "\n";  print Within single quotes: @data, "\n"; 

This is what prints:

 $  ./array1.pl  Within double quotes: Joe 39 Test data 49.3  Outside any quotes: Joe39Test data49.3  Within single quotes: @data 

Placing the array within the double quotes inserts spaces between the array elements, while placing it outside does not, and placing it within single quotes prints the array name itself, not the elements thereof.

As in C, Perl arrays start at element . For @data , the elements are:

 0: Joe  1: 39  2: Test data  3: 49.3 

To access an individual array element, use the $array_name[index] syntax:

 #! /usr/bin/perl -w  # file: array2.pl  use strict;  my @data = (Joe, 39, Test data, 49.3);  print "element 0: $data[0]\n";  print "element 1: $data[1]\n";  print "element 2: $data[2]\n";  print "element 3: $data[3]\n"; 

That code produces:

 $  ./array2.pl  element 0: Joe  element 1: 39  element 2: Test data  element 3: 49.3 

A useful array- related variable is $#array_name , the last index of the array. For example:

 #! /usr/bin/perl -w  # file: array3.pl  use strict;  my @data = (Joe, 39, Test data, 49.3);  print "last index: $#data\n"; 

That code produces:

 $  ./array3.pl  last index: 3 
Array Functions

Perl has a variety of functions that implement commonly performed array functions, saving you from having to write your own.

push() and pop() The push() and pop() functions modify the right side of an array. The push() function adds elements to the right, and pop() removes the rightmost element:

 #! /usr/bin/perl -w  # file: pushpop.pl  use strict;  my @a = (2, 4, 6, 8);  print "before: @a\n";  push(@a, 10, 12);  print "after push: @a\n";  my $element = pop(@a);  print "after pop: @a\n";  print "element popped: $element\n"; 

That code changes the array in the following fashion:

 before: 2 4 6 8  after push: 2 4 6 8 10 12  after pop: 2 4 6 8 10  element popped: 12 

unshift() and shift() These functions are similar to push() and pop() except that they operate on the left side of the array.

 #! /usr/bin/perl -w  # file: unshiftshift.pl  use strict;  my @a = (2, 4, 6, 8);  print "before: @a\n";  unshift(@a, 10, 12);  print "after unshift: @a\n";  my $element = shift(@a);  print "after shift: @a\n";  print "element shifted: $element\n"; 

That code alters the array:

 $  ./unshiftshift.pl  before: 2 4 6 8  after unshift: 10 12 2 4 6 8  after shift: 12 2 4 6 8  element shifted: 10 

sort() and reverse() These functions are nondestructive ”they don't change the arrays or lists that are being sorted. The sort() function sorts lists (as you might have guessed), and the reverse() function reverses lists (as you might have guessed):

 #! /usr/bin/perl -w  # file: sortreverse.pl  use strict;  my @a = (hello, world, good, riddance);  print "before: @a\n";  my @b = sort(@a);  print "sorted: @b\n";  @b = reverse(@a);  print "reversed: @b\n";  $  ./sortreverse.pl  before: hello world good riddance  sorted: good hello riddance world  reversed: riddance good world hello 
Hash Variables

Hashes (also known as associative arrays ) are arrays that are indexed not by a number but by a string. In other words, instead of accessing an array by its position with an element like or 1 , we can access a hash with an index like name or age. For instance, data indexed as an array:

 index       value  -----       ----- 0           Joe  1           39  2           555-1212  3           123 Main. St.  4           Chicago  5           IL  6           60601 

could instead be indexed as a hash:

 key         value  -----       ----- name        Joe  age         39  phone       555-1212  address     123 Main. St.  city        Chicago  state       IL  zip         60601 

By indexing with a string, instead of a number, we can access the data structure by a more meaningful (at least to us humans ) bit of information.

A hash variable is defined to be a collection of zero or more key/value pairs (a hash can be empty, with zero pairs). The keys must be unique strings; the values can be any scalar. A hash variable name begins with the percent sign: %person .

To create a hash, assign to it a list, which is treated as a collection of key/value pairs:

 %person = (name, Joe,      age, 39,      phone, 555-1212,      address, 123 Main St.,      city, Chicago,      state, IL,      zip, 60601); 

Or you can use the => operator to make the code more readable:

 %person = (name    => Joe,      age     => 39,      phone   => 555-1212,      address => 123 Main St.,      city    => Chicago,      state   => IL,      zip     => 60601); 

If the key contains no space characters, like the example above, it does not need to be quoted if the => operator is used.

Accessing a value in a hash is similar to accessing values in an array, indexed with the key string in curly braces, instead of a number within square brackets: $person{name} . The key does not need to be quoted within the curly braces (unless the key contains a whitespace character). For example:

 #! /usr/bin/perl -w  # file: hash1.pl  use strict;  my %person = (name    => Joe,      age     => 39,      phone   => 555-1212,      address => 123 Main St.,      city    => Chicago,      state   => IL,      zip     => 60601);  print "$person{name} lives in $person{state}\n";  print "$person{name} is $person{age} years old\n"; 

That code produces:

 $  ./hash1.pl  Joe lives in IL  Joe is 39 years old 

A hash can be declared with the my() operator just like scalars and arrays, and they must be declared if we do a use strict; .

Hash Functions

The most common way to process a hash is by looping though the keys. The keys are obtained by executing the built-in keys() function. For example:

 #! /usr/bin/perl -w  # file: hash2.pl  use strict;  my %person = (name    => Joe,      age     => 39,      phone   => 555-1212,      address => 123 Main St.,      city    => Chicago,      state   => IL,      zip     => 60601);  my @k = keys(%person);  print "The keys are: @k\n"; 

That code produces:

 $  ./hash2.pl  The keys are: state zip address city phone age name 

What's this? The keys() function returns the keys in an apparently random order. They are not in the order created, nor the reverse, nor sorted in any obvious way, but according to how the key hashes. In Perl, the keys are returned in the order in which they are stored in memory. The keys can be sorted by using the (surprise!) sort() function. For example:

 #! /usr/bin/perl -w  # file:  hash3.pl  use strict;  my %person = (name    => Joe,      age     => 39,      phone   => 555-1212,      address => 123 Main St.,      city    => Chicago,      state   => IL,      zip     => 60601);  my @k = keys(%person);  my @sk = sort(@k);  # or: @sk = sort(keys(%person));  print "The sorted keys are: @sk\n"; 

That code produces:

 $  ./hash3.pl  The sorted keys are: address age city name phone state zip 

4.3.5 Operators

Perl operator precedence, the order in which math things are executed, is C-like. It's also junior high school algebra-like ”the usual stuff. For exact rules, see one of the recommended books or man perlop . We discuss each operator but not in precedence order.


Perl arithmetic is quite C-like, as you might expect.












exponentiation [7]

[7] Strictly speaking, ** is FORTRAN-like, not C-like, though that's C's loss.

For example:

 #! /usr/bin/perl -w  # file: operators.pl  use strict;  my $i = 10;  my $j = 4;  print $i + $j = , $i + $j, "\n";  print $i * $j = , $i * $j, "\n";  print $i / $j = , $i / $j, "\n";  print $i % $j = , $i % $j, "\n";  print $i ** $j = , $i ** $j, "\n"; 

gives the results:

 $  ./operators.pl  $i + $j = 14  $i * $j = 40  $i / $j = 2.5  $i % $j = 2  $i ** $j = 10000 

One difference from C-like arithmetic is that when Perl divides integers, the result is automatically typed as floating point ”not the usual C integer division, which truncates the result to an integer. That's much more sensible , as with Perl providing an exponentiation operator. To emulate integer division, use the built-in int() function:

 $integer_value = int($a / $b); 

As befits its origins, Perl has powerful built-in string processing. Perl has two useful string operators:


string concatenation


string replication

For example:

 #!/usr/bin/perl -w  # file: string.pl  use strict;  my $a = hello;  my $b = world;  my $msg = $a .  . $b;  print "$msg : $msg\n";  $a = hi;  $b = $a x 2;  my $c = $a x 5;  print "$b : $b\n";  print "$c : $c\n"; 

Executing that code gives:

 $  ./string.pl  $msg : hello world  $b : hihi  $c : hihihihihi 
True and False

There are three false values in Perl:

empty string

the string of the single character 0

the number 0

Every other value is true. So in Perl, the following are all true values:

 1  10  hi 
Numerical Comparison

Perl has the usual numerical comparison operators, with one addition, the compare operator, <=> . All of these, with the exception of the compare operator, evaluate to either true (1) or false (empty string ”not zero!). The following are the operators:


less than


less than or equal to


greater than


greater than or equal to


equal to


not equal to



Here are some examples:

 if ($i < 10) {      print "$i is less than 10\n";  }  if ($j >= 20) {      print "$j is greater than or equal to 20\n";  } 

A common mistake (not just in Perl) is to use the assignment operator ( = ) when comparing numbers instead of the equivalence operator ( == ). The former actually changes one of the values ”the statement $a = $b changes $a to be the same as $b , while the latter returns a logical true or false but changes neither $a nor $b . Remember that the two operations are different and have different effects.

The compare operator, <=> , evaluates to either 1 , , or -1 . The expression:

 $a <=> $b 

evaluates to:

 1      if $a > $b  0      if $a == $b  -1     if $a < $b 
String Comparison

Since Perl scalars are either numbers or strings, and we already know how to compare numbers, we need a way to compare strings ”string comparison operators. These are similar to the numeric operators in that, with the exception of cmp , they evaluate to either true or false:


less than


less than or equal to


greater than


greater than or equal to


equivalent to


not equal to



String comparison is an ASCII-betic (similar to but not the same as alphabetic) comparison based on the ASCII values ( man ascii ) of the characters.

 if ($name gt Joe) {      print "$name comes after Joe alphabetically\n";  }  if ($language ne Perl) {      print "Your solution is a 2 step process. Step 1, install Perl.\n";  } 

The string comparison operator, cmp , evaluates to either 1, 0, or -1 . The expression:

 $a cmp $b 

evaluates to:

 1      if $a gt $b  0      if $a eq $b  -1     if $a lt $b 
Increment and Decrement

Perl has the following autoincrement and autodecrement operators:





These operators provide a simple way to add 1 to a variable (with ++ ) or subtract 1 from a variable ( -- ). The location of the operator is important. If placed on the left side of the variable, it increments before the variable is used (preincrement); on the right side, it increments after the variable is used (postincrement). For example:

 #! /usr/bin/perl -w  # file: increment.pl  use strict;  my $i = 10;  my $j = ++$i;  print "$i = $i, $j = $j\n";  $i = 10;  $j = $i++;  print "$i = $i, $j = $j\n"; 

Executing that code gives:

 $  ./increment.pl  $i = 11, $j = 11  $i = 11, $j = 10 

To illustrate pre- and postincrements, consider the statement:

 $j = ++$i; 

$j is assigned the result of ++$i . This preincrement (since the ++ is on the left side of $i ) causes the variable $i to be incremented first to 11 ; then the value of $i is assigned to $j . The result is that both $i and $j have the value 11 .

Conversely, consider this statement:

 $j = $i++; 

In the postincrement, the ++ is on the right side of $i . $i is incremented after its value is taken. First, $j is assigned the current value of $i , 10 ; then $i is incremented to 11 .

The decrement operator ( -- ) works in a similar manner: If the -- is on the left side, it is a predecrement, subtracting 1 first, if it is on the right side, it is a postdecrement, subtracting 1 last.


Perl's logical operators are similar to C and C++, with the addition of three very readable ones:


logical and


logical or


logical not


logical and


logical or


logical not

You can use either or both, depending on your tastes. There is a difference in precedence. The operators and , or , and not have extremely low precedence ”they are at the bottom of the precedence table. See man perlop for all the details.

Here are a couple of examples of their use:

 if ($i >= 0 && $i <= 100) {      print "$i is between 0 and 100\n";  }  if ($answer eq y or $answer eq Y) {      print "$answer" equals y or Y\n";  } 

The low precedence of and , or , and not can work in our favor, allowing us to drop parentheses in some cases (we'll see this later when we discuss short-circuited logic operators), but it can cause us problems as well. Here are some gotchas:

 $a = $b && $c;               # like: $a = ($b && $c);     good!  $a = $b and $c;              # like: ($a = $b) and $c;    oops!  $a = $b  $c;               # like: $a = ($b  $c);     good!  $a = $b or $c;               # like: ($a = $b) or $c;     oops! 

4.3.6 Flow-Control Constructs

Flow-control constructs are the familiar if , else , while , etc. Do this if this condition is met, do that if not ”flow control. An important thing about these constructs in Perl, unlike C, is that the curly braces, {}, are required . Omitting them is a syntax error.

Conditional Constructs

The if statement is the most common conditional construct. The syntax for it is:

 if (  condition  ) {  statements  } else {  statements  } 

If the condition in parentheses evaluates as true, the first set of statements is executed; otherwise , the second set of statements is executed. The else part is optional. The condition can be any statement that evaluates to true or false, such as $a < $b , or far more complicated things.

Nested if statements have this syntax:

 if (  condition1  ) {  statements  } elsif (  condition2  ) {  statements  } else {  statements  } 

Here is an example of the if statement:

 if ($answer eq yes) {      print "The answer is yes\n";  } else {      print "The answer is NOT yes\n";  } 
Looping Constructs

The while Loop A common looping construct, it has the syntax:

 while (  condition  ) {  statements  } 

The condition is tested , and if true, the statements are executed; then the condition is tested, and if true, the statements are executed; then the condition is tested ... You get the picture. The loop terminates when the condition is false. For example:

 $i = 1;  while ($i <= 5) {      print "the value is: $i\n";      $i++;  } 

Executing that code produces:

 the value is 1  the value is 2  the value is 3  the value is 4  the value is 5 

One might process an array variable (keep in mind that $#names is the last index of the array @names ):

 #! /usr/bin/perl -w  # file: whilearray.pl  use strict;  my @names = (Joe, Charlie, Sue, Mary);  my $i = 0;  while ($i <= $#names) {      print "Element $i is $names[$i]\n";      $i++;  } 

That code produces:

 $  ./whilearray.pl  Element 0 is Joe  Element 1 is Charlie  Element 2 is Sue  Element 3 is Mary 

Or one might process a hash sorted by its keys:

 #! /usr/bin/perl -w  # file: whilehash.pl  use strict;  my %capitals = (Illinois   => Springfield,      California => Sacramento,      Texas      => Austin,      Wisconsin  => Madison,      Michigan   => Lansing);  my @sk = sort(keys(%capitals));  my $i = 0;  while ($i <= $#sk) {      print "$sk[$i]:  \t$capitals{$sk[$i]}\n";      $i++;  } 

The \t is the tab character. That code produces:

 $  ./whilehash.pl  California:     Sacramento  Illinois:       Springfield  Michigan:       Lansing  Texas:          Austin  Wisconsin:      Madison 

The for Loop This loop has the syntax:

 for (  init_expression; condition; step_expression  ) {  statements  } 

The init_expression is executed first and only once. Then the condition is tested to be true or false. If true, the statements are executed, then the step_expression is executed, and then the condition is tested. The loop stops when the condition evaluates as false.

Our earlier while loop example could have been written to process @names as:

 for ($i = 0; $i <= $#names; $i++) {      print "Element $i is $names[$i]\n";  } 

That gives the same result, and the %capitals example could be written using the for as:

 for ($i = 0; $i <= $#sk; $i++) {      print "$sk[$i]:  \t$capitals{$sk[$i]}\n";  } 

There's more than one way to do it.

The foreach Loop Perl has a nifty loop used to process lists and arrays called the foreach loop. It has this syntax:

 foreach  scalar_variable  (  list  ) {  statements  } 

The statements are executed for each item in the list (hence the name), meaning we don't have to know how long the list is or check to see whether the end has been reached. For example:

 #! /usr/bin/perl -w  # file: foreach1.pl  use strict;  my @a = (2, 4, 6, 8);  foreach my $i (@a) {      print $i,  ** , $i,  = , $i ** $i, "\n";  } 

That code produces:

 $  ./foreach1.pl  2 ** 2 = 4  4 ** 4 = 256  6 ** 6 = 46656  8 ** 8 = 16777216 

Did you see the way $i was declared with my() within the foreach construct?

This example shows the control variable being modified. Note that when the control variable is modified, the array element itself is modified:

 #!/usr/bin/perl -w  # file: foreach2.pl  use strict;  my @a = (2, 4, 6, 8);  foreach my $i (@a) {      $i = $i * 2;  }  print "@a\n"; 

This code produces:

 $. /foreach 2.pl  4 8 12 16 

As strange as this may sound, the keywords for and foreach are interchangeable. We can use for in place of foreach (which is not uncommon) and foreach in place of for (which is rare). So don't be surprised if you see code that resembles for (@a) { ... }.

Other Constructs

There are more control flow constructs in Perl, including unless and until , which we will discuss later.

4.3.7 Regular Expressions

Regular expressions ( regexes ) are a powerful, strange, beautiful, and complex part of Perl. These can match a string against a general pattern, perform string substitution, and extract text from a string. This sounds simple, but within this simple notion, a world of complexity lies. Much of Perl's power and usefulness derives from the use of regular expressions. So much more can be done with this than we can describe here ”the Camel Book [Wall+ 00] expends many pages on this, but we will try to explain the useful basics in just a few.

To match a string, use the following syntax:

 $string_variable =~ /pat/ 

That is =~ (equal-tilde), not = ~ (equal-space-tilde).

This returns true if the variable $string_variable matches the pattern pat , false if it does not. For a string to match a pattern, only a portion of the string needs to match, not the entire string. For example:

 if ($name =~ /John/) {      print "$name matches `John\n";  } 

If the variable $name matches the pattern "John" , the if condition is true and the result is printed. In order for $name to match the regex "John" , only a portion of the string needs to match. For instance, all the following values for $name would match the regex ` John :

 John  John Lennon  Andrew Johnson 

When we create regexes, we abstract patterns to be compared with the text. Let's examine the rules of patterns.


Most characters match themselves :


matches a


matches b


matches a immediately followed by b


matches a immediately followed by b immediately followed by c

Some characters have a special meaning:


matches any character except newline (\n)


matches the beginning of the string


matches the end of the string



match a followed by any character but \n followed by b




^ abc

match a string that begins with ` abc




match a string that ends with abc


I learned my abc

Character Classes

Square brackets, [] , create a character class, which matches any single character in that class. A dash ( - ) can be used to specify a range:


one of either a or b or c or d


the same


one digit character


one alpha character, upper or lowercase

If the caret character ( ^ ) is the first character in the class, any single character that is not in the class listed will be matched:

[ ^ abcd]

any character except a or b or c or d

[ ^ a-d]

the same

[ ^ 0-9]

one nondigit character

[ ^ a-zA-Z]

one nonalpha

As an example of the previous concepts, let's say we want to match a phone number in the following format: 800 867 5309 . This is a regex to match any number of this format:

 /[0-9][0-9][0-9] [0-9][0-9][0-9] [0-9][0-9][0-9][0-9]/ 

This is read as one digit followed by one digit, followed by one digit, followed by one space (the spacebar), followed by one digit, followed by ... This is one place where whitespace does count!

Some character classes occur frequently and are given predefined character classes:


digit character [0-9]


nondigit character


word character [a-zA-Z0-9_]


nonword character


space character [ \r\n\t\f]



The phone-number-matching regex could be rewritten:


That is, three digits followed by one space character, followed by three digits, followed by one space character, followed by four digits.


Quantifiers express quantity. How many of an element are there? How does one match multiple instances? It's awkward to use the multiple instances of [0-9] or \d\d\d as we did earlier. Quantifiers allow us to clean up this sort of thing. These are the Perl quantifiers:


0 or more


1 or more


0 or 1


exactly m


m or more


m through n , inclusive

Quantifiers operate on the element immediately to the left of the quantifier (parentheses can be used to apply precedence):


match x followed by or more y followed by z


match or more xy followed by z

For example:


match x followed by or more y followed by z







match x followed by 1 or more y followed by z






match x followed by or 1 y followed by z




match x followed by exacty 3 y followed by z



match x followed by y followed by exacty 2 z followed by y



match x followed by 3 or more y followed by z






match x followed by 3 to 5 y followed by z




So the phone-number-matching regex can be rewritten yet again:


And even as this:


One could argue that the previous code is approaching being unreadable. [8] Ain't Perl fun?

[8] On the old-school theory of "It was hard to write, it should be hard to read."


Perl has a mechanism that allows it to remember patterns previously matched. Expressions that have been matched by a regex within a set of parentheses are stored in the special variables $1 , $2 , $3 , and so on. [9]

[9] This memory function can also help with matching recurring text by using the related variables \ 1, \ 2, and \ 3 ”see the Camel Book [Wall+ 00] for details.

When one set of parentheses is used, the text within it is stored in $1 :

 #!/usr/bin/perl -w  # file: memory1.pl  use strict;  my $name = John Doe;  if ($name =~/  ^  (..)/) {      print "  ^  (..) : \n";  }  if ($name =~/  ^  (\w+)/) {      print "  ^  (\w+)  :  \n";  }  if ($name =~/(\w+)$/) {      print "(\w+)$  :  \n";  } 

Executing that code produces:

 $  ./memory1.pl   ^  (..) : Jo  ^  (\w+) : John  (\w+)$ : Doe 

If more than one set of parentheses is used, the first set will remember its characters in $1 , the second set in $2 , the third set in $3 , and so on.

For instance, you could have a file full of records with three fields, containing an account number, name, and phone number, and write a regex that would go through the record and print each field separately:

 #!/usr/bin/perl -w  # file: memory2.pl  use strict;  # create a record, 3 fields, colon-separated  my $record = 32451:John Doe:847 555 1212;  # see if the record contains:  #  beginning of the string  #  followed by 0 or more of any character but \n  #    (remembered into)  #  followed by a colon (`:)  #  followed by 0 or more of any character but \n  #    (remembered into)  #  followed by a colon (`:)  #  followed by 0 or more of any character but \n  #    (remembered into)  #  followed by the end of the string  if ($record =~/  ^  (.*):(.*):(.*)$/) {      print "The record is:\n";      print " account number: \n";      print " name:           \n";      print " phone number:   \n";  } 

Executing that code produces:

 $  ./memory2.pl  The record is:      account number: 32451      name:           John Doe      phone number:   847 555 1212 

This record is only one entry long for this example, but we could have processed more than one entry. One could also imagine creating a regex that matched " John Doe ," his phone number, or his account number.

Regular expressions are incredibly useful ”now you can write a script that goes through your old e-mail files, picking out phone numbers. To make it robust, you'd have to figure out how to disregard parentheses, dashes, and dots, but that isn't hard to do. Or you could write a script to scan your Apache logs for a certain IP address.

You can also apply these same principles to the Unix commands grep , sed , and awk and their variants, which also use regular expressions with minor differences. This knowledge adds a great deal of utility to system administration and just general user coolness.

4.3.8 Functions

User-defined functions are created as follows:

 sub function_name {      # body  } 

In Perl, the terms function and subroutine are used interchangeably. Some programmers like to call them functions, other prefer subroutine. [10] We like the term function. TMTOWTDI. To invoke the function, do this:

[10] Some say a function is a subroutine that returns a value, but this is not a firm definition.


For example:

 #! /usr/bin/perl -w  # file: function1.pl  sub say_hello {      print "hello, world!\n";  }  say_hello(); 

Executing that program produces:

 $  ./function1.pl  hello, world! 

By convention, functions are defined above the point where first called but can be defined after it.

 #! /usr/bin/perl -w  # file: function2.pl  say_hello();  sub say_hello {      print "hello, world\n";  } 

Usually, we prefer to define functions before they are used, especially because it allows us to use the lazy way of calling functions by dropping the parentheses: say_hello; . However, the lazy way of invoking functions is not suggested by most Perl programmers, so most of us would suggest using the parentheses. So, before or after ”no difference. Still, we put them before. TMTOWTDI.

Return Values

All Perl functions return the last value evaluated by the function. The following example shows two functions returning values differently. The function test1() returns a scalar value, and the function test2() returns a list:

 #!/usr/bin/perl -w  # file: return1.pl  sub test1 {      $a = 10;      $b = 11;      # return the sum      $a + $b;  }  sub test2 {      @a = (testing, one, two, three);      # return the sort of @a      sort(@a);  }  $c = test1();  print "$c = $c\n";  @b = test2();  print "\@b = @b\n"; 

Executing that code produces:

 $  ./return1.pl  $c = 21  @b = one testing three two 

Perl also has the return operator, which can be used to escape gracefully from the middle of a block or simply for readability, which is a Good Thing.

 #!/usr/bin/perl -w  # file: return2.pl  sub pick_a_restaurant {      if ($cash > 150.00) {          return Chez Paul;      } elsif ($cash > 50.00) {          return "Pete Miller's Steak House";      } elsif ($cash > 10.00) {          return "Pancakes R Us";      } else {          return Fast Food Delight;      }  }  # we are a little light today...  $cash = 10.00;  print You should eat at: , pick_a_restaurant(), "\n"; 

Executing that code produces:

 $  ./return2.pl  You should eat at: Fast Food Delight 

The variable $cash used within pick_a_restaurant() is global because all variables in a function are, by default, global. Using global variables is something one should not do without intent, so we need to learn how to use local variables.

Local Variables

To create local variables in a function, use my() :

 #! /usr/bin/perl -w  # file: function3.pl  use strict;  sub print_my_i {      my $i = 10;      print "in print_my_i(): $i\n";  }  my $i = 20;  print "outside print_my_i(): $i\n";  print_my_i();  print "outside print_my_i(): $i\n"; 

Using use strict; requires that the variables within the function be declared and makes them local ”another good reason to do a use strict; .

Executing that code produces the following result. The $i within print_my_i() is lexically scoped with the function and does not change the value of the global $i .

 $  ./function3.pl  outside print_my_i(): 20  in print_my_i(): 10  outside print_my_i(): 20 
Function Arguments

Arguments are passed into functions through the special array @_ :

 #! /usr/bin/perl -w  # file: printargs1.pl  use strict;  sub print_args {      my $i = 0;      # loop through @_ and print each element      # and yes, that is $#_, the last index of @_      while ($i <= $#_) {          print "arg $i: $_[$i]\n";          $i++;      }  }  # some variables to pass in  my $num = 10;  my $name = Joe;  print_args($num, $name, 3.14159, hello, world!); 

Executing that code produces:

 $  ./printargs.pl  arg 0: 10  arg 1: Joe  arg 2: 3.14159  arg 3: hello, world! 

Arguments are often copied into my variables like this:

 #! /usr/bin/perl -w  # printargs2.pl  use strict;  sub print_args_2 {      # copy the arguments into $a, $b and $c      my($a, $b, $c) = @_;      # print the arguments      print "$a is: $a\n";      print "$b is: $b\n";      print "$c is: $c\n";  }  my $num = 10;  my $name = Joe;  print_args_2($num, $name, 3.14159); 

Executing that code produces:

 $  ./printargs2.pl  $a is: 10  $b is: Joe  $c is: 3.14159 

Next, we show an example of a program with two functions. The first, munge_phone() , checks a phone number to see whether it is in this format: 847 555 1212 ; if so, extracts the three-digit parts and returns the string in the format (847) 555-1212 . If the phone number does not match the required format, the function prints a statement to that effect and exits gracefully.

The second function, matches_class() , takes two arguments: The first is the string to be matched, and the second, the character class to match it against. If the string contains one or more characters of the class, it indicates a match, and if not, indicates no match. In both cases, it returns the string it tried to match.

 #!/usr/bin/perl -w  # file: function4.pl  use strict;  sub munge_phone {      my($phone) = @_;      # check to see that the phone number is      # in the form 847 555 1212      if ($phone =~ /(\d{3})\s(\d{3})\s(\d{4})/) {          return "() -";      } else {          return Phone improperly formed;      }  }  sub matches_class {      my($str, $char_class) = @_;      if ($str =~ /  ^  [$char_class]+$/) {          print "[$str] matches /[$char_class]/\n";      } else {          print "[$str] does not match /[$char_class]/\n";      }  }  # let's try a properly formed phone number  my $p1 = phone number: 847 555 1212;  my $p2 = munge_phone($p1);  print "before: $p1   after: $p2\n";  # now let's try an improperly formed phone number  my $p3 = 847 555-1212;  my $p4 = munge_phone($p2);  print "before: $p3   after: $p4\n";  # let's check some strings against a class  my $s1 = A string of only alphas and space;  my $s2 = A string with 1 digit;  my $c = a-zA-Z ;  matches_class($s1, $c);  matches_class($s2, $c); 

When that program is executed, this is the result:

 $  ./function4.pl  before: phone number: 847 555 1212   after: (847) 555-1212  before: 847 555-1212   after: Phone improperly formed  [A string of only alphas and space] matches /[a-zA-Z ]/  [A string with 1 digit] does not match /[a-zA-Z ]/ 

4.3.9 File I/O

Since the Web is interactive, Perl scripts which don't interact with the outside world will be of limited use. The scripts have to be able to accept input and write output. This is where file input/output comes in, or File I/O.

Standard Output

We have already been sending text to standard output (probably right on the terminal on the monitor you're looking at now) using the print() function:

 print "hello, world\n"; 
Standard Input

You can imagine how it might be useful to be able to read data from the outside world. This can be done using standard input, <STDIN> , which is typically the keyboard, though it can be redirected to be a file or something else.

To read input into a scalar variable, up to and including the newline character \n , you need only to add $line = <STDIN>; .

More usefully, this example reads the next line of data into $line , including the newline character:

 #! /usr/bin/perl -w  # file: stdin1.pl  use strict;  print "Enter your name: ";  my $name = <STDIN>;  print "Hello $name!\n"; 

Executing that program produces:

 $  ./stdin1.pl  Enter your name:  J. Random Luser  Hello J. Random Luser  ! 

The exclamation point ( ! ) appeared on the next line after the name. That is because <STDIN> reads up to and including the newline character, so $name contains the text " J. Random Luser\n ." To remove the newline character, chomp() it:

 $name = <STDIN>;  chomp($name); 


 chomp($name = <STDIN>); 

For example:

 #! /usr/bin/perl -w  # file: stdin2.pl  use strict;  my $name;  print "Enter your name: ";  chomp($name = <STDIN>);  print "Hello $name!\n"; 

Executing that code produces:

 $  ./stdin2.pl  Enter your name:  J. Random Luser  Hello J. Random Luser! 

You can also read the input into an array variable:

 @all_lines = <STDIN>; 

That reads all remaining lines of standard input into the array @all_lines . Each line of input (including the newline character) is now an element of @all_lines . This behavior, reading until the end of file in one statement, is so important that it has a special term: a STDIN slurp .

 #! /usr/bin/perl -w  # stdin3.pl  use strict;  print "Enter your text: \n";  my @all_lines = <STDIN>;  my $i = 0;  while ($i <= $#all_lines) {      print "line $i: $all_lines[$i]";      $i++;  } 

That code produces:

 $  ./stdin3.pl  Enter your text:  hello   world   good bye   ^D  line 0: hello  line 1: world  line 2: good bye 

The ^ D character (Ctrl + D) is the end-of-file character for standard input. Also, each line of text contains the newline character, so when we print each line of text in line 6 of the program, we do not need to include the newline character. The text can be read, and the newlines removed on each line, by passing the array variable as the argument to chomp() :

 @all_lines = <STDIN>;  chomp(@all_lines); 


 chomp(@all_lines = <STDIN>); 

Perl programmers often slurp and chomp() but otherwise have good manners.

Reading from a File

To read from (or write to) a file, you must first open the file. To open a file, use the open() function:

 # this opens test.txt in read mode  open FH, <test.txt; 

FH is a filehandle variable. By convention, all filehandles are named with uppercase letters . The less-than symbol ( < ) indicates that the file is to be opened in read mode. Read mode happens to be the default, so if the lessthan symbol is not used, the file will still be opened in read mode. Some consider it good programming practice to explicitly open in read mode (it is more secure; we talk about this in Chapter 7), so we do that in this book, but if you choose to drop the less-than symbol, it will work just the same.

The open() function returns true if successful, false if it fails. If open() returns false, we should handle this as a serious error, and in Perl, when serious errors occur, we die() (serious business, this programming stuff). The following code will die() if the file fails to open: [11]

[11] Death, when programming Perl, is not inherently bad, except when programming CGIs. CGI programs need to die in a more acceptable fashion ”we talk about this in Chapter 7.

 if (! open (FH, <test.txt)) {      die "Can't open test.txt: $!";  } 

The die() function takes its argument, prints it to standard error output, cleans up, and exits with a nonzero exit status. The funny -looking variable $! contains the error status of why the open failed. Although that code works, it is more Perl-like to write it thus:

 open(FH, <test.txt) or die "Can't open test.txt: $!"; 

We will discuss the logic of the or operator in a bit, when we talk about short-circuited evaluation. If you can't wait for that discussion, then this logic means that if the file test.txt can't be opened, the program will die() .

 #! /usr/bin/perl -w  # file: file1.pl  use strict;  my $line;  open (FH, <test.txt) or die "Can't open test.txt: $!";  while ($line = <FH>) {      print "Line is: $line";  }  close (FH); 

At the end of the program, close the file: close() . Boy, this Perl stuff is complicated. It is considered good style to close a file once you are finished with it.

So if test.txt contains:

 goodbye  cruel  world  Perl is cool 

and you operate on it with file1.pl , you get:

 $  ./file1.pl  Line is: goodbye  Line is: cruel  Line is: world  Line is: Perl is cool 

A complete file can also be read into an array variable:

 @all_lines = <FH>; 

The preceding line of code reads the entire contents of the file into the array @all_lines . Each line of the file is stored as an individual array element, newlines included. This is called a file slurp .

 #! /usr/bin/perl -w  # file2.pl  use strict;  open (FH, `<test.txt) or die "Can't open test.txt: $!";  my @all_lines = <FH>;  my $i = 0;  while ($i <= $#all_lines) {      print "Line is: $all_lines[$i]";      $i++;  }  close (FH); 

Executing that code produces:

 $  ./file2.pl  Line is: goodbye  Line is: cruel  Line is: world  Line is: Perl is cool 

The newlines can be removed with chomp() :

 chomp(@all_lines = <FH>); 
Writing to a File

To open a file in write mode, use the open() function:

 open FH, >output.txt or die "Can't open output.txt: $!"; 

The > tells Perl that the file is to be opened in write mode. This will overwrite the file if it exists, so be careful! To write into the file, execute print() with an additional argument: the filehandle variable:

 print FH "hello, world\n"; 

Yes, that's right ”there is no comma after FH . Here is an example:

 #! /usr/bin/perl -w  # file3.pl  use strict;  my $line;  open (FH, >output.txt) or die "Can't open output.txt: $!";  while ($line = <STDIN>) {      print FH "You entered: $line";  }  close (FH); 

This program loops through standard input, writing the line entered to the file output.txt prepended with the text "You entered: " . If the program is executed like this:

 $  ./file3.pl   hello   world   good bye   ^D  

the resulting contents of output.txt would be:

 You entered: hello  You entered: world  You entered: good bye 

4.3.10 Additional Perl Constructs

Perl allows you to write code that reads a lot like English. It also allows you to write code that's completely unreadable, but that's another issue. [12]

[12] The Perl Journal (www.tpj.com/) conducts the annual Obfuscated Perl Coding Contest ”an enlightening, if somewhat frightening, experience.

The unless Statement

The unless is like the if , except the logic is reversed. The basic syntax is:

 unless (  condition  ) {  statements  } 

The condition is tested, and if it is false, the statements are executed.

Thus, this if statement:

 if (not $done) {      print "keep working!\n";  } 

can be written as:

 unless ($done) {      print "keep working!\n";  } 
The until Loop

The until loop is like the while loop, except the logic is reversed. The basic syntax is:

 until (  condition  ) {  statements  } 

The condition is tested, and if it is false, the statements are executed. Then the condition is tested, and if it is false ...

Thus, this while loop:

 while (not $done) {      keep_working();  } 

can be rewritten as:

 until ($done) {      keep_working();  } 
Expression Modifiers

Expression modifiers Perl has ”also known as the backward (Yoda) form (TMTOWTDI). For instance, the if statement:

 if (  condition  ) {  statements  } 

can be written backward as:

  statements  if  condition;  

The following if statement:

 if ($hungry) {      eat();  } 

can be rewritten as:

 eat() if $hungry; 

And this statement:

 if ($error_condition) {      die "we have an error...";  } 

can be written as:

 die "we have an error..." if $error_condition; 

The unless can be written in backward form as well:

 unless ($happy) {      complain();  } 

rewritten can be as:

 complain() unless $happy; 

The while and until loops work this way as well:

 while ($tired) {      sleep();  } 

can be written as:

 sleep() while $tired; 

This until loop:

 until ($done) {      cook_the_burgers();  } 

can be written as:

 cook_the_burgers() until $done; 

Expression modifiers can be used only when the body of the construct is a simple statement. In other words, the body cannot be an if or a while or any other multistatement body. For instance, this is illegal in Perl (but possible in reality):

 print "rebooting..." if $os ne Linux until $end_of_time; 
Short-Circuited Logic Operators

The logic operators && , and , , and or are short-circuited ”if the entire result of the operator can be established at any point, the rest of the expression is not evaluated, because it doesn't need to be. With the logical and operator, if the first operand is false, we know that the result of the operand is false (because "false and false" is false and "false and true" is false). Therefore, in expression1 && expression2 , if expression1 is false, expression2 is not evaluated because of this short-circuited nature. If expression1 is true, expression2 must be evaluated to determine the result of the && operator. Thus, the && is logically equivalent to the if statement. This code:

  expression1  &&  expression2  

is equivalent to this:

 if (  expression1  ) {  expression2  } 

Therefore, this if statement:

 # set this variable to true (1) if we want to print  # debug output, 0 if not  $debug = 1;  ...  if ($debug) {      print_debug_output();  } 

can be written as:

 $debug && print_debug_output(); 


 $debug and print_debug_output(); 

The short-circuitedness of the logical and can be used as an if , but some might question whether that is a Good Thing.

The short-circuited nature of the logical or , on the other hand, is widely thought to be a Good Thing. The logical or is similar to the logical and , but negated. If the first operand is true, the result of the or operator is true (because "true or false" is true and "true or true" is true). So in this case:

  expression1   expression2  

if expression1 is true, expression2 is not evaluated, because expression1 short-circuits it. If expression1 is false, expression2 must be evaluated to determine the result of the operator. Thus, the is logically equivalent to the unless statement. This code:

  expression1   expression2  

is equivalent to:

 unless (  expression1  ) {  expression2  } 

So this unless statement:

 unless (open (FH, <myfile.txt)) {      die "can't open myfile.txt: $!";  } 

can be written as:

 open (FH, <myfile.txt) or die "can't open myfile.txt: $!"; 

Be careful about the difference of precedence between && and and , as well as the difference between and or :

 $c = $a && $b;      # good, like $c = ($a && $b);  $c = $a and $b;     # not so good, like ($c = $a) and $b; 

4.3.11 Making Operating System Calls

Perl includes some convenient functions and operators to perform system calls. We look at only two of these, so see the Camel Book [Wall+ 00] for more.

The system() Function

The system() function takes its argument and executes it as if it were a system command typed into the shell. The output of the command is sent to standard output for the program.

This code:

 system /bin/ls; 

executes the /bin/ls system command, showing the files in the current directory in sorted order, much as if you had entered that command to the shell. The output of the system() function, here a listing of all files in the current directory, is sent to STDOUT just as if you executed from the shell.

The system() function can be called by giving it a list of arguments. This invocation is more secure than having only one argument, because system() is written so that it does not interpret the shell metacharacters (such as * and ; ) as anything special. For instance, let's say we have this code:

 system "/bin/ls $user_supplied_input"; 

Imagine if $user_supplied_input had the value ; rm -rf / . This would be like executing:

 system "/bin/ls ; rm -rf /"; 

This can cause a problem. If we instead use system as a list, like this:

 system "/bin/ls", $user_supplied_input; 

then the metacharacter ; is not treated as anything special ”in other words, ls tries to list the file named simply ; . Then it tries to list the files rm , -rf and / .

Use a list of arguments to system() if appropriate.


Backquotes, ``, are similar to the system() function, except the standard output of the command is brought into the program. For instance:

 $my_working_directory = `/bin/pwd`; 

Now $my_working_directory contains a value that resembles "/home/jrl\n" .


The output ends in the newline character (just as it does when executed from the shell). Using chomp() chomp($my_working_directory = ` /bin/pwd ` ) , you can remove it.

When using system() and backquotes, be sure to use full pathnames (for example, /bin/ls ) instead of relying on the user's PATH variable (for example, ls ). Who knows what the user's PATH is set to? If you want them to execute the official ls , say it explicitly.

Open Source Development with Lamp
Open Source Development with LAMP: Using Linux, Apache, MySQL, Perl, and PHP
ISBN: 020177061X
EAN: 2147483647
Year: 2002
Pages: 136

Similar book on Amazon

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net