Item 36: Enable static andor run-time checks.

Item 36: Enable static and/or run-time checks.

Perl provides for both static and run-time checks for a variety of things. Perl is normally a wide open Big Sky language, seemingly best suited for cowboys who can find their own way, but when you turn on warnings and strict pragmas it becomes a much more civilized tool. It can be downright dogmatic at times.

Of the checks that appear below, I especially recommend use strict (which includes strict vars , strict subs and strict refs ) for general use. Any program that is going to grow to more than 20 lines will probably benefit from use strict .

-w is also a generally useful tool, but it has some annoying limitations at present. I discuss it in more detail below.

One general principle about using the checks that follow: If you plan to use them at all in a program, start using them from the very beginning . Retro-fitting use strict and -w to a program can be very difficult, whereas developing from scratch with them turned on is easy.

Static checks `strict vars` and `strict subs`

Misspellings are an all-too-common source of errors in Perl programs:

@temp = <FH>; # ... some intervening code ... while (@tmp) { }	Read lines into `@temp` .
	OOPSmeant to use `@temp` .

It's easy to misspell a variable name in a language that does not require you to declare variables before they are used. If you are a cut-and-paste programmer (I confess that this is my most common technique of code reuse), such mistakes are inevitable. Fortunately, you can use the strict vars pragma to catch and prevent such errors. A Perl program that uses strict vars either must declare all of its variables via my or use vars , or use an explicit package name with them:

use strict vars to require declarations.

use strict vars;	Now have to declare variables.
$x, $y, $z;	ERROR at compile time.
use strict 'vars';	`vars` is really a string (see below).
my $x; use vars qw($y); $x, $y, $::z;	Declare `$x` with `my` . Declare `$y` with `use vars` . Use explicit main package.

The identifier vars following use strict is really a string argument to the strict module's import method (see Item 42). In some cases you may need to quote itfor example, use strict 'vars' or use strict qw(vars) to avoid warnings or errors; or, if you prefer, you can quote it all the time.

You can turn off strict vars for a portion of a program with no strict vars :

 use strict vars;

 {    no strict vars;    $pi = 3.1416;  }

strict vars off for this block.

$pi OK.

 print "pi = $::pi\n";

But $pi must be declared or have explicit package here.

There are some cases in which you must declare variables with use vars (or use subs ) rather than my :

 use strict vars;  use vars qw($global1);

 BEGIN {    $global1 = 3.1416;    my $global2 = 2.7183;  }

my $global2 exists only within this BEGIN block.

 print "global1 = $global1\n";  print "global2 = $global2\n";

OKdeclared.

ERROR $global2 undeclared.

At present, some variables with special uses, like $a and $b , are ignored by strict vars . This behavior may change in the future, so don't rely on it.

When identifiers have no other interpretation, Perl treats them as strings (this is sometimes called poetry mode ^[4] ). Such " barewords" are another potential source of errors:

^[4] Poetry mode? What's that, you say? Look up "poetry" in Programming Perl .

 for ($i = 0; $i < 10; $i++) {    print $a[i];  }

OOPSmeant to say $a[$i] .

In this example, the subscript i , which should have been $i , is interpreted as the string "i" , which then appears to be the number thus the contents of $a[0] are printed ten times. Using strict subs turns off poetry mode and generates errors for inappropriately used identifiers:

use strict subs;	Or `use strict 'subs'` `subs` is actually a bareword, which is what we're trying to avoid.
for ($i = 0; $i < 10; $i++) { print $a[i]; }	ERROR `Bareword "i" not allowed` .

The strict subs pragma gets along with the sanctioned forms of bareword quotingalone inside hash key braces, or to the left of an arrow:

 use strict subs;  $a{name} = 'ok';  $a{-name} = 'ok';  %h = (last => 'Smith',    first => 'Jon');

Bareword as hash key is OK.

Also OK.

Bareword left of => is OK.

Both strict vars and strict subs are easy to get along with. They rarely, if ever, break idiomatic Perl code. All you have to do is declare your variables.

Dynamic checks `strict refs`

The strict refs pragma disables soft references (see Item 30). Soft references aren't often a source of bugs , but they are a somewhat obscure feature that can be accidently misused. Problems with soft references are usually due to a lack of understanding of the way that ordinary references work. For example, if you are trying to write a data structure to a file and read it back in again, you might misguidedly write the following:

Avoid unintentionally using soft references.

The root of the problem in this example is that you can't read references from a file, nor convert string or numeric types to references in general (see Item 30 ). It manifests itself in a strange way.
$a = { H => 1, He => 2, Li => 3, Be => 4 }; open SAVE, ">save"; print SAVE $a, "\n"; close SAVE;	`$a` is a hash ref.
	This writes something like `'HASH(0x9d450)'` to the file.
open SAVE, "save"; chop($a = <SAVE>); print keys %$a;	Sets `$a = 'HASH(0x9d450)'` . Nothing? Of courseno hash there.

In this example, what gets written to the file save is the single line:

 HASH(0x9d450)

Obviously, the data in the anonymous hash isn't there, so there's no hope of ever getting it back. You will see this quickly once you look at the con-tents of save . But what you may not understand is why the program "works" without producing an error.

What is happening is that the variable $a is being assigned the string 'HASH(0x9d450)' . When the last line attempts to use $a as a reference to a hash, Perl treats the string as the name of a variable . In other words:

 print keys %{'HASH(0x9d450)'};

This is not what you want at all. If you turn on strict refs , Perl will catch it at run time:

 Can't use string ("HASH(0x9d450)") as a HASH ref while "strict  refs" in use at tryme line 12, <SAVE> chunk 1.

By the way, an easy way to handle a "persistent" data structure problem like this is to use the Data::Dumper module (see Item 37):

Use Data::Dumper to save and restore data structures.

 use Data::Dumper;  $a = { H => 1, He => 2, Li => 3, Be => 4 };  open SAVE, ">save";  print SAVE Data::Dumper->Dump([$a], ['a']);  close SAVE;

Some sample data.

Dump hash ref with name 'a' .

 $a = undef;  do "save";  print keys %$a;

Nothing up my sleeve. . . .

Read and execute file "save" .

"HLiHeBe " or similar.

The combination of strict vars , strict subs , and strict refs is available in one convenient unit as use strict . I recommend use strict for all programs of significant length. Remember that you can temporarily turn off strict -ness that gets in your way by putting no strict (or no strict vars , no strict subs , etc.) in a block:

Here's a program that prompts for a variable name and dumps its contents to standard output:

 use strict;  print "variable name: ";  chop(my $var = <STDIN>);  {    no strict 'refs';    print "$var = $$var\n";  }

Prompt for the name.

Read it into $var.

Because $var is a string, $$var is a symbolic reference, and we have to turn off strict refs .

Dynamic checkswarnings with `-w`

Perl has a warnings feature that can be enabled from the command line with the -w flag:

 %  perl -w myscript

or from inside a script by appending the -w flag to the #! line:

 #!/usr/local/bin/perl -w

Turning on warnings enables a large number of run-time checks. These cover a vast spectrum of possibilities, from Possible attempt to put comments in qw() list to umask: argument is missing initial 0 to Misplaced _ in number .

Most often, though, -w will complain about uses of unitialized values:

 #!/usr/local/bin/perl -w  print "$a\n";  %  tryme  Use of uninitialized value at tryme line 2.

Code that you wouldn't necessarily expect to produce warnings sometimes does:

 @a = (1,2);  print "@a[0..2]\n";

-w complains here.

In earlier versions of Perl, -w was very (or excessively) aggressive in reporting uses of uninitialized values. The following produce unitialized value warnings in Perl 5.003:

$sum += 1;	Warning if `$sum` uninitialized.
for $word (split) { $count{$word} += 1; }	More annoyingly, for every new word.

More recent versions of Perl have a kinder, gentler -w that produces many fewer gratuitous warnings. Neither of the above cases produces a warning under Perl 5.004 ( assuming that $sum , $word , etc. have been declared).

You can turn off warnings for a section of code by changing the value of the $^W variable. It is a good idea to make this change local :

 {    local $^W = 0;    print "a = $a\n";  }

Warnings off till end of block.

No complaints if $a not yet initialized .

The biggest drawbacks of the current warnings system in Perl are that (1) individual warnings cannot be turned on and off, and (2) there are no lexically scoped warnings (this is the same scoping issue that distinguishes my and local see Item 23). These are the "annoying limitations" I referred to above. Both of these issues likely will be addressed in future releases of Perl.

I recommend that programmers new to the Perl language, regardless of their experience in other languages, use -w at least long enough that the results are no longer surprising. You should also use -w in combination with use strict when developing code for important applications or for public distribution. Of course, if you are an "all warnings, all the time" sort of programmer, just turn it on and keep it on.

Run-time warnings impose a small speed penalty on programs. In addition, it is not a good idea to present unexpected or spurious warning messages to users. Thus, in general, -w warnings should be used only during development. Warnings should be turned off for code that is released to the world, much as assert() tests shouldn't be compiled into final versions of C programs.

The use strict pragma is lightweight and can be left in released code with no ill effects.

Tracking dangerous datataint checking

Perl programs that are running setuid (that is, with different real and effective user or group ids) are subject to taint checking . You can also enable taint checking explicitly with the -T command line option.

Taint checking is a run-time feature that tracks the flow of data inside a Perl program. Data that is derived from user input or the outside world in general (command line arguments, environment variables, file or streams input) is marked as tainted . Perl will not allow tainted data to be used in ways that are insecurefor example, as input to a shell command line. To give you an idea of how taint checking works, let's consider the following simple program:

 print "enter pattern: ";  chop($pat = <STDIN>);  print `grep $pat *`;

If you run this with taint checking enabled, you will see the following message:

 Insecure dependency in `` while running with -T switch

Perl is telling us that the contents of the backticks are insecure. This is because the data in the variable $pat was taken directly from standard input. It is a bad idea to send user input directly to the shellsuppose the user types in:

 enter pattern:  ; rm *

To untaint the contents of $pat , we must process the input with a regular expression match and assign $pat the value of one of the memory variables ( $1 , $2 , etc.). This is the only way to untaint data in Perl. Here is one possible fix:

 print "enter pattern: ";  chop($pat_in = <STDIN>);  $pat_in =~ tr/  print "enter pattern: "; chop($pat_in = <STDIN>); $pat_in =~ tr/\0-\037\177-\377//d; $pat_in =~ s/(['\\])/\$1/g; 
 -77-7//d;  $pat_in =~ s/(['\])/$1/g;

Prompt for pattern.

Read it.

Remove unprintables.

Escape quote, backslash.

 $pat_in =~ /(.*)/;  $pat = ;

Here's where we untaint by using pattern memory.

 print `grep '$pat' *`;

We've made this safer now note the addition of single quotes to grep 's argument.

The statements that do the actual untainting are the two lines:

 $pat_in =~ /(.*)/;  $pat = ;

Note that we could have skipped the tr/// and s/// steps above it and still untainted the contents of $pat , without making any changes to them. Yes, even taint checking lets you shoot yourself in the foot . Some tainting problems admit prettier solutions than this one, but the real purpose behind the required untainting step is to force you to at least look at each possible source of tainted input and then deal with it in some way. It's up to you to make sure it's the right way.

Back to our exampleif you run it as it now stands, you get a different error message:

 Insecure $ENV{PATH} while running with -T switch

Perl is telling us that our PATH environment variable is insecure. To make our PATH secure, we must set it to a known quantity. Once we set it, Perl also checks to make sure that each of the directories listed in it is writeable only by its owner and/or group.

To fix the program, add the line

 $ENV{PATH} = "/bin:/usr/bin";

somewhere near the top.

Taint checking is valuable for CGI programming applications, especially when CGI scripts must run setuid as a user with more privileges than the user nobody . A conservative way to start off a CGI script is:

Use taint checks and other safety/debugging features in CGI scripts.

 #!/usr/local/bin/perl -Tw  use strict;  $ENV{PATH} = "/bin:/usr/bin";  $ = 1;

Taint checks and warnings on.

Code cleanly.

Straight and narrow PATH .

Unbuffer STDOUT .

Because of taint checking, Perl scripts actually can be more secure than programs written in C.