Learning Perl


As previously mentioned, learning an entire language within a span of 10 to 15 pages is rather difficult, but you can certainly learn enough of the language to develop useful applications. You will learn by studying three example applications: one that lists the system users, another to send e-mail from the command line, and the last one to archive system load average data. These examples illustrate how to accomplish the following:

  • Create a Perl program

  • Store and access different types of data

  • Process input and output

  • Implement logic operations

  • Find patterns or strings

  • Interact with external applications

This section finishes by actually designing the disk usage application that we looked at earlier. By implementing this reasonably advanced application, you can see for yourself how well you understand some of the concepts behind Perl development. Let s start!

How to Start

You have seen examples of shell scripts throughout the book, which look something like the following:

 #!/bin/sh ... exit; 

Perl programs have a similar structure:

   #!/usr/bin/perl   ##++   ##   hello.pl: This is so cool, my first Perl program, can you believe it?     ##--     $city    = 'Manchester';    ## Scalar variable: store Manchester in $city     $country = 'England';       ## Scalar variable: store England in $country     print "Hello, welcome to my home in $city, $country!\n";  ## Print message     exit (0);                   ## Hooray, success!   

After you ve saved this code in a file ( hello.pl , for example), you can execute the program in one of two ways. You can invoke the Perl interpreter manually and pass the filename to it, as follows :

 $ /usr/bin/perl hello.pl 

Or, you can ask the shell to execute it for you:

 $ chmod +x hello.pl $ ./hello.pl 

In the first case, you manually call upon the interpreter to execute the code. In the second case, however, the shell will execute the interpreter, feeding to it the program. This will work only if the program meets two specific conditions. The initial line of the program must start with #! and specify the path to the interpreter, in this case /usr/bin/perl . And the user executing the program must have the execute permission enabled.

Why don t you try running this program? What do you get as the output? As you look at this rather trivial program, you should be aware of a few details. First, each statement, defined as a single logical command, ends with a semicolon. This tells Perl that the statement is complete. Second, everything starting from the # character to the end of the line represents a comment. You should make it a point to add comments that describe the thoughts and logic behind your code, especially if you are implementing something that might be difficult for other people, or even yourself, to understand at a later time. And, finally, Perl ignores white space and empty lines in and around statements, so you should use a liberal amount of white space to align variable declarations and indent code. This will make it easier for other developers to read and understand your code.

Now, let s look at our first main application. Each application is designed to illustrate a set of Perl s key features. Before tackling the application, these features are explained, including their significance and how you can use them later to build your own programs.

Application 1: Who Can Get In?

The first task is to build a simple application to open the /etc/password configuration file and display each user s login name and associated comment. If you don t remember the format of this file (which was discussed in Chapter 7), here is how it looks:

 sravanthi:x:500:500:Sravanthi:/home/sravanthi:/bin/bash dcheng:x:501:501:David Cheng:/home/dcheng:/bin/bash dzhiwei:x:502:502:David Zhiwei:/home/dzhiwei:/bin/bash sliao:x:503:503:Steve Liao:/home/sliao:/bin/bash 

Each record in the file consists of seven fields, starting with the login, and followed by the password, user id, group id, comment (typically the full name), home directory, and the default shell. We are interested in extracting the first and fifth fields.

By understanding the code behind this application, you will learn how to open a text file, read its contents, access certain pieces of information, and display the formatted results. These tasks are critical for everyday development, because most administrative applications will rely on opening, reading from, and writing to one file or the other.

   #!/usr/bin/perl     ##++     ##   list_users.pl: display list of users and their comments     ##--     use strict;                                       ## "strict" mode     my (@data);                                       ## Pre-declare array     open (FILE, '/etc/passwd')                        ## Open the file     die "Cannot open file: $!\n";     while (<FILE>) {                                  ## Read each record     @data = split (/:/, $_, 7);                   ## Separate the elements     print "$data[0], $data[4]\n";                 ## Print login, comments     }     close (FILE);                                     ## Close file     exit  (0);                                        ## Exit application   

Perl provides you with flexibility to develop applications in the manner that you choose; it imposes very few restrictions. Take, for example, the declaration of variables. By default, you don t have to pre-declare variables before using them in the program. Once you refer to a new variable, it is automatically instantiated and has an undefined value until you provide it with a set value. Unfortunately, this is not good programming practice and should be avoided because it makes finding errors in your code very difficult if something goes wrong.

Lucky for users, Perl comes with the strict pragma, which, when enabled, requires you to pre-declare variables before using them and forces you to avoid symbolic references and bareword identifiers Later examples look at the latter two requirements, but what exactly is a pragma ? A pragma is either an internal module, an outside extension, or a combination of the two that specifies the rules that Perl should follow when processing code.

Using strict

This brings us to the first line of our application. The use function imports functionality from an external module into your current application. You call on this function with the strict argument to enable the strict pragma. That is all that is needed to force you to pre-declare variables from here on. As a side note, if you look in the Perl extensions directory, typically /usr/lib/perl5/5.8.3 , you will see the file strict.pm; this is the file that will be loaded.

Next , use the my function to declare, or localize, the @data array. An array is simply a data type, represented by the leading at-sign character, that you can use to store a set of scalar (single) values in one variable. You will use this array to store all of the individual elements for each record in the file, including login and comment.

Opening the File

We proceed to open the configuration file, /etc/password , using the open function. This function takes two arguments, the first being the file handle that you want to use and the second the path to the file. Think of the handle as a communications channel through which you can read data from the file. If the open function executes successfully, which means that the file exists and you have the necessary permissions, it returns a positive (true) status.

Note

In Perl, a status of true is represented by a defined non-zero value, and a status of false is identified with an undefined or zero value or a null string.

If you cannot open the file, call the die function with a specific error message to exit from the program. Notice the $! expression within the double-quoted message string . Expressions that start with the dollar sign represent scalar variables . You can use a scalar variable to store a single value, whether it is a character, number, a text string, or even a paragraph of content. However, the $! is a special Perl scalar variable that holds the latest error message, returned by either a system function call or a user-defined procedure. Typical messages for failure to open a file include Permission denied and No such file or directory.

Look again at the entire line that is trying to open the file, and think of it as two separate statements separated by the logical OR operator, . The second statement will be executed only if the first statement is false, if the file cannot be opened. We are using a convenient shortcut, but you can just as easily rewrite that code as follows:

 if (!open (FILE, '/etc/passwd')) {     die "Cannot open file: $!\n"; } 

The exclamation point in front of the open function call will negate the status returned by the function. In other words, it will convert a true value into a false, and vice versa. So, only if the entire expression within the main parentheses is true will you end up calling the die function to terminate your program. And that will happen if the open function returns a false status.

You can also write the same statement like this:

 if (open (FILE, '/etc/passwd')) {    ## Success     ## Read file     ## Parse records     ## Print output } else {                             ## Failure!     die "Cannot open file: $!\n"; } 

This should be easy enough to understand by now; if the open function returns true, you process the file or else you exit. Now, you are ready to read data from the file. But, how do you know how many records to read? Simple, use a while loop to iterate through the file, reading one record at a time, until there are no more records to be read. And for each record retrieved from the file, Perl executes the block of code inside the loop.

Reading Records

The expression located inside of the parentheses after the while command controls when the loop will terminate; the loop will continue until this expression evaluates to a false value. The strange -looking expression is the one responsible for reading a line from the FILE filehandle and storing it in the default Perl variable, $_ . When there are no more lines to be read, Perl returns an undefined value, or undef , for short. Once this happens, the expression evaluates to a false value and the while loop stops.

Whenever you see such an expression ”an identifier of some sort enclosed between a less-than and a greater-than sign ”you should know that Perl is reading data from the filehandle represented by that identifier. You can store each record read from the file in a variable other than $_ by doing the following:

 while ($line = <FILE>) {     ... } 

Inside the loop, you see two statements. These are responsible for pulling out the different elements from each record and displaying the login and comment.

Extracting the Elements

The split function separates a piece of text into multiple elements, based on a specified delimiter, and returns an array. It takes three arguments: the delimiter, the input string, and the maximum number of elements. Notice that the delimiter is enclosed within a set of backslash characters ; this is an example of a regular expression, commonly known as regex . You can use regular expressions to specify a pattern to match, as opposed to a simple one-character delimiter. Regular expressions are covered in more detail in Application 3: What Is My System Load?

The last two arguments to the split function are optional. If the input string is not specified, Perl uses its default variable, $_ . This is something you should be aware of. A large number of Perl functions operate on this default variable, so it is very convenient. You have seen two examples of it already: reading records from a filehandle and now the split function. Last, but not least, if you don t specify a maximum number of elements, you won t see a difference a majority of the time. However, there will be occasions when the last element in the text string is null, in which case, split strips and removes this element. To be on the safe side, you should specify the number of elements that you expect to get back, so there are no surprises .

Once the split function returns the elements, store them in the @data array. Each item in an array is associated with a specific position or index, starting at index zero. You can access a specific element in an array by specifying its associated index within brackets. Use the print command to display the first and sixth elements to the terminal, followed by a newline.

Repeat this process for all the records in the file. Once the while loop terminates, close the file using its filehandle and exit from the application. This completes your first major Perl application.

The Output

Go ahead and run the program. What do you see? You will most likely see output like the following:

   $ perl list_users.pl   root, root bin, bin ... sravanthi, Sravanthi dcheng, David Cheng dzhiwei, David Zhiwei sliao, Steve Liao 

The application itself seems to work properly, but there are a few flaws that should be fixed. First, there is no need to see a list of the default system users, users created by the Linux installation. Second, it would make for a good display if the two pieces of information were neatly aligned into columns .

Who Can Get In? Take 2

We will change the application somewhat to implement the new features discussed in the preceding paragraph, namely ignoring system users and fixing the display format. Ignoring system users from the output is not difficult, but is not fully precise. How do you tell the difference between a system user and a regular user? There is no one distinct flag or marker that identifies a system user. However, system users are typically, but not always, allocated a user identification number less than 500, so we will use that as the main criteria.

The display format must also be fixed. Fortunately, there is the pack function, which you can use to pad each data element with a certain number of spaces, thereby creating columns that are aligned. Let s look at the new version of the application:

   #!/usr/bin/perl   ##++   ##   list_users_plus.pl: display list of users and their comments     ##   in neat columns     ##--     use strict;                                 ## Enable "strict" pragma     my (@data);                                 ## Pre-declare variables     open (FILE, '/etc/passwd')                  ## Open the file     die "Cannot open file: $!\n";     while (<FILE>) {                            ## Read each record     @data = split (/:/, $_, 7);             ## Separate the elements     if ($data[2] < 500) {                   ## Ignore UID less than 500     next;     }     print pack ("A16A*", $data[0], $data[4]), "\n";  ## Print     ## login, comments     }     close (FILE);                               ## Close file     exit  (0);                                  ## Exit application   

The code has changed very little. If you find a user identification number less than one hundred, you use the next command to start the next iteration of the while loop. As a result, the print command never gets executed and the record doesn t display.

You can also combine the next command and the conditional check into one statement, like so:

 next if ($data[2] < 500); 

You will most likely see this abbreviated syntax in a number of Perl programs. It is convenient and easy to use. However, you cannot use this syntax if you need to execute a block of code based on a specific condition.

Formatting the Output

Next, use the pack command to create a string based on a specific template configuration and a set of values; these comprise the first and subsequent arguments, respectively. The A16A* template instructs pack to create a 16-character ASCII padded string using the first value from the list, followed by a string containing all other remaining values. When you execute this program, you will see the following much improved output:

 sravanthi       Sravanthi dcheng          David Cheng dzhiwei         David Zhiwei sliao           Steve Liao 

You have already learned much about Perl, including the following:

  • Using its strict pragma

  • Declaring and using variables

  • Implementing while loops

  • Reading data from text files

  • Formatting output for display

The next section looks at more Perl syntax and constructs, including how to use the Mail::Send module to send e-mail.

Application 2: Send a Quick E-Mail

How many times have you wanted to send a quick e-mail, without having to use a full-fledged e-mail client? Linux provides you with a few applications that allow you to read and send e-mail from the command line, such as mail . But we are not interested in those at the moment. Instead, let s develop a useful Perl application that serves much the same purpose, namely the ability to send e-mail. In the process, you will learn how to check code for errors, accept input from the user, implement regular expressions, and use external modules and view their built-in documentation. Here is the application:

   #!/usr/bin/perl     ##++     ##   send_email.pl: send e-mail messages from command-line     ##--     use Mail::Send;                                 ## Import/use Mail::Send     use strict;                                     ## Enable "strict" pragma     ##++     ##   Let's purposely inject two errors in this program so we can     ##   understand the debugging process:     ##     - remove semicolon at end of line     ##     - spell $email as $emale     ##--     my ($emale, $answer, $subject, $message)     print "*** E-Mail Sender Application ***\n\n";  ## Display header     ##++     ##   Prompt for recipient's e-mail address, checking for validity.     ##--     while (1) {     print "Enter recipient's e-mail address: ";     $email = <STDIN>;                       ## Read from standard input     chomp $email;                           ## Remove trailing newline     next if (!$email);                      ## Repeat loop if no $email     print "You entered $email as the recipient's address, accept [y/n]: ";     chomp ($answer = <STDIN>);              ## Different style     last if ($answer eq 'y');               ## Exit loop if $answer=y     }     ##++     ##   Prompt for subject and message body.     ##--     print 'Subject: ';     chomp ($subject = <STDIN>);     print "Enter message, type Control-D (CTRL-D) to end:\n";     while (<STDIN>) {                           ## Read from standard input     $message .= $_;                         ## Concatenate to $message     }     ##++     ##   Set defaults if empty.     ##--     $subject = 'No Subject';     $message = 'No Body';     ##++     ##   Print summary and ask for confirmation.     ##--     print "*** E-Mail Summary ***\n\n";     print "To: $email\n";     print "Subject: $subject\n";     print "Message:\n\n$message\n";     print 'Do you want to send the message [y/n]: ';     chomp ($answer = <STDIN>);     if ($answer eq 'n') {     print "Aborting.\n";     exit (0);     }     ##++     ##   Send message     ##--     my ($mail, $fh);     $mail = new Mail::Send;                     ## Create new object     $mail->to ($email);                         ## Call its function/method     $mail->subject ($subject);     $fh = $mail->open();                        ## Returns filehandle     print $fh $message;                         ## Print body to filehandle     $fh->close();                               ## Close filehandle     print "Message sent.\n";     exit (0);   

You can save this application in send_email.pl and proceed to run it, like so:

   $ perl send_email.pl   syntax error at send_email.pl line 15, near ") print" Global symbol "$email" requires explicit package name at send_email.pl line 23. Global symbol "$email" requires explicit package name at send_email.pl line 24. Global symbol "$email" requires explicit package name at send_email.pl line 26. Global symbol "$email" requires explicit package name at send_email.pl line 28. Global symbol "$email" requires explicit package name at send_email.pl line 58. Global symbol "$email" requires explicit package name at send_email.pl line 78. Execution of send_email.pl aborted due to compilation errors. 

Perl is telling you there are errors in the code. The first error is at or around line 15. The second set of errors claim that we are using the $email variable without pre-declaring it first. Take a look at the code. You should see that we purposely left out the semicolon at the end of line 12 and declared a variable called $emale instead of $email . The line numbers that Perl reports in the errors may not always be exact, due to the manner in which it parses the code, so you should check the entire area around the reported error for any possible bugs .

Fixing Errors and Checking Syntax

If you are getting other errors, you may have forgotten to install the Mail::Send module properly, as shown in the section Installing Extensions earlier in the chapter. It is also possible that you may have made a mistake in typing the code. Fix the broken code in the application, adding the necessary semicolon and correcting the misspelled variable:

 my ($email, $answer, $subject, $message); 

Let s run it again, but before doing so, check its syntax and structure. You can use Perl s -w and -c options, which report warnings and check syntax, respectively, to check the status of your code, like so:

   $ perl -wc send_email.pl   send_email syntax OK 

If there are no problems with the code, Perl returns the output previously shown. You should make it a point to check the syntax of your applications in this manner before running them, so that you can catch possible errors quickly.

Loading External Modules

You saw the use command in the first application, list_users.pl , when we discussed the strict pragma. Here, we import the Mail::Send extension, so we can use its functionality to send e-mail. You can find this module at /usr/lib/perl5/site_perl/5.8.3/Mail/Send.pm; the .pm extension stands for Perl Module. Most CPAN extensions include built-in documentation, which you can view by using the perldoc application.

   $ perldoc Mail::Send   

This documentation provides you more information on what the extension does and how to use it. You can also use perldoc to get information on virtually any Perl command. In this application, the chomp command is used to remove the end-of-line character, also referred to as the record separator . To see more information on chomp , including examples, you can type the following:

   $ perldoc -f chomp   

Let s go back to the code. We pre-declare the variables that we intend to use in the program and print out a header for the user. You don t have to declare all of the variables in one place; you can declare them as you go.

Getting Recipient s E-Mail Address

The real core of the program starts now. We wrap a block of code inside a while loop that will run until we explicitly break out. Remember, the expression that follows the while command determines how many times the loop executes; as long as the expression is true, the loop continues to run. In this case, a static expression with a value of 1 always evaluates to true, so the loop never stops.

Our job inside the loop is to ask the user for the recipient s e-mail address. If the user doesn t enter an address or accidentally makes a mistake in typing, this provides him or her the ability to enter it again. Repeat this process until the user is satisfied and affirmatively confirms the address, at which point you jump out of the loop using the last command.

Notice how input is accepted from the user. Doesn t the syntax look similar to that used in list_users.pl to read data from a file? In fact, you can use the < filehandle > expression to read a line of data from any defined filehandle. The standard input ( STDIN ) filehandle is established at program startup time and allows you to read data from the user, terminal, or input stream. In addition to the STDIN filehandle, you have access to STDOUT (standard output), STDERR (standard error), and DATA . You can use STDOUT to write output, STDERR to write diagnostic messages, and DATA to read data stored within a Perl application. As a side note, whenever you use the print command to display a message, Perl writes it to the standard output stream.

Once the user enters the recipient s e-mail address and presses the Enter key on his or her keyboard, the trailing newline is actually captured as part of the input. You use the chomp function to cleanly remove this end-of-line character from the input, as stored in the $email variable. If you don t remove it, you will have a difficult time determining if the user entered an address because $email will still evaluate to true, as it would have a length of at least 1 (from the newline character).

Checking the E-Mail Address

Next, check to see if $email is false, which would occur if the user did not enter an address, and if so, you simply execute the loop again. This provides the user another chance to enter the address. However, if an address was entered, you ask the user for confirmation and exit from the loop. An advanced application might also check the validity of the e-mail address, but we are not doing it here. If you are interested in adding such functionality, you should install the Email::Valid module, as shown in the section Installing Extensions, and then change this application in the following manner:

 #!/usr/bin/perl use Mail::Send;                            ## Import/use Mail::Send  use Email::Valid;                          ## Import/use Email::Valid  use strict;                                ## Enable "strict" pragma  ... while (1) {    ...    next if (!$email);                      ## Repeat loop if no $email  if (!Email::Valid-  >  address ($email)) {  ## E-Mail address is not valid   print "The e-mail address you entered is not in a valid format.\n";   }  ... } ... exit (0); 

Use the address function, defined in the Email::Valid module, to check whether the specified e-mail address conforms to the RFC 822 specification, Standard for the Format of ARPA Internet Text Messages. The function simply checks the syntax; there is no way to determine whether a specific address is deliverable without actually attempting delivery.

Getting Subject and Message Body

Continuing on with the program, we ask the user to enter the subject and the body of the message. Use a while loop to allow the user to enter as much content as desired. Each time through the loop, take the content that is entered ( $_ ) and append it to the existing content in $message , using the concatenation .= operator. If the user presses Ctrl+D at any point, the shell ends the input stream, at which point the while loop terminates.

Then, we perform a few routine tasks. First, assign default values to the subject and body of the message in case they were not entered. The = operator is very useful and is a shortcut for the following:

 if (!$subject) {             ## or:     $subject = 'No Subject'  ## $subject = 'No Subject' if (!$subject); } 

Second, print out all of the information entered by the user and ask whether he or she wants to continue with the message. If not, exit from the application. Otherwise, use the Mail::Send module to send the e-mail.

Sending the Message

If you look at the built-in documentation for the Mail::Send module, you will see several small examples on how to use the module. These examples look very similar to the code that you are about to look at. Mail::Send is a module that provides an object-oriented interface to its functionality. If you have done any C++ or Smalltalk application development, you know exactly what this means. If not, think of object-oriented programming (OOP) as a clean way to implement a data structure ”the object ”that contains not only data, but also associated functions to manipulate that data.

We call the new function, typically known as the constructor, in Mail::Send to create a new instance of the object. You can use this instance to invoke defined functions (or methods ) in the object. First, we call the to and subject methods to specify the recipient and subject, respectively. Then, we invoke the open method to obtain the filehandle associated with the mail transport mechanism. By default, Mail::Send uses the sendmail application to send the message. If you don t have sendmail installed on your Linux system, or need to use an external SMTP server from within a firewall environment, you can specify the server address in the open method:

 $fh = $mail->open ('smtp', Server => 'smtp.someserver.com'); 

Unfortunately, this is not explained in the built-in documentation. If you look at the module s code, you will find that Mail::Send actually calls the Mail::Mailer module to send the e-mail. And Mail::Mailer can use sendmail , qmail , or an SMTP server as the transport mechanism. Whatever arguments passed to the open method here are simply passed on to Mail::Mailer .

The last action we have to take is to write the body content to this filehandle using the print com-mand and then close the filehandle. This sends the message, at which point we display a status message and exit.

Sample User Session

Let s run through the application to see how it looks. Figure 13-1 shows the application in use.

Developing this application has shown you how to accept and validate user input and how to load and use external modules. There are literally hundreds of Perl extensions available through CPAN, so learning how to use them properly is extremely valuable . Whenever you need to build a new application, you should make it a habit to browse through CPAN to see if there is a module that might help you. If one exists, all you have to do is look at the built-in documentation, determine how to interface with the module, and you are on your way to building an excellent application in no time.

Until now, you have learned quite a bit about Perl and how to use it. However, you have not looked at the two features of the language that are probably the most powerful, namely regular expressions and the ability to interact with the system environment. Application 3 looks at an interesting application that uses both of these features.

click to expand
Figure 13-1

Application 3: What Is My System Load?

In a typical single-user environment, where the system administrator is also the only user, you don t have to pay much attention to administering, monitoring, and optimizing the operating system. How-ever, the situation is completely different with a multiuser system; the stakes are higher as more people are dependent on the system running well. You can use a number of diagnostic applications to keep tabs on the system. Take, for example, the uptime program, which returns a number of useful pieces of information, including how long the system has been up and running, the number of users currently logged on, and the load averages for the past 1, 5, and 15 minutes:

 23:15:42 up 54 min,  3 users,  load average: 0.33, 0.29, 0.36 

If you want to optimize the system, you should typically keep track of system activity and metrics over a certain period of time. This activity can include the number of users logged in, what applications they are running, the average system load, and how much memory is being consumed. This allows you to analyze the data and look for specific patterns. You may find, for example, that a large spike occurs in the system load every Monday morning before the weekly engineering meeting. Then, you can determine how to best handle the issue: don t run other jobs on Monday mornings, add more RAM, or upgrade the processor.

In that vein, we will create an application to archive the system load averages and the number of active users. However, for any data analysis to be effective, you need to have enough data that captures a variety of conditions. The best way to do that is to set up a cron job to automatically run this application every hour Monday through Friday, like so:

 00 * * * 1-5  /home/gundavaram/uptime_monitor.pl /home/gundavaram/uptime.log 

Even though the application is only ten lines long, you will still learn enough advanced Perl development techniques to make it worthwhile. More specifically , you will learn how to invoke an external program, retrieve its output, extract certain information from it and write the formatted output to a log file. Because there are so many useful utilities and programs that exist for Linux, the ability to interact with these tools from within Perl will empower you to develop some very interesting applications.

Here is the code.

   #!/usr/bin/perl     ##++     ##   uptime_monitor.pl: archive system load averages to file     ##--     use strict;                                    ## Enable "strict" pragma     my ($file, $uptime, $users, $load1, $load2, $load3);     $file   = $ARGV[0]  '/var/log/uptime.log';   ## Path to log file     $uptime = `/usr/bin/uptime`;                   ## Store output from uptime     ##++     ##   Parse the output of the uptime command and store the numbers of     ##   users and the three system load averages into: $users, $load1,     ##   $load2 and $load3.     ##--     $uptime =~ /(\d+) +user.+?(\d.+?), +(.+?), +(.+)/;     $users  = ;     $load1  = ;     $load2  = ;     $load3  = ;     ##++     ##   We can also write the above like this:   ##   ##     $uptime =~ /(\d+) +user.+?(\d.+?), +(.+?), +(.+)/;     ##     ($users, $load1, $load2, $load3) = (, , , );     ##     ##   or even:     ##     ##     ($users, $load1, $load2, $load3)     ##         = $uptime =~ /(\d+) +user.+?(\d.+?), +(.+?), +(.+)/;     ##--     ##++     ##   Store the data in a log file; open modes:     ##     ##     >> = append, > = write, < (or nothing) = read     ##--     open (FILE, ">>$file")  die "Cannot append uptime data to $file: $!\n";     print FILE join (':', time, $users, $load1, $load2, $load3), "\n";     close (FILE);     exit (0);   

One note, before discussing the code: Our application accepts a command line argument and uses its value to determine where to archive the load average data. This provides the flexibility to archive the data to different files without modifying the code at all. If you were curious as to the significance of the /home/gundavaram/uptime.log file in the preceding crontab entry, now you know that it refers tothe log file.

Getting the Command Line Argument

A user can specify arguments and information to any Perl application through the command line, and Perl makes them available to you through the special @ARGV array. You don t necessarily have to use these arguments in your application, but they are there in case you need them. Here is a simple program that illustrates how command line arguments are processed :

 #!/usr/bin/perl ##++ ##   print_args.pl: display all command-line arguments ##-- use strict; for (my $loop=0; $loop <= $#ARGV; $loop++) { ## $#ARGV returns the      print "$ARGV[$loop] = $ARGV[$loop]\n";  ## last index of the                                                ## array } exit (0); 

You are looking at the for loop in action. You should use this construct when you know exactly how many times you want a loop to execute. This is very different than a while loop, where you may not know how many times it should run; all you know is that once the loop meets some specific criteria, it should stop. You iterate through the @ARGV array, one element at a time, displaying its value. Go ahead and run the program, like so:

   $ perl print_args.pl how are you this is a test   $ARGV[0] = how $ARGV[1] = are $ARGV[2] = you $ARGV[3] = this $ARGV[4] = is $ARGV[5] = a $ARGV[6] = test 

You can treat the @ARGV array as you would any other array, accessing a specific element by specifying its index. Look back at the main application where you get the first command line argument, $ARGV[0] , and use it:

 $file = $ARGV[0]  '/var/log/uptime.log'; 

What do you think we are trying to do? If you remember, this clever one-line statement is identical to the following if-then block:

 if ($ARGV[0]) {     $file = $ARGV[0]; } else {     $file = '/var/log/uptime.log'; } 

We use the value passed to us from the command line as the path to our log file, storing it in $file , but only if it is defined and has a true value. Otherwise, we use the /var/log/uptime.log as our default log file. We mentioned Perl s flexibility in the beginning of this chapter. This is just a simple example that illustrates it. If you don t feel comfortable using the one-line technique, you can always use the longer, but more clear, group of code shown previously. If you are more adventurous, you can also use the following:

 $file = ($ARGV[0]) ? $ARGV[0] : '/var/log/uptime.log'; 

You ve just seen three different techniques for performing the same task. Perl does not force you to use one approach over another; you are free to use whichever one you feel comfortable with.

Invoking the uptime Command

We can now invoke the uptime system command to get the load average data that we re looking for. You will be quite surprised when you see how easy it is to communicate with external applications from within Perl. We simply need to enclose either the command to execute or the path to an application, along with any necessary command line arguments, within a set of backticks . Perl then spawns a shell to execute the application and returns its output. In this case, we store that output in the $uptime variable.

Note

We are about to venture off on a long detour to better understand the intricate process of communicating with external applications. If you are not interested in learning about these techniques now, you can safely jump to the section Back to Our Program and come back to this material at a later time.

Pipes

The backtick approach is very easy to use and quite convenient. However, it should be used only when you know that the output generated from the invoked application is small. Imagine what would happen if an application generates megabytes and megabytes of output? Perl would then have to store all this information in memory, which could cause quite a problem. So, Perl provides an alternative to communicate with applications, via a pipe. For example, say you wanted to retrieve the list of currently logged-in users; here is how you would do it:

   #!/usr/bin/perl     ##++     ##   current_users.pl: display list of current users     ##--     use strict;     open (MYPIPE, '/usr/bin/w ')  die "Cannot create pipe: $!\n";     while (<MYPIPE>) {       ## Read one line at a time into $_     print;               ## Equivalent to: print $_;     }     close (MYPIPE);     exit  (0);   

If you quickly glance at this program, you may think that we are simply iterating through a file and displaying each record. However, that is not the case. Instead, we are opening a pipe to the /usr/bin/w command, reading it, and displaying its output:

 23:16:35 up 55 min,  3 users,  load average: 0.13, 0.24, 0.34 USER     TTY      FROM        LOGIN@   IDLE   JCPU   PCPU WHAT root     :0       -          22:26   ?xdm?   5:43   2.34s /usr/bin/gnome-session root     pts/0    :0.0       22:27    0.00s  1.23s  0.03s w root     pts/1    :0.0       23:02    6:42   0.64s  0.23s -bash 

You may have seen examples of pipes throughout this book. Here is an example that finds the number of occurrences of the word perl in all of the files located in the /etc directory:

 # grep -d skip perl /etc/*  wc -l      67 

When the shell sees this, it executes the grep command, finding all lines that contain the word perl, and then passes that output to the wc command as input. In other words, the output of the first command gets passed to the second as input. We are doing something very similar in our program, as well. How-ever, if you look at the previous code line with the open command, you will see that there is nothing specified after the pipe (the vertical bar) character. Where is the output going? To the MYPIPE filehandle, like this:

 $ /usr/bin/w  MYPIPE 

By reading data from the MYPIPE filehandle, you are in effect reading the content produced by the /usr/bin/w program. The main advantage here is that you can read the content a line at a time, which is not only more efficient, but provides you with better control. In a similar fashion, you can also use a pipe to send data to another application as input, as in the following example:

   #!/usr/bin/perl     ##++     ##   sort_numbers.pl: send list of numbers to /bin/sort to get sorted list     ##--     open (MYPIPE, ' /bin/sort -g')  die "Cannot create pipe: $!\n";     print MYPIPE "100\n50\n30\n70\n90\n60\n20\n80\n10\n40\n";     close (MYPIPE);     exit (0);   

Notice the location of the pipe; it s on the left side rather than the right side. This means that we are sending our output through the MYPIPE filehandle to another application as input. More specifically, we are passing a series of unordered numbers to the sort command, which produces the following output:

   $ perl sort_numbers.pl   10 20 30 40 50 60 70 80 90 100 

Unfortunately, once you send information to the external application as input, you have no control over the output that it produces; the output is typically sent to the standard output stream.

The system Command

What if you want to interact with a program that doesn t care about its input or output? Take, for example, a script that starts a server of some sort, or an editor, which might open an empty window. Perl provides the system command, which you can use to invoke an application.

Let s look back at the send_email.pl program in the previous section for a moment. Imagine how convenient it would be for the user if he or she could enter the body of the message in an editor? You can use the system command to open an editor, saving the contents in a temporary file. Then, you can read the content from the file and pass it to the Mail::Send module, as follows:

 #!/usr/bin/perl use Mail::Send;                  ## Import/use Mail::Send use POSIX;                       ## Import/use POSIX use strict;                      ## Enable "strict" pragma ... print 'Subject: '; chomp ($subject = <STDIN>); print "Enter the message in an editor, save and exit when you are done:"; ##++ ##   Call POSIX::tmpnam() to determine a name for a temporary file, ##   which we'll use to store the content of the message. ##-- my $file = POSIX::tmpnam();      ## For example: /tmp/fileTwFpXe system ("/usr/bin/gedit $file"); ## Open the editor; Perl will wait until                                  ## user finishes, at which point a temp.                                  ## file is created by the editor. {     local $/ = undef;            ## Undefine record separator     if (open (FILE, $file)) {         $message = <FILE>;       ## Reads ENTIRE content from file, as                                  ## there is no record separator         close (FILE);     } } unlink $file;                    ## Delete temp. file; ignore status ... exit (0); 

We won t discuss this code snippet in detail, but I we hope you get the general idea. As you can see from all of these examples, communicating with external applications via backticks, pipes, and the system command is reasonably straightforward, yet extremely powerful. You can interact with any application, whether it is a system program, a script of some sort, or a compiled application.

Parsing Load Averages

Now comes the most difficult part of the application. We briefly talked about regular expressions earlier in the chapter. However, we have yet to really use them in a program, with the exception of the simple regex in list_users.pl . In this application, we need to extract the number of users and the load averages from the output generated by the uptime command. Although you can use many techniques to accomplish this task, regular expressions are by far the best and easiest way to handle it.

Regular Expressions

What is a regular expression? Simply defined, a regular expression is a set of normal characters and special syntactic elements (metacharacters) used to match patterns in text. You can use regular expressions in all types of text-manipulation tasks, ranging from checking for the existence of a particular pattern to finding a specific string and replacing it.

Substitutions from the Command Line

Suppose you have a large set of HTML files that contain, among other information, your company s physical address. Soon after you create these files, you move to a new location and have to change the address in all of the files. How would you do it? One tedious way would be to manually replace the address in each and every file, but that s not realistic. However, armed with Perl and its regex support, you can get this task done in no time.

Take a look at the following HTML file. Everything is left out except for the company address that we are interested in modifying:

 <HTML> ... MechanicNet Group, Inc.<br> 43801 Mission Blvd., Suite 103<br> Fremont, CA 94539 ... </HTML> 

We want to change this address to read as follows:

 <HTML> ... MechanicNet Group, Inc.<br> 7150 Koll Center Parkway, Suite 200<br> Pleasanton, CA 94566 ... </HTML> 

There are several ways to tackle this job. It would be beyond the scope of this chapter to discuss each approach, so we will concentrate on one technique that is compact, versatile, and easy to use:

 $ perl -0777 -p -i.bak -e \ 's/43801 Mission Blvd., Suite 103<br>\s*Fremont, CA 94539/7150 Koll Center Parkway,  Suite 200<br>\nPleasanton, CA 94566/gsi' *.html 

That s it, and we didn t even have to write a full-fledged program! We are running the Perl interpreter from the command line, passing to it several arguments, including a piece of code that performs the actual substitution. The -0 switch sets the record separator to the octal number 777 . This has the same effect as assigning an undefined value as the separator because the octal value 777 is not legal. By doing this, we can match strings that span multiple lines. How, you ask? Typically, the default record separator is the newline character; each time you read from a filehandle, you will get back exactly one line:

 $record = <FILE>;    ## One line 

However, if the record separator is undefined , one read will slurp the entire file into a string:

 local $/ = undef;    ## $/ = record separator; special Perl variable $record = <FILE>;    ## Entire file 

This is convenient because you can search for strings or patterns without having to worry about line boundaries. However, you should be careful not to use this technique with very large files, as you might exhaust system memory. On the other hand, if you need to match only a single string or pattern, you can safely ignore this switch.

Next, the -p switch creates an internal while loop that iterates over each record from each of the specified files, storing the content in the default Perl variable, $_ . If you look at the far right of the one-line statement, you will see the list of files that Perl will process. Remember that because the record separator is undefined , $_ will contain the contents of the entire file, as opposed to just one line.

The -i switch asks Perl to modify each of these files in-place, moving the original to another file with the same name but with the .bak extension. If you don t want to create a backup copy of each file, you can simply remove the .bak after the -i switch. And, finally, the -e switch specifies the Perl code that should be executed for each record read from a file. If you want more information on all of the command line switches accepted by the interpreter, you should look at the perlrun man page.

Here is the code that performs the substitution:

 s/43801 Mission Blvd., Suite 103<br>\s*Fremont, CA 94539/7150 Koll Center  Parkway, Suite 200<br>\nPleasanton, CA 94566/gsi; 

This is technically equivalent to

 $_ =~ s/43801 Mission Blvd., Suite 103<br>\s*Fremont, CA 94539/7150 Koll  Center Parkway, Suite 200<br>\nPleasanton, CA 94566/gsi; 

as regular expression operators, including s// , work on the default Perl variable, $_ , if another scalar variable is not specified. Whenever you see the =~ or !~ operators, you should automatically think of regular expressions. You can use these operators to compare a scalar value against a particular regular expression.

The s// operator replaces the left side of the expression with the value on the right side. For example, if you want to substitute the ZIP Code 94539 with 94566 , you would use the following:

 s/94539/94566/; 

We are simply substituting literal values here. But, in our main example, we are using regex metacharacters in the substitution pattern. These metacharacters have a special significance to the regex processing engine. Take a look at the \s token followed by the asterisk. The token matches an occurrence of white space; white space includes a regular space character, tab, newline, or carriage return. We use the asterisk as the token s multiplier , forcing the regex engine to match the token ”in this case, the white space ”zero or more times. Other than this metacharacter, the rest of the expression is simply a set of literal characters.

And, finally, let s look at the expression modifiers that follow the substitution operator. Each of the three characters, g , s , and i , has a special significance:

  • The g modifier enables global substitution, where all occurrences of the original pattern are replaced by the new string.

  • The s modifier forces the engine to treat the string stored in the $_ variable as a single line.

  • The i modifier enables case-insensitive matching.

Only a small subset of metacharacters is supported by the Perl regex engine. We won t cover them all here, so take a look at the documentation provided by Perl and Beginning Perl, by Simon Cozens with Peter Wainwright, by Wrox Press.

Substitutions in Many Files

What if you have many HTML files spread around in multiple directories? Can you use the one-line Perl statement discussed previously to find and replace the old address with the new one? Yes, you can use the find command in conjunction with the xargs to process all the files:

 $ find . -name '*.html' -print0  xargs --verbose --null \ perl -0777 -p -i.bak -e \ 's/43801 Mission Blvd., Suite 103<br>\s*Fremont, CA 94539/7150 Koll Center  Parkway, Suite 200<br>\n Pleasanton, CA 94566/gsi' 

This example uses the find command to find all the files with an .html extension in the current directory and all underlying subdirectories. These files are sent as input, via the pipe, to the xargs command, which takes them and passes them as arguments to the ensuing perl command.

In summary, we have looked at just one regular expression metacharacter, along with a useful Perl technique for manipulating text easily and quickly from the command line. That alone should convince you of the power of regular expressions. Next, we will continue on with our main application, using the knowledge gained here to parse the relevant information from the output generated by the uptime command.

Back to Our Program . . .

Are you ready to handle a much more advanced regex? Let s dissect the one used in our main application to extract the number of users and the load averages (see Figure 13-2).

You can liken the process of crafting a regex to putting together a jigsaw puzzle piece by piece. To jog your memory, look once again at the uptime output:

 23:15:42 up 54 min,  3 users,  load average: 0.33, 0.29, 0.36 
click to expand
Figure 13-2

The first piece involves extracting the number of users. The easiest way to do this is to find the number that precedes the literal word user . You don t need to start your search from the beginning of the input string; you can create your expression to match from anywhere in the string. And once you find the number of users, you must store this value somewhere. Do this by enclosing the information you are interested in within a set of parentheses (see the expression in the preceding figure). This forces Perl to save the matched information in a special variable, $1 .

Next, we need to get at the three load averages. You can use a variety of different expressions, but the simplest is to proceed through the output, starting from the string user and stopping when you find the next numerical digit. Once you find this number, store all the characters from that point forward until the next comma; this is the first load average that gets stored in $2 . Then, ignore this comma as well as the following space, and obtain the next load average. And, finally, extract the final load average by matching everything from the current point to the end of the line.

At first glance, regular expressions are difficult to comprehend. But, as you practice more and more, they will get much easier to handle, and you will be able to implement all types of parsers, from finding errors in log files to extracting content from Web sites.

Archiving to File

Once you have the four pieces of information that you are interested in, you archive this data to a log file. You use the open command to open the file, but with a little twist. Notice the two leading greater-than signs in front of the filename; this tells Perl to open the file in append mode. Then, use the join command to create a string containing the current timestamp and the uptime information, delimited by the colon character.

To accurately analyze this type of information over time, you also need to have a timestamp associated with each data point. For this purpose, use the time command, which simply returns the number of non-leap seconds since the epoch ”on UNIX systems, this is 01/01/1970. Try the following to get a feel for these types of timestamps:

   $ perl -e 'print time, "\n"'   1079335147   $ perl -e 'print scalar localtime 1079335147, "\n"'   Sun Mar 14 23:19:07 2004 

There are a large number of Perl extensions that you can use to manipulate timestamps, in addition to the built-in localtime command.

Last but not least, you write the delimited string to the file using the print command. Then, close the file and exit.

You have read quite a number of new techniques in this section, ranging from accepting command line arguments to interacting with external applications and designing regular expressions. It is not easy to understand all of this material at first glance, so take some time to go through the material a number of times. In addition, you should refer to the built-in documentation as well as to other resources, such as Beginning Perl, by Simon Cozens and Professional Perl, by Peter Wainwright et al, from Wrox Press for more detailed coverage.

The next section implements the disk usage application that was covered at the beginning of the chapter. You should use this application as a benchmark to see how well you understand the techniques presented up to this point.

Application 4: Disk Usage Monitor

We spent a considerable amount of time and space discussing and analyzing a hypothetical disk usage monitoring application earlier in the chapter. You have learned quite a bit of Perl since that discussion, so in this section, you actually implement the application. To summarize, the application must be able to perform the following tasks:

  • Monitor specified filesystems for available disk space.

  • Determine user directories that exceed a certain quota limit if available disk space falls below a specified threshold.

  • Find the 15 largest files for each user directory that exceeds the quota limit.

  • Send an e-mail message to the offending user listing the largest files.

In discussing this application, we will not dissect the code in extreme detail as we have done with the three previous programs. Instead, you will see a block of code and a brief description of its functionality, highlighting any new constructs or syntax. But first, Figure 13-3 illustrates an e-mail message that is typically produced by this application if the quota usage thresholds are reached.

Here is the code:

   #!/usr/bin/perl     ##++     ##   disk_usage.pl: monitor disk usage and alert users via e-mail     ##--     use Mail::Send;                             ## Load Mail::Send     use strict;                                 ## Use strict pragma;     ## must declare all vars.     ##++     ##   Declare constants/global variables     ##--     our ($DF, $DU, %FILESYSTEMS, $QUOTA, $NO_FILES); ## Pre-declare global     ## variables     $DF          = '/bin/df';                        ## Path to df command     $DU          = '/usr/bin/du -ab';                ## Path and options to du     ##++     ##   The following three constants store (1) the filesystems to check and     ##   their maximum allowable usage thresholds, (2) max user's disk quota,     ##   50 MB and (3) the number of files to process, 15. You should change     ##   these values to suit your requirements and system configuration.     ##--     %FILESYSTEMS = ('/home1' => 90, '/home2' => 85);     $QUOTA       = 1_024_000 * 50;                ## Use _ to make large     ## numbers more readable     $NO_FILES    = 15;     ##++     ##   Start main program     ##--     print STDERR "Disk usage check in progress, please wait ...\n";     my    ($percent, $filesystem);     local *PIPE1;                                 ## Pre-declare/localize     ## filehandle     open (PIPE1, "$DF ")  die "Cannot create pipe: $!\n";     while (<PIPE1>) {                             ## Read line into $_     if (/(\d+)%\s+(.+)\s*$/) {                ## Match against $_ variable     ($percent, $filesystem) = (, );   ## Store matches     if (exists $FILESYSTEMS{$filesystem} &&     ## Does this element     ## exist in hash?     $percent >= $FILESYSTEMS{$filesystem}) {     print STDERR ">> $filesystem has $percent usage <<\n";     process_filesystem ($filesystem); ## Invoke subroutine     }                                     ## with $filesystem as arg.     }     }     close (PIPE1);     exit  (0);     ##++     ##   End of main program, subroutines follow ...     ##--   

We start out by defining a set of constants , variables whose values will not change over the life of the program, using the our command. The our command allows you to define variables that have a global scope and can be accessed in subroutines. By using a constant, as opposed to a literal value, you can change its value easily without having to search and replace various instances of the actual value throughout the program.

One of the constants you should take note of is %FILESYSTEM , a data type that you have not seen so far. It is an associative array, commonly referred to as a hash , but is very different than a regular array. You can think of a hash as an array that is indexed by an alphanumeric key , rather than an ordered number. Each hash element has two scalar components : the key and an associated value. This application uses a hash to associate filesystems with their corresponding disk usage thresholds. For example, you can access the threshold value for the /home1 filesystem by using the following syntax:

 $home = $FILESYSTEMS{'/home1'}; 
click to expand
Figure 13-3

After initialization, we proceed to use a pipe to interact with the df system command, which, in turn , produces and returns a disk usage report. Then, we iterate through each line of the report and extract the usage percentage and filesystem name with a regular expression. Based on the regex dissection performed in the previous section, can you understand this expression? We start our search toward the end of the input string, looking for one or more numerical digits followed by a percent sign. The numerical value will be stored in $1 and represents the usage percentage. Then, we match one or more white space characters and proceed to extract and save the remaining portion of the string in $2; this is the filesystem name. The trailing $ metacharacter allows us to anchor a regex to the end of the string.

Only if the line of input matches the regex do we proceed with the rest of the code. For each filesystem, we check to see if it exists in the hash and if the usage value exceeds the specified threshold. Only in that case do we print a diagnostic message and invoke the process_filesystem subroutine to process each user s individual directory.

You can use subroutines, also known as functions, methods, and procedures, to make your code more modular. By placing a block of code that performs a distinct operation or set of operations in its own unique container, you can keep the program clean, easy to understand, and possibly more efficient. Look at this subroutine:

   ##++     ##   Subroutines     ##--     sub process_filesystem     {     my    $filesystem = shift;         ## Argument; read from @_ array     my    ($dir, $path, $files);       ## Pre-declare other variables     local *DIR;                        ## Pre-declare/localize filehandle     opendir (DIR, $filesystem)         ## Open specified filesystem     die "Cannot open filesystem $filesystem: $!\n";     while ($dir = readdir (DIR)) {     ## Get each user's directory     next if ($dir =~ /^\./);       ## Ignore dirs. that start with .     print STDERR "- Processing $dir directory ...\n";     $path  = "$filesystem/$dir";   ## Add full path to user's dir     $files = process_dir ($path);  ## Invoke subroutine     ##++     ##   Send e-mail to user only if he/she has exceeded quota limit.     ##--     send_email ($dir, $files) if (ref $files);     }     closedir (DIR);                    ## Close directory   } 

The main program passes the name of the filesystem that has exceeded its specified disk usage threshold as an argument to this subroutine. Perl stores subroutine arguments in the special @_ array ”analogous to the $_ default variable. We use the shift command to remove the first element from this array and store its value in $filesystem . However, if you don t feel comfortable using shift on an unseen array to obtain the argument, you can use either one of the following two statements:

 my $filesystem = shift @_;             ## Remove first element from @_ my $filesystem = $_[0];                ## Argument; access index 0 

Next, we call the opendir function to open the filesystem s root directory, in much the same manner as we would open a text file. We proceed to iterate through the directory, getting a subdirectory name each time we invoke the readdir command. You should be aware that each subdirectory, in turn, represents a user s home directory. Because readdir returns only the subdirectory name, we use the $filesystem value to construct the entire path to the directory and pass it to the process_dir subroutine.

The process_dir subroutine traverses through all of the user s files and returns a hash reference containing the list of the largest files, but only if the user has exceeded the quota value as defined by the $QUOTA constant. If we do end up getting back a list, we would call the send_email subroutine to send a warning e-mail message to the user.

We have not talked about references at all in this chapter. You can use references to create complex data structures using the three basic data types that you are familiar with, namely scalars, arrays, and hashes. We will look at references in more detail as we discuss this application:

   sub process_dir     {     my    $dir = shift;                     ## Argument; from @_ or $_[0]     my    ($min, %files, $size, $file, @sorted);     local *PIPE2;                           ## Localize filehandle     ##++     ##   Open a pipe to the /usr/bin/du command, with the -ab arguments;     ##   this is defined by the $DU constant. The -a switch returns sizes     ##   for all files, not just directories, and the -b switch returns     ##   the sizes in bytes -- instead of the default unit blocks.     ##     ##   Sample output:     ##   140     /home/postgres/global     ##--     open (PIPE2, "$DU $dir ")  die "Cannot create pipe: $!\n";     $min   = 0;                             ## Keep track of smallest file     %files = ();                            ## Initialize hash     while (<PIPE2>) {                          ## Read record for file     ($size, $file) = /^(\d+)\s+(.+)\s*$/;  ## Get the size and file     ## from du     last if ($file eq $dir);            ## End loop if we are finished     next if (-d $file);                 ## Ignore if file is directory     if ($size > $min) {     $files{$file} = $size;          ## Store each file and size     ##++     ##   If we have more than 15 files in the %files hash, then     ##   we sort the values based on the file size and delete the     ##   file with the smallest size. At any given time, we will     ##   have only that number of files in the hash.     ##--     if (scalar keys %files > $NO_FILES) {     @sorted = sort { $files{$a} <=> $files{$b} } keys %files;     $min    = $files{ $sorted[1] };     delete $files{ $sorted[0] };   ## Delete smallest file     }     }     }     close (PIPE2);                             ## Close pipe     return ($size >= $QUOTA) ? \%files : 0;    ## Return list only if user     ## exceeded quota     }   

This is the most complicated piece of code to date in this chapter. The task in this subroutine is to locate and return files from a specific user s home directory that occupy the most disk space. The actual number of files to process is specified by the $NO_FILES constant defined at the beginning of the program ”15, in this case.

First, we create a pipe to the du command to get a list of each and every file and its associated size from the specified directory. We then iterate through this list one file at a time, ignoring all directories, with the exception of one. That is the directory that has the same path as the user s home directory. This signifies the end of the output, so we exit from the loop.

Looking inside the loop, we check to see if the size of the current file is greater than the minimum size, which is initially zero. If so, we store the file and its size in the hash, and then immediately determine the number of files currently stored in the hash. You don t want to store more than 15 files at any given point. If the number exceeds 15, you need to remove the file with the smallest size.

The technique to remove this file from the hash is a simple one. You use the sort command to sort the hash by file size and store the sorted sizes in the @sorted array. Next, reset the minimum file size to the second smallest size, so that from here on, you store only the files that are larger than this value. And then, you finally remove the smallest file from the hash using the delete command. Repeat this process for all of the files, at which point you will be left with the 15 largest files, unless there are less than 15 files in the directory. After the loop terminates, you return the list of these files, but only if the user s total directory size is greater than the defined quota.

You should note a few things, to note however. The first is the sort command and its syntax. The sort command expects to receive an array as input and returns the sorted array as output:

 @array  = (100, 50, 25, 75); @sorted = sort { $a <=> $b };  ## @sorted = (25, 50, 75, 100); 

Your objective is to get the list of files in the %files hash ordered by their size, so you can remove the file with the smallest size. Unfortunately, sort cannot directly work with hashes, so you must do things a bit differently. You can use the keys command to return an array of all the hash keys (filenames):

 @files = keys %files;          ## @files = ("/home/postgres/global", ...); 

Next, you need to familiarize yourself with is the return command. You can use the return command to return a value or set of values back to your caller. However, the return command, as shown in the subroutine, is a shortcut for the following:

 if ($size >= $QUOTA) {     return \%files;            ## Return reference to a hash } else {     return 0; } 

Remember, an associative array and a regular array allow you to store multiple elements. If you were to return one of these data types from a subroutine, they would not be returned as one entity, but instead as multiple scalar variables. To avoid this, you take a reference of the hash by using the \ prefix; now it is returned as one single entity. Of course, if you return a reference to a hash in this manner, you must de-reference it outside before you can access elements in the hash. We discuss this process after we look at the following send_email subroutine:

   sub send_email     {     my ($user, $files) = @_;     my ($mail, $list, $fh);     $mail = new Mail::Send;              ## Creates new Mail::Send object     $list = '';                          ## String to hold list of files     $mail->to ($user);     $mail->subject ('Disk Usage Alert');     $fh = $mail->open();     map { $list .= pack ("A15A*", $files->{$_}, $_) . "\n" }     sort { $files->{$b} <=> $files->{$a} } keys %$files;     ##++     ##   This is a here document. It allows us to output large blocks     ##   of a text at once; it prints until it finds the     ##   'Message' delimiter.     ##--     print $fh <<Message;     *************************  DISK USAGE ALERT  *************************     We are running LOW on disk space, and so we ask you to please clean up     any unnecessary files.  For your convenience,  we have attached a list     of your $NO_FILES largest files, which you may want to look at:     $list     **********************************************************************     Sincerely,     Your Friendly System Administrators     Message     $fh->close();     }   

Most of the code in this subroutine should be familiar to you by now; it is nearly identical to the send_email.pl application. The process_filesystem subroutine passes the username and the hash reference to this subroutine to build and send the e-mail warning message. We are making one critical assumption here, namely that the user s e-mail address is the same as his or her home directory name. If you are adventurous, you can modify the code a bit from list_users.pl to match the home directory to the username, and thus the e-mail address.

If you look at the code carefully , you will see one very cryptic-looking statement. You should read it from right to left, instead of left to right. First, we use the sort command to sort the files in descending order; compare the position of $a and $b here to the sort expression in the process_dir subroutine. We pass the resulting array to map to build a string that lists the files and sizes in a tabular format.

To make this statement easier to understand, it is broken up into a simpler syntax here:

 @keys   = keys %$files; @sorted = sort { $files->{$b} <=> $files->{$a} } @keys; foreach $key (@sorted) {     $list .= pack ("A15A*", $files->{$_}, $_) . "\n"; } 

That s it for the application, but we are not done with Perl just yet. The next section implements a Web-based system administration application that you can use to monitor your system. But more significantly, it illustrates how easy it is to design and develop a comprehensive application in Perl.




Beginning Fedora 2
Beginning Fedora 2
ISBN: 0764569961
EAN: 2147483647
Year: 2006
Pages: 170

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net