Section 31.1. Objective 1: Automating Tasks Using Scripts


31.1. Objective 1: Automating Tasks Using Scripts

Scripting is one of the oldest and most powerful tools in the Unix environment. Understanding scripts is also one of the most useful administrator skills when it comes to analyzing system problems, because setting down a procedure in a script, and testing it, forces the administrator to decide exactly what should be done.

The most common scripting languages in Linux are Perl and the Unix shells sh and bash . Perl is a complete scripting environment in itself. On most Linux installations, sh and bash are the same interpreter, but this is not the case on Unix in general. sh and bash adhere to the old toolbox philosophy of Unix: lots of little programs that do one task well. Among the subordinate tools they invoke are sed and awk.

Thus, using sed and awk is considered an integral part of shell scripting. One thing to note about Bash and shell scripting in general is that in an unfriendly environment it is quite easy for an intruder to subvert the scripts and turn them to their own uses.

To take this exam, you should already know scripting on the level described in the Objective so that you understand all the examples shown here. This book will not teach you to program. Scripting is complex enough that there are whole books devoted to the subject. We will only scratch the surface. The scripts shown in this chapter illustrate system administration tasks more than they help you learn programming.

Shell scripting is best suited for checking things and then executing something in response. For text processing and scripting that require more than 10 to 100 lines of bash, you will most often find Perl to be a better tool.

Learning Shell and Perl Programming

There are a plethora of excellent books on both these subjects. Learning Perl is a good text for a non-programmer to learn Perl from. Programming Perl is a good reference and introduction for programmers. For Bash scripting, take a look at Learning the Bash Shell. (All three books are published by O'Reilly.)

On your system, you will find that Perl has about 30 manpages divided into subject areas, with much basic and tutorial material. Bash has a gigantic manpage and built-in help for all its commands showing all options and syntax.


31.1.1. Scripting with Bash and Friends

Bash is the standard shell on Linux. One reference for it is the manpage; the info page (invoked at the command line through info bash) is probably even better. You'll learn much from reading other people's scripts, because there are quite a few clever tricks you can use.

Unix and Linux scripts start with a magic line such as:

 #!/bin/bash 

When the first 2 bytes of an executable file are #! Unix and Linux know that the file is a script and should be interpreted by a program. The name of the program follows. In our case it is /bin/bash, the standard path to Bash on Linux. You may include exactly one word, or cluster, of arguments as wellfor example -xv to trace script execution.

31.1.1.1. Variables

When a script is invoked with arguments or options, these are stored in numbered variables $1, $2, $3, and so on. In $0 you will find the command name under which the script was started. The complete list of arguments can also be referred to as "$@", with the quotes. The often seen, older form $* should not be used because it does not preserve quoting and word separation. A short example of a script displaying its arguments:

 #!/bin/bash echo $0 $1 $2 $3 $4 echo $@ 

And the output from one sample invocation:

 $ ./arg.sh 1st 2nd 3rd 4th ./arg.sh 1st 2nd 3th 4th 1st 2nd 3th 4th 

User variables in shell scripts are, by tradition, in all uppercase characters. They are always referenced with a leading dollar sign but always assigned without the dollar sign. Thus, an assignment and a reference look like this:

 #!/bin/bash FOO="hello world" echo "$FOO" 

As shown in this example, it is a good idea to use double quotes (") a lot in shell scripts. In the case of the assignment, double quotes are also required; otherwise, FOO would be only hello, and world would be assumed to be a command to execute.

31.1.1.2. Checking process status and sending alerts

Suppose you want to write and run a script that checks whether your web server is running. In Linux, if the command you want to check is in your PATH, check it with the pidof command:

 #!/bin/bash if pidof httpd >/dev/null; then    echo We have Apache servers else    echo There is no Apache server fi 

Because pidof prints pids, and you don't want to see those in this particular setting, the >/dev/null redirects the output to the special file /dev/null, where it will disappear.

It's probably not very helpful to just print something in a window if you want to learn that an Apache server is not running. An email is far better. On most systems you will find the executable /usr/sbin/sendmail, even if the systems have another MTA such as Exim or Postfix instead of Sendmail, and using this executable directly is the safest way of sending email from scripts :

 #!/bin/bash if pidof apache >/dev/null; then    :; # That's OK then else    /usr/sbin/sendmail -t <<EOM To: admin@example.com Subject: Apache is not running Apache is not running on $(uname -n) Regards,   The Apache posse . EOM fi 

When using sendmail, you should not start any lines with From because that is the envelope From header used to separate mails in standard mail spool files. Also, avoid lines containing only a single dot (.) because mail programs use such lines to denote the last line in the input.

You will often find scripts using mail (or mailx or Mail) to send email. This is, in general, insecure, because they tend to interpret ~ (tilde) as a command character, and if someone tricks your script into outputing a ~ sequence, he can easily compromise your system. Also, the options to these commands vary among Unixes and even among Linuxes.

There are two other things to note in the sample script shown earlier. The <<EOM construct causes the shell to set up sendmail's standard input to read from the next line up to (but not including) the line that consists only of EOM. The $(uname -n) construct executes the parenthesized command and inserts its output instead. The old way of writing this is 'uname -n' (i.e., a command quoted in single backquote or backticks). The newer $( ) notation has more structured logic for quoting, which handles several levels of quoting without causing insanity.

Sending pager or SMS alerts is not supported by Linux out of the box. You need to use an Internet email-to-pager or email-to-SMS gateway or install some combination of software and hardware locally that is capable of sending out such an alert.


Tip: It is, of course, also perfectly possible to use ps with suitable options in combination with grep or awk to determine whether a specific process is alive, e.g.: ps -ef | grep apache. A thorny problem with that approach, for example, is that the grep command tends to find itself, because it obviously contains the string apache. This can be especially bad if the pipe continues and ends in a kill command that then kills the grep command before it found all the Apache processes. Using some regular expression mangling such as: ps -ef | grep ap[a]che avoids showing the unwanted grep command.
31.1.1.3. Monitoring users and using awk

Warning: Monitoring users may be subject to legal restrictions where you live and work.

User logging is done in two system files. These are most easily read in a shell script by the commands w, who, and last. If you want to monitor the use of the root account, look at the output of w:

 $ w  15:44:48  up 9 days,  3:44, 14 users,  load average: 0,00, 0,00, 0,00 USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU  WHAT annen    pts/10   10.0.0.2         Sun 5pm  1:43m 16:10   0.01s  -bash janl     pts/9    selidor.langfeld  3:34pm  0.00s  0.01s  0.01s  w 

This is helpful, but contains an oddity that can make parsing with awk difficult. awk tends to work by extracting fields separated by spaces. In the w, as one often sees in Unix utilities, the login time here is not always a single string of characters. If a login took place long enough ago, the date is printed with a space in the middle. If you count the fields of the output, such as awk does, the results are corrupted. You could use the cut command, but that is not in this Objective.

If we look at the output of who, on the other hand, we see it has a fixed number and syntax of fields:

 annen    pts/10       Jan  4 17:21 (10.0.0.2) janl     pts/9        Jan  5 15:34 (selidor.langfeldt.net) 

User auditing and report generation can be done efficiently with last:

 # last root root     pts/6        pargo.un-bc.petr Tue Jul 19 16:57   still logged in root     pts/4        nbctr36591.bc.ep Tue Jul 12 11:20 - 11:26  (00:06) root     vc/1                          Wed Jul  6 10:13 - down  (2+08:46) root     vc/1                          Wed Jul  6 10:01 - 10:02  (00:00) root     pts/1        pargo.un-bc.petr Thu Jun 30 15:58 - 16:53  (00:55) root     pts/1        pargo.un-bc.petr Thu Jun 30 14:59 - 15:51  (00:51) root     pts/6        cherne.un-bc.pet Tue Jun 21 01:39   still logged in root     pts/6        cherne.un-bc.pet Tue Jun 21 01:03 - 01:39  (00:35) root     pts/11       10.185.4.14      Fri Jun 17 09:31 - 10:16  (00:44) root     pts/11                        Thu Jun 16 19:40 - 19:40  (00:00) root     pts/6        pargo.un-bc.petr Thu Jun 16 10:28 - 11:05  (00:36) root     pts/6        pargo.un-bc.petr Thu Jun 16 10:23 - 10:23  (00:00) 

awk is a quite complete programming language in itself, and you may well find yourself starting a script with #!/usr/bin/awk -f sometime. It was made to process files and produce some kind of report, much like Perl, but is not as powerful. There are a few things particularly handy to know about awk, and most of them are in the following example:

 #!/bin/bash who | awk ' BEGIN   { root=0; users=0; }         { FROM="localy"; users++; } /^root.*\)$/  { FROM="from "$6; } /^root/ { print "Root is logged in at "$3" "$4" "$5" "FROM; root++; } END     { print "Root was logged in "root" times. ";           print "A total of "users" users seen."; } 

This yields:

 Root is logged in at Jan 5 16:02 locally Root was logged in 1 times. A total of 2 users seen. 

The basic form of directives in an awk program is:

 pattern { statements; } 

Usually, pattern begins and ends with slash characters (/) and represents a regular expression that awk is supposed to search for in each line. awk goes line by line through the input in order and checks each line for each pattern, executing the statements each time a pattern matches a line.

If the pattern is BEGIN, the statements are executed at startup. Here the variables root and user are assigned starting values. That is actually superfluous in this script, because awk assumes that new variables start with values of zero or empty strings, but it served to illustrate the use of BEGIN.

Likewise, if the pattern is END, the statements are executed at the end of the program.

If the pattern is skipped, such as on the second line of the awk script, the statements are executed once for each line. Here the variable FROM is set, and the variable users is incremented. users++ is C-style notation for users = users + 1.


Tip: awk has language elements for conditional execution, loops, and functions for searching, substitution and all sorts of other things. If you want to learn any more about awk for the test, study how to use its if statement and substring matching and substitution.

Advanced Regular Expressions

The regular expressions used in the Level 1 LPI Exam are not the whole story. There are two more levels. awk and sed (as well as egrep) implement a more advanced syntax, while remaining mostly compatible with the simpler ones. Look in the info pages for sed and awk for an explanation of these. The standard for these are determined by the POSIX Unix standards.

Perl takes sed and awk regular expressions up several steps. Perl is fairly POSIX-compatible, but extends the syntax significantly. Please refer to the perlre manpage for further material.


The only frequently used option to awk is -F; it is used to specify the field separator. In most cases, the default is good, but sometimes it helps a lot, as when processing a passwd file where -F: indicates that fields are separated by colons. Another inportant option is -f file, which causes awk to take the named file as input patterns and statements to execute.

31.1.1.4. Detecting changes

The previously shown scripts detect everything that matches in the input files, whenever they were added to those files. When checking system logs and doing other system monitoring, it's more useful to detect changes and then look at what changed. Two programs help do this: cmp, which just checks whether files are the same, and diff, which compares and reports differences between two text files (or directories of text files). cmp is faster if you simply want to see whether two files are different and is more appropriate for binary files. diff does complex computations to find out what changed and report it in an intelligent manner.

Both cmp and diff need a file showing how things were and another file that shows how things are. If you have a directory hierarchy (for example, an incoming hierarchy on an FTP server) that you want to monitor for changes, you may use something like this shell function:

 lslrdir=/var/lslR ifupdated ( ) {         dir=$1; shift || exit 1         cd $dir || exit 1         # Make a slashless name to obtain a suitable filename         name=$lslrdir/$(echo $dir | tr '/' '-')-ls-lR         rm -f $name.new         # Create a new snapshot         /bin/ls -lR >$name.new || exit 1         # Compare with old snapshot         cmp $name $name.new >/dev/null || {                 rm -f $name || exit 1                 mv $name.new $name                 eval "$2"         } } 

The script could be invoked in a manner such as:

 ifupdated /var/ftp/incoming /usr/local/bin/processincoming 

With an understanding of shell $ variables and the commands being invoked, you should be able to understand how the script does its job, which is to detect whether any files have been added, deleted, or altered in /var/ftp/incoming and to execute /usr/local/bin/process incoming if any have.

One useful feature shown in the script is how to check for errors: a nonzero status from each command leads to the execution of the exit 1 command after the || double bar. Thus, the script exits in case of error, which is a crude way of handling the error, but better than continuing and potentially making things worse.

In order to monitor users, you might create a snapshot with w, who, or even last. Using diff, you can see which users have appeared since the most recent execution.

 $ who   >who.txt...5 minutes pass...$ who >who.txt.new$ diff who.txt who.txt.new 0a1 > root     vc/1         Jan  5 16:02 

The lines matching the regular expression /^>/ are new. Any lines matching /^</ have disappeared. It's therefore quite easy to look for interesting new users and report on them with something like the awk script shown earlier.

Monitoring files can be a different problem. If a file is huge, it may not be practical to copy it and then cmp the new and old versions. If so, it may be suitable to use a checksumming program. On Linux, md5sum is present everywhere and used by a lot of software to detect changes in files. The downside with it is that it's computationally heavy. In contrast, two similar commands, cksum and sum, which should be part of all Linux installations, are fast. If you want to monitor system files throughout your machines to see if your security has been broken, there are better tools. Check, for example, aide, which is very configurable and is used by some as an intrusion detection system (IDS) tool. There is also a module in CPAN (described later) called mon , which underpins many kinds of monitoring .

31.1.1.5. Log munging with sed

Let's say you have an email server and want to see who the top recipients of email are. The log format of the postfix MTA is like this:

 pickup[13422]: 7B11119: uid=0 from=<root> cleanup[13538]: 7B11119: message-id=<20040105113325.7B11119@lorbanery.langfeldt.net> qmgr[13423]: 7B11119: from=<root@langfeldt.net>, size=753, nrcpt=1 (queue active) smtp[13540]: 7B11119: to=<janl@linpro.no>, relay=smtp.chello.no[213.46.243.2], delay=27, status=sent (250 Message received: \   20040105113347.CIQY15111.amsfep12-int.chello.nl@lorbanery.langfeldt.net) smtpd[13541]: connect from head.linpro.no[80.232.36.1] smtpd[13541]: TLS connection established from head.linpro.no[80.232.36.1]: TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits) smtpd[13541]: C4B3D19: client=head.linpro.no[80.232.36.1] cleanup[13542]: C4B3D19: message-id=<E1AdT1l-0005sS-00@sc8-sf-web1.sourceforge.net> smtpd[13541]: disconnect from head.linpro.no[80.232.36.1] qmgr[13423]: C4B3D19: from=<janl@linpro.no>, size=6009, nrcpt=1 (queue active) smtp[13544]: C4B3D19: to=<janl@langfeldt.net>, relay=10.0.0.4[10.0.0.4], delay=8, status=sent (250 OK id=1AdT3l-0000Bp-00) 

The interesting bits here are the lines that start with smtp. No other lines are interesting, including the lines containing smtpd, so they must be avoided. The sed regular expression for removing nonmatching lines is /smtp\[/!d, which means "delete all lines except those containing smtp." This yields:

 $ sed -e '/smtp\[/!d' mail.log smtp[13540]: 7B11119: to=<janl@linpro.no>, relay=smtp.chello.no[213.46.243.2], delay=27, status=sent (250 Message \   received: 20040105113347.CIQY15111.amsfep12-int.chello.nl@lorbanery.langfeldt.net) ... 

Next, you have to pick out the part of the line with the email address. This can be done by removing the other parts. Start by removing the front part with s/^.*to=<//, yielding:

 $ sed -e '/smtp\[/!d' -e 's/^.*to=<//' mail.log janl@linpro.no>, relay=smtp.chello.no[213.46.243.2], delay=27, status=sent (250 Message received: \   20040105113347.CIQY15111.amsfep12-int.chello.nl@lorbanery.langfeldt.net) ... 

Now remove the end of the line with s/>,.*$. This yields:

 $ sed -e '/smtp\[/!d' -e 's/^.*to=<//' -e 's/>,.*$//'  mail.log janl@linpro.no ... 

Now you want to make sure all addresses are cased the same, and do some plumbing to get the count of individual different addresses:

 $ sed -e '/smtp\[/!d' -e 's/^.*to=<//' -e 's/>,.*$//' mail.log |>   tr '[A-Z]' '[a-z]' | sort | uniq -c | sort -nr     330 janl@langfeldt.net       2 andyo@oreilly.com       1 yngve@linpro.no       1 steve@kspei.com       1 nicolai@langfeldt.net       1 machine-registration@counter.li.org       1 janl@linpro.no       1 ftp-drift@uio.no 

The sed expression got a bit ugly. Another property of the email addresses is that they're bracketed in <...>. If you replace the complete line by the bracketed contents, that would have the same effect as the longer command shown before:

 $ sed -e '/smtp\[/!d' -e 's/^.*<\([^>]*\)>.*$/\1/' mail.log |> /   tr '[A-Z]' '[a-z]' | sort | uniq -c | sort -nr     330 janl@langfeldt.net       2 andyo@oreilly.com       1 yngve@linpro.no       1 steve@kspei.com       1 nicolai@langfeldt.net       1 machine-registration@counter.li.org       1 janl@linpro.no       1 ftp-drift@uio.no 

Not that the command is a lot prettier. Here the \(...\) marks a subexpression. The string matching the first subexpression is placed in \1. So the \1 in the replacement text refers to the email address on the line. Here it replaces the whole line. You may refer to up to nine subexpressions with the \n syntax.

31.1.2. Scripting with Perl

It has been said that Perl, rather than being a Swiss Army knife, is a Swiss Army chain saw. Perl is a very powerful file-processing tool and a complete programming language with object-oriented features besides. Perl has always been at the forefront of advanced support for regular expressions, which makes it a tool for easy log processing.

Since Perl Version 5 arrived on the scene, the second big thing about Perl is the Comprehensive Perl Archive Network (CPAN), where you will find a Perl module and related documentation for almost any conceivable use ready to download and install.

31.1.2.1. Using CPAN

Recent Perl versions come with a module called CPAN that provides a complete interface to retrieving and installing more modules from CPAN. The first time the CPAN module is run, it wants all manner of configuration, but answering its questions is far from rocket science. Most questions can usually be answered by accepting the defaults. But if you are behind an authenticated proxy, you should configure your proxy settings before running the CPAN module, because all the -MCPAN requests are done over the Internet. Use the following shell commands to export these two environment variables:

 $ export ftp_proxy=http:// user : password @ proxy_addr : port  $ export http_proxy=http:// user : password @ proxy_addr : port  

Once configured, CPAN is ready to use. If you're looking for a module to support making DNS requests in Perl, for instance, try:

 # perl -MCPAN -e shell cpan shell -- CPAN exploration and modules installation (v1.76) ReadLine support enabledcpan> i /DNS/ CPAN: Storable loaded ok Going to read /root/.cpan/Metadata   Database was generated on Fri, 26 Sep 2003 22:50:31 GMT CPAN: LWP::UserAgent loaded ok Fetching with LWP:   ftp://ftp.uninett.no/pub/languages/perl/CPAN/authors/01mailrc.txt.gz Going to read /root/.cpan/sources/authors/01mailrc.txt.gz CPAN: Compress::Zlib loaded ok Fetching with LWP: ftp://ftp.uninett.no/pub/languages/perl/CPAN/modules/02packages.details.txt.gz Going to read /root/.cpan/sources/modules/02packages.details.txt.gz   Database was generated on Mon, 05 Jan 2004 22:51:56 GMT Fetching with LWP:   ftp://ftp.uninett.no/pub/languages/perl/CPAN/modules/03modlist.data.gz Going to read /root/.cpan/sources/modules/03modlist.data.gz Going to write /root/.cpan/Metadata Distribution    A/AN/ANARION/DNS-TinyDNS-0.20.1.tar.gz ... Module          Tie::DNS        (D/DI/DIEDERICH/Tie-DNS-0.41.tar.gz) 131 items found 

A bit overwhelming perhaps. As you may deduce from the use of slashes in the search expression, you have Perl 's full regular expression syntax at your service. A better way to search is to use http://search.cpan.org. It lets you search and browse by various parameters (author, category, etc.), select interesting modules, and look at the documentation to see if they meet your needs. Somewhere in there you will find a module called Net::DNS. Now you can install it as follows:

 cpan> install Net::DNS Running install for module Net::DNS Running make for C/CR/CREIN/Net-DNS-0.44.tar.gz Fetching with LWP:   ftp://ftp.uninett.no/pub/languages/perl/CPAN  /authors/id/C/CR/CREIN/Net-DNS-0.44.tar.gz ... Warning: prerequisite Digest::HMAC_MD5 1 not found. Writing Makefile for Net::DNS ---- Unsatisfied dependencies detected during [C/CR/CREIN/Net-DNS-0.44.tar.gz] -----     Digest::HMAC_MD5Shall I follow them and prepend them to the queue ... Warning: prerequisite Digest::SHA1 1 not found. Writing Makefile for Digest::HMAC ---- Unsatisfied dependencies detected during [G/GA/GAAS/Digest-HMAC-1.01.tar.gz] -----     Digest::SHA1Shall I follow them and prepend them to the queue ... Writing /usr/lib/perl5/site_perl/5.8.0/powerpc-linux/auto/Net/DNS/.packlist Appending installation info to /usr/lib/perl5/5.8.0/powerpc-linux/perllocal.pod   /usr/bin/make install  -- OK 

The shell does everything needed. It fetches the package, along with the packages it depends onrecursivelyand it unpacks, processes, tests, and installs each package.

A straightforward and noninteractive way to install modules can be done with the Perl -e option:

 # perl -MCPAN -e 'install SSH::Bundle' 

Thus, Perl packages have a way to detect dependencies, just as the Linux package managers do. They are detected during installation, whereupon the CPAN shell stops and asks whether you want to get the package (if so configured). After you say yes (as you generally should), CPAN goes and fetches and installs the package.

Two things are known to go wrong in the CPAN shell. Sometimes it stumbles when following dependencies and fails to install something properly. Then you must break the process and install these packages manually. It may be as simple as just issuing another install command in the CPAN shell. More rarely, CPAN will start fetching and installing a new Perl for you. You almost never want that. Your Perl package is usually under the control of a Linux package system, which should be managing your Perl executable, not the CPAN shell.

A corollary problem arises because all this CPAN downloading takes place outside your distribution's package system. When you upgrade your Linux distribution later, it will know nothing about what CPAN installed and will not reinstall the CPAN modules. You will have to reinstall them yourself, with or without the aid of the CPAN shell.


Tip: For deb and RPM packages, there are ways to download CPAN modules within the package system, but this is probably a win for you only if you manage more than a handful of computers that need the same packages. For RPM, you will find scripts such as cpanflute or cpan2rpm, and deb provides dh-make-perl. But this still does not solve the problem of upgrading Perl. You now have to make the upgraded Perl modules yourself when upgrading a distribution.
31.1.2.2. Log watching with Perl

There are plenty of log-watching scripts out there; just do a web search for them. The purpose of these is to watch system logs for interesting events and extract or even summarize them into a web page or an email. But you can also do simple log extractions easily with awk or Perl. For security purposes, it makes sense to look for things in /var/log/auth.log. This can be done quite simply with a grep command, but it would quickly turn ugly as you add things to watch for. There are just a huge number of things that might be interesting to know: all kinds of errors and warnings of course, but also login failures. In other logs, one should look for symptoms of hardware failurefor example, excessive messages about anything having to do with hda or sda if you have SCSI. A very simple log watcher would go like this:

 #!/usr/bin/perl -w use strict; open(LOG,"</var/log/auth.log") or die; while (<LOG>) {     /failure/ && do { print; next; };     /error/   && do { print; next; };     /warning/ && do { print; next; }; } 

If you're more in a tool-maker mind frame, you can find CPAN modules that make it trivial to read configuration files. All you have to do then is write an engine to watch the logs based on an external configuration file.

31.1.2.3. Fetching and processing web logs

It is fairly common to keep external web servers out in a firewall DMZ, semi-unavailable. They are also usually hardened and have next to no software installed. And they are probably not the place you want to publish access statistics. Therefore, you need to transfer the log files into your internal network and generate the statistics there.

For the sake of speed, web servers do not resolve the client IP address to its DNS name. This makes the statistics a bit less useful. Statistics tool such as webalizer may do DNS resolving, but they are typically as slow as molasses at it because they send one DNS query at a time. Even sending 10 DNS queries at a time does not help much with logs of many tens or hundred thousands of lines. With the Net::DNS module in Perl, you can do much better: it supports completely asynchronous name resolultion without any forking or other complications.

Getting logs from a secure machine these days mostly requires scp or rsync over ssh. For example, one of these commands copies the file:

 $ scp -c blowfish www.linux.no:/var/log/apache/www.linux.no-access_log* . $ rsync -avPe 'ssh -c blowfish' \ >    www.linux.no:/var/log/apache/www.linux.no-access_log*. 

The ssh suite is covered more in Chapter 40. rsync is discussed later in this chapter.

After copying the log file, it's time to resolve it with Perl and Net::DNS. Here is a script to do that:

 #!/usr/bin/perl -w # logdoc - log-doctor #    modify http logs for log processing/statistical purposes # # Written by Nicolai Langfeldt, Linpro AS.  Distribute under same terms as # Perl itself.  Based on a script in the Net::DNS distribution. use strict; use IO::Select; use Net::DNS; use Getopt::Std; my $default_timeout = 15; # Seconds # Fallback to this if no nameservers are configured. my $default_nameserver = "127.0.0.1"; #------------------------------------------------------------------------------ # Get the timeout or use a reasonable default. #------------------------------------------------------------------------------ my $progname = $0; $progname =~ s!.*/!!; my %opt; getopts("t:", \%opt); # The thing with zcat handles gz log files. my @files=map { $_="zcat $_ |" if /\.gz$/; $_; } @ARGV; my $timeout = defined($opt{"t"}) ? $opt{"t"} : $default_timeout; die "$progname: invalid timeout: $timeout\n" if $timeout < 0; #------------------------------------------------------------------------------ # Create a socket pointing at the default nameserver. #------------------------------------------------------------------------------ my $res = Net::DNS::Resolver->new; my $ns = $res->nameservers ? ($res->nameservers)[0] : $default_nameserver; warn "Using $ns as nameserver\n"; my $sock = IO::Socket::INET->new(PeerAddr => $ns,                                  PeerPort => "dns(53)",          Proto    => "udp"); die "couldn't create socket: $!" unless $sock; my $sel = IO::Select->new; $sel->add($sock); #------------------------------------------------------------------------------ # Read IP addresses and send queries. #------------------------------------------------------------------------------ my %cache=( ); my $pending = 0; my $max=0; my ($fwip, $rest, $ip, @a, @line, $url); # This loop resolves. @ARGV=@files; warn "Resolving names\n"; while(<>) {   chomp;   ($fwip,$rest)=split(' ',$_,2);   @a=split('\.',$fwip);   $ip = join(".",,reverse(@a),"in-addr.arpa");   next if exists($cache{$ip}); # Already know it   $cache{$ip}=['unresolved'];  # Insert place holder   my $packet = Net::DNS::Packet->new($ip, "PTR");   $sock->send($packet->data) or die "send: $ip: $!";   ++$pending;   $max = $pending if $pending>$max;   # Collect answers received until now.   while ($sel->can_read(0)) { accept_answer( ); } } warn "Waiting for outstanding answers, Pending: $pending, maxpending: $max\n"; while ($pending > 0 && $sel->can_read($timeout)) {   accept_answer( ); } #------------------------------------------------------------------------------ # Output resolved log files #------------------------------------------------------------------------------ @ARGV=@files; warn "Rewriting logs\n"; while (<>) {     chomp;     @line=split(' ',$_,11);     $line[0] = getname($line[0]);     $url = $line[6];     # Skip some lines     next if  $#line < 9 or # Bogus REQUEST prolly  $line[8] eq '"-"' or #  $line[8] ne '200' or # Only 200s are interesting  $url eq '408' or # Bogus REQUEST  $url eq '/' or   # Results in redirect, not interesting  # Different unintersting files and directories  index($url,'.gif')>=0 or  index($url,'.swf')>=0 or  index($url,'.css')>=0 or  index($url,'.js')>=0 or  index($url,'/share/')>=0 or  index($url,'/usage/')>=0 or  index($url,'/cgi-bin/')>=0;     print join(' ',@line),"\n"; } sub getname {   # Resolve names from the cache, resolving CNAMEs as we go   my($from)=shift;   my $type;   my $i=0;   # print "Looking for $from\n";   do {  return $from       if (!exists($cache{$from}) or    $cache{$from}[0] eq 'unresolved' or    $i++>10);     ($type,$from)=@{$cache{$from}};   } while $type != 'PTR';   return $from; } # Accept DNS query answers off the wire. sub accept_answer {   --$pending;   my $buf = "";   $sock->recv($buf, 512) or die "recv: $!";   next unless $buf;   my $ans = Net::DNS::Packet->new(\$buf);   next unless $ans;   foreach my $rr ($ans->answer) {     if ($rr->type eq 'PTR') {       $cache{$rr->name}=['PTR',$rr->ptrdname];     } elsif ($rr->type eq 'CNAME') {       my $cname = $rr->cname;       $cache{$rr->name}=['CNAME',$cname];       my $packet = Net::DNS::Packet->new($cname, "PTR");       $sock->send($packet->data) or die "send: $cname: $!";       ++$pending;     } else {       die "What do I do with a ",$rr->type,"record?\n";     }   } } 

After this processing, your logs are ready for webalizer or whatever log processor you favor. The whole point here is that there are Perl modules for almost anything you can imagine: from math and string manipulation to generating HTML, XML, and MIME mail with attachments to full-blown web servers and web proxies. You will also find modules not only to interface with almost any database, but also to perform SQL queries on flat text files and to write Excel spreadsheets so anyone in the organization can easily generate usage reports. Or the monitor module mentioned earlier in this chapter. You name it, look for it on CPAN first. Perl is an ideal tool to automate and accelerate any of the numerous tasks a sysadmin has and shell scripting can't quite manage.

31.1.2.4. Using Perl to add new disks attached to an HBA controller

This example shows how Perl can provide solutions to many kind of daily tasks. The example detects new external disks (SAN storage) attached to the system without requiring a reboot. Without automated help like this, you could certainly pass a few hours handling the task.

 #!/usr/bin/perl -w use IO::File; sub catch_zap {     my $signame = shift;         our $shucks++;             die "Somebody sent me a SIG$signame!";             }             $shucks = 0;             $SIG{INT} = 'catch_zap';             $SIG{INT} = \&catch_zap;             $SIG{QUIT} = \&catch_zap; print "Disk Probe 1.0 for Emulex HBA's\n"; print "Probing...\n"; my @lun = 0..255; my @id = 0.15; my %seen; my $scsi_path = '/proc/scsi/scsi'; my $host = 0; my $target = 0; my $part = 0; system "echo '/usr/sbin/lpfc/dfc << ! set board $host lip exit !' > /dev/null"; $input = IO::File->new("> $scsi_path")     or die "Couldn't open $scsi_path for writing: $!\n"; for (my $i = 0; $i <= 15; $i++) { foreach (@lun) {                 print $input "scsi add-single-device 1 0 $i $_\n";         } } $input->close( ); print "...done!\n"; 

First, we set the script to work with file handles using the IO::File module previously installed through CPAN:

 use IO::File; 

It's recomended to add error handling code. Just to put all in order in case.

 sub catch_zap {     my $signame = shift;         our $shucks++;             die "Somebody sent me a SIG$signame!";             }             $shucks = 0;             $SIG{INT} = 'catch_zap';             $SIG{INT} = \&catch_zap;             $SIG{QUIT} = \&catch_zap; 

Next, declare some variables:

 my @lun = 0..255; my @id = 0.15; my %seen; my $scsi_path = '/proc/scsi/scsi'; my $host = 0; my $target = 0; my $part = 0; 

You will be able to interact with external programs if needed:

 system "echo '/usr/sbin/lpfc/dfc << ! set board $host lip exit !' > /dev/null"; 

The next lines illustrate some file handling: creating a file, appending output to it, and closing it. Looping constructs (for and foreach) save a lot of time.

 $input = IO::File->new("> $scsi_path")     or die "Couldn't open $scsi_path for writing: $!\n"; for (my $i = 0; $i <= 15; $i++) { foreach (@lun) {                 print $input "scsi add-single-device 1 0 $i $_\n";         } } $input->close( ); 

Perl is very flexible and modular, providing easy methods to create automated system procedures.

31.1.2.5. Perl in adverse environments

Perl scripts (or any software at all) that execute within sudo, that are SUID root, or that execute in environments such as web servers should be extra careful with what they do.

A little careless programming and suddenly the script is a cracker's best friend. An adversary can change a script's behavior in two ways: by changing the execution environment and by changing the input. To help avoid careless mistakes, Perl has a mode of operation that helps you be careful about your execution environment and inputs. If a script starts with a -T option Perl, will do extra bookkeeping to track which data is tainted by the user and which data is not. Thus, this silly little script may look innocent:

 #!/usr/bin/perl -wT use strict; system("date"); 

but when run it reveals:

 $ ./taint.pl Insecure $ENV{PATH} while running with -T switch at ./taint.pl line 5. 

This is because the PATH variable, which is used by the system call, was inherited from the environment of the user. The user may have inserted a malevolent date program earlier in the PATH than the system command date, and suddenly the easy little SUID-root script is used to break root on your machine. The fix for this is to set the PATH yourself: $ENV{PATH}='/usr/bin:/bin';. More generally, you need either to not use tainted data or to untaint it by scrubbing it. The scrubbing procedure should remove any unsafe characters or constructions from the input so that it is in fact safe to pass it on.

For example, if a CGI script takes a date as an argument and then simply passes that to system("cal $date") and if the web user inputs a $date containing 2 2004; rm -rf /, you're out of luck and quite big parts of your web server will be wiped out, if not the whole system. A date for cal should probably be allowed to contain only the characters [0-9] or something like that. If it has other characters, they should be removed at the very least or, perhaps better, be logged as an abuse attempt. Security in Perl scripts is explained extensively in the manpage perlsec.

Advancing Your Perl Skills

After you learn Perl from any of the great number of tutorial texts suited for anyone from programming beginners to senior programmers, a good book to have on the shelf is The Perl Cookbook (O'Reilly). It has hundreds of examples and solutions for Perl programmers to use and learn from.


31.1.3. Synchronizing Files Across Machines

There are many tools to copy files among machines. Among the more pedestrian ones are rcp and scp. Toward the high end is rdist, which is very nice for synchronizing client machines from a master machine. But rsync is the steamroller among these.

rsync is, along with the SSH suite, one of the new utilities that have gained almost universal acceptance in the Unix world. It is used to synchronize files and/or directories over a network connection. It also works locally. The downside of rsync is a quite high startup penalty when you want to synchronize a large amount of files or data. On the other hand, it has a checksum algorithm that helps it uncover how files have changed since the last run and transfer only the difference. Especially over slow lines, this is a boon, but whenever you transfer nontrivial amounts of data, rsync should be faster than a complete copy with any of the other candidates.

rsync has its own client/server protocol. To employ that, you need to set up a server, and we're not going to do that in this book. The alternative is to let rsync tunnel over rsh. This is quite efficient, but it's as insecure as the rsh protocol itself. For the more security-conscious, using ssh is a good alternative. But the SSH protocol encrypts all its traffic, and encryption can be quite slow. If you use the blowfish encryption algorithm instead of the default one, things will be a bit faster. The example in the Perl scripting section is a good example of this:

 $ rsync -avPe 'ssh -c blowfish' \ > www.linux.no:/var/log/apache/www.linux.no-access_log*. 

This form of the command is like rcp except that it supports user@ in front of hostnames as scp does. The most common options to rsync are:


-a, --archive

Equivalent to -rlptgoD. It mimics the -a option of the Linux cp command, attempting to preserve all important attributes of the files and directories copied. The -r means that the command is also recursive, able to copy entire directories with all their subdirectories. The -l copies symlinks as symlinks, -p preserves permissions, -t preserves timestamps, -go preserves file group and owner, and -D copies device files as device files. The -goD options are valid only if you are root on the target system.


-H, --hard-links

If you ever mirror Linux distributions, you probably want this to preserve hard links. There are often quite a lot of them.


-P

Shorthand for --partial and --progress. This keeps partially transferred files so the copy can be restarted later without transferring the whole file again. Progress enables progress reporting. It's more meaningful if it is combined with -v.


-v, --verbose

Increase the amount of information printed while working. Using just one instance of this option is useful to give you a count while rsync prepares its worklist, and then the names of transferred files as it progresses.


-S, --sparse

If you use rsync to make backups of operating system disks, you will find that this option helps your disk usage. Executables and libraries often contain less data than they declare as their size, and this option keeps them that way.


-x, --one-file-system

This is handy for backups, if you want to restrict them to one partition at a time.


-e COMMAND, --rsh=COMMAND

Replace rsh with something else, such as ssh.


--rsync-path=PATH

If the rsync on the opposite side is not in the default PATH of the account you are using on that side, you will need to specify the full path with this option.


--delete

By default, rsync does not delete target-side files that have disappeared from the originating site. --delete makes sure that files that have disappeared from the originating side are also deleted on the target side.


-z, --compress

Compress file data. Good if you have more CPU capacity than bandwidth.

31.1.4. Scheduled execution

Linux offers two systems to schedule program execution: cron and at. Both are present on most systems and considered standard, but they have quite different uses.

31.1.4.1. cron

The way to execute something periodically is through cron. cron is run by a system daemon named crond, and its operation is driven by crontab files. Each user can have a crontab file (unless the system administrator disables user access to cron), and the system provides a number of its own files to do routine, behind-the-scenes maintenance such as cleaning up old logfiles so disks don't become filled.

There are two formats for crontab files in a standard Linux system. One is found in normal users' files, as well as in /etc/crontab and related system crontab files. It goes like this:

 minute hour day-of-month month day-of-week command 

The five first fields are numbers, although the Linux version of cron allows three-letter names in the fourth (month) and fifth (day-of-week) fields. The command field is a shell command.

The other format is found in the files in /etc/cron.d. The format has an additional field before the command fieldthe username of the user the command should run as. The files in the /etc/cron.d directory can have any user's cron commands. But only root should be allowed to put files into that directory.

Additionally you will find on most Linux systems a battery of directories in /etc called cron.hourly, cron.daily, cron.weekly, and cron.monthly. The contents of these are normally run out of /etc/crontab, but the exact details, including when they are run, differ from distribution to distribution and among versions. If you look in these directories, you may also find references to anacron. This is asynchronous companion to cron and is very useful for laptops and other systems that are not on 24/7. It is also outside our scope.

Normal users have their crontabs stored in /var/spool/cron/crontabs. Users invoke the crontab command to view and edit the files there and to include the commands they want to run regularly. The crontab files should not be manipulated any other way. In this directory, root is a normal user, but on Linux systems root almost never has files in /var/spool/cron/crontabs. There are too many other places that are easier for root to use.

As system administrator, it's most often best to use /etc/crontab and drop files into the different directories under /etc. As a user, you can use the crontab command only to replace, list, remove, and edit your crontab. Be sure to use the -l option to list crontabs; otherwise, you may just replace your crontab with an empty one.

Any jobs that monitor Apache or the use of the root account should, of course, be run quite oftenif not every minute, then every five minutes. If you put something like the following in /etc/cron.d/watchapache, you'll run the desired script every five minutes:

 */5 * * * *    root    /usr/local/sbin/monitor-apache 

This can quickly turn into a nuisance, of course. Getting a page, SMS, or email every five minutes can be a bit rough, not to mention expensive. The program should perhaps send an alert only every hour after initially reporting a problem. It can also be restricted to working hours, if the hour field (the second field) is changed to 8-17. Then it will run for only those nine hours of the day.

cron requires hours to be specified in European or "military" time, 023; AM and PM are not allowed. The n/m syntax gives an increment value of m, not a division by m. Thus, 4/5 means starting at the fourth minute, then every five minutes (or hour or day and so on). */5 means starting at the 0th minute. You can also enumerate like this: 0,5,10,15,20,... but that's both boring and error-prone. It is what is required on many other Unix cron systems, though.

If a cron job produces output, the cron daemon catches this and emails the cron-job owner. This also happens if the job sets its exit status to a nonzero value. This is nice for debugging but not for regular use. The subject line of cron mail is generic, so it is hard to determine what is wrong. If, for example, an Apache checking script just prints something instead of mailing it with a good subject line, it will look like all the other cron mails you receive.

One thing regularly goes wrong with cron jobs. The PATH and other environment variables read by crond may be different from those you use with your interactive shell. So executing a well-tested script in cron may well fail. To fix this, you need to set PATH and any other environment variables explicitly in the script. Running printenv inside the script may help you figure this out. Some people use absolute pathnames for commands in crontab files so that it doesn't matter what their PATH is, but this is cumbersome.

One other thing that theoretically goes wrong in a well-tested script but is much rarer and more obscure: signal and session handling in cron are a bit different than in a shell. So if a job uses signals, it may fail in odd ways. There is no easy advice to handle that.


Tip: Linux crontab files can optionally contain variables such as MAILTO. While that is not an objective in this Topic, it's good to know at work.
31.1.4.2. Using at

at is very useful for one-off execution of jobs, mostly the kinds of jobs you want done later but can't be bothered to remember or because you're not going to be there at the time.

To use at, you need first to make sure the atd daemon is running. This is the process that runs all the jobs queued by the at command.

 # /etc/init.d/atd start # pgrep -fl atd 5486 /usr/sbin/atd 

If you've gotten a huge 300-page PDF file that you want to print, but not during hours, so as not to hold up the printer for 2030 minutes when your coworkers want to use it, you can employ at. First, convert the file to PostScript or some other format your print-spooling system likes, then do something like this:

 $ at 23:00 warning: commands will be executed using /bin/shat > lpr DNS-HOWTO.psat > <EOT> job 2 at 2004-01-05 23:00 

Two things of note: as the output says, it will execute the script with /bin/sh. In Linux, that's usually the same as bash. Second, use Ctrl-D (the end-of-file character in your shell) to end the input. This is shown as <EOT> in the shell. A period (.) will not work, as it does for some other utilities. With the returned job number, you can list the job: at -c 2 (be prepared to be amazed at the output) and remove it with atrm 2. To find your job numbers later, use atq. To experiment with at, use at now, which runs your commands at once. at also understands English times with <AM> and PMe.g., 11PM.


Syntax

 at [-V] [-q queue] [-f file] [-mldbv] TIME at -c job [job...] 


Description

ueue, delete, and list at jobs for later execution.


Options


-m

Send mail to the user when the job has completed.


-c

List detailed information about the queued job.


-l

List jobs.


-d

Delete jobs.



LPI Linux Certification in a Nutshell
LPI Linux Certification in a Nutshell (In a Nutshell (OReilly))
ISBN: 0596005288
EAN: 2147483647
Year: 2004
Pages: 257

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net