Section 13.4. Hard Things Done Once


13.4. Hard Things Done Once

When we find ourselves doing something very difficult, automating the task records what we've done. When we do it in the future, it will be easier. This is how we build up our little bag of tricks.

13.4.1. Encapsulating a Difficult Command

Sometimes it takes hours to work out exactly the right command required to do something. For example, there is a program that creates ISO images, the kind you burn onto CD-ROMs. Its manual page describes hundreds of options, but to make an image readable by Windows, Unix, and Mac systems, the command is simply:

     $ mkisofs -D -l -J -r -L -f -P "Author Name" -V "disk label" -copyright       copyright.txt -odisk.iso /directory/of/files

Sure, you can do it from a GUI, but where's the fun (or ability to script) in that?

This command also lets you do things not found in most GUIs, such as the ability to specify a copyright note, author name, and so on.

This is a good example of something to work into a .BAT file (DOS) or a Unix/Linux shell script.

Here's a shell script called makeimage1 that uses this:

     #!/bin/bash     mkisofs -D -l -J -r -L -f -P "Limoncelli" -V 'date -u +%m%d' $*

The 'date -u +%m%d' sets the volume name to the current date.

One of the things that held me back from writing good scripts was that I didn't know how to process command-line parameters. Here are instructions for copying all the command-line arguments into a script.

The $* in the makeimage1 script means "any items on the command line." So, if you typed:

     $ makeimage1 cdrom/

then the $* would be replaced by cdrom/.

Since $* works for multiple arguments, you can also do:

     $ makeimage1 cdrom/dir1/ dir2/

Then the $* would be replaced by all three components. In the case of mkisofs, this would merge all three directories into the CD-ROM image. You can refer to $1, $2, and so on, if you want to refer to specific items on the command line. In this example, $1 would refer to cdrom/, and $2 would refer to dir1/.

Another thing that prevented me from writing good scripts was not knowing how to process command-line flags like scriptname -q file1.txt. Thus, if a script I wanted to write was sophisticated enough to need command-line flags, I would use a different language or not write it at all. It turns out bash has a feature called getopt that does all the parsing for you, but the manual page for Bash isn't clear. It tells you how the getopt function works, but not how to use it. Finally, I found an example of how to use it and have been copying that example time and time again. It isn't important how it works; you don't even have to understand how it works or why it works to use it. You use it like this:

     args='getopt ab: $*'     if [ $? != 0 ]     then             echo "Usage: command [-a] [-b file.txt] file1 file2 ..."             exit -1     fi     set -- $args     for i     do             case "$i"             in                     -a)                         FLAGA=1                         shift                         ;;                     -b)                         ITEMB="$2" ; shift                         shift                         ;;                     --)                         shift; break                 ;;             esac     done

This would be a command that has flags -a and -b. -b is special because it must be followed by an argument such as -b file.txt. It you look at the first line, the getopt command is followed by the letters that can be flags. There is a colon after any letter that requires an additional argument. Later, we see a case statement for each possible argument, with code that either sets a flag or sets a flag and remembers the argument.

What is this $2 business? What's the deal with the )? What does set - mean? And what about Naomi? Those are all things you can look up later. Just follow the template and it all works.

(OK, if you really want to learn why all of that works, I highly recommend reading the Advanced Bash-Scripting Guide at http://www.tldp.org/LDP/abs/html.)

Here's a larger example that adds a couple additional things. First of all, it uses a function "usage" to print out the help message. An interesting thing about this function is that the "echo" lasts multiple lines. Neat, eh? Bash doesn't mind. Second, it makes sure that there are at least MINITEMS items on the command line after the options are processed. Finally, it demonstrates how to process flags that override defaults.

Please steal this code whenever you are turning a simple script into one that takes options and parameters:

     #!/bin/bash     MINITEMS=1     function usage     {         echo "     Usage: $0 [-d] [-a author] [-c file.txt] [-h] dir1 [dir1 ...]         -d        debug, don't actual run command         -a author    name of the author         -c copyright    override default copyright file         -h        this help message     "         exit 1     }     # Set our defaults:     DEBUG=false     DEBUGCMD=     AUTHOR=     COPYRIGHT=copyright.txt     # Process command-line arguments, possibly overriding defaults     args='getopt da:c:h $*'     if [ $? != 0 ]     then         usage     fi     set -- $args     for i     do         case "$i"         in             -h)                 usage                 shift                 ;;             -a)                 AUTHOR="$2"; shift                 shift                 ;;             -c)                 COPYRIGHT="$2"; shift                 shift                 ;;             -d)                 DEBUG=true                 shift                 ;;             --)                 shift; break;;             esac     done     if $DEBUG ; then         echo DEBUG MODE ENABLED.         DEBUGCMD=echo     fi     # Make sure we have the minimum number of items on the command line.     if $DEBUG ; then echo ITEM COUNT = $# ; fi     if [ $# -lt "$MINITEMS" ]; then         usage     fi     # If the first option is special, capture it:     # THEITEM="$1" ; shift     # Clone that line for each item you want to gather.     # Make sure that you adjust the MINITEMS variable to suit your needs.     # If you want to do something with each remaining item, do it here:     #for i in $* ; do     #    echo Looky! Looky!  I got $i     #done     if [ ! -z "$COPYRIGHT" ];     then         if $DEBUG ; then echo Setting copyright to: $COPYRIGHT ; fi         CRFLAG="-copyright $COPYRIGHT"     fi     LABEL='date -u +%Y%m%d'     $DEBUGCMD mkisofs -D -l -J -r -L -f -P "$AUTHOR" -V $LABEL $CRFLAG $*

13.4.2. Building Up a Long Command Line

The best way to learn the Unix/Linux way of stringing commands together into one big pipe is to look over the shoulder of someone as she does it. I'll try to do that here by walking you through the steps I used to create a small utility.

Think Unix (Que) is an excellent book for learning how to link Unix/Linux tools to make bigger commands.


The single most powerful technology introduced by Unix/Linux is the ability to connect commands together like linking garden hoses. If you have one program that takes input and changes everything to uppercase, and another program that sorts the lines of a file, you can chain them together. The result is a command that converts the lines to uppercase and outputs the lines in sorted order. All you have to do is put a pipe symbol (|) between each command. The output of one command is fed into the next command:

     $ catfile | toupper | sort

For those of you unfamiliar with Unix/Linux, cat is the command that outputs a file. toupper is a program I wrote that changes text to uppercase. sort is the program that sorts lines of text. They all fit together quite nicely.

Let's use this to write a more complicated utility. How about a program that will determine which machine on your local network is most likely to be infected with a worm? We'll do it in one very long pipeline.

Sound amazing? Well, what this program will really do is find the hosts most likely to be infectedi.e., generate a list of which hosts require further investigation. However, I assure you that this technique will amaze your coworkers.

It's no replacement for a good malware or virus scanner. However, I picked this example because it is a good demonstration of some rudimentary shell-programming techniques, and you'll learn something about networking, too. When we're done, you'll have a simple tool you can use on your own network to detect this particular problem. I've used this tool to convince management to purchase a real virus scanner.

What's one sign that a machine is infected with some kind of worm? How about a quick test to see which machines are ARPing the most?

Spyware/worms/virii often try to connect to randomly selected machines on your network. When a machine tries to talk to a local IP address for the first time, it sends an ARP packet to find out its Ethernet (MAC) address. On the other hand, normal (uninfected) machines generally talk to a few machines only: the servers they use and their local router. Detecting a machine that is sending considerably more ARP packets than other machines on the network is often a sign that the machine is infected.

Let's build a simple shell pipeline to collect the next 100 ARP packets seen on your network and determine which hosts generated more ARP packets than their peers. It's sort of a "most likely to ARP" award. The last time I did this on a 50-host network, I found 2 machines infested with worms.

These commands should work on any Unix/Linux or Unix-like system. You will need the tcpdump command and root access. The command which tcpdump tells you if you have tcpdump installed. Sniffing packets from your network has privacy concerns. Only do this if you have permission.

Here's the final command that I came up with (sorry to spoil the surprise):

     $ sudo tcpdump -l -n arp | grep 'arp who-has' | head -100 | \     awk '{ print $NF }' |sort | uniq -c | sort -n

The command is too long to fit on one line of this book, so I put a backslash at the end of the first part to continue it across two lines. You don't have to type the backlash, and you shouldn't press Enter in its place.

The output looks like this:

     tcpdump: verbose output suppressed, use -v or -vv for full protocol decode      listening on en0, link-type EN10MB (Ethernet), capture size 96 bytes         1 192.168.1.104         2 192.168.1.231         5 192.168.1.251         7 192.168.1.11         7 192.168.1.148         7 192.168.1.230         8 192.168.1.254        11 192.168.1.56        21 192.168.1.91        30 192.168.1.111      101 packets captured      3079 packets received by filter      0 packets dropped by kernel

Ignore the headers. The middle lines show a count followed by an IP address. During my experiment, host 192.168.1.111 sent 30 ARP packets, while 192.168.104 only sent 1. Most machines rarely ARPed in that time period, but two hosts had four to six times as many ARPs as some of the other machines! Those were my two problem children. A quick scan with some anti-virus software and they were as good as new.

Here's how I built this command line. I started with this command:

     $ sudo tcpdump -l -n arp

sudo means to run the next command as root. It will most likely ask for a password. If you don't use sudo in your environment, you might use something like it, or you can run this entire sequence as root. Just be careful. To err is human; to really screw up, be careless with root.

tcpdump listens to the local Ethernet. The -l flag is required if we're going to pipe the output to another program because, unlike other programs, tcpdump does something special with output buffering so that it runs faster. However, when piping the output, we need it to act more normal. The -n means don't do DNS lookups for each IP address we see. The arp means that we only want tcpdump to display ARP packets.

(If you are concerned about privacy of your network, I'd like to point out some good news. There isn't much private data available to your eyes if, at the sniffing end, you filter out everything besides ARP packets.)

Run the command yourself. In fact, you will learn more if you try each command as you read this. Nothing here deletes any data. Of course, it may be illegal to snoop packets on your network, so be warned. Only do this on a network where you have permission to snoop packets.

When I run the command, the output looks like:

      $ sudo tcpdump -n -l arp      tcpdump: verbose output suppressed, use -v or -vv for full protocol decode      listening on en0, link-type EN10MB (Ethernet), capture size 96 bytes      19:10:48.212755 arp who-has 192.168.1.110 (85:70:48:a0:00:10) tell 192.168.          1.10      19:10:48.743185 arp who-has 192.168.1.96 tell 192.168.1.92      19:10:48.743189 arp reply 192.168.1.2 is-at 00:0e:e7:7a:b2:24 19:10:48.          743198 arp who-has 192.168.1.96 tell 192.168.1.111      ^C

To get the output to stop, I press Ctrl-C. Otherwise, it will run forever.

If you get a permission error, you may not be running the command as root. tcpdump has to be run as root. You wouldn't want just anyone listening to your network, right?

After the header, we see these "arp who-has X tell Y" lines. Y is the host that asked the question. The question was, "Will the host at IP address X please respond so that I know your Ethernet (MAC) address?" The question is sent out as a broadcast, so we should see any ARP requests on our local LAN. However, we won't see many of the answers because they are sent as unicast packets, and we are on a switch. In this case, we see one reply because we're on the same hub as that machine (or maybe that is the machine running the command; I won't tell you which it is). That's OK because we only need to see one side of the question.

That's our data source. Now, let's transform the data into something we can use.

First, let's isolate just the lines of output that we want. In our case, we want the "arp who-has" lines:

     $ sudo tcpdump -l -n arp | egrep 'arp who-has'

We can run that and see that it is doing what we expect. The only problem now is that this command runs forever, waiting for us to stop it by pressing Ctrl-C. We want enough lines to do something useful, and then we'll process it all. So, let's take the first 100 lines of data:

     $ sudo tcpdump -l -n arp | grep 'arp who-has' | head -100

Again, we run this and see that it comes out OK. Of course, I'm impatient and changed the 100 down to 10 when I was testing this. However, that gave me the confidence that it worked and that I could use 100 in the final command. You'll notice that there are a bunch of headers that are output, too. Those go to stderr (directly to the screen) and aren't going into the grep command.

So, now we have 100 lines of the kind of data we want. It's time to calculate the statistic we were looking for. That is, which hosts are generating the most ARP packets? Well, we're going to need to extract each host IP that generated an ARP and count it somehow. Let's start by extracting out the host IP address, which is always the sixth field of each line, so we can use this command to extract that field's data:

      awk '{ print $6 }'

That little bit of awk is a great idiom for extracting a particular column of text from each line.

I should point out that I was too lazy to count which field had the data I wanted. It looked like it was about the fifth word, so I first tried it with $5. That didn't work. So I tried $6. Oh yeah, I need to remember that awk counts starting fields with 1, not 0. The benefit of testing the command line as we build it is that we find these mistakes early on. Imagine if I had written the entire command line and then tried to find this bug?

I'm lazy and I'm impatient. I didn't want to wait for all 100 ARPs to be collected. Therefore, I stored them once and kept reusing the results.

I stored them in a temporary file:

     $ sudo tcpdump -l -n arp | grep 'arp who-has' | head -100 >/tmp/x

Then I ran my awk command against the temp file:

     $ cat /tmp/x | awk '{ print $5 }'     tell     tell     tell     tell     ...

Dang! It isn't the fifth. I'll try the sixth:

     $ cat /tmp/x | awk '{ print $6 }'     192.168.1.110     192.168.1.10     192.168.1.92     ...

Ah, that's better.

Anyway, I then realized I could be lazy in a different way. $NF means "the last field" and saves me from needing to count:

     $ cat /tmp/x | awk '{ print $NF }'     192.168.1.110     192.168.1.10     192.168.1.92     ...

Why isn't it $LF? That would be too easy. No, seriously, the NF means "number of fields." Thus, $NF means the field that is NFth fields in from the left. Whatever. Just remember that in awk you can type $NF when you want the last field on a line.

     $ sudo tcpdump -l -n arp | egrep 'arp who-has' | head -100 | awk '{ print $NF }'

So, now we get output that is a series of IP addresses. Test it and see.

(Really! Test it and see. I'll wait.)

Now, we want to count how many times each IP address appears in our list. There is an idiom that I use all the time for just this purpose:

     sort | uniq -c

This sorts the data, then runs uniq, which usually eliminates duplicates from a sorted list (well, technically it removes any adjacent duplicate lines...sorting the list just assures us that the same ones are all adjacent). The -c flag counts how many repetitions were seen and prepends the number to each line. The output looks like this:

     ...     11 192.168.1.111      7 192.168.1.230     30 192.168.1.254      8 192.168.1.56     21 192.168.1.91     ...

We're almost there! Now we have a count of how many times each host sent an ARP. The last thing we need to do is sort that list so we know who the most talkative hosts were. To do that, we sort the list numerically by adding | sort -n to the end:

     $ sudo tcpdump -l -n arp | egrep 'arp who-has' | head -100 | awk '{  print $NF }' |sort | uniq -c | sort -n

When we run that, we will see the sorted list. It will take a while to run on a network that isn't very busy. On a LAN with 50 computers, this took nearly an hour to run when not a lot of people were around. However, that was after the machine with the spyware was eliminated. Before that, it only took a few minutes to collect 100 ARP packets.

On your home LAN with only one or two machines, this command may take days to run. Hosts are required to cache the ARP info they gather, so after a machine is running for a while, it should be very rare that it outputs an ARP if the only machine it talks to (on the local LAN) is your router.

However, on a network with 100 or so hosts, this will find suspect machines very quickly.

We now have a very simple tool we can use during a worm attack. This doesn't replace a multi-thousand-dollar Intrusion Detection System or a good antivirus/antispyware/antiworm system, but it sure can help you pinpoint a problem when it is happening. Best of all, it's free, and you learned something about shell programming.

If you'd like to hone your shell programming skills, here are some mini projects you can try:

  • tcpdump outputs some informational messages to stderr. Is there a way to stop it from outputting those messages? If not, how could we get cleaner-looking output?

  • Turn this one-line command into a shell script. Put this in your bin directory so you can use it in the future.

  • Take the shell script and expand it so that you can specify which NIC to sniff or other options you find useful.

  • tcpdump can be programmed to only gather ARP "who-has" packets, so you can eliminate the grep command. Learn enough about tcpdump to do this.

  • tcpdump has the ability to replace the functionality of head -100. Learn enough about tcpdump to do this. Is it the exact same thing as head -100? Is it better or worse?

  • awk is a complete programming language. Eliminate the "grep" as well as the "head" arguments using awk. Why do you think I chose to do it in three processes instead of just letting awk do it all?

13.4.3. Using Microsoft Excel to Avoid Writing a GUI

Writing the GUI for an application can be 90 percent of the battle. Here's a lazy way to make the user interface: maintain the data in Microsoft Excel, but write a macro that uploads the data to a server for processing.

Once, I created an entire application this way. We had a web site that listed various special events. I was tired of updating the web page manually, but I knew the secretary wasn't technical enough to maintain the web site herself.

I started planning out a user interface that would let anyone do updates. It was granda big MySQL database with a PHP frontend that would let people log in, do updates, add new events, and so on. The system would then generate the web pages listing the events automatically. It was wonderful on paper, and I'm sure if I'd had 100 years to write the code, it would have been great.

Instead, I realized that only one person would actually be doing updates. Therefore, I gave her access to a spreadsheet that captured all the information that needed to be collected and to a macro that would save the file twice: once on the server as a tab-separated file and again as an XLS file. A process on the server would parse the tab-separated file and generate the web page automatically.

You can see the spreadsheet in Figure 13-2.

Figure 13-2. Event spreadsheet


Making the button takes a few steps.

First, use the macro recorder to do what you want:

  1. Record the macro: Tools Macro Record New Macro.

  2. Perform the actions to save the file as a tab-separated file on the network file server.

  3. Save the file as an MS Excel Workbook (.xls) in your file area.

    It is important that the last place you save the file is the richest format (Workbook) because this choice sets the default save format. If someone saves the file using File Save, you want it to default to this format.

    1. View the Forms toolbar: View Toolbars Forms.

    2. Draw a button where you want it to appear in the spreadsheet.

    3. When asked, select the macro created earlier.

    4. If you need to edit the button later, Ctrl-click it.

    Now, test this by clicking the button. Voilà! It works! Check the dates on the files to make sure that the file really got saved twice. (Yes, it may ask you twice whether it's OK to replace the file. Click Yes.)

    If you want to clean up the macro a bit, that's easy, too. In fact, one of the first things I did was edit exactly where the file gets saved:

    1. Go to the macro editor: Tools Macro Macros.

    2. Save and exit when you are done.

    In Microsoft macros, the line-continuation symbol is the underbar (_).


    The final macro looks like this:

         Sub Save(  )     '     ' Macro recorded 5/22/2005 by Thomas Limoncelli     '         ActiveWorkbook.SaveAs Filename:= _             "Y:\calendar\EventList.txt", FileFormat:= _             xlText, CreateBackup:=False         ActiveWorkbook.SaveAs Filename:= _             "Y:\calendar\EventList.xls", FileFormat:= _             xlNormal, Password:="", WriteResPassword:="", _             ReadOnlyRecommended:=False _             , CreateBackup:=False     End Sub

    Now that I have the tab-separated version being stored on a file server, it was easy to create a script that could pick up that file, extract out the useful lines, and generate the web page from it.

    I have since used this technique in many situations in which I didn't want to have to write a user interface and the user already had MS Excel.




Time Management for System Administrators
Time Management for System Administrators
ISBN: 0596007833
EAN: 2147483647
Year: 2003
Pages: 117

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net