9.1. Basic Debugging Aids

< Day Day Up >

What sort of functionality do you need to debug a program? At the most empirical level, you need a way of determining what is causing your program to behave badly, and where the problem is in the code. You usually start with an obvious what (such as an error message, inappropriate output, infinite loop, etc.), try to work backwards until you find a what that is closer to the actual problem (e.g., a variable with a bad value, a bad option to a command), and eventually arrive at the exact where in your program. Then you can worry about how to fix it.

Notice that these steps represent a process of starting with obvious information and ending up with often obscure facts gleaned through deduction and intuition. Debugging aids make it easier to deduce and intuit by providing relevant information easily or even automatically, preferably without modifying your code.

The simplest debugging aid (for any language) is the output statement, echo, in the shell's case. Indeed, old-time programmers debugged their FORTRAN code by inserting WRITE cards into their decks. You can debug by putting lots of echo statements in your code (and removing them later), but you will have to spend lots of time narrowing down not only what exact information you want but also where you need to see it. You will also probably have to wade through lots and lots of output to find the information you really want.

9.1.1. Set Options

Luckily, the shell has a few basic features that give you debugging functionality beyond that of echo. The most basic of these are options to the set -o command (as covered in Chapter 3). These options can also be used on the command line when running a script, as Table 9-1 shows.

Table 9-1. Debugging options
set -o option	Command-line option	Action
noexec	-n	Don't run commands; check for syntax errors only
verbose	-v	Echo commands before running them
xtrace	-x	Echo commands after command-line processing

The verbose option simply echoes (to standard error) whatever input the shell gets. It is useful for finding the exact point at which a script is bombing. For example, assume your script looks like this:

alice hatter march teatime treacle well

None of these commands is a standard UNIX program, and each does its work silently. Say the script crashes with a cryptic message like "segmentation violation." This tells you nothing about which command caused the error. If you type bash -v scriptname, you might see this:

alice hatter march segmentation violation teatime treacle well

Now you know that march is the probable culprit though it is also possible that march bombed because of something it expected alice or hatter to do (e.g., create an input file) that they did incorrectly.

The xtrace option is more powerful: it echoes command lines after they have been through parameter substitution, command substitution, and the other steps of command-line processing (as listed in Chapter 7). For example:

.ps 8 $ set -o xtrace$ alice=girl+ alice=girl $ echo "$alice"+ echo girl girl $ ls -l $(type -path vi)++ type -path vi + ls -F -l /usr/bin/vi lrwxrwxrwx   1 root     root      5 Jul 26 20:59 /usr/bin/vi -> elvis* $

As you can see, xtrace starts each line it prints with + (each + representing a level of expansion). This is actually customizable: it's the value of the built-in shell variable PS4. So if you set PS4 to "xtrace >" (e.g., in your .bash_profile or .bashrc), then you'll get xtrace listings that look like this:

.ps 8 $ ls -l $(type -path vi)xxtrace--> type -path vi xtrace--> ls -l /usr/bin/vi lrwxrwxrwx   1 root     root      5 Jul 26 20:59 /usr/bin/vi -> elvis* $

Notice that for multiple levels of expansion, only the first character of PS4 is printed. This makes the output more readable.

An even better way of customizing PS4 is to use a built-in variable we haven't seen yet: LINENO, which holds the number of the currently running line in a shell script.^[2] Put this line in your .bash_profile or environment file:

^[2] In versions of bash prior to 2.0, LINENO won't give you the current line in a function. LINENO, instead, gives an approximation of the number of simple commands executed so far in the current function.

PS4='line $LINENO: '

We use the same technique as we did with PS1 in Chapter 3: using single quotes to postpone the evaluation of the string until each time the shell prints the prompt. This will print messages of the form line N: in your trace output. You could even include the name of the shell script you're debugging in this prompt by using the positional parameter $0:

PS4='$0 line $LINENO: '

As another example, say you are trying to track down a bug in a script called alice that contains this code:

dbfmq=$1.fmq ... fndrs=$(cut -f3 -d' ' $dfbmq)

You type alice teatime to run it in the normal way, and it hangs. Then you type bash -x alice teatime, and you see this:

+ dbfmq=teatime.fmq ... + + cut -f3 -d

It hangs again at this point. You notice that cut doesn't have a filename argument, which means that there must be something wrong with the variable dbfmq. But it has executed the assignment statement dbfmq=teatime.fmq properly... ah-hah! You made a typo in the variable name inside the command substitution construct.^[3] You fix it, and the script works properly.

^[3] We should admit that if you had turned on the nounset option at the top of this script, the shell would have flagged this error.

The last option is noexec, which reads in the shell script and checks for syntax errors, but doesn't execute anything. It's worth using if your script is syntactically complex (lots of loops, command blocks, string operators, etc.) and the bug has side effects (like creating a large file or hanging up the system).

You can turn on these options with set -o option in your shell scripts, and, as explained in Chapter 3, turn them off with set +o option. For example, if you're debugging a chunk of code, you can precede it with set -o xtrace to print out the executed commands, and end the chunk with set +o xtrace.

Note, however, that once you have turned noexec on, you won't be able to turn it off; a set +o noexec will never be executed.

9.1.2. Fake Signals

Fake signals are more sophisticated set of debugging aids. They can be used in trap statements to get the shell to act under certain conditions. Recall from the previous chapter that trap allows you to install some code that runs when a particular signal is sent to your script.

Fake signals work in the same way, but they are generated by the shell itself, as opposed to the other signals which are generated externally. They represent runtime events that are likely to be of interest to debuggers both human ones and software tools and can be treated just like real signals within shell scripts. Table 9-2 lists the four fake signals available in bash.

Table 9-2. Fake signals
Fake signal	Sent when
EXIT	The shell exits from script
ERR	A command returning a non-zero exit status
DEBUG	The shell has executed a statement^[4]
RETURN	A shell function or a script executed with the . or source builtins finishes executing^[5]

^[4] The DEBUG signal is not available in bash versions prior to 2.0.

^[5] The RETURN signal is not available in bash versions prior to 3.0.

9.1.2.1 EXIT

The EXIT trap, when set, will run its code whenever the script within which it was set exits.^[6]

^[6] You can use this signal only for the exiting of a script. Functions don't generate the EXIT signal, as they are part of the current shell invocation.

Here's a simple example:

trap 'echo exiting from the script' EXIT echo 'start of the script'

If you run this script, you will see this output:

start of the script exiting from the script

In other words, the script starts by setting the trap for its own exit, then prints a message. The script then exits, which causes the shell to generate the signal EXIT, which in turn runs the code echo exiting from the script.

An EXIT trap occurs no matter how the script exits whether normally (by finishing the last statement), by an explicit exit or return statement, or by receiving a "real" signal such as INT or TERM. Consider this inane number-guessing program:

trap 'echo Thank you for playing!' EXIT       magicnum=$(($RANDOM%10+1)) echo 'Guess a number between 1 and 10:' while read -p 'Guess: ' guess ; do     sleep 4     if [ "$guess" = $magicnum ]; then         echo 'Right!'         exit     fi     echo 'Wrong!' done

This program picks a number between 1 and 10 by getting a random number (the built-in variable RANDOM), extracting the last digit (the remainder when divided by 10), and adding 1. Then it prompts you for a guess, and after 4 seconds, it will tell you if you guessed right.

If you did, the program will exit with the message, "Thank you for playing!", i.e., it will run the EXIT trap code. If you were wrong, it will prompt you again and repeat the process until you get it right. If you get bored with this little game and hit CTRL-C or CTRL-D while waiting for it to tell you whether you were right, you will also see the message.

The EXIT trap is especially useful when you want to print out the values of variables at the point that your script exits. For example, by printing the value of loop counter variables, you can find the most appropriate places in a complicated script, with many nested for loops, to enable xtrace or place debug output.

9.1.2.2 ERR

The fake signal ERR enables you to run code whenever a command in the surrounding script or function exits with non-zero status. Trap code for ERR can take advantage of the built-in variable ?, which holds the exit status of the previous command. It survives the trap and is accessible at the beginning of the trap-handling code.

A simple but effective use of this is to put the following code into a script you want to debug:

function errtrap {     es=$?     echo "ERROR: Command exited with status $es." } trap errtrap ERR

The first line saves the nonzero exit status in the local variable es.

For example, if the shell can't find a command, it returns status 127. If you put the code in a script with a line of gibberish (like "nhbdeuje"), the shell responds with:

scriptname: line N: nhbdeuje:  command not found ERROR: command exited with status 127.

N is the number of the line in the script that contains the bad command. In this case, the shell prints the line number as part of its own error-reporting mechanism, since the error was a command that the shell could not find. But if the nonzero exit status comes from another program, the shell doesn't report the line number. For example:

function errtrap {     es=$?     echo "ERROR: Command exited with status $es." } trap errtrap ERR function bad {     return 17 } bad

This only prints ERROR: Command exited with status 17.

It would obviously be an improvement to include the line number in this error message. The built-in variable LINENO exists, but if you use it inside a function, it evaluates to the line number in the function, not in the overall file. In other words, if you used $LINENO in the echo statement in the errtrap routine, it would always evaluate to 2.

To get around this problem, we simply pass $LINENO as an argument to the trap handler, surrounding it in single quotes so that it doesn't get evaluated until the fake signal actually comes in:

function errtrap {     es=$?     echo "ERROR line $1: Command exited with status $es." } trap 'errtrap $LINENO' ERR ...

If you use this with the above example, the result is the message, ERROR line 12: Command exited with status 17. This is much more useful. We'll see a variation on this technique shortly.

This simple code is actually not a bad all-purpose debugging mechanism. It takes into account that a nonzero exit status does not necessarily indicate an undesirable condition or event: remember that every control construct with a conditional (if, while, etc.) uses a nonzero exit status to mean "false." Accordingly, the shell doesn't generate ERR traps when statements or expressions in the "condition" parts of control structures produce nonzero exit statuses. Also, an ERR trap is not inherited by shell functions, command substitutions, and commands executed in a subshell. However this inheritance behaviour can be turned on by using set -o errtrace (or set -E).^[7]

^[7] Inheritance of the ERR trap is not available in versions of bash prior to 3.0.

One disadvantage is that exit statuses are not as uniform (or even as meaningful) as they should be, as we explained in Chapter 5. A particular exit status need not say anything about the nature of the error or even that there was an error.

9.1.2.3 DEBUG

Another fake signal, DEBUG, causes the trap code to be executed before every statement in a function or script.^[8] This has two main uses. First is the use for humans, as a sort of "brute force" method of tracking a certain element of a program's state that you notice has gone awry.

^[8] Warning: the DEBUG trap was run after statements in versions of bash prior to 2.05b. The debugger in this chapter has been written for the current version of bash where the trap is run before each statement.

For example, you notice the value of a particular variable is running amok. The naive approach is to put in a lot of echo statements to check the variable's value at several points. The DEBUG trap makes this easier by letting you do this:

function dbgtrap {     echo "badvar  is  $badvar " }       trap dbgtrap DEBUG ...section of code in which the problem occurs...  trap - DEBUG    # turn off the DEBUG trap

This code will print the value of the wayward variable before every statement between the two traps.

One important point to remember when using DEBUG is that it is not inherited by functions called from the shell in which it is set. In other words, if your shell sets a DEBUG trap and then calls a function, the statements within the function will not execute the trap. There are three ways around this. Firstly you can set a trap for DEBUG explicitly within the function. Alternately you can declare the function with the -t option which turns on debug inheritance in functions and allows a function to inherit a DEBUG trap from the caller. Lastly you can use set -o functrace (or set -T) which does the same thing as declare but applies to all functions.^[9]

^[9] Inheritance of the DEBUG trap, declare -t, set -o functrace, and set -T are not available in bash prior to version 3.0.

The second use of the DEBUG signal is as a primitive for implementing a bash debugger. We'll look at doing just that shortly.

9.1.2.4 RETURN

A RETURN trap is executed each time a shell function or a script executed with the . or source commands finishes executing.

As with DEBUG, the RETURN trap is not inherited by functions. You again have the options of setting the trap for RETURN within the function, declare the function with the -t option so that that function inherits the trap, or use set -o functrace to turn on the inheritance for all functions.

Here is a simple example of a RETURN trap:

function returntrap {     echo "A return occurred" } trap returntrap RETURN function hello {     echo "hello world" } hello

When the script is executed it executes the hello function and then runs the trap:

$ ./returndemo hello world A return occurred $

Notice that it didn't trap when the script itself finished. The trap would only have run at the end of the script if we'd sourced the script. Normally, to trap at the exiting of the script we'd also need to define a trap for the EXIT signal that we looked at earlier.

In addition to these fake signals, bash 3.0 added some other features to help with writing a full-scale debugger for bash. The first of these is the extdebug option to the shopt command, which switches on certain things that are useful for a debugger. These include:

The -F option to declare displays the source filename and line number corresponding to each function name supplied as an argument.
If the command that is run by the DEBUG trap returns a non-zero value, the next command is skipped and not executed.
If the command run by the DEBUG trap returns a value of 2, and the shell is executing in a subroutine (a shell function or a shell script executed by the . or source commands), a call to return is simulated.

The shell also has a new option, debugger, which switches on both the extdebug and functrace functionality.

9.1.3. Debugging Variables

Bash 3.0 added some useful environment variables to aid in writing a debugger. These include BASH_SOURCE, which contains an array of filenames that correspond to what is currently executing; BASH_LINENO, which is an array of line numbers that correspond to function calls that have been made; BASH_ARGC and BASH_ARGV array variables, the first holding the number of parameters in each frame and the second the parameters themselves.

We'll now look at writing a debugger, although we'll keep things simple and avoid using these variables. This also means the debugger will work with earlier versions of bash.