Section 6.6. Error-Checking Strategies | Python in a Nutshell, Second Edition (In a Nutshell)

6.6. Error-Checking Strategies

Most programming languages that support exceptions are geared to raise exceptions only in rare cases. Python's emphasis is different. In Python, exceptions are considered appropriate whenever they make a program simpler and more robust, even if that means that exceptions are raised rather frequently.

6.6.1. LBYL Versus EAFP

A common idiom in other languages, sometimes known as "look before you leap" (LBYL), is to check in advance, before attempting an operation, for all circumstances that might make the operation invalid. This approach is not ideal for several reasons:

The checks may diminish the readability and clarity of the common, mainstream cases where everything is okay.
The work needed for checking may duplicate a substantial part of the work done in the operation itself.
The programmer might easily err by omitting some needed check.
The situation might change between the moment the checks are performed and the moment the operation is attempted.

The preferred idiom in Python is generally to attempt the operation in a try clause and handle the exceptions that may result in except clauses. This idiom is known as "it's easier to ask forgiveness than permission" (EAFP), a motto widely credited to Admiral Grace Murray Hopper, co-inventor of COBOL, and that shares none of the defects of LBYL. Here is a function written using the LBYL idiom:

 def safe_divide_1(x, y):     if y==0:         print "Divide-by-0 attempt detected"         return None     else:         return x/y

With LBYL, the checks come first, and the mainstream case is somewhat hidden at the end of the function. Here is the equivalent function written using the EAFP idiom:

 def safe_divide_2(x, y):     try:         return x/y     except ZeroDivisionError:         print "Divide-by-0 attempt detected"         return None

With EAFP, the mainstream case is up front in a TRy clause, and the anomalies are handled in an except clause.

6.6.1.1. Proper usage of EAFP

EAFP is most often the preferable error-handling strategy, but it is not a panacea. In particular, you must be careful not to cast too wide a net, catching errors that you did not expect and therefore did not mean to catch. The following is a typical case of such a risk (built-in function getattr is covered in getattr on page 162):

 def trycalling(obj, attrib, default, *args, **kwds):     try: return getattr(obj, attrib)(*args, **kwds)     except AttributeError: return default

The intention of function TRycalling is to try calling a method named attrib on object obj, but to return default if obj has no method thus named. However, the function as coded does not do just that: it also mistakenly hides any error case where AttributeError is raised inside the implementation of the sought-after method, silently returning default in those cases. This may easily hide bugs in other code. To do exactly what is intended, the function must take a little bit more care:

 def trycalling(obj, attrib, default, *args, **kwds):     try: method = getattr(obj, attrib)     except AttributeError: return default     else: return method(*args, **kwds)

This implementation of TRycalling separates the getattr call, placed in the try clause and therefore watched over by the handler in the except clause, from the call of the method, placed in the else clause and therefore free to propagate any exceptions it may need to. Using EAFP in the most effective way involves frequent use of the else clause on try/except statements.

6.6.2. Handling Errors in Large Programs

In large programs, it is especially easy to err by making your try/except statements too wide, particularly once you have convinced yourself of the power of EAFP as a general error-checking strategy. A try/except combination is too wide when it catches too many different errors, or an error that can occur in too many different places. The latter is a problem if you need to distinguish exactly what happened and where, and the information in the traceback is not sufficient to pinpoint such details (or you discard some or all of the information in the traceback). For effective error handling, you have to keep a clear distinction between errors and anomalies that you expect (and thus know exactly how to handle) and unexpected errors and anomalies that indicate a bug in your program.

Some errors and anomalies are not really erroneous, and perhaps not even all that anomalous: they are just special cases, perhaps rare but nevertheless quite expected, which you choose to handle via EAFP rather than via LBYL to avoid LBYL's many intrinsic defects. In such cases, you should just handle the anomaly, usually without even logging or reporting it. Be very careful, under these circumstances, to keep the relevant try/except constructs as narrow as feasible. Use a small try clause that contains a small amount of code that doesn't call too many other functions and very specific exception-class tuples in the except clauses.

Errors and anomalies that depend on user input or other external conditions not under your control are always expected, to some extent, precisely because you have no control over their underlying causes. In such cases, you should concentrate your effort on handling the anomaly gracefully, normally reporting and logging its exact nature and details, and generally keep your program running with undamaged internal and persistent state. The breadth of TRy/except clauses under such circumstances should also be reasonably narrow, although this is not quite as crucial as when you use EAFP to structure your handling of not-really-erroneous special cases.

Lastly, entirely unexpected errors and anomalies indicate bugs in your program's design or coding. In most cases, the best strategy regarding such errors is to avoid try/except and just let the program terminate with error and traceback messages. (You might want to log such information and/or display it more suitably with an application-specific hook in sys.excepthook, as we'll discuss shortly.) In the unlikely case that your program must keep running at all costs, even under the direst circumstances, try/except statements that are quite wide may be appropriate, with the try clause guarding function calls that exercise vast swaths of program functionality and broad except clauses.

In the case of a long-running program, make sure that all details of the anomaly or error are logged to some persistent place for later study (and that some indication of the problem gets displayed, too, so that you know such later study is necessary). The key is making sure that the program's persistent state can be reverted to some undamaged, internally consistent point. The techniques that enable long-running programs to survive some of their own bugs are known as checkpointing and transactional behavior, but I do not cover them further in this book.

6.6.3. Logging Errors

When Python propagates an exception all the way to the top of the stack without finding an applicable handler, the interpreter normally prints an error traceback to the standard error stream of the process (sys.stderr) before terminating the program. You can rebind sys.stderr to any file-like object usable for output in order to divert this information to a destination more suitable for your purposes.

When you want to change the amount and kind of information output on such occasions, rebinding sys.stderr is not sufficient. In such cases, you can assign your own function to sys.excepthook, and Python will call it before terminating the program due to an unhandled exception. In your exception-reporting function, you can output whatever information you think will help you diagnose and debug the problem and direct that information to whatever destinations you please. For example, you might use module traceback (covered in "The traceback Module" on page 466) to help you format stack traces. When your exception-reporting function terminates, so does your program.

6.6.3.1. The logging module

The Python standard library offers the rich and powerful logging package to let you organize the logging of messages from your applications in systematic and flexible ways. You might organize a whole hierarchy of Logger classes and subclasses, coupled with instances of Handler (and subclasses thereof), possibly with instances of class Filter inserted to fine-tune criteria determining what messages get logged in which ways, and the messages that do get emitted are formatted by instances of the Formatter classindeed, the messages themselves are instances of the LogRecord class. The logging package even includes a dynamic configuration facility, whereby logging-configuration files may be dynamically set by reading them from on-disk files or even by receiving them on a dedicated socket in a specialized thread.

While the logging package sports a frighteningly complex and powerful architecture, suitable for implementing highly sophisticated logging strategies and policies that may be needed in vast and complicated programming systems, in many applications you may get away with using a tiny subset of the package through some simple functions supplied by the logging module itself. First of all, import logging. Then, emit your message by passing it as a string to any of the functions debug, info, warning, error, or critical, in increasing order of severity. If the string you pass contains format specifiers such as %s (as covered in "String Formatting" on page 193) then after the string, you must pass as further arguments all the values to be formatted in that string. For example, don't call:

 logging.debug('foo is %r' % foo)

which performs the formatting operation whether it's needed or not; rather, call:

 logging.debug('foo is %r', foo)

which performs formatting if and only if that's necessary (i.e., if and only if calling debug is going to result in logging output, depending on the current threshold level).

By default, the threshold level is WARNING, meaning that any of the functions warning, error, or critical results in logging output, but the functions debug and info don't. To change the threshold level at any time, call logging.getLogger( ).setLevel, passing as the only argument one of the corresponding constants supplied by module logging: DEBUG, INFO, WARNING, ERROR, or CRITICAL. For example, once you call:

 logging.getLogger( ).setLevel(logging.DEBUG)

all of the functions from debug to critical will result in logging output until you change level again; if later you call:

 logging.getLogger( ).setLevel(logging.ERROR)

then only the functions error and critical will result in logging output (debug, info, and warning will not result in logging output); this condition, too, will persist only until you change level again, and so forth.

By default, logging output is to your process's standard error stream (sys.stderr, as covered in stdin, stdout, stderr on page 171) and uses a rather simplistic format (for example, it does not include a timestamp on each line it outputs). You can control these settings by instantiating an appropriate handler instance, with a suitable formatter instance, and creating and setting a new logger instance to hold it. In the simple, common case in which you just want to set these logging parameters once and for all, after which they persist throughout the run of your program, the simplest approach is to call the logging.basicConfig function, which lets you set up things quite simply via named parameters. Only the very first call to logging.basicConfig has any effect, and only if you call it before any of the functions debug, info, and so on. Therefore, the most reasonable use is to call logging.basicConfig at the very start of your program. For example, a common idiom at the start of a program is something like:

 import logging logging.basicConfig(format='%(asctime)s %(levelname)8s %(message)s',                     filename='/tmp/logfile.txt', filemode='w')

This setting emits all logging messages to a file and formats them nicely with a precise human-readable timestamp, followed by the severity level right-aligned in an eight-character field, followed by the message proper.

For excruciatingly large amounts of detailed information on the logging package and all the wonders you can perform with it, be sure to consult Python's rich online information about it at http://docs.python.org/lib/module-logging.html.

6.6.4. The assert Statement

The assert statement allows you to introduce debugging code into a program. assert is a simple statement with the following syntax:

 assert condition[,expression]

When you run Python with the optimize flag (-O, as covered in "Command-Line Syntax and Options" on page 23), assert is a null operation: the compiler generates no code for it. Otherwise, assert evaluates condition. If condition is satisfied, assert does nothing. If condition is not satisfied, assert instantiates AssertionError with expression as the argument (or without arguments, if there is no expression) and raises the resulting instance.

assert statements are an effective way to document your program. When you want to state that a significant condition C is known to hold at a certain point in a program's execution, assert C is better than a comment that just states C. The advantage of assert is that, when the condition does not in fact hold, assert immediately alerts you to the problem by raising AssertionError.

6.6.4.1. The _ _debug_ _ built-in variable

When you run Python without option -O, the _ _debug_ _ built-in variable is true. When you run Python with option -O, _ _debug_ _ is False. Also, with option -O, the compiler generates no code for any if statement whose condition is _ _debug_ _.

To exploit this optimization, surround the definitions of functions that you call only in assert statements with if _ _debug_ _:. This technique makes compiled code smaller and faster when Python is run with -O, and enhances program clarity by showing that those functions exist only to perform sanity checks.