Section 3.2. System Scripting Overview

3.2. System Scripting Overview

We will take a quick tour through the standard library sys and os modules in the first few sections of this chapter before moving on to larger system programming concepts. As you can tell from the length of their attribute lists, both of these are large modules (their content may vary slightly per Python version and platform):

 >>> import sys, os >>> len(dir(sys))          # 56 attributes 56 >>> len(dir(os))           # 118 on Windows, more on Unix 118 >>> len(dir(os.path))      # a nested module within os 43

As I'm not going to demonstrate every item in every built-in module, the first thing I want to do is show you how to get more details on your own. Officially, this task also serves as an excuse for introducing a few core system scripting concepts; along the way, we'll code a first script to format documentation.

3.2.1. Python System Modules

Most system-level interfaces in Python are shipped in just two modules: sys and os. That's somewhat oversimplified; other standard modules belong to this domain too. Among them are the following:

glob: For filename expansion
socket: For network connections and Inter-Process Communication (IPC)
thread and queue: For concurrent threads
time: For accessing system time details
fcntl: For low-level file control

In addition, some built-in functions are actually system interfaces as well (e.g., open). But sys and os together form the core of Python's system tools arsenal.

In principle at least, sys exports components related to the Python interpreter itself (e.g., the module search path), and os contains variables and functions that map to the operating system on which Python is run. In practice, this distinction may not always seem clear-cut (e.g., the standard input and output streams show up in sys, but they are arguably tied to operating system paradigms). The good news is that you'll soon use the tools in these modules so often that their locations will be permanently stamped on your memory.^[*]

^[*] They may also work their way into your subconscious. Python newcomers sometimes appear on Internet discussion forums to discuss their experiences "dreaming in Python" for the first time.

The os module also attempts to provide a portable programming interface to the underlying operating system; its functions may be implemented differently on different platforms, but to Python scripts, they look the same everywhere. In addition, the os module exports a nested submodule, os.path, which provides a portable interface to file and directory processing tools.

3.2.2. Module Documentation Sources

As you can probably deduce from the preceding paragraphs, learning to write system scripts in Python is mostly a matter of learning about Python's system modules. Luckily, there are a variety of information sources to make this task easierfrom module attributes to published references and books.

For instance, if you want to know everything that a built-in module exports, you can read its library manual entry, study its source code (Python is open source software, after all), or fetch its attribute list and documentation string interactively. Let's import sys and see what it has:

 C:\...\PP3E\System> python >>> import sys >>> dir(sys) ['_ _displayhook_ _', '_ _doc_ _', '_ _excepthook_ _', '_ _name_ _', '_ _stderr_ _', '_ _stdin_ _', '_ _stdout_ _', '_getframe', 'api_version', 'argv', 'builtin_module_names', 'byteorder', 'call_tracing', 'callstats', 'copyright', 'displayhook', 'dllhandle', 'exc_clear', 'exc_info', 'exc_traceback', 'exc_type', 'exc_value', 'excepthook', 'exec_prefix', 'executable', 'exit', 'exitfunc', 'getcheckinterval', 'getdefaultencoding', 'getfilesystemencoding', 'getrecursionlimit', 'getrefcount', 'getwindowsversion', 'hexversion', 'maxint', 'maxunicode', 'meta_path', 'modules', 'path', 'path_hooks', 'path_importer_cache', 'platform', 'prefix', 'ps1', 'ps2', 'setcheckinterval', 'setprofile', 'setrecursionlimit', 'settrace', 'stderr', 'stdin', 'stdout', 'version', 'version_info', 'warnoptions', 'winver']

The dir function simply returns a list containing the string names of all the attributes in any object with attributes; it's a handy memory jogger for modules at the interactive prompt. For example, we know there is something called sys.version, because the name version came back in the dir result. If that's not enough, we can always consult the _ _doc_ _ string of built-in modules:

 >>> sys._ _doc_ _  "This module provides access to some objects used or maintained by the\ninterpreter and to functions that interact strongly with the interpreter.\n\nDynamic objects:\n\nargv -- command line arguments; argv[0] is the script pathname if  known\npath -- module search path; path[0] is the script directory, else ''\nmodules ...  ...lots of text deleted here... ... "

3.2.3. Paging Documentation Strings

The _ _doc_ _ built-in attribute usually contains a string of documentation, but it may look a bit weird when displayed this wayit's one long string with embedded end-line characters that print as \n, not as a nice list of lines. To format these strings for a more humane display, you can simply use a print statement:

 >>> print sys._ _doc_ _  This module provides access to some objects used or maintained by the interpreter and to functions that interact strongly with the interpreter. Dynamic objects: argv -- command line arguments; argv[0] is the script pathname if known ...  ...lots of lines deleted here... ...

The print statement, unlike interactive displays, interprets end-line characters correctly. Unfortunately, print doesn't, by itself, do anything about scrolling or paging and so can still be unwieldy on some platforms. Tools such as the built-in help function can do better:

 >>> help(sys)  Help on built-in module sys: NAME     sys FILE     (built-in) MODULE DOCS     http://www.python.org/doc/current/lib/module-sys.html DESCRIPTION     This module provides access to some objects used or maintained by the     interpreter and to functions that interact strongly with the interpreter.     Dynamic objects:     argv -- command line arguments; argv[0] is the script pathname if known ...  ...lots of lines deleted here... ...

The help function is one interface provided by the PyDoc systemcode that ships with Python and renders documentation (documentation strings, as well as structural details) related to an object in a formatted way. The format is either like a Unix manpage, which we get for help, or an HTML page, which is more grandiose. It's a handy way to get basic information when working interactively, and it's a last resort before falling back on manuals and books. It is also fairly fixed in the way it displays information; although it attempts to page the display in some contexts, its page size isn't quite right on some of the machines I use. When I want more control over the way help text is printed, I usually use a utility script of my own, like the one in Example 3-1.

Example 3-1. PP3E\System\more.py

 ######################################################### # split and interactively page a string or file of text; ######################################################### def more(text, numlines=15):     lines = text.split('\n')     while lines:         chunk = lines[:numlines]         lines = lines[numlines:]         for line in chunk: print line         if lines and raw_input('More?') not in ['y', 'Y']: break if _ _name_ _ == '_ _main_ _':     import sys                              # when run, not imported     more(open(sys.argv[1]).read( ), 10)     # page contents of file on cmdline

The meat of this file is its more function, and if you know any Python at all, it should be fairly straightforward. It simply splits up a string around end-line characters, and then slices off and displays a few lines at a time (15 by default) to avoid scrolling off the screen. A slice expression, lines[:15], gets the first 15 items in a list, and lines[15:] gets the rest; to show a different number of lines each time, pass a number to the numlines argument (e.g., the last line in Example 3-1 passes 10 to the numlines argument of the more function).

The split string object method call that this script employs returns a list of substrings (e.g., ["line", "line",...]). In recent Python releases, a new splitlines method does similar work:

 >>> line = 'aaa\nbbb\nccc\n' >>> line.split('\n') ['aaa', 'bbb', 'ccc', ''] >>> line.splitlines( ) ['aaa', 'bbb', 'ccc']

As we'll see in the next chapter, the end-of-line character is always \n (which stands for a byte having a binary value of 10) within a Python script, no matter what platform it is run upon. (If you don't already know why this matters, DOS \r characters are dropped when read.)

3.2.4. Introducing String Methods

Now, this is a simple Python program, but it already brings up three important topics that merit quick detours here: it uses string methods, reads from a file, and is set up to be run or imported. Python string methods are not a system-related tool per se, but they see action in most Python programs. In fact, they are going to show up throughout this chapter as well as those that follow, so here is a quick review of some of the more useful tools in this set. String methods include calls for searching and replacing:

 >>> str = 'xxxSPAMxxx' >>> str.find('SPAM')                             # return first offset 3 >>> str = 'xxaaxxaa' >>> str.replace('aa', 'SPAM')                    # global replacement 'xxSPAMxxSPAM' >>> str = '\t  Ni\n' >>> str.strip( )                                  # remove whitespace 'Ni'

The find call returns the offset of the first occurrence of a substring, and replace does global search and replacement. Like all string operations, replace returns a new string instead of changing its subject in-place (recall that strings are immutable). With these methods, substrings are just strings; in Chapter 21, we'll also meet a module called re that allows regular expression patterns to show up in searches and replacements.

String methods also provide functions that are useful for things such as case conversions, and a standard library module named string defines some useful preset variables, among other things:

 >>> str = 'SHRUBBERY' >>> str.lower( )                            # case converters 'shrubbery' >>> str.isalpha( )                          # content tests True >>> str.isdigit( ) False >>> import string                          # case constants >>> string.lowercase 'abcdefghijklmnopqrstuvwxyz'

There are also methods for splitting up strings around a substring delimiter and putting them back together with a substring in between. We'll explore these tools later in this book, but as an introduction, here they are at work:

 >>> str = 'aaa,bbb,ccc' >>> str.split(',')                          # split into substrings list ['aaa', 'bbb', 'ccc'] >>> str = 'a  b\nc\nd' >>> str.split( )                             # default delimiter: whitespace ['a', 'b', 'c', 'd'] >>> delim = 'NI' >>> delim.join(['aaa', 'bbb', 'ccc'])       # join substrings list 'aaaNIbbbNIccc' >>> ' '.join(['A', 'dead', 'parrot'])       # add a space between 'A dead parrot' >>> chars = list('Lorreta')                 # covert to characters list >>> chars ['L', 'o', 'r', 'r', 'e', 't', 'a'] >>> chars.append('!') >>> ''.join(chars)                          # to string: empty delimiter 'Lorreta!'

These calls turn out to be surprisingly powerful. For example, a line of data columns separated by tabs can be parsed into its columns with a single split call; the more.py script uses it to split a string into a list of line strings. In fact, we can emulate the replace call we saw earlier in this section with a split/join combination:

 >>> str = 'xxaaxxaa' >>> 'SPAM'.join(str.split('aa'))            # replace, the hard way 'xxSPAMxxSPAM'

For future reference, also keep in mind that Python doesn't automatically convert strings to numbers, or vice versa; if you want to use one as you would use the other, you must say so with manual conversions:

 >>> int("42"), eval("42")                   # string to int conversions (42, 42) >>> str(42), repr(42), ("%d" % 42)          # int to string conversions ('42', '42', '42') >>> "42" + str(1), int("42") + 1            # concatenation, addition ('421', 43)

In the last command here, the first expression triggers string concatenation (since both sides are strings), and the second invokes integer addition (because both objects are numbers). Python doesn't assume you meant one or the other and convert automatically; as a rule of thumb, Python tries to avoid magic whenever possible. String tools will be covered in more detail later in this book (in fact, they get a full chapter in Part V), but be sure to also see the library manual for additional string method tools.

A section on the original string module was removed in this edition. In the past, string method calls were also available by importing the string module and passing the string object as an argument to functions corresponding to the current methods. For instance, given a name str assigned to a string object, the older call form:

 import string string.replace(str, old, new)    # requires an import

is the same as the more modern version:

 str.replace(old, new)

But the latter form does not require a module import, and it will run quicker (the older module call form incurs an extra call along the way). You should use string object methods today, not string module functions, but you may still see the older function-based call pattern in some Python code. Although most of its functions are now deprecated, the original string module today still contains predefined constants (such as string.lowercase) and a new template interface in 2.4.

3.2.5. File Operation Basics

The more.py script also opens the external file whose name is listed on the command line using the built-in open function, and reads that file's text into memory all at once with the file object read method. Since file objects returned by open are part of the core Python language itself, I assume that you have at least a passing familiarity with them at this point in the text. But just in case you've flipped to this chapter early on in your Pythonhood, the calls:

 open('file').read( )            # read entire file into string open('file').read(N)           # read next N bytes into string open('file').readlines( )       # read entire file into line strings list open('file').readline( )        # read next line, through '\n'

load a file's contents into a string, load a fixed-size set of bytes into a string, load a file's contents into a list of line strings, and load the next line in the file into a string, respectively. As we'll see in a moment, these calls can also be applied to shell commands in Python to read their output. File objects also have write methods for sending strings to the associated file. File-related topics are covered in depth in the next chapter, but making an output file and reading it back is easy in Python:

 >>> file = open('spam.txt', 'w')        # create file spam.txt >>> file.write(('spam' * 5) + '\n') >>> file.close( ) >>> file = open('spam.txt')             # or open('spam.txt').read( ) >>> text = file.read( ) >>> text 'spamspamspamspamspam\n'

3.2.6. Using Programs in Two Ways

The last few lines in the more.py file also introduce one of the first big concepts in shell tool programming. They instrument the file to be used in either of two ways: as a script or as a library. Every Python module has a built-in _ _name_ _ variable that Python sets to the _ _main_ _ string only when the file is run as a program, not when it's imported as a library. Because of that, the more function in this file is executed automatically by the last line in the file when this script is run as a top-level program, not when it is imported elsewhere. This simple trick turns out to be one key to writing reusable script code: by coding program logic as functions rather than as top-level code, you can also import and reuse it in other scripts.

The upshot is that we can run more.py by itself or import and call its more function elsewhere. When running the file as a top-level program, we list on the command line the name of a file to be read and paged: as I'll describe in depth later in this chapter, words typed in the command that is used to start a program show up in the built-in sys.argv list in Python. For example, here is the script file in action, paging itself (be sure to type this command line in your PP3E\System directory, or it won't find the input file; more on command lines later):

 C:\...\PP3E\System>python more.py more.py ######################################################### # split and interactively page a string or file of text; ######################################################### def more(text, numlines=15):     lines = text.split('\n')     while lines:         chunk = lines[:numlines]         lines = lines[numlines:]         for line in chunk: print line More?y         if lines and raw_input('More?') not in ['y', 'Y']: break if _ _name_ _ == '_ _main_ _':     import sys                             # when run, not imported     more(open(sys.argv[1]).read( ), 10)     # page contents of file on cmdline

When the more.py file is imported, we pass an explicit string to its more function, and this is exactly the sort of utility we need for documentation text. Running this utility on the sys module's documentation string gives us a bit more information in human-readable form about what's available to scripts:

 C:\...\PP3E\System> python >>> from more import more >>> import sys >>> more(sys._ _doc_ _) This module provides access to some objects used or maintained by the interpreter and to functions that interact strongly with the interpreter. Dynamic objects: argv -- command line arguments; argv[0] is the script pathname if known path -- module search path; path[0] is the script directory, else '' modules -- dictionary of loaded modules displayhook -- called to show results in an interactive session excepthook -- called to handle any uncaught exception other than SystemExit   To customize printing in an interactive session or to install a custom   top-level exception handler, assign other functions to replace these. exitfunc -- if sys.exitfunc exists, this routine is called when Python exits More?

Pressing "y" or "Y" here makes the function display the next few lines of documentation, and then prompt again, unless you've run past the end of the lines list. Try this on your own machine to see what the rest of the module's documentation string looks like.

3.2.7. Python Library Manuals

If that still isn't enough detail, your next step is to read the Python library manual's entry for sys to get the full story. All of Python's standard manuals ship as HTML pages, so you should be able to read them in any web browser you have on your computer. They are installed with Python on Windows, but here are a few simple pointers:

On Windows, click the Start button, pick Programs, select the Python entry there, and then choose the manuals item. The manuals should magically appear on your display within a browser like Internet Explorer. As of Python 2.4, the manuals are provided as a Windows help file and so support searching and navigation.
On Linux, you may be able to click on the manuals' entries in a file explorer, or start your browser from a shell command line and navigate to the library manual's HTML files on your machine.
If you can't find the manuals on your computer, you can always read them online. Go to Python's web site at http://www.python.org and follow the documentation links.

However you get started, be sure to pick the Library manual for things such as sys; Python's standard manual set also includes a short tutorial, language reference, extending references, and more.

3.2.8. Commercially Published References

At the risk of sounding like a marketing droid, I should mention that you can also purchase the Python manual set, printed and bound; see the book information page at http://www.python.org for details and links. Commercially published Python reference books are also available today, including Python Essential Reference (Sams) and Python Pocket Reference (O'Reilly). The former is more complete and comes with examples, but the latter serves as a convenient memory jogger once you've taken a library tour or two.^[*] Also useful are O'Reilly's Python in a Nutshell and Python Standard Library.

^[*] I also wrote the latter as a replacement for the reference appendix that appeared in the first edition of this book; it's meant to be a supplement to the text you're reading. Insert self-serving plug here.