Section 3.4. Introducing the os Module

3.4. Introducing the os Module

As mentioned, os is the larger of the two core system modules. It contains all of the usual operating-system calls you may have used in your C programs and shell scripts. Its calls deal with directories, processes, shell variables, and the like. Technically, this module provides POSIX toolsa portable standard for operating-system callsalong with platform-independent directory processing tools as the nested module os.path. Operationally, os serves as a largely portable interface to your computer's system calls: scripts written with os and os.path can usually be run unchanged on any platform.

In fact, if you read the os module's source code, you'll notice that it really just imports whatever platform-specific system module you have on your computer (e.g., nt, mac, posix). See the os.py file in the Python source library directoryit simply runs a from* statement to copy all names out of a platform-specific module. By always importing os rather than platform-specific modules, though, your scripts are mostly immune to platform implementation differences. On some platforms, os includes extra tools available just for that platform (e.g., low-level process calls on Unix); by and large, though, it is as cross-platform as it is technically feasible.

3.4.1. The Big os Lists

Let's take a quick look at the basic interfaces in os. As a preview, Table 3-1 summarizes some of the most commonly used tools in the os module organized by functional area.

Table 3-1. Commonly used os module tools
Tasks	Tools
Shell variables	os.environ
Running programs	`os.system`, `os.popen`, `os.popen2`/`3`/`4`, `os.startfile`
Spawning processes	`os.fork`, `os.pipe`, `os.exec`, `os.waitpid`, `os.kill`
Descriptor files, locks	`os.open`, `os.read`, `os.write`
File processing	`os.remove`, `os.rename`, `os.mkfifo`, `os.mkdir`, `os.rmdir`
Administrative tools	`os.getcwd`, `os.chdir`, `os.chmod`, `os.getpid`, `os.listdir`
Portability tools	`os.sep`, `os.pathsep`, `os.curdir`, `os.path.split`, `os.path.join`
Pathname tools	`os.path.exists('path')`, `os.path.isdir('path')`, `os.path.getsize('path')`

If you inspect this module's attributes interactively, you get a huge list of names that will vary per Python release, will likely vary per platform, and isn't incredibly useful until you've learned what each name means (I've removed most of this list to save spacerun the command on your own):

 >>> import os  >>> dir(os)  ['F_OK', 'O_APPEND', 'O_BINARY', 'O_CREAT', 'O_EXCL', 'O_NOINHERIT', 'O_RANDOM', 'O_RDONLY', 'O_RDWR', 'O_SEQUENTIAL', 'O_SHORT_LIVED', 'O_TEMPORARY', 'O_TEXT', 'O_TRUNC', 'O_WRONLY', 'P_DETACH', 'P_NOWAIT', ...  ...10 lines removed here... ... 'popen4', 'putenv', 'read', 'remove', 'removedirs', 'rename', 'renames', 'rmdir', 'sep', 'spawnl', 'spawnle', 'spawnv', 'spawnve', 'startfile', 'stat', 'stat_float_times', 'stat_result', 'statvfs_result', 'strerror', 'sys', 'system', 'tempnam', 'times', 'tmpfile', 'tmpnam', 'umask', 'unlink', 'unsetenv', 'urandom', 'utime', 'waitpid', 'walk', 'write']

Besides all of these, the nested os.path module exports even more tools, most of which are related to processing file and directory names portably:

 >>> dir(os.path) ['_ _all_ _', '_ _builtins_ _', '_ _doc_ _', '_ _file_ _', '_ _name_ _', 'abspath', 'altsep', 'basename', 'commonprefix', 'curdir', 'defpath', 'devnull', 'dirname', 'exists', 'expanduser', 'expandvars', 'extsep', 'getatime', 'getctime', 'getmtime', 'getsize', 'isabs', 'isdir', 'isfile', 'islink', 'ismount', 'join', 'lexists', 'normcase', 'normpath', 'os', 'pardir', 'pathsep', 'realpath', 'sep', 'split', 'splitdrive', 'splitext', 'splitunc', 'stat', 'supports_unicode_filenames', 'sys', 'walk']

3.4.2. Administrative Tools

Just in case those massive listings aren't quite enough to go on, let's experiment interactively with some of the simpler os tools. Like sys, the os module comes with a collection of informational and administrative tools:

 >>> os.getpid( ) -510737 >>> os.getcwd( ) 'C:\\PP3rdEd\\Examples\\PP3E\\System' >>> os.chdir(r'c:\temp') >>> os.getcwd( ) 'c:\\temp'

As shown here, the os.getpid function gives the calling process's process ID (a unique system-defined identifier for a running program), and os.getcwd returns the current working directory. The current working directory is where files opened by your script are assumed to live, unless their names include explicit directory paths. That's why earlier I told you to run the following command in the directory where more.py lives:

 C:\...\PP3E\System>python more.py more.py

The input filename argument here is given without an explicit directory path (though you could add one to page files in another directory). If you need to run in a different working directory, call the os.chdir function to change to a new directory; your code will run relative to the new directory for the rest of the program (or until the next os.chdir call). This chapter will have more to say about the notion of a current working directory, and its relation to module imports when it explores script execution context.

3.4.3. Portability Constants

The os module also exports a set of names designed to make cross-platform programming simpler. The set includes platform-specific settings for path and directory separator characters, parent and current directory indicators, and the characters used to terminate lines on the underlying computer:^[*]

^[*] os.linesep comes back as \r\n herethe symbolic escape code equivalent of \015\012, which reflect the carriage-return + line-feed line terminator convention on Windows. In older versions of Python, you may still see these displayed in their octal or hexadecimal escape forms. See the discussion of end-of-line translations in the next chapter.

 >>> os.pathsep, os.sep, os.pardir, os.curdir, os.linesep (';', '\\', '..', '.', '\r\n')

os.sep is whatever character is used to separate directory components on the platform on which Python is running; it is automatically preset to \ on Windows, / for POSIX machines, and : on the Mac. Similarly, os.pathsep provides the character that separates directories on directory lists: for POSIX and ; for DOS and Windows.

By using such attributes when composing and decomposing system-related strings in our scripts, the scripts become fully portable. For instance, a call of the form os.sep.split(dirpath) will correctly split platform-specific directory names into components, even though dirpath may look like dir\dir on Windows, dir/dir on Linux, and dir:dir on Macintosh. As previously mentioned, on Windows you can usually use forward slashes rather than backward slashes when giving filenames to be opened; but these portability constants allow scripts to be platform neutral in directory processing code.

3.4.4. Basic os.path Tools

The nested module os.path provides a large set of directory-related tools of its own. For example, it includes portable functions for tasks such as checking a file's type (isdir, isfile, and others), testing file existence (exists), and fetching the size of a file by name (getsize):

 >>> os.path.isdir(r'C:\temp'),        os.path.isfile(r'C:\temp') (True, False) >>> os.path.isdir(r'C:\config.sys'),  os.path.isfile(r'C:\config.sys') (False, Tuue) >>> os.path.isdir('nonesuch'),        os.path.isfile('nonesuch') (False, False) >>> os.path.exists(r'c:\temp\data.txt') 0 >>> os.path.getsize(r'C:\autoexec.bat') 260

The os.path.isdir and os.path.isfile calls tell us whether a filename is a directory or a simple file; both return False if the named file does not exist. We also get calls for splitting and joining directory path strings, which automatically use the directory name conventions on the platform on which Python is running:

 >>> os.path.split(r'C:\temp\data.txt') ('C:\\temp', 'data.txt') >>> os.path.join(r'C:\temp', 'output.txt') 'C:\\temp\\output.txt' >>> name = r'C:\temp\data.txt'                            # Windows paths >>> os.path.basename(name), os.path.dirname(name) ('data.txt', 'C:\\temp') >>> name = '/home/lutz/temp/data.txt'                     # Unix-style paths >>> os.path.basename(name), os.path.dirname(name) ('data.txt', '/home/lutz/temp') >>> os.path.splitext(r'C:\PP3rdEd\Examples\PP3E\PyDemos.pyw') ('C:\\PP3rdEd\\Examples\\PP3E\\PyDemos', '.pyw')

os.path.split separates a filename from its directory path, and os.path.join puts them back togetherall in entirely portable fashion using the path conventions of the machine on which they are called. The basename and dirname calls here return the second and first items returned by a split simply as a convenience, and splitext strips the file extension (after the last .). The normpath call comes in handy if your paths become a jumble of Unix and Windows separators:

 >>> mixed 'C:\\temp\\public/files/index.html' >>> os.path.normpath(mixed) 'C:\\temp\\public\\files\\index.html' >>> print os.path.normpath(r'C:\temp\\sub\.\file.ext') C:\temp\sub\file.ext

This module also has an abspath call that portably returns the full directory pathname of a file; it accounts for adding the current directory, .. parents, and more:

 >>> os.getcwd( ) 'C:\\PP3rdEd\\cdrom\\WindowsExt' >>> os.path.abspath('temp')                    # expand to full pathname 'C:\\PP3rdEd\\cdrom\\WindowsExt\\temp' >>> os.path.abspath(r'..\examples')            # relative paths expanded 'C:\\PP3rdEd\\examples' >>> os.path.abspath(r'C:\PP3rdEd\chapters')    # absolute paths unchanged 'C:\\PP3rdEd\\chapters' >>> os.path.abspath(r'C:\temp\spam.txt')       # ditto for filenames 'C:\\temp\\spam.txt' >>> os.path.abspath('')                        # empty string means the cwd 'C:\\PP3rdEd\\cdrom\\WindowsExt'

Because filenames are relative to the current working directory when they aren't fully specified paths, the os.path.abspath function helps if you want to show users what directory is truly being used to store a file. On Windows, for example, when GUI-based programs are launched by clicking on file explorer icons and desktop shortcuts, the execution directory of the program is the clicked file's home directory, but that is not always obvious to the person doing the clicking; printing a file's abspath can help.

3.4.5. Running Shell Commands from Scripts

The os module is also the place where we run shell commands from within Python scripts. This concept is intertwined with others we won't cover until later in this chapter, but since this is a key concept employed throughout this part of the book, let's take a quick first look at the basics here. Two os functions allow scripts to run any command line that you can type in a console window:

os.system: Runs a shell command from a Python script
os.popen: Runs a shell command and connect to its input or output streams

3.4.5.1. What's a shell command?

To understand the scope of these calls, we first need to define a few terms. In this text, the term shell means the system that reads and runs command-line strings on your computer, and shell command means a command-line string that you would normally enter at your computer's shell prompt.

For example, on Windows, you can start an MS-DOS console window and type DOS commands therecommands such as dir to get a directory listing, and type to view a file, names of programs you wish to start, and so on. DOS is the system shell, and commands such as dir and type are shell commands. On Linux, you can start a new shell session by opening an xterm window and typing shell commands there tools to list directories, cat to view files, and so on. A variety of shells are available on Unix (e.g., csh, ksh), but they all read and run command lines. Here are two shell commands typed and run in an MS-DOS console box on Windows:

 C:\temp>dir /B                    ...type a shell command line about-pp.html                     ...its output shows up here python1.5.tar.gz                  ...DOS is the shell on Windows about-pp2e.html about-ppr2e.html newdir C:\temp>type helloshell.py # a Python program print 'The Meaning of Life'

3.4.5.2. Running shell commands

None of this is directly related to Python, of course (despite the fact that Python command-line scripts are sometimes confusingly called "shell tools"). But because the os module's system and popen calls let Python scripts run any sort of command that the underlying system shell understands, our scripts can make use of every command-line tool available on the computer, whether it's coded in Python or not. For example, here is some Python code that runs the two DOS shell commands typed at the shell prompt shown previously:

 C:\temp>python >>> import os >>> os.system('dir /B') about-pp.html python1.5.tar.gz about-pp2e.html about-ppr2e.html newdir 0 >>> os.system('type helloshell.py') # a Python program print 'The Meaning of Life' 0

The 0s at the end here are just the return values of the system call itself. The system call can be used to run any command line that we could type at the shell's prompt (here, C:\temp>). The command's output normally shows up in the Python session's or program's standard output stream.

3.4.5.3. Communicating with shell commands

But what if we want to grab a command's output within a script? The os.system call simply runs a shell command line, but os.popen also connects to the standard input or output streams of the command; we get back a file-like object connected to the command's output by default (if we pass a w mode flag to popen, we connect to the command's input stream instead). By using this object to read the output of a command spawned with popen, we can intercept the text that would normally appear in the console window where a command line is typed:

 >>> open('helloshell.py').read( ) "# a Python program\nprint 'The Meaning of Life'\n" >>> text = os.popen('type helloshell.py').read( ) >>> text "# a Python program\nprint 'The Meaning of Life'\n" >>> listing = os.popen('dir /B').readlines( ) >>> listing ['about-pp.html\n', 'python1.5.tar.gz\n', 'helloshell.py\n', 'about-pp2e.html\n', 'about-ppr2e.html\n', 'newdir\n']

Here, we first fetch a file's content the usual way (using Python files), then as the output of a shell type command. Reading the output of a dir command lets us get a listing of files in a directory that we can then process in a loop (we'll learn other ways to obtain such a list in the next chapter^[*]). So far, we've run basic DOS commands; because these calls can run any command line that we can type at a shell prompt, they can also be used to launch other Python scripts:

^[*] In the next chapter, after we've learned about file iterators, we'll also learn that the popen objects have an iterator that reads one line at a time, often making the readlines method call superfluous.

 >>> os.system('python helloshell.py')       # run a Python program The Meaning of Life 0 >>> output = os.popen('python helloshell.py').read( ) >>> output 'The Meaning of Life\n'

In all of these examples, the command-line strings sent to system and popen are hardcoded, but there's no reason Python programs could not construct such strings at runtime using normal string operations (+, %, etc.). Given that commands can be dynamically built and run this way, system and popen turn Python scripts into flexible and portable tools for launching and orchestrating other programs. For example, a Python test "driver" script can be used to run programs coded in any language (e.g., C++, Java, Python) and analyze their output. We'll explore such a script in Chapter 6.

3.4.5.4. Shell command limitations

You should keep in mind two limitations of system and popen. First, although these two functions themselves are fairly portable, their use is really only as portable as the commands that they run. The preceding examples that run DOS dir and type shell commands, for instance, work only on Windows, and would have to be changed in order to run ls and cat commands on Unix-like platforms.

Second, it is important to remember that running Python files as programs this way is very different and generally much slower than importing program files and calling functions they define. When os.system and os.popen are called, they must start a brand-new, independent program running on your operating system (they generally run the command in a newly forked process). When importing a program file as a module, the Python interpreter simply loads and runs the file's code in the same process in order to generate a module object. No other program is spawned along the way.^[]

] The Python execfile built-in function also runs a program files code, but within the same process that called it. It's similar to an import in that regard, but it works more as if the file's text had been pasted into the calling program at the place where the execfile call appears (unless explicit global or local namespace dictionaries are passed). Unlike imports, execfile unconditionally reads and executes a file's code (it may be run more than once per process), no module object is generated by the file's execution, and unless optional namespace dictionaries are passed in, assignments in the file's code may overwrite variables in the scope where the execfile appears; see the Python library manual for more details.

There are good reasons to build systems as separate programs too, and we'll later explore things such as command-line arguments and streams that allow programs to pass information back and forth. But for most purposes, imported modules are a faster and more direct way to compose systems.

If you plan to use these calls in earnest, you should also know that the os.system call normally blocksthat is, pausesits caller until the spawned command line exits. On Linux and Unix-like platforms, the spawned command can generally be made to run independently and in parallel with the caller by adding an & shell background operator at the end of the command line:

 os.system("python program.py arg arg &")

On Windows, spawning with a DOS start command will usually launch the command in parallel too:

 os.system("start program.py arg arg")

In fact, this is so useful that an os.startfile call was added in recent Python releases. This call opens a file with whatever program is listed in the Windows registry for the file's typeas though its icon has been clicked with the mouse cursor:

 os.startfile("webpage.html")    # open file in your web browser os.startfile("document.doc")    # open file in Microsoft Word os.startfile("myscript.py")     # run file with Python

The os.popen call does not generally block its caller (by definition, the caller must be able to read or write the file object returned) but callers may still occasionally become blocked under both Windows and Linux if the pipe object is closede.g., when garbage is collectedbefore the spawned program exits or the pipe is read exhaustively (e.g., with its read( ) method). As we will see in the next chapter, the Unix os.fork/exec and Windows os.spawnv calls can also be used to run parallel programs without blocking.

Because the os module's system and popen calls also fall under the category of program launchers, stream redirectors, and cross-process communication devices, they will show up again in later parts of this chapter and in the following chapters, so we'll defer further details for the time being. If you're looking for more details right away, see the stream redirection section in this chapter and the directory listings section in the next.

3.4.6. Other os Module Exports

Since most other os module tools are even more difficult to appreciate outside the context of larger application topics, we'll postpone a deeper look at them until later sections. But to let you sample the flavor of this module, here is a quick preview for reference. Among the os module's other weapons are these:

os.environ: Fetches and sets shell environment variables
os.fork: Spawns a new child process on Unix
os.pipe: Communicates between programs
os.execlp: Starts new programs
os.spawnv: Starts new programs with lower-level control
os.open: Opens a low-level descriptor-based file
os.mkdir: Creates a new directory
os.mkfifo: Creates a new named pipe
os.stat: Fetches low-level file information
os.remove: Deletes a file by its pathname
os.path.walk, os.walk: Applies a function or loop body to all parts of an entire directory tree

And so on. One caution up front: the os module provides a set of file open, read, and write calls, but all of these deal with low-level file access and are entirely distinct from Python's built-in stdio file objects that we create with the built-in open function. You should normally use the built-in open function (not the os module) for all but very special file-processing needs (e.g., opening with exclusive access file locking).

Throughout this chapter, we will apply sys and os tools such as these to implement common system-level tasks, but this book doesn't have space to provide an exhaustive list of the contents of modules we will meet along the way. If you have not already done so, you should become acquainted with the contents of modules such as os and sys by consulting the Python library manual. For now, let's move on to explore additional system tools in the context of broader system programming concepts.