Threads | Parallel System Tools

Table of contents:

The Global Interpreter Lock and Threads

Threads are another way to start activities running at the same time. They sometimes are called "lightweight processes," and they are run in parallel like forked processes, but all run within the same single process. For applications that can benefit from parallel processing, threads offer big advantages for programmers:

Performance

Because all threads run within the same process, they don't generally incur a big startup cost to copy the process itself. The costs of both copying forked processes and running threads can vary per platform, but threads are usually considered less expensive in terms of performance overhead.

Simplicity

Threads can be noticeably simpler to program too, especially when some of the more complex aspects of processes enter the picture (e.g., process exits, communication schemes, and "zombie" processes covered in Chapter 10).

Shared global memory

Also because threads run in a single process, every thread shares the same global memory space of the process. This provides a natural and easy way for threads to communicate -- by fetching and setting data in global vmemory. To the Python programmer, this means that global (module-level) variables and interpreter components are shared among all threads in a program: if one thread assigns a global variable, its new value will be seen by other threads. Some care must be taken to control access to shared global objects, but they are still generally simpler to use than the sorts of process communication tools necessary for forked processes we'll meet later in this chapter (e.g., pipes, streams, signals, etc.).

Portability

Perhaps most importantly, threads are more portable than forked processes. At this writing, the os.fork is not supported on Windows at all, but threads are. If you want to run parallel tasks portably in a Python script today, threads are likely your best bet. Python's thread tools automatically account for any platform-specific thread differences, and provide a consistent interface across all operating systems.

Using threads is surprisingly easy in Python. In fact, when a program is started it is already running a thread -- usually called the "main thread" of the process. To start new, independent threads of execution within a process, we either use the Python thread module to run a function call in a spawned thread, or the Python threading module to manage threads with high-level objects. Both modules also provide tools for synchronizing access to shared objects with locks.

3.3.1 The thread Module

Since the basic thread module is a bit simpler than the more advanced threading module covered later in this section, let's look at some of its interfaces first. This module provides a portable interface to whatever threading system is available in your platform: its interfaces work the same on Windows, Solaris, SGI, and any system with an installed "pthreads" POSIX threads implementation (including Linux). Python scripts that use the Python thread module work on all of these platforms without changing their source code.

Let's start off by experimenting with a script that demonstrates the main thread interfaces. The script in Example 3-5 spawns threads until you reply with a "q" at the console; it's similar in spirit to (and a bit simpler than) the script in Example 3-1, but goes parallel with threads, not forks.

Example 3-5. PP2ESystemThreads hread1.py

# spawn threads until you type 'q'

import thread

def child(tid):
 print 'Hello from thread', tid

def parent( ):
 i = 0
 while 1:
 i = i+1
 thread.start_new(child, (i,))
 if raw_input( ) == 'q': break

parent( )

There are really only two thread-specific lines in this script: the import of the thread module, and the thread creation call. To start a thread, we simply call the thread.start_new function, no matter what platform we're programming on.[3] This call takes a function object and an arguments tuple, and starts a new thread to execute a call to the passed function with the passed arguments. It's almost like the built-in apply function (and like apply, also accepts an optional keyword arguments dictionary), but in this case, the function call begins running in parallelwith the rest of the program.

[3] This call is also available as thread.start_new_thread, for historical reasons. It's possible that one of the two names for the same function may become deprecated in future Python releases, but both appear in this text's examples.

Operationally speaking, the thread.start_new call itself returns immediately with no useful value, and the thread it spawns silently exits when the function being run returns (the return value of the threaded function call is simply ignored). Moreover, if a function run in a thread raises an uncaught exception, a stack trace is printed and the thread exits, but the rest of the program continues.

In practice, though, it's almost trivial to use threads in a Python script. Let's run this program to launch a few threads; it can be run on both Linux and Windows this time, because threads are more portable than process forks:

C:...PP2ESystemThreads>python thread1.py
Hello from thread 1

Hello from thread 2

Hello from thread 3

Hello from thread 4
q

Each message here is printed from a new thread, which exits almost as soon as it is started. To really understand the power of threads running in parallel, we have to do something more long-lived in our threads. The good news is that threads are both easy and fun to play with in Python. Let's mutate the fork-count program of the prior section to use threads. The script in Example 3-6 starts 10 copies of its counter running in parallel threads.

Example 3-6. PP2ESystemThreads hread-count.py

##################################################
# thread basics: start 10 copies of a function
# running in parallel; uses time.sleep so that 
# main thread doesn't die too early--this kills 
# all other threads on both Windows and Linux;
# stdout shared: thread outputs may be intermixed
##################################################

import thread, time

def counter(myId, count): # this function runs in threads
 for i in range(count): 
 #time.sleep(1)
 print '[%s] => %s' % (myId, i)

for i in range(10): # spawn 10 threads
 thread.start_new(counter, (i, 3)) # each thread loops 3 times

time.sleep(4)
print 'Main thread exiting.' # don't exit too early

Each parallel copy of the counter function simply counts from zero up to two here. When run on Windows, all 10 threads run at the same time, so their output is intermixed on the standard output stream:

C:...PP2ESystemThreads>python thread-count.py  
 ...some lines deleted...
[5] => 0
[6] => 0
[7] => 0
[8] => 0
[9] => 0
[3] => 1
[4] => 1
[1] => 0
[5] => 1
[6] => 1
[7] => 1
[8] => 1
[9] => 1
[3] => 2
[4] => 2
[1] => 1
[5] => 2
[6] => 2
[7] => 2
[8] => 2
[9] => 2
[1] => 2
Main thread exiting.

In fact, these threads' output is mixed arbitrarily, at least on Windows -- it may even be in a different order each time you run this script. Because all 10 threads run as independent entities, the exact ordering of their overlap in time depends on nearly random system state at large at the time they are run.

If you care to make this output a bit more coherent, uncomment (that is, remove the # before) the time.sleep(1) call in the counter function and rerun the script. If you do, each of the 10 threads now pauses for one second before printing its current count value. Because of the pause, all threads check in at the same time with the same count; you'll actually have a one-second delay before each batch of 10 output lines appears:

C:...PP2ESystemThreads>python thread-count.py  
 ...some lines deleted...
[7] => 0
[6] => 0  pause...
[0] => 1
[1] => 1
[2] => 1
[3] => 1
[5] => 1
[7] => 1
[8] => 1
[9] => 1
[4] => 1
[6] => 1  pause...
[0] => 2
[1] => 2
[2] => 2
[3] => 2
[5] => 2
[9] => 2
[7] => 2
[6] => 2
[8] => 2
[4] => 2
Main thread exiting.

Even with the sleep synchronization active, though, there's no telling in what order the threads will print their current count. It's random on purpose -- the whole point of starting threads is to get work done independently, in parallel.

Notice that this script sleeps for four seconds at the end. It turns out that, at least on my Windows and Linux installs, the main thread cannot exit while any spawned threads are running; if it does, all spawned threads are immediately terminated. Without the sleep here, the spawned threads would die almost immediately after they are started. This may seem ad hoc, but isn't required on all platforms, and programs are usually structured such that the main thread naturally lives as long as the threads it starts. For instance, a user interface may start an FTP download running in a thread, but the download lives a much shorter life than the user interface itself. Later in this section, we'll see different ways to avoid this sleep with global flags, and will also meet a "join" utility in a different module that lets us wait for spawned threads to finish explicitly.

3.3.1.1 Synchronizing access to global objects

One of the nice things about threads is that they automatically come with a cross-task communications mechanism: shared global memory. For instance, because every thread runs in the same process, if one Python thread changes a global variable, the change can be seen by every other thread in the process, main or child. This serves as a simple way for a program's threads to pass information back and forth to each other -- exit flags, result objects, event indicators, and so on.

The downside to this scheme is that our threads must sometimes be careful to avoid changing global objects at the same time -- if two threads change an object at once, it's not impossible that one of the two changes will be lost (or worse, will corrupt the state of the shared object completely). The extent to which this becomes an issue varies per application, and is sometimes a nonissue altogether.

But even things that aren't obviously at risk may be at risk. Files and streams, for example, are shared by all threads in a program; if multiple threads write to one stream at the same time, the stream might wind up with interleaved, garbled data. Here's an example: if you edit Example 3-6, comment-out the sleep call in counter, and increase the per-thread count parameter from 3 to 100, you might occasionally see the same strange results on Windows that I did:

C:...PP2ESystemThreads>python thread-count.py | more 
 ...more deleted...
[5] => 14
[7] => 14
[9] => 14
[3] => 15
[5] => 15
[7] => 15
[9] => 15
[3] => 16 [5] => 16 [7] => 16 [9] => 16



[3] => 17
[5] => 17
[7] => 17
[9] => 17
 ...more deleted...

Because all 10 threads are trying to write to stdout at the same time, once in a while the output of more than one thread winds up on the same line. Such an oddity in this script isn't exactly going to crash the Mars Lander, but it's indicative of the sorts of clashes in time that can occur when our programs go parallel. To be robust, thread programs need to control access to shared global items like this such that only one thread uses it at once.[4]

[4] For a more detailed explanation of this phenomenon, see The Global Interpreter Lock and Threads.

Luckily, Python's thread module comes with its own easy-to-use tools for synchronizing access to shared objects among threads. These tools are based on the concept of a lock -- to change a shared object, threads acquire a lock, make their changes, and then release the lock for other threads to grab. Lock objects are allocated and processed with simple and portable calls in the thread module, and are automatically mapped to thread locking mechanisms on the underlying platform.

For instance, in Example 3-7, a lock object created by thread.allocate_lock is acquired and released by each thread around the print statement that writes to the shared standard output stream.

Example 3-7. PP2ESystemThreads hread-count-mutex.py

##################################################
# synchronize access to stdout: because it is 
# shared global, thread outputs may be intermixed
##################################################

import thread, time

def counter(myId, count):
 for i in range(count): 
 mutex.acquire( )
 #time.sleep(1)
 print '[%s] => %s' % (myId, i)
 mutex.release( )

mutex = thread.allocate_lock( )
for i in range(10): 
 thread.start_new_thread(counter, (i, 3))

time.sleep(6) 
print 'Main thread exiting.'

Python guarantees that only one thread can acquire a lock at any given time; all other threads that request the lock are blocked until a release call makes it available for acquisition. The net effect of the additional lock calls in this script is that no two threads will ever execute a print statement at the same point in time -- the lock ensures mutually exclusive access to the stdout stream. Hence, the output of this script is the same as the original thread_count.py, except that standard output text is never munged by overlapping prints.

The Global Interpreter Lock and Threads

Strictly speaking, Python currently uses a global interpreter lock mechanism, which guarantees that at most one thread is running code within the Python interpreter at any given point in time. In addition, to make sure that each thread gets a chance to run, the interpreter automatically switches its attention between threads at regular intervals (by releasing and acquiring the lock after a number of bytecode instructions), as well as at the start of long-running operations (e.g., on file input/output requests).

This scheme avoids problems that could arise if multiple threads were to update Python system data at the same time. For instance, if two threads were allowed to simultaneously change an object's reference count, the result may be unpredictable. This scheme can also have subtle consequences. In this chapter's threading examples, for instance, the stdout stream is likely corrupted only because each thread's call to write text is a long-running operation that triggers a thread switch within the interpreter. Other threads are then allowed to run and make write requests while a prior write is in progress.

Moreover, even though the global interpreter lock prevents more than one Python thread from running at the same time, it is not enough to ensure thread safety in general, and does not address higher-level synchronization issues at all. For example, if more than one thread might attempt to update the same variable at the same time, they should generally be given exclusive access to the object with locks. Otherwise, it's not impossible that thread switches will occur in the middle of an update statement's bytecode. Consider this code:

import thread, time
count = 0

def adder():
 global count
 count = count + 1 # concurrently update a shared global
 count = count + 1 # thread swapped out in the middle of this

for i in range(100):
 thread.start_new(adder, ()) # start 100 update threads
time.sleep(5)
print count

As is, this code fails on Windows due to the way its threads are interleaved (you get a different result each time, not 200), but works if lock acquire/release calls are inserted around the addition statements. Locks are not strictly required for all shared object access, especially if a single thread updates an object inspected by other threads. As a rule of thumb, though, you should generally use locks to synchronize threads whenever update rendezvous are possible, rather than relying on the current thread implementation.

Interestingly, the above code also works if the thread-switch check interval is made high enough to allow each thread to finish without being swapped out. The sys.setcheckinterval(N) call sets the frequency with which the interpreter checks for things like thread switches and signal handlers. This interval defaults to 10, the number of bytecode instructions before a switch; it does not need to be reset for most programs, but can be used to tune thread performance. Setting higher values means that switches happen less often: threads incur less overhead, but are less responsive to events.

If you plan on mixing Python with C, also see the thread interfaces described in the Python/C API standard manual. In threaded programs, C extensions must release and reacquire the global interpreter lock around long-running operations, to let other Python threads run.

Incidentally, uncommenting the time.sleep call in this version's counter function makes each output line show up one second apart. Because the sleep occurs while a thread holds the lock, all other threads are blocked while the lock holder sleeps. One thread grabs the lock, sleeps one second and prints; another thread grabs, sleeps, and prints, and so on. Given 10 threads counting up to 3, the program as a whole takes 30 seconds (10 x 3) to finish, with one line appearing per second. Of course, that assumes that the main thread sleeps at least that long too; to see how to remove this assumption, we need to move on to the next section.

3.3.1.2 Waiting for spawned thread exits

Thread module locks are surprisingly useful. They can form the basis of higher-level synchronization paradigms (e.g., semaphores), and can be used as general thread communication devices.[5] For example, Example 3-8 uses a global list of locks to know when all child threads have finished.

[5] They cannot, however, be used to directly synchronize processes. Since processes are more independent, they usually require locking mechanisms that are more long-lived and external to programs. In Chapter 14, we'll meet a fcntl.flock library call that allows scripts to lock and unlock files, and so is ideal as a cross-process locking tool.

Example 3-8. PP2ESystemThreads hread-count-wait1.py

##################################################
# uses mutexes to know when threads are done
# in parent/main thread, instead of time.sleep;
# lock stdout to avoid multiple prints on 1 line;
##################################################

import thread

def counter(myId, count):
 for i in range(count): 
 stdoutmutex.acquire( )
 print '[%s] => %s' % (myId, i)
 stdoutmutex.release( )
 exitmutexes[myId].acquire( ) # signal main thread

stdoutmutex = thread.allocate_lock( )
exitmutexes = []
for i in range(10):
 exitmutexes.append(thread.allocate_lock( ))
 thread.start_new(counter, (i, 100))

for mutex in exitmutexes:
 while not mutex.locked( ): pass
print 'Main thread exiting.'

A lock's locked method can be used to check its state. To make this work, the main thread makes one lock per child, and tacks them onto a global exitmutexes list (remember, the threaded function shares global scope with the main thread). On exit, each thread acquires its lock on the list, and the main thread simply watches for all locks to be acquired. This is much more accurate than naively sleeping while child threads run, in hopes that all will have exited after the sleep.

But wait -- it gets even simpler: since threads share global memory anyhow, we can achieve the same effect with a simple global list of integers, not locks. In Example 3-9, the module's namespace (scope) is shared by top-level code and the threaded function as before -- name exitmutexes refers to the same list object in the main thread and all threads it spawns. Because of that, changes made in a thread are still noticed in the main thread without resorting to extra locks.

Example 3-9. PP2ESystemThreads hread-count-wait2.py

####################################################
# uses simple shared global data (not mutexes) to 
# know when threads are done in parent/main thread;
####################################################

import thread
stdoutmutex = thread.allocate_lock( )
exitmutexes = [0] * 10

def counter(myId, count):
 for i in range(count): 
 stdoutmutex.acquire( )
 print '[%s] => %s' % (myId, i)
 stdoutmutex.release( )
 exitmutexes[myId] = 1 # signal main thread

for i in range(10):
 thread.start_new(counter, (i, 100))

while 0 in exitmutexes: pass
print 'Main thread exiting.'

The main threads of both of the last two scripts fall into busy-wait loops at the end, which might become significant performance drains in tight applications. If so, simply add a time.sleep call in the wait loops to insert a pause between end tests and free up the CPU for other tasks. Even threads must be good citizens.

Both of the last two counting thread scripts produce roughly the same output as the original thread_count.py -- albeit, without stdout corruption, and with different random ordering of output lines. The main difference is that the main thread exits immediately after (and no sooner than!) the spawned child threads:

C:...PP2ESystemThreads>python thread-count-wait2.py  
 ...more deleted...
[2] => 98
[6] => 97
[0] => 99
[7] => 97
[3] => 98
[8] => 97
[9] => 97
[1] => 99
[4] => 98
[5] => 98
[2] => 99
[6] => 98
[7] => 98
[3] => 99
[8] => 98
[9] => 98
[4] => 99
[5] => 99
[6] => 99
[7] => 99
[8] => 99
[9] => 99
Main thread exiting.

Of course, threads are for much more than counting. We'll put shared global data like this to more practical use in a later chapter, to serve as completion signals from child processing threads transferring data over a network, to a main thread controlling a Tkinter GUI user interface display (see Section 11.4 in Chapter 11).

3.3.2 The threading Module

The standard Python library comes with two thread modules -- thread , the basic lower-level interface illustrated thus far, and threading, a higher-level interface based on objects. The threading module internally uses the thread module to implement objects that represent threads and common synchronization tools. It is loosely based on a subset of the Java language's threading model, but differs in ways that only Java programmers would notice.[6] Example 3-10 morphs our counting threads example one last time to demonstrate this new module's interfaces.

[6] But in case this means you: Python's lock and condition variables are distinct objects, not something inherent in all objects, and Python's Thread class doesn't have all the features of Java's. See Python's library manual for further details.

Example 3-10. PP2ESystemThreads hread-classes.py

#######################################################
# uses higher-level java like threading module object
# join method (not mutexes or shared global vars) to 
# know when threads are done in parent/main thread;
# see library manual for more details on threading;
#######################################################

import threading

class mythread(threading.Thread): # subclass Thread object
 def __init__(self, myId, count):
 self.myId = myId
 self.count = count
 threading.Thread.__init__(self) 
 def run(self): # run provides thread logic
 for i in range(self.count): # still synch stdout access
 stdoutmutex.acquire( )
 print '[%s] => %s' % (self.myId, i)
 stdoutmutex.release( )

stdoutmutex = threading.Lock( ) # same as thread.allocate_lock( )
threads = []
for i in range(10):
 thread = mythread(i, 100) # make/start 10 threads
 thread.start( ) # start run method in a thread
 threads.append(thread) 

for thread in threads:
 thread.join( ) # wait for thread exits
print 'Main thread exiting.'

The output of this script is the same as that shown for its ancestors earlier (again, randomly distributed). Using the threading module is largely a matter of specializing classes. Threads in this module are implemented with a Thread object -- a Python class which we customize per application by providing a run method that defines the thread's action. For example, this script subclasses Thread with its own mythread class; mythread's run method is what will be executed by the Thread framework when we make a mythread and call its start method.

In other words, this script simply provides methods expected by the Thread framework. The advantage of going this more coding-intensive route is that we get a set of additional thread-related tools from the framework "for free." The Thread.join method used near the end of this script, for instance, waits until the thread exits (by default); we can use this method to prevent the main thread from exiting too early, rather than the time.sleep calls and global locks and variables we relied on in earlier threading examples.

The example script also uses threading.Lock to synchronize stream access (though this name is just a synonym for thread.allocate_lock in the current implementation). Besides Thread and Lock, the threading module also includes higher-level objects for synchronizing access to shared items (e.g., Semaphore, Condition, Event), and more; see the library manual for details. For more examples of threads and forks in general, see the following section and the examples in Part III.

Introducing Python

Part I: System Interfaces