Section 5.3. Threads | Programming Python

5.3. Threads

Threads are another way to start activities running at the same time. In short, they run a function call in parallel with the rest of the program. Threads are sometimes called "lightweight processes," because they run in parallel like forked processes, but all of them run within the same single process. While processes are commonly used to start independent programs, threads are commonly used for tasks such as nonblocking input calls and long-running tasks in a GUI. They also provide a natural model for algorithms that can be expressed as independently running tasks. For applications that can benefit from parallel processing, some developers consider threads to offer a number of advantages:

Performance: Because all threads run within the same process, they don't generally incur a big startup cost to copy the process itself. The costs of both copying forked processes and running threads can vary per platform, but threads are usually considered less expensive in terms of performance overhead.
Simplicity: Threads can be noticeably simpler to program too, especially when some of the more complex aspects of processes enter the picture (e.g., process exits, communication schemes, and zombie processes, covered in Chapter 13).
Shared global memory: Also because threads run in a single process, every thread shares the same global memory space of the process. This provides a natural and easy way for threads to communicateby fetching and setting data in global memory. To the Python programmer, this means that both global scope (module-level) variables and program-wide interpreter components are shared among all threads in a program; if one thread assigns a global variable, its new value will be seen by other threads. Some care must be taken to control access to shared global objects, but to some they seem generally simpler to use than the process communication tools necessary for forked processes, which we'll meet later in this chapter and book (e.g., pipes, streams, signals, sockets, etc.). Like much in programming, this is not a universally shared view, however, so you'll have to weigh the difference for your programs and platforms yourself.
Portability: Perhaps most important is the fact that threads are more portable than forked processes. At this writing, os.fork is not supported by the standard version of Python on Windows, but threads are. If you want to run parallel tasks portably in a Python script today and you are unwilling or unable to install a Unix-like library such as Cygwin on Windows, threads may be your best bet. Python's thread tools automatically account for any platform-specific thread differences, and they provide a consistent interface across all operating systems.

So what's the catch? There are three potential downsides you should be aware of before you start spinning your threads:

Function calls versus programs

First of all, threads are not a wayat least, not a direct wayto start up another program. Rather, threads are designed to run a call to a function in parallel with the rest of the program. As we saw in the prior section, by contrast, forked processes can either call a function or start a new program. The thread function can run scripts with the execfile function and can start programs with tools such as os.popen; but fundamentally, they are in-program functions.

In practice, this is usually not a limitation. For many applications, parallel functions are sufficiently powerful. For instance, if you want to implement nonblocking input and output and avoid blocking a GUI with long-running tasks, threads do the job; simply spawn a thread to run a function that performs the potentially long-running task. The rest of the program will continue independently.

Thread synchronization and queues

Secondly, the fact that threads share global memory and resources is both good news and bad newsit provides a communication mechanism, but we have to be careful to synchronize a variety of operations. Even operations such as printing are a potential conflict since there is only one sys.stdout per process, which is shared by all threads.

Luckily, the Python Queue module, described in this section, makes this simple: realistic threaded programs are usually structured as one or more producer threads that add data to a queue, along with one or more consumer threads that take the data off the queue and process it. In a typical threaded GUI, for example, producers may download or compute data and place it on the queue; the consumerthe main GUI threadchecks the queue for data periodically with a timer event and displays it in the GUI when it arrives.

The global interpreter lock (GIL)

Finally, as we'll learn in more detail later in this section, Python's implementation of threads means that only one thread is ever running in the Python virtual machine at any point in time. Python threads are true operating system threads, but all threads must acquire a single shared lock when they are ready to run, and each thread may be swapped out after running for a set number of virtual machine instructions.

Because of this structure, Python threads cannot today be distributed across multiple CPUs on a multi-CPU computer. To leverage more than one CPU, you'll simply need to use process forking, not threads (the amount and complexity of code required for both are roughly the same). Moreover, long-running tasks implemented as C extensions can run truly independently if they release the GIL to allow Python threads to run while their task is in progress. Python code, however, cannot truly overlap in time.

The advantage of Python's implementation of threads is performancewhen it was attempted, making the virtual machine truly thread safe reportedly slowed all programs by a factor of two on Windows and by an even larger factor on Linux. Even nonthreaded programs ran at half speed.

Despite what you may think after reading the last few introductory paragraphs, threads are remarkably easy to use in Python. In fact, when a program is started it is already running a thread, usually called the "main thread" of the process. To start new, independent threads of execution within a process, we use either the Python thread module to run a function call in a spawned thread or the Python threading module to manage threads with high-level objects. Both modules also provide tools for synchronizing access to shared objects with locks.

5.3.1. The thread Module

Since the basic tHRead module is a bit simpler than the more advanced threading module covered later in this section, let's look at some of its interfaces first. This module provides a portable interface to whatever threading system is available in your platform: its interfaces work the same on Windows, Solaris, SGI, and any system with an installed pthreads POSIX threads implementation (including Linux). Python scripts that use the Python tHRead module work on all of these platforms without changing their source code.

Let's start off by experimenting with a script that demonstrates the main thread interfaces. The script in Example 5-5 spawns threads until you reply with a "q" at the console; it's similar in spirit to (and a bit simpler than) the script in Example 5-1, but it goes parallel with threads, not with forks.

Example 5-5. PP3E\System\Threads\thread1.py

 # spawn threads until you type 'q' import thread def child(tid):     print 'Hello from thread', tid def parent( ):     i = 0     while 1:         i = i+1         thread.start_new(child, (i,))         if raw_input( ) == 'q': break parent( )

This script really contains only two thread-specific lines: the import of the thread module and the thread creation call. To start a thread, we simply call the thread.start_new function, no matter what platform we're programming on.^[*] This call takes a function object and an arguments tuple and starts a new thread to execute a call to the passed function with the passed arguments. It's almost like the built-in apply function and newer function(*args) call syntax (and, like apply, it also accepts an optional keyword arguments dictionary), but in this case, the function call begins running in parallel with the rest of the program.

^[*] This call is also available as thread.start_new_thread for historical reasons. It's possible that one of the two names for the same function may become deprecated in future Python releases, but both appear in this text's examples. As of Python 2.4, both names are still available.

Operationally speaking, the tHRead.start_new call itself returns immediately with no useful value, and the thread it spawns silently exits when the function being run returns (the return value of the threaded function call is simply ignored). Moreover, if a function run in a thread raises an uncaught exception, a stack trace is printed and the thread exits, but the rest of the program continues.

In practice, though, it's almost trivial to use threads in a Python script. Let's run this program to launch a few threads; we can run it on both Linux and Windows this time, because threads are more portable than process forks:

 C:\...\PP3E\System\Threads>python thread1.py Hello from thread 1 Hello from thread 2 Hello from thread 3 Hello from thread 4 q

Each message here is printed from a new thread, which exits almost as soon as it is started. To really understand the power of threads running in parallel, we have to do something more long-lived in our threads. The good news is that threads are both easy and fun to play with in Python. Let's mutate the fork-count program of the prior section to use threads. The script in Example 5-6 starts 10 copies of its counter running in parallel threads.

Example 5-6. PP3E\System\Threads\thread-count.py

 ########################################################################## # thread basics: start 10 copies of a function running in parallel; # uses time.sleep so that the main thread doesn't die too early--this # kills all other threads on both Windows and Linux; stdout is shared: # thread outputs may be intermixed in this version occasionally. ########################################################################## import thread, time def counter(myId, count):                    # this function runs in threads     for i in range(count):         #time.sleep(1)         print '[%s] => %s' % (myId, i) for i in range(10):                          # spawn 10 threads     thread.start_new(counter, (i, 3))        # each thread loops 3 times time.sleep(4) print 'Main thread exiting.'                 # don't exit too early

Each parallel copy of the counter function simply counts from zero up to two here. When run on Windows, all 10 threads run at the same time, so their output is intermixed on the standard output stream:

 C:\...\PP3E\System\Threads>python thread-count.py   ...some lines deleted... [5] => 0 [6] => 0 [7] => 0 [8] => 0 [9] => 0 [3] => 1 [4] => 1 [1] => 0 [5] => 1 [6] => 1 [7] => 1 [8] => 1 [9] => 1 [3] => 2 [4] => 2 [1] => 1 [5] => 2 [6] => 2 [7] => 2 [8] => 2 [9] => 2 [1] => 2 Main thread exiting.

In fact, the output of these threads is mixed arbitrarily, at least on Windows. It may even be in a different order each time you run this script. Because all 10 threads run as independent entities, the exact ordering of their overlap in time depends on nearly random system state at large at the time they are run.

If you care to make this output a bit more coherent, uncomment the time.sleep(1) call in the counter function (that is, remove the # before it) and rerun the script. If you do, each of the 10 threads now pauses for one second before printing its current count value. Because of the pause, all threads check in at the same time with the same count; you'll actually have a one-second delay before each batch of 10 output lines appears:

 C:\...\PP3E\System\Threads>python thread-count.py   ...some lines deleted... [7] => 0 [6] => 0                                pause... [0] => 1 [1] => 1 [2] => 1 [3] => 1 [5] => 1 [7] => 1 [8] => 1 [9] => 1 [4] => 1 [6] => 1                                pause... [0] => 2 [1] => 2 [2] => 2 [3] => 2 [5] => 2 [9] => 2 [7] => 2 [6] => 2 [8] => 2 [4] => 2 Main thread exiting.

Even with the sleep synchronization active, though, there's no telling in what order the threads will print their current count. It's random on purpose. The whole point of starting threads is to get work done independently, in parallel.

Notice that this script sleeps for four seconds at the end. It turns out that, at least on my Windows and Linux installs, the main thread cannot exit while any spawned threads are running; if it does, all spawned threads are immediately terminated. Without the sleep here, the spawned threads would die almost immediately after they are started. This may seem ad hoc, but it isn't required on all platforms, and programs are usually structured such that the main thread naturally lives as long as the threads it starts. For instance, a user interface may start an FTP download running in a thread, but the download lives a much shorter life than the user interface itself. Later in this section, we'll see different ways to avoid this sleep using global flags, and we will also meet a "join" utility in a different module that lets us wait for spawned threads to finish explicitly.

5.3.1.1. Synchronizing access to global objects

One of the nice things about threads is that they automatically come with a cross-task communications mechanism: shared global memory. For instance, because every thread runs in the same process, if one Python thread changes a global variable, the change can be seen by every other thread in the process, main or child. This serves as a simple way for a program's threads to pass informationexit flags, result objects, event indicators, and so onback and forth to each other.

The downside to this scheme is that our threads must sometimes be careful to avoid changing global objects at the same time. If two threads change an object at once, it's not impossible that one of the two changes will be lost (or worse, will corrupt the state of the shared object completely). The extent to which this becomes an issue varies per application, and sometimes it isn't an issue at all.

But even things that aren't obviously at risk may be at risk. Files and streams, for example, are shared by all threads in a program; if multiple threads write to one stream at the same time, the stream might wind up with interleaved, garbled data. Here's an example: if you edit Example 5-6, comment out the sleep call in counter, and increase the per-thread count parameter from 3 to 100, you might occasionally see the same strange results on Windows that I did:

 C:\...\PP3E\System\Threads\>python thread-count.py | more   ...more deleted... [5] => 14 [7] => 14 [9] => 14 [3] => 15 [5] => 15 [7] => 15 [9] => 15 [3] => 16 [5] => 16 [7] => 16 [9] => 16 [3] => 17 [5] => 17 [7] => 17 [9] => 17  ...more deleted...

Because all 10 threads are trying to write to stdout at the same time, once in a while the output of more than one thread winds up on the same line. Such an oddity in this artificial script isn't exactly going to crash the Mars Lander, but it's indicative of the sorts of clashes in time that can occur when our programs go parallel. To be robust, thread programs need to control access to shared global items like this such that only one thread uses it at once.^[*]

^[*] If it's not clear why this should be so, watch for a more detailed explanation of this phenomenon in the section "The Global Interpreter Lock and Threads" near the end of this discussion of threads.

Luckily, Python's thread module comes with its own easy-to-use tools for synchronizing access to shared objects among threads. These tools are based on the concept of a lockto change a shared object, threads acquire a lock, make their changes, and then release the lock for other threads to grab. Lock objects are allocated and processed with simple and portable calls in the thread module and are automatically mapped to thread locking mechanisms on the underlying platform.

For instance, in Example 5-7, a lock object created by thread.allocate_lock is acquired and released by each thread around the print statement that writes to the shared standard output stream.

Example 5-7. PP3E\System\Threads\thread-count-mutex.py

 ############################################################## # synchronize access to stdout: because it is shared global, # thread outputs may be intermixed if not synchronized ############################################################## import thread, time def counter(myId, count):     for i in range(count):         mutex.acquire( )         #time.sleep(1)         print '[%s] => %s' % (myId, i)         mutex.release( ) mutex = thread.allocate_lock( ) for i in range(10):     thread.start_new_thread(counter, (i, 3)) time.sleep(6) print 'Main thread exiting.'

Python guarantees that only one thread can acquire a lock at any given time; all other threads that request the lock are blocked until a release call makes it available for acquisition. The net effect of the additional lock calls in this script is that no two threads will ever execute a print statement at the same point in time; the lock ensures mutually exclusive access to the stdout stream. Hence, the output of this script is the same as the original thread_count.py except that standard output text is never munged by overlapping prints.

Incidentally, uncommenting the time.sleep call in this version's counter function makes each output line show up one second apart. Because the sleep occurs while a thread holds the mutex lock, all other threads are blocked while the lock holder sleeps, even though time.sleep itself does not block other threads. One thread grabs the mutex lock, sleeps one second, and prints; another thread grabs, sleeps, and prints, and so on. Given 10 threads counting up to three, the program as a whole takes 30 seconds (10 x 3) to finish, with one line appearing per second. Of course, that assumes that the main thread sleeps at least that long too; to see how to remove this assumption, we need to move on to the next section.

5.3.1.2. Waiting for spawned thread exits

Thread module locks are surprisingly useful. They can form the basis of higher-level synchronization paradigms (e.g., semaphores) and can be used as general thread communication devices.^[*] For example, Example 5-8 uses a global list of locks to know when all child threads have finished.

^[*] They cannot, however, be used to directly synchronize processes. Since processes are more independent, they usually require locking mechanisms that are more long-lived and external to programs. Both the os.open call with an open flag of O_EXCL, as well as the less portable fcntl.flock call, allow scripts to lock and unlock files and so are ideal as cross-process locking tools.

Example 5-8. PP3E\System\Threads\thread-count-wait1.py

 ################################################## # uses mutexes to know when threads are done # in parent/main thread, instead of time.sleep; # lock stdout to avoid multiple prints on 1 line; ################################################## import thread def counter(myId, count):     for i in range(count):         stdoutmutex.acquire( )         print '[%s] => %s' % (myId, i)         stdoutmutex.release( )     exitmutexes[myId].acquire( )    # signal main thread stdoutmutex = thread.allocate_lock( ) exitmutexes = [] for i in range(10):     exitmutexes.append(thread.allocate_lock( ))     thread.start_new(counter, (i, 100)) for mutex in exitmutexes:     while not mutex.locked( ): pass print 'Main thread exiting.'

A lock's locked method can be used to check its state. To make this work, the main thread makes one lock per child and tacks them onto a global exitmutexes list (remember, the threaded function shares global scope with the main thread). On exit, each thread acquires its lock on the list, and the main thread simply watches for all locks to be acquired. This is much more accurate than naïvely sleeping while child threads run in hopes that all will have exited after the sleep.

But wait, it gets even simpler: since threads share global memory anyhow, we can achieve the same effect with a simple global list of integers, not locks. In Example 5-9, the module's namespace (scope) is shared by top-level code and the threaded function, as before. exitmutexes refers to the same list object in the main thread and all threads it spawns. Because of that, changes made in a thread are still noticed in the main thread without resorting to extra locks.

Example 5-9. PP3E\System\Threads\thread-count-wait2.py

 #################################################### # uses simple shared global data (not mutexes) to # know when threads are done in parent/main thread; #################################################### import thread stdoutmutex = thread.allocate_lock( ) exitmutexes = [0] * 10 def counter(myId, count):     for i in range(count):         stdoutmutex.acquire( )         print '[%s] => %s' % (myId, i)         stdoutmutex.release( )     exitmutexes[myId] = 1  # signal main thread for i in range(10):     thread.start_new(counter, (i, 100)) while 0 in exitmutexes: pass print 'Main thread exiting.'

The main threads of both of the last two scripts fall into busy-wait loops at the end, which might become significant performance drains in tight applications. If so, simply add a time.sleep call in the wait loops to insert a pause between end tests and to free up the CPU for other tasks. Even threads must be good citizens.

Both of the last two counting thread scripts produce roughly the same output as the original thread_count.py, albeit without stdout corruption and with different random ordering of output lines. The main difference is that the main thread exits immediately after (and no sooner than!) the spawned child threads:

 C:\...\PP3E\System\Threads>python thread-count-wait2.py   ...more deleted... [2] => 98 [6] => 97 [0] => 99 [7] => 97 [3] => 98 [8] => 97 [9] => 97 [1] => 99 [4] => 98 [5] => 98 [2] => 99 [6] => 98 [7] => 98 [3] => 99 [8] => 98 [9] => 98 [4] => 99 [5] => 99 [6] => 99 [7] => 99 [8] => 99 [9] => 99 Main thread exiting.

Of course, threads are for much more than counting. We'll put shared global data to more practical use in a later chapter, where it will serve as completion signals from child processing threads transferring data over a network to a main thread controlling a Tkinter GUI user interface display (see Chapter 14). Shared global data among threads also turns out to be the basis of queues, which are discussed later in this section; each thread gets or puts data using the same queue object.

5.3.2. The threading Module

The Python standard library comes with two thread modulesthread, the basic lower-level interface illustrated thus far, and tHReading, a higher-level interface based on objects. The threading module internally uses the tHRead module to implement objects that represent threads and common synchronization tools. It is loosely based on a subset of the Java language's threading model, but it differs in ways that only Java programmers would notice.^[*] Example 5-10 morphs our counting threads example one last time to demonstrate this new module's interfaces.

^[*] But in case this means you, Python's lock and condition variables are distinct objects, not something inherent in all objects, and Python's Thread class doesn't have all the features of Java's. See Python's library manual for further details.

Example 5-10. PP3E\System\Threads\thread-classes.py

 ########################################################################## # uses higher-level Java-like threading module object join method (not # mutexes or shared global vars) to know when threads are done in main # parent thread; see library manual for more details on threading; ########################################################################## import threading class mythread(threading.Thread):          # subclass Thread object     def _ _init_ _(self, myId, count):         self.myId  = myId         self.count = count         threading.Thread._ _init_ _(self)     def run(self):                         # run provides thread logic         for i in range(self.count):        # still sync stdout access             stdoutmutex.acquire( )             print '[%s] => %s' % (self.myId, i)             stdoutmutex.release( ) stdoutmutex = threading.Lock()              # same as thread.allocate_lock( ) threads = [] for i in range(10):     thread = mythread(i, 100)                # make/start 10 threads     thread.start( )                         # start run method in a thread     threads.append(thread) for thread in threads:     thread.join( )                          # wait for thread exits print 'Main thread exiting.'

The output of this script is the same as that shown for its ancestors earlier (again, randomly distributed). Using the tHReading module is largely a matter of specializing classes. Threads in this module are implemented with a THRead object, a Python class which we customize per application by providing a run method that defines the thread's action. For example, this script subclasses Thread with its own mytHRead class; the run method will be executed by the THRead framework in a new thread when we make a mythread and call its start method.

In other words, this script simply provides methods expected by the THRead framework. The advantage of taking this more coding-intensive route is that we get a set of additional thread-related tools from the framework "for free." The Thread.join method used near the end of this script, for instance, waits until the thread exits (by default); we can use this method to prevent the main thread from exiting too early rather than using the time.sleep calls and global locks and variables we relied on in earlier threading examples.

The example script also uses threading.Lock to synchronize stream access (though this name is just a synonym for thread.allocate_lock in the current implementation). The THRead class can also be used to start a simple function without subclassing, though this call form is not noticeably simpler than the basic thread module. For example, the following four code snippets spawn the same sort of thread:

 # subclass with state class mythread(threading.Thread):     def _ _init_ _(self, myId, count):         self.i = i         threading.Thread._ _init_ _(self)     def run(self):         consumer(self.i) mythread().start( ) # pass action in thread = threading.Thread(target=(lambda: consumer(i))) thread.start( ) # same but no lambda wrapper for state Threading.Thread(target=consumer, args=(i,)).start( ) # basic thread module thread.start_new_thread(consumer, (i,))

Besides THRead and Lock, the threading module also includes higher-level objects for synchronizing access to shared items (e.g., Semaphore, Condition, Event)many more than we have space to cover here; see the library manual for details.^[*]

^[*] Some Python users would probably recommend that you use tHReading rather than tHRead in general. Unless you need the more powerful tools in threading, though, the choice is arbitrary. The basic thread module does not impose OOP, and as you can see from the four coding alternatives in this section, it can be simpler. The most general Python rule of thumb applies here as always: keep it simple, unless it has to be complex.

For more examples of threads and forks in general, see the following sections of this chapter as well as the examples in the GUI and network scripting parts of this book. We will thread GUIs, for instance, to avoid blocking them, and we will thread and fork network servers to avoid denying service to clients.

5.3.3. The Queue Module

You can synchronize your threads' access to shared resources with locks, but you usually don't have to. As mentioned in our introduction to threads, realistically scaled, threaded programs are often structured as a set of producer and consumer threads, which communicate by placing data on, and taking it off of, a shared queue.

The Python Queue module implements this storage device. It provides a standard queue data structure (a fifo, a first-in first-out list, in which items are added on one end and removed from the other), which may contain any type of Python object. However, the queue object is automatically controlled with thread lock acquire and release calls, such that only one thread can modify the queue at any given point in time. Because of this, programs that use a queue for their cross-thread communication will be thread-safe and can usually avoid dealing with locks of their own.

Like the other tools in Python's threading arsenal, queues are surprisingly simple to use. The script in Example 5-11, for instance, spawns two consumer threads that watch for data to appear on the shared queue and four producer threads that place data on the queue periodically after a sleep interval (each of their sleep durations differs to simulate a real, long-running task). Because the queue is assigned to a global variable, it is shared by all of the spawned threads; all of them run in the same process and in the same global scope.

Example 5-11. PP3E\System\Threads\xd5 ueuetest.py

 ################################################################### # producer and consumer threads communicating with a shared queue ################################################################### numconsumers = 2                  # how many consumers to start numproducers = 4                  # how many producers to start nummessages  = 4                  # messages per producer to put import thread, Queue, time safeprint = thread.allocate_lock( )    # else prints may overlap dataQueue = Queue.Queue( )             # shared global. infinite size def producer(idnum):     for msgnum in range(nummessages):         time.sleep(idnum)         dataQueue.put('producer %d:%d' % (idnum, msgnum)) def consumer(idnum):     while 1:         time.sleep(0.1)         try:             data = dataQueue.get(block=False)         except Queue.Empty:             pass         else:             safeprint.acquire( )             print 'consumer', idnum, 'got =>', data             safeprint.release( ) if _ _name_ _ == '_ _main_ _':     for i in range(numconsumers):         thread.start_new_thread(consumer, (i,))     for i in range(numproducers):         thread.start_new_thread(producer, (i,))     time.sleep(((numproducers-1) * nummessages) + 1)

Following is the output of Example 5-11 when run on my Windows XP machine. Notice that even though the queue automatically coordinates the communication of data between the threads, this script still must use a lock to manually synchronize access to the standard output stream. As in prior examples, if the safeprint lock is not used, the printed lines from one consumer may be intermixed with those of another. It is not impossible that a consumer may be paused in the middle of a print operation (in fact, this occurs regularly on my test machine in some test scenarios; try it on yours to see for yourself).

 C:\...\PP3E\System\Threads >queuetest.py consumer 0 got => producer 0:0 consumer 1 got => producer 0:1 consumer 0 got => producer 0:2 consumer 1 got => producer 0:3 consumer 0 got => producer 1:0 consumer 1 got => producer 1:1 consumer 0 got => producer 2:0 consumer 1 got => producer 1:2 consumer 0 got => producer 3:0 consumer 0 got => producer 1:3 consumer 1 got => producer 2:1 consumer 1 got => producer 2:2 consumer 0 got => producer 3:1 consumer 1 got => producer 2:3 consumer 0 got => producer 3:2 consumer 1 got => producer 3:3

Try adjusting the parameters at the top of this script to experiment with different scenarios. A single consumer, for instance, would simulate a GUI's main thread; the output of a single-consumer run is given here. Producers still add to the queue in fairly random fashion, because threads run in parallel with each other and with the consumer.

 C:\...\PP3E\System\Threads >queuetest.py consumer 0 got => producer 0:0 consumer 0 got => producer 0:1 consumer 0 got => producer 0:2 consumer 0 got => producer 0:3 consumer 0 got => producer 1:0 consumer 0 got => producer 2:0 consumer 0 got => producer 1:1 consumer 0 got => producer 1:2 consumer 0 got => producer 3:0 consumer 0 got => producer 2:1 consumer 0 got => producer 1:3 consumer 0 got => producer 2:2 consumer 0 got => producer 3:1 consumer 0 got => producer 2:3 consumer 0 got => producer 3:2 consumer 0 got => producer 3:3

Queues may be fixed or infinite in size, and get and put calls may or may not block; see the Python library manual for more details on queue interface options.

GUIs and Threads

We will return to threads and queues and see additional thread and queue examples when we study GUIs in a later part of the book. The PyMailGUI example in Chapter 15, for instance, will make extensive use of thread tools introduced here and developed further in Chapter 11. Although we can't get into code at this point, threads are usually an integral part of most nontrivial GUIs. In fact, many GUIs are a combination of threads, a queue, and a timer-based loop.

Here's why. In the context of a GUI, any operation that can block or take a long time to complete must be spawned off in a thread so that the GUI (the main thread) remains active. Because only the main thread can generally update the display, GUI programs typically take the form of a main GUI thread and one or more long-running producer threadsone for each long-running task being performed. To synchronize their points of interface, all of the threads share data on a global queue. More specifically:

The main thread handles all GUI updates and runs a timer-based loop that wakes up periodically to check for new data on the queue to be displayed on-screen. The after( ) Tkinter method can be used to schedule queue-check events. All GUI updates occur only in this main thread.
The child threads don't do anything GUI related. They just produce data and put it on the queue to be picked up by the main thread. Alternatively, child threads can place a callback function on the queue, to be picked up and run by the main thread. It's not generally sufficient, however, to simply pass in a GUI update callback function from the main thread to the child thread and run it from there. The function in shared memory will still be executed in the child thread.

Since threads are much more responsive than a timer event loop in the GUI, this scheme both avoids blocking the GUI (producer threads run in parallel with the GUI) and avoids missing incoming events (producer threads run independent of the GUI event loop). The main GUI thread will display the queued results as quickly as it can, in the context of a slower GUI event loop. See also this chapter's discussion of sys.setcheckinterval for tweaking the responsiveness of spawned producer threads.

5.3.4. The Global Interpreter Lock and Threads

Strictly speaking, Python currently uses a global interpreter lock (GIL) mechanism, which guarantees that at most, one thread is running code within the Python interpreter at any given point in time. We introduced the GIL at the start of the "Threads" section. In addition, to make sure that each thread gets a chance to run, the interpreter automatically switches its attention between threads at regular intervals (by releasing and acquiring the lock after a number of bytecode instructions) as well as at the start of long-running operations (e.g., on file input/output requests).

This scheme avoids problems that could arise if multiple threads were to update Python system data at the same time. For instance, if two threads were allowed to simultaneously change an object's reference count, the result may be unpredictable. This scheme can also have subtle consequences. In this chapter's threading examples, for instance, the stdout stream is likely corrupted only because each thread's call to write text is a long-running operation that triggers a thread switch within the interpreter. Other threads are then allowed to run and make write requests while a prior write is in progress.

Moreover, even though the GIL prevents more than one Python thread from running at the same time, it is not enough to ensure thread safety in general, and it does not address higher-level synchronization issues at all. For example, in the case that more than one thread might attempt to update the same variable at the same time, the threads should generally be given exclusive access to the object with locks. Otherwise, it's not impossible that thread switches will occur in the middle of an update statement's bytecode. Consider this code:

 import thread, time count = 0 def adder( ):     global count     count = count + 1     # concurrently update a shared global     count = count + 1     # thread swapped out in the middle of this for i in range(100):     thread.start_new(adder, ( ))   # start 100 update threads time.sleep(5) print count

As is, this code fails on Windows due to the way its threads are interleaved (you may get a different result each time, but you usually won't get 200), but it works if lock acquire/release calls are inserted around the addition statements. The reason for the failure is subtle, but eventually, one thread will fetch the current value of count and be swapped out of memory before incrementing it. When this thread resumes, it will be updating a potentially old value of count, which other threads may have subsequently changed. All the work done since the thread was suspended will be lost.

Locks are not strictly required for all shared object access, especially if a single thread updates an object inspected by other threads. As a rule of thumb, though, you should generally use locks to synchronize threads whenever update rendezvous are possible instead of relying on the current thread implementation. The following version of the prior code works as expected:

 import thread, time count = 0 def adder( ):     global count     lock.acquire( )        # only one thread running this at a time     count = count + 1      # concurrently update a shared global     count = count + 1     lock.release( ) lock = thread.allocate_lock( ) for i in range(100):     thread.start_new(adder, ( ))   # start 100 update threads time.sleep(5) print count                       # prints 200

5.3.4.1. The thread switch interval

Interestingly, the preceding example also works without locks if the thread-switch check interval is made high enough to allow each thread to finish without being swapped out. The sys.setcheckinterval(N) call sets the frequency with which the interpreter checks for things like thread switches and signal handlers.

This interval defaults to 100, the number of bytecode instructions before a switch. It does not need to be reset for most programs, but it can be used to tune thread performance. Setting higher values means switches happen less often: threads incur less overhead but they are less responsive to events. Setting lower values makes threads more responsive to events but increases thread switch overhead.

5.3.4.2. Atomic operations

Note that because of the way Python uses the GIL to synchronize threads' access to the virtual machine, whole statements are not generally thread-safe, but each bytecode instruction is. Aa thread will never be suspended in the middle of a bytecode's operation, and generally won't be during the execution of the C code that the bytecode invokes (though some long-running C code tasks release the GIL and allow the thread to be suspendedin fact, this is likely why print statements' output may be intermixed).

Because of this bytecode indivisibility, some Python language operations are thread-safealso called atomic, because they run without interruptionand do not require the use of locks or queues to avoid concurrent update issues. As of this writing, for instance, the following operations are thread-safe (in this listing L, L1, and L2 are lists; D, D1, and D2 are dictionaries; x and y are objects; and i and j are integers):

 L.append(x) L1.extend(L2) x = L[i] x = L.pop( ) L1[i:j] = L2 L.sort( ) x = y x.field = y D[x] = y D1.update(D2) D.keys( )

The following are not thread-safe. Relying on these rules is a bit of a gamble, though, because they require a deep understanding of Python internals and may vary per release. As a rule of thumb, it may be easier to use locks for all access to global and shared objects than to try to remember which types of access may or may not be safe across multiple threads.

 i = i+1 L.append(L[-1]) L[i] = L[j] D[x] = D[x] + 1

5.3.4.3. C API thread considerations

Finally, if you plan to mix Python with C, also see the thread interfaces described in the Python/C API standard manual. In threaded programs, C extensions must release and reacquire the GIL around long-running operations to let other Python threads run during the wait. Specifically, the long-running C extension function should release the lock on entry and reacquire it on exit when resuming Python code.

Also note that even though Python threads cannot truly overlap in time due to the GIL synchronization, C-coded threads can; any number may be running in parallel, as long as they do work outside the scope of the Python virtual machine. In fact, C threads may overlap both with other C threads and with Python language threads run in the virtual machine. Because of this, splitting code off to C libraries is one way that Python applications can still take advantage of multi-CPU machines.

Still, it will usually be easier to leverage such machines by simply writing Python programs that fork processes instead of starting threads. The complexity of process and thread code is similar. For more on C extensions and their threading requirements, see Chapter 22. There, we'll meet a pair of macros that can be used to wrap long-running operations in C coded extensions and that allow other Python threads to run in parallel.