5.3. ThreadsThreads are another way to start activities running at the same time. In short, they run a function call in parallel with the rest of the program. Threads are sometimes called "lightweight processes," because they run in parallel like forked processes, but all of them run within the same single process. While processes are commonly used to start independent programs, threads are commonly used for tasks such as nonblocking input calls and long-running tasks in a GUI. They also provide a natural model for algorithms that can be expressed as independently running tasks. For applications that can benefit from parallel processing, some developers consider threads to offer a number of advantages:
So what's the catch? There are three potential downsides you should be aware of before you start spinning your threads:
Despite what you may think after reading the last few introductory paragraphs, threads are remarkably easy to use in Python. In fact, when a program is started it is already running a thread, usually called the "main thread" of the process. To start new, independent threads of execution within a process, we use either the Python thread module to run a function call in a spawned thread or the Python threading module to manage threads with high-level objects. Both modules also provide tools for synchronizing access to shared objects with locks. 5.3.1. The thread ModuleSince the basic tHRead module is a bit simpler than the more advanced threading module covered later in this section, let's look at some of its interfaces first. This module provides a portable interface to whatever threading system is available in your platform: its interfaces work the same on Windows, Solaris, SGI, and any system with an installed pthreads POSIX threads implementation (including Linux). Python scripts that use the Python tHRead module work on all of these platforms without changing their source code. Let's start off by experimenting with a script that demonstrates the main thread interfaces. The script in Example 5-5 spawns threads until you reply with a "q" at the console; it's similar in spirit to (and a bit simpler than) the script in Example 5-1, but it goes parallel with threads, not with forks. Example 5-5. PP3E\System\Threads\thread1.py
This script really contains only two thread-specific lines: the import of the thread module and the thread creation call. To start a thread, we simply call the thread.start_new function, no matter what platform we're programming on.[*] This call takes a function object and an arguments tuple and starts a new thread to execute a call to the passed function with the passed arguments. It's almost like the built-in apply function and newer function(*args) call syntax (and, like apply, it also accepts an optional keyword arguments dictionary), but in this case, the function call begins running in parallel with the rest of the program.
Operationally speaking, the tHRead.start_new call itself returns immediately with no useful value, and the thread it spawns silently exits when the function being run returns (the return value of the threaded function call is simply ignored). Moreover, if a function run in a thread raises an uncaught exception, a stack trace is printed and the thread exits, but the rest of the program continues. In practice, though, it's almost trivial to use threads in a Python script. Let's run this program to launch a few threads; we can run it on both Linux and Windows this time, because threads are more portable than process forks: C:\...\PP3E\System\Threads>python thread1.py Hello from thread 1 Hello from thread 2 Hello from thread 3 Hello from thread 4 q Each message here is printed from a new thread, which exits almost as soon as it is started. To really understand the power of threads running in parallel, we have to do something more long-lived in our threads. The good news is that threads are both easy and fun to play with in Python. Let's mutate the fork-count program of the prior section to use threads. The script in Example 5-6 starts 10 copies of its counter running in parallel threads. Example 5-6. PP3E\System\Threads\thread-count.py
Each parallel copy of the counter function simply counts from zero up to two here. When run on Windows, all 10 threads run at the same time, so their output is intermixed on the standard output stream: C:\...\PP3E\System\Threads>python thread-count.py ...some lines deleted... [5] => 0 [6] => 0 [7] => 0 [8] => 0 [9] => 0 [3] => 1 [4] => 1 [1] => 0 [5] => 1 [6] => 1 [7] => 1 [8] => 1 [9] => 1 [3] => 2 [4] => 2 [1] => 1 [5] => 2 [6] => 2 [7] => 2 [8] => 2 [9] => 2 [1] => 2 Main thread exiting. In fact, the output of these threads is mixed arbitrarily, at least on Windows. It may even be in a different order each time you run this script. Because all 10 threads run as independent entities, the exact ordering of their overlap in time depends on nearly random system state at large at the time they are run. If you care to make this output a bit more coherent, uncomment the time.sleep(1) call in the counter function (that is, remove the # before it) and rerun the script. If you do, each of the 10 threads now pauses for one second before printing its current count value. Because of the pause, all threads check in at the same time with the same count; you'll actually have a one-second delay before each batch of 10 output lines appears: C:\...\PP3E\System\Threads>python thread-count.py ...some lines deleted... [7] => 0 [6] => 0 pause... [0] => 1 [1] => 1 [2] => 1 [3] => 1 [5] => 1 [7] => 1 [8] => 1 [9] => 1 [4] => 1 [6] => 1 pause... [0] => 2 [1] => 2 [2] => 2 [3] => 2 [5] => 2 [9] => 2 [7] => 2 [6] => 2 [8] => 2 [4] => 2 Main thread exiting. Even with the sleep synchronization active, though, there's no telling in what order the threads will print their current count. It's random on purpose. The whole point of starting threads is to get work done independently, in parallel. Notice that this script sleeps for four seconds at the end. It turns out that, at least on my Windows and Linux installs, the main thread cannot exit while any spawned threads are running; if it does, all spawned threads are immediately terminated. Without the sleep here, the spawned threads would die almost immediately after they are started. This may seem ad hoc, but it isn't required on all platforms, and programs are usually structured such that the main thread naturally lives as long as the threads it starts. For instance, a user interface may start an FTP download running in a thread, but the download lives a much shorter life than the user interface itself. Later in this section, we'll see different ways to avoid this sleep using global flags, and we will also meet a "join" utility in a different module that lets us wait for spawned threads to finish explicitly. 5.3.1.1. Synchronizing access to global objectsOne of the nice things about threads is that they automatically come with a cross-task communications mechanism: shared global memory. For instance, because every thread runs in the same process, if one Python thread changes a global variable, the change can be seen by every other thread in the process, main or child. This serves as a simple way for a program's threads to pass informationexit flags, result objects, event indicators, and so onback and forth to each other. The downside to this scheme is that our threads must sometimes be careful to avoid changing global objects at the same time. If two threads change an object at once, it's not impossible that one of the two changes will be lost (or worse, will corrupt the state of the shared object completely). The extent to which this becomes an issue varies per application, and sometimes it isn't an issue at all. But even things that aren't obviously at risk may be at risk. Files and streams, for example, are shared by all threads in a program; if multiple threads write to one stream at the same time, the stream might wind up with interleaved, garbled data. Here's an example: if you edit Example 5-6, comment out the sleep call in counter, and increase the per-thread count parameter from 3 to 100, you might occasionally see the same strange results on Windows that I did: C:\...\PP3E\System\Threads\>python thread-count.py | more ...more deleted... [5] => 14 [7] => 14 [9] => 14 [3] => 15 [5] => 15 [7] => 15 [9] => 15 [3] => 16 [5] => 16 [7] => 16 [9] => 16 [3] => 17 [5] => 17 [7] => 17 [9] => 17 ...more deleted... Because all 10 threads are trying to write to stdout at the same time, once in a while the output of more than one thread winds up on the same line. Such an oddity in this artificial script isn't exactly going to crash the Mars Lander, but it's indicative of the sorts of clashes in time that can occur when our programs go parallel. To be robust, thread programs need to control access to shared global items like this such that only one thread uses it at once.[*]
Luckily, Python's thread module comes with its own easy-to-use tools for synchronizing access to shared objects among threads. These tools are based on the concept of a lockto change a shared object, threads acquire a lock, make their changes, and then release the lock for other threads to grab. Lock objects are allocated and processed with simple and portable calls in the thread module and are automatically mapped to thread locking mechanisms on the underlying platform. For instance, in Example 5-7, a lock object created by thread.allocate_lock is acquired and released by each thread around the print statement that writes to the shared standard output stream. Example 5-7. PP3E\System\Threads\thread-count-mutex.py
Python guarantees that only one thread can acquire a lock at any given time; all other threads that request the lock are blocked until a release call makes it available for acquisition. The net effect of the additional lock calls in this script is that no two threads will ever execute a print statement at the same point in time; the lock ensures mutually exclusive access to the stdout stream. Hence, the output of this script is the same as the original thread_count.py except that standard output text is never munged by overlapping prints. Incidentally, uncommenting the time.sleep call in this version's counter function makes each output line show up one second apart. Because the sleep occurs while a thread holds the mutex lock, all other threads are blocked while the lock holder sleeps, even though time.sleep itself does not block other threads. One thread grabs the mutex lock, sleeps one second, and prints; another thread grabs, sleeps, and prints, and so on. Given 10 threads counting up to three, the program as a whole takes 30 seconds (10 x 3) to finish, with one line appearing per second. Of course, that assumes that the main thread sleeps at least that long too; to see how to remove this assumption, we need to move on to the next section. 5.3.1.2. Waiting for spawned thread exitsThread module locks are surprisingly useful. They can form the basis of higher-level synchronization paradigms (e.g., semaphores) and can be used as general thread communication devices.[*] For example, Example 5-8 uses a global list of locks to know when all child threads have finished.
Example 5-8. PP3E\System\Threads\thread-count-wait1.py
A lock's locked method can be used to check its state. To make this work, the main thread makes one lock per child and tacks them onto a global exitmutexes list (remember, the threaded function shares global scope with the main thread). On exit, each thread acquires its lock on the list, and the main thread simply watches for all locks to be acquired. This is much more accurate than naïvely sleeping while child threads run in hopes that all will have exited after the sleep. But wait, it gets even simpler: since threads share global memory anyhow, we can achieve the same effect with a simple global list of integers, not locks. In Example 5-9, the module's namespace (scope) is shared by top-level code and the threaded function, as before. exitmutexes refers to the same list object in the main thread and all threads it spawns. Because of that, changes made in a thread are still noticed in the main thread without resorting to extra locks. Example 5-9. PP3E\System\Threads\thread-count-wait2.py
The main threads of both of the last two scripts fall into busy-wait loops at the end, which might become significant performance drains in tight applications. If so, simply add a time.sleep call in the wait loops to insert a pause between end tests and to free up the CPU for other tasks. Even threads must be good citizens. Both of the last two counting thread scripts produce roughly the same output as the original thread_count.py, albeit without stdout corruption and with different random ordering of output lines. The main difference is that the main thread exits immediately after (and no sooner than!) the spawned child threads: C:\...\PP3E\System\Threads>python thread-count-wait2.py ...more deleted... [2] => 98 [6] => 97 [0] => 99 [7] => 97 [3] => 98 [8] => 97 [9] => 97 [1] => 99 [4] => 98 [5] => 98 [2] => 99 [6] => 98 [7] => 98 [3] => 99 [8] => 98 [9] => 98 [4] => 99 [5] => 99 [6] => 99 [7] => 99 [8] => 99 [9] => 99 Main thread exiting. Of course, threads are for much more than counting. We'll put shared global data to more practical use in a later chapter, where it will serve as completion signals from child processing threads transferring data over a network to a main thread controlling a Tkinter GUI user interface display (see Chapter 14). Shared global data among threads also turns out to be the basis of queues, which are discussed later in this section; each thread gets or puts data using the same queue object. 5.3.2. The threading ModuleThe Python standard library comes with two thread modulesthread, the basic lower-level interface illustrated thus far, and tHReading, a higher-level interface based on objects. The threading module internally uses the tHRead module to implement objects that represent threads and common synchronization tools. It is loosely based on a subset of the Java language's threading model, but it differs in ways that only Java programmers would notice.[*] Example 5-10 morphs our counting threads example one last time to demonstrate this new module's interfaces.
Example 5-10. PP3E\System\Threads\thread-classes.py
The output of this script is the same as that shown for its ancestors earlier (again, randomly distributed). Using the tHReading module is largely a matter of specializing classes. Threads in this module are implemented with a THRead object, a Python class which we customize per application by providing a run method that defines the thread's action. For example, this script subclasses Thread with its own mytHRead class; the run method will be executed by the THRead framework in a new thread when we make a mythread and call its start method. In other words, this script simply provides methods expected by the THRead framework. The advantage of taking this more coding-intensive route is that we get a set of additional thread-related tools from the framework "for free." The Thread.join method used near the end of this script, for instance, waits until the thread exits (by default); we can use this method to prevent the main thread from exiting too early rather than using the time.sleep calls and global locks and variables we relied on in earlier threading examples. The example script also uses threading.Lock to synchronize stream access (though this name is just a synonym for thread.allocate_lock in the current implementation). The THRead class can also be used to start a simple function without subclassing, though this call form is not noticeably simpler than the basic thread module. For example, the following four code snippets spawn the same sort of thread: # subclass with state class mythread(threading.Thread): def _ _init_ _(self, myId, count): self.i = i threading.Thread._ _init_ _(self) def run(self): consumer(self.i) mythread().start( ) # pass action in thread = threading.Thread(target=(lambda: consumer(i))) thread.start( ) # same but no lambda wrapper for state Threading.Thread(target=consumer, args=(i,)).start( ) # basic thread module thread.start_new_thread(consumer, (i,)) Besides THRead and Lock, the threading module also includes higher-level objects for synchronizing access to shared items (e.g., Semaphore, Condition, Event)many more than we have space to cover here; see the library manual for details.[*]
For more examples of threads and forks in general, see the following sections of this chapter as well as the examples in the GUI and network scripting parts of this book. We will thread GUIs, for instance, to avoid blocking them, and we will thread and fork network servers to avoid denying service to clients. 5.3.3. The Queue ModuleYou can synchronize your threads' access to shared resources with locks, but you usually don't have to. As mentioned in our introduction to threads, realistically scaled, threaded programs are often structured as a set of producer and consumer threads, which communicate by placing data on, and taking it off of, a shared queue. The Python Queue module implements this storage device. It provides a standard queue data structure (a fifo, a first-in first-out list, in which items are added on one end and removed from the other), which may contain any type of Python object. However, the queue object is automatically controlled with thread lock acquire and release calls, such that only one thread can modify the queue at any given point in time. Because of this, programs that use a queue for their cross-thread communication will be thread-safe and can usually avoid dealing with locks of their own. Like the other tools in Python's threading arsenal, queues are surprisingly simple to use. The script in Example 5-11, for instance, spawns two consumer threads that watch for data to appear on the shared queue and four producer threads that place data on the queue periodically after a sleep interval (each of their sleep durations differs to simulate a real, long-running task). Because the queue is assigned to a global variable, it is shared by all of the spawned threads; all of them run in the same process and in the same global scope. Example 5-11. PP3E\System\Threads\xd5 ueuetest.py
Following is the output of Example 5-11 when run on my Windows XP machine. Notice that even though the queue automatically coordinates the communication of data between the threads, this script still must use a lock to manually synchronize access to the standard output stream. As in prior examples, if the safeprint lock is not used, the printed lines from one consumer may be intermixed with those of another. It is not impossible that a consumer may be paused in the middle of a print operation (in fact, this occurs regularly on my test machine in some test scenarios; try it on yours to see for yourself). C:\...\PP3E\System\Threads >queuetest.py consumer 0 got => producer 0:0 consumer 1 got => producer 0:1 consumer 0 got => producer 0:2 consumer 1 got => producer 0:3 consumer 0 got => producer 1:0 consumer 1 got => producer 1:1 consumer 0 got => producer 2:0 consumer 1 got => producer 1:2 consumer 0 got => producer 3:0 consumer 0 got => producer 1:3 consumer 1 got => producer 2:1 consumer 1 got => producer 2:2 consumer 0 got => producer 3:1 consumer 1 got => producer 2:3 consumer 0 got => producer 3:2 consumer 1 got => producer 3:3 Try adjusting the parameters at the top of this script to experiment with different scenarios. A single consumer, for instance, would simulate a GUI's main thread; the output of a single-consumer run is given here. Producers still add to the queue in fairly random fashion, because threads run in parallel with each other and with the consumer. C:\...\PP3E\System\Threads >queuetest.py consumer 0 got => producer 0:0 consumer 0 got => producer 0:1 consumer 0 got => producer 0:2 consumer 0 got => producer 0:3 consumer 0 got => producer 1:0 consumer 0 got => producer 2:0 consumer 0 got => producer 1:1 consumer 0 got => producer 1:2 consumer 0 got => producer 3:0 consumer 0 got => producer 2:1 consumer 0 got => producer 1:3 consumer 0 got => producer 2:2 consumer 0 got => producer 3:1 consumer 0 got => producer 2:3 consumer 0 got => producer 3:2 consumer 0 got => producer 3:3 Queues may be fixed or infinite in size, and get and put calls may or may not block; see the Python library manual for more details on queue interface options.
5.3.4. The Global Interpreter Lock and ThreadsStrictly speaking, Python currently uses a global interpreter lock (GIL) mechanism, which guarantees that at most, one thread is running code within the Python interpreter at any given point in time. We introduced the GIL at the start of the "Threads" section. In addition, to make sure that each thread gets a chance to run, the interpreter automatically switches its attention between threads at regular intervals (by releasing and acquiring the lock after a number of bytecode instructions) as well as at the start of long-running operations (e.g., on file input/output requests). This scheme avoids problems that could arise if multiple threads were to update Python system data at the same time. For instance, if two threads were allowed to simultaneously change an object's reference count, the result may be unpredictable. This scheme can also have subtle consequences. In this chapter's threading examples, for instance, the stdout stream is likely corrupted only because each thread's call to write text is a long-running operation that triggers a thread switch within the interpreter. Other threads are then allowed to run and make write requests while a prior write is in progress. Moreover, even though the GIL prevents more than one Python thread from running at the same time, it is not enough to ensure thread safety in general, and it does not address higher-level synchronization issues at all. For example, in the case that more than one thread might attempt to update the same variable at the same time, the threads should generally be given exclusive access to the object with locks. Otherwise, it's not impossible that thread switches will occur in the middle of an update statement's bytecode. Consider this code: import thread, time count = 0 def adder( ): global count count = count + 1 # concurrently update a shared global count = count + 1 # thread swapped out in the middle of this for i in range(100): thread.start_new(adder, ( )) # start 100 update threads time.sleep(5) print count As is, this code fails on Windows due to the way its threads are interleaved (you may get a different result each time, but you usually won't get 200), but it works if lock acquire/release calls are inserted around the addition statements. The reason for the failure is subtle, but eventually, one thread will fetch the current value of count and be swapped out of memory before incrementing it. When this thread resumes, it will be updating a potentially old value of count, which other threads may have subsequently changed. All the work done since the thread was suspended will be lost. Locks are not strictly required for all shared object access, especially if a single thread updates an object inspected by other threads. As a rule of thumb, though, you should generally use locks to synchronize threads whenever update rendezvous are possible instead of relying on the current thread implementation. The following version of the prior code works as expected: import thread, time count = 0 def adder( ): global count lock.acquire( ) # only one thread running this at a time count = count + 1 # concurrently update a shared global count = count + 1 lock.release( ) lock = thread.allocate_lock( ) for i in range(100): thread.start_new(adder, ( )) # start 100 update threads time.sleep(5) print count # prints 200 5.3.4.1. The thread switch intervalInterestingly, the preceding example also works without locks if the thread-switch check interval is made high enough to allow each thread to finish without being swapped out. The sys.setcheckinterval(N) call sets the frequency with which the interpreter checks for things like thread switches and signal handlers. This interval defaults to 100, the number of bytecode instructions before a switch. It does not need to be reset for most programs, but it can be used to tune thread performance. Setting higher values means switches happen less often: threads incur less overhead but they are less responsive to events. Setting lower values makes threads more responsive to events but increases thread switch overhead. 5.3.4.2. Atomic operationsNote that because of the way Python uses the GIL to synchronize threads' access to the virtual machine, whole statements are not generally thread-safe, but each bytecode instruction is. Aa thread will never be suspended in the middle of a bytecode's operation, and generally won't be during the execution of the C code that the bytecode invokes (though some long-running C code tasks release the GIL and allow the thread to be suspendedin fact, this is likely why print statements' output may be intermixed). Because of this bytecode indivisibility, some Python language operations are thread-safealso called atomic, because they run without interruptionand do not require the use of locks or queues to avoid concurrent update issues. As of this writing, for instance, the following operations are thread-safe (in this listing L, L1, and L2 are lists; D, D1, and D2 are dictionaries; x and y are objects; and i and j are integers): L.append(x) L1.extend(L2) x = L[i] x = L.pop( ) L1[i:j] = L2 L.sort( ) x = y x.field = y D[x] = y D1.update(D2) D.keys( ) The following are not thread-safe. Relying on these rules is a bit of a gamble, though, because they require a deep understanding of Python internals and may vary per release. As a rule of thumb, it may be easier to use locks for all access to global and shared objects than to try to remember which types of access may or may not be safe across multiple threads. i = i+1 L.append(L[-1]) L[i] = L[j] D[x] = D[x] + 1 5.3.4.3. C API thread considerationsFinally, if you plan to mix Python with C, also see the thread interfaces described in the Python/C API standard manual. In threaded programs, C extensions must release and reacquire the GIL around long-running operations to let other Python threads run during the wait. Specifically, the long-running C extension function should release the lock on entry and reacquire it on exit when resuming Python code. Also note that even though Python threads cannot truly overlap in time due to the GIL synchronization, C-coded threads can; any number may be running in parallel, as long as they do work outside the scope of the Python virtual machine. In fact, C threads may overlap both with other C threads and with Python language threads run in the virtual machine. Because of this, splitting code off to C libraries is one way that Python applications can still take advantage of multi-CPU machines. Still, it will usually be easier to leverage such machines by simply writing Python programs that fork processes instead of starting threads. The complexity of process and thread code is similar. For more on C extensions and their threading requirements, see Chapter 22. There, we'll meet a pair of macros that can be used to wrap long-running operations in C coded extensions and that allow other Python threads to run in parallel. |