18.3. Python, Threads, and the Global Interpreter Lock18.3.1. Global Interpreter Lock (GIL)Execution of Python code is controlled by the Python Virtual Machine (aka the interpreter main loop). Python was designed in such a way that only one thread of control may be executing in this main loop, similar to how multiple processes in a system share a single CPU. Many programs may be in memory, but only one is live on the CPU at any given moment. Likewise, although multiple threads may be "running" within the Python interpreter, only one thread is being executed by the interpreter at any given time. Access to the Python Virtual Machine is controlled by the global interpreter lock (GIL). This lock is what ensures that exactly one thread is running. The Python Virtual Machine executes in the following manner in an MT environment:
When a call is made to external code, i.e., any C/C++ extension built-in function, the GIL will be locked until it has completed (since there are no Python bytecodes to count as the interval). Extension programmers do have the ability to unlock the GIL, however, so you being the Python developer shouldn't have to worry about your Python code locking up in those situations. As an example, for any Python I/O-oriented routines (which invoke built-in operating system C code), the GIL is released before the I/O call is made, allowing other threads to run while the I/O is being performed. Code that doesn't have much I/O will tend to keep the processor (and GIL) for the full interval a thread is allowed before it yields. In other words, I/O-bound Python programs stand a much better chance of being able to take advantage of a multithreaded environment than CPU-bound code. Those of you interested in the source code, the interpreter main loop, and the GIL can take a look at the Python/ceval.c file. 18.3.2. Exiting ThreadsWhen a thread completes execution of the function it was created for, it exits. Threads may also quit by calling an exit function such as tHRead. exit(), or any of the standard ways of exiting a Python process, i.e., sys.exit() or raising the SystemExit exception. You cannot, however, go and "kill" a thread. We will discuss in detail the two Python modules related to threads in the next section, but of the two, the thread module is the one we do not recommend. There are many reasons for this, but an obvious one is that when the main thread exits, all other threads die without cleanup. The other module, threading, ensures that the whole process stays alive until all "important" child threads have exited. (We will clarify what "important" means soon. Look for the daemon threads Core Tip sidebar.) Main threads should always be good managers, though, and perform the task of knowing what needs to be executed by individual threads, what data or arguments each of the spawned threads requires, when they complete execution, and what results they provide. In so doing, those main threads can collate the individual results into a final, meaningful conclusion. 18.3.3. Accessing Threads from PythonPython supports multithreaded programming, depending on the operating system that it is running on. It is supported on most Unix-based platforms, i.e., Linux, Solaris, MacOS X, *BSD, as well as Win32 systems. Python uses POSIX-compliant threads, or "pthreads," as they are commonly known. By default, threads are enabled when building Python from source (since Python 2.0) or the Win32 installed binary. To tell whether threads are available for your interpreter, simply attempt to import the thread module from the interactive interpreter. No errors occur when threads are available: >>> import thread >>> If your Python interpreter was not compiled with threads enabled, the module import fails: >>> import thread Traceback (innermost last): File "<stdin>", line 1, in ? ImportError: No module named thread In such cases, you may have to recompile your Python interpreter to get access to threads. This usually involves invoking the configure script with the "--with-thread" option. Check the README file for your distribution to obtain specific instructions on how to compile Python with threads for your system. 18.3.4. Life Without ThreadsFor our first set of examples, we are going to use the time.sleep() function to show how threads work. time.sleep() takes a floating point argument and "sleeps" for the given number of seconds, meaning that execution is temporarily halted for the amount of time specified. Let us create two "time loops," one that sleeps for 4 seconds and one that sleeps for 2 seconds, loop0() and loop1(), respectively. (We use the names "loop0" and "loop1" as a hint that we will eventually have a sequence of loops.) If we were to execute loop0() and loop1() sequentially in a one-process or single-threaded program, as onethr.py does in Example 18.1, the total execution time would be at least 6 seconds. There may or may not be a 1-second gap between the starting of loop0() and loop1(), and other execution overhead which may cause the overall time to be bumped to 7 seconds. Example 18.1. Loops Executed by a Single Thread (onethr.py)
We can verify this by executing onethr.py, which gives the following output: $ onethr.py starting at: Sun Aug 13 05:03:34 2006 start loop 0 at: Sun Aug 13 05:03:34 2006 loop 0 done at: Sun Aug 13 05:03:38 2006 start loop 1 at: Sun Aug 13 05:03:38 2006 loop 1 done at: Sun Aug 13 05:03:40 2006 all DONE at: Sun Aug 13 05:03:40 2006 Now, pretend that rather than sleeping, loop0() and loop1() were separate functions that performed individual and independent computations, all working to arrive at a common solution. Wouldn't it be useful to have them run in parallel to cut down on the overall running time? That is the premise behind MT that we now introduce to you. 18.3.5. Python Threading ModulesPython provides several modules to support MT programming, including the tHRead, tHReading, and Queue modules. The thread and threading modules allow the programmer to create and manage threads. The thread module provides basic thread and locking support, while threading provides higher-level, fully featured thread management. The Queue module allows the user to create a queue data structure that can be shared across multiple threads. We will take a look at these modules individually and present examples and intermediate-sized applications. Core Tip: Avoid use of thread module
|