Threads | The Art of Software Security Assessment: Identifying and Preventing Software Vulnerabilities

Multithreaded programs also suffer from reentrancy problems in much the same way as signal handlers and processes dealing with global resources, but to a larger extent. Code in a multithreaded application can be interrupted at any point, so it needs to be coded carefully to avoid race and deadlock conditions. Bugs in software related to thread races are often subtle and hard to debug because the program seems to work fine most of the time, but one out of every hundred tries or so, it behaves differently. Often these bugs can turn out to be security problems because the race condition might result in memory corruption or other equally undesirable program behavior. In multithreaded environments, you might question how much of a security problem synchronization issues are. After all, with signals, attackers can try to send well-timed signals specifically to trigger a bug, but what about threads? The truth is that attackers may or may not be able to influence the program enough to trigger a threading error; it depends on what the program does. Usually, however, it's safe to assume attackers can trigger it or give the program such a heavy workload that it's likely to be triggered. After the error occurs, they can probably cause enough damage to bring the program down or have it violate security policies in some way.

OS Thread APIs contain functionality for developers to create programs that can safely execute concurrent threads of execution in the same address space. Both Windows and UNIX provide robust threading APIs with similar semantics and potential for multithreaded programming issues. As such, both APIs are covered in examples throughout this section. Before you examine the examples, the following sections introduce you to these APIs.

Note

There are multiple threading interfaces for UNIX environments, the primary one being PThreads (POSIX threads), which is what's used in this section.

PThreads API

The PThreads API enables developers to design thread-safe code that avoids race conditions by defining two data types that can be used as synchronization objects: mutexes and condition variables.

Mutexes in PThreads

A mutex in PThreads is similar in principle to the mutexes in Windows, except it isn't globally visible. It's used to ensure that a shared resource is being operated on by only one thread at a time.

Note

Actually, a PThreads mutex is more like a critical section provided by Windows (covered in "Windows API" later in this chapter).

The PThreads API provides a mutex data type (pthread_mutex_t) for controlling access to code that isn't allowed to be interrupted by other threads, commonly referred to as "critical sections." The pthread_mutex_t type is manipulated with the functions described in the following paragraphs.

The pthread_mutex_init() function initializes a mutex data type:

int pthread_mutex_init(pthread_mutex_t *mutex, const pthread_mutex_attr_t *attr)

The attr parameter specifies attributes that can modify the mutex's behavior. These attributes aren't covered in this chapter because they aren't relevant to the issues discussed. This function must be called before a mutex is used.

Note

Instead of calling the pthread_mutex_init() function, a developer can just initialize the mutex with default values manually, typically with the constant PTHREAD_MUTEX_INITIALIZER. A variation of PThreads for Linux, called LinuxThreads, has two other initializers: PTHREAD_RECURSIVE_INTIALIZER_NP and PTHREAD_ERRORCHECK_MUTEX_NP, which initialize the mutex with different attributes.

The following function is used to lock the mutex:

int pthread_mutex_lock(pthread_mutex_t *mutex)

If the mutex is already locked, the thread calling this function goes to sleep until the lock is released.

The pthread_mutex_trylock() function is identical to pthread_mutex_lock(), except it returns immediately to the caller with an error if the mutex is already locked:

int pthread_mutex_trylock(pthread_mutex_t *mutex)

The following function unlocks a mutex that was locked with pthread_mutex_lock() or pthread_mutex_unlock():

int pthread_mutex_unlock(pthread_mutex_t *mutex)

The following function destroys a mutex; it's called after the program no longer needs the mutex:

int pthread_mutex_destroy(pthread_mutex_t *mutex)

Condition Variables

PThreads provides another synchronization object, the condition variable (pthread_cond_t), which is used to indicate to waiting threads that a certain condition has been met. In this respect, condition variables are similar to a localized version of the Windows events (localized because condition variables aren't globally accessible). The functions for manipulating a condition variable are described in the following paragraphs.

The pthread_cond_init() function is used for initializing a condition variable before use:

int pthread_cond_init(pthread_cond_t *cond, pthread_condattr_t *attr);

The attr parameter supplies optional parameters that can modify the condition variable's behavior. They aren't relevant to this discussion, so for more information, consult the PThreads documentation.

Note

Like pthread_mutex_init(), a developer can choose to initialize a condition variable with default attributes instead of calling this function, typically with the PTHREAD_COND_INITIALIZER constant.

The following function is used to wake up a thread waiting on a condition variable:

int pthread_cond_signal(pthread_cond_t *cond)

If multiple variables are waiting on the condition, only one of the threads is awakened, which is similar to how auto-reset events function in Windows.

The pthread_cond_broadcast() function acts like pthread_cond_signal(), except it wakes up all threads waiting on a condition variable, not just one:

int pthread_cond_broadcast(pthread_cond_t *cond)

This behavior is similar to how manual-reset events function in Windows.

The pthread_cond_wait() function is used to wait on a condition variable:

int pthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex)

The mutex specified by the second argument is atomically unlocked for the duration of time the thread is blocking during the wait on the condition variable. After the condition variable is signaled, this function relocks the mutex before returning.

The following function basically the same as pthread_cond_wait(), except it waits only the amount of time indicated by the abstime parameter:

[View full width]
int pthread_cond_timedwait(pthread_cond_t *cond, pthread_mutex_t *mutex, const struct  timespec *abstime)

The pthread_cond_destroy() function simply destroys the specified condition variable:

int pthread_cond_destroy(pthread_cond_t *cond)

Windows API

The Windows API for thread synchronization is a little more complicated than PThreads. The Windows API provides a broad range of synchronization objects that a multithreaded process can use to ensure that shared resources are accessed safely. You've already seen most of these objects in the "Windows IPC Synchronization Objects" section earlier in this chapter. However, there are a few thread-specific synchronization primitives, the most important of which being critical section, which will be discussed here.

Note

Even though the IPC objects were introduced as interprocess synchronization objects, they can be used to synchronize threads, so the previous material on using those objects also applies to a single multithreaded process.

Critical Sections

A critical section (declared in code as CRITICAL_SECTION data type) can be used to provide mutually exclusive access to a shared resource by acting as a locking mechanism in the same way a mutex object does. Like a mutex, a critical section has a binary statelocked or unlockedand can be locked by only one thread at a time. The key differences between a mutex object and a critical section is that a critical section can be accessed only by threads of a single process; they are never globally visible or accessible. This is because a critical section isn't a true Windows object; it's simply a data structure that creates a Windows synchronization primitive if necessary. Being a local data structure makes it faster than a mutex and explains why it can be used only between threads in the same process. Therefore, critical sections don't use the wait functions discussed earlier. Instead, the functions described in the following paragraphs are used for manipulating a critical section.

The following function populates the CRITICAL_SECTION data structure; it must be called before any use of the CRITICAL_SECTION:

void InitializeCriticalSection(         LPCRITICAL_SECTION lpCriticalSection)

The following function initializes a CRITICAL_SECTION as well as setting the spin count:

BOOL InitializeCriticalSectionAndSpinCount(         LPCRITICAL_SECTION lpCriticalSection,         DWORD dwSpinCount)

The spin count affects performance but not synchronization, so it's irrelevant to this discussion.

The following function acquires the lock for a CRITICAL_SECTION data structure:

void EnterCriticalSection(LPCRITICAL_SECTION lpCriticalSection)

If the lock is owned by another thread, calling this function causes this thread to block until the lock is available. This means the owning thread doesn't block on a call to this function. However, every call to EnterCriticalSection() must be paired with a call to LeaveCriticalSection(); otherwise, the critical section remains locked and deadlock can occur. This function is equivalent to the pthread_mutex_lock() function from the PThreads API.

The following function attempts to obtain the lock for the specified CRITICAL_SECTION data structure:

BOOL TryEnterCriticalSection(         LPCRITICAL_SECTION lpCriticalSection)

If it's unlocked, this function locks it and returns successfully; otherwise, it returns FALSE. Calling this function doesn't cause the calling thread to block, as EnterCriticalSection() does. Like EnterCriticalSection(), every successful acquiring of a critical section must have a corresponding call to LeaveCriticalSection(); otherwise, deadlock can occur. This function is similar to the pthread_mutex_trylock() function in the PThreads API.

The LeaveCriticalSection() function unlocks the given CRITICAL_SECTION data structure:

void LeaveCriticalSection(LPCRITICAL_SECTION lpCriticalSection)

Any other threads waiting on the critical section are awakened so that one of them can take ownership of it.

The following function deletes a critical section and releases any associated memory and kernel objects:

void DeleteCriticalSection(LPCRITICAL_SECTION lpCriticalSection)

Threading Vulnerabilities

Now that you're familiar with the threading models available in UNIX and Windows, you can begin to look at practical examples of the synchronization problems discussed at the beginning of this chapter. Basically, threading issues are caused by incorrect use of synchronization objects. With race conditions, it's usually because some code that operates on a shared resource isn't correctly synchronized. For deadlock and starvation issues, it's usually because locking devices are used improperly.

Note that you can approach auditing threading vulnerabilities in a similar fashion to auditing IPC synchronization objects. That is, you can construct a scoreboard noting the use of the locking mechanisms and keep notes of potentially dangerous situations.

Race Conditions

As stated previously, a race condition occurs when the successful outcome of an operation depends on whether the threads are scheduled for running in a certain order. Neglecting to use mutexes or semaphores in appropriate places causes race conditions because you can't guarantee a thread won't be interrupted in the middle of modifying or accessing a shared resource.

Auditing code to find potential vulnerabilities of this nature is a three-step process:

1.	Identify shared resources that are acted on by multiple threads.
2.	Determine whether the appropriate locking mechanism has been selected.
3.	Examine the code that modifies this resource to see whether appropriate locking mechanisms have been neglected or misused.

Although this process sounds straightforward, it's often trickier than it seems because of the complexity of multithreaded programming. For this reason, the following sections explain in more detail how to perform each step in a systematic fashion.

Identify Shared Resources

This step is probably the easiest. Any thread synchronization objects are used for one primary reason: threads must access resources atomically. To identify the shared resources being operated on, you simply need to read the code and note accesses to global variables and any objects that aren't local to the thread or process, such as a HANDLE to a global object. Usually, these accesses stand out because the point of worker threads is to operate on a resource. For example, a multithreaded server process might consist of one thread accepting connections from remote nodes and adding received requests to a queue. Then another set of threads takes objects from that queue and processes them on behalf of the client. In this case, the shared resource is obviously the queue where requests are being added to and taken from.

Ensure That Appropriate Locking Mechanisms Are Used

There's no point in using a synchronization object if it's not appropriate for the shared resource that needs to be protected. Therefore, you must evaluate the developers' choice of synchronization primitive so that you can determine whether it meets the intended requirements. Here are some common reasons for providing synchronization for a resource:

A resource can be operated on by only one thread at a time, no matter what it's doing. Generally, a mutex or critical section is necessary.
A resource can be read from by multiple threads. In this case, a semaphore might be most appropriate.
A queue resource has multiple threads adding to it and removing elements from it. In this case, a mutex or critical section seems most appropriate because every thread is actually writing to the queue by unlinking elements from it or linking elements to it.

Obviously, these three reasons are simple guidelines and aren't true for all situations. For instance, this list doesn't consider the need for signaling consumer threads that data is available. Because these requirements can vary so much, you need to be careful to evaluate the locking mechanisms developers select. This evaluation requires understanding the purpose the locking mechanism is supposed to serve and attempting to locate situations in which the mechanism might not behave as intended.

Examine Accesses to the Object

The whole point of locking mechanisms is to allow an object to be modified in an atomic fashion. A race condition can occur when locking mechanisms aren't used in correctly when accessing shared resources or aren't used at all. The most obvious race conditions happen when no locking objects are used, as shown in the following code:

struct element *queue; int fd; void *job_task(void *arg) {     struct element *elem;     struct timespec ts;     ts.tv_sec = 1;     ts.tv_nsec = 0;     for(;;)     {         if(queue == NULL)         {             nanosleep(&ts, NULL);             continue;         }         elem = queue;         queue = queue->next;         .. process element ..     }     return NULL; } void *network_task(void *arg) {     struct element *elem, *tmp;     struct request *req;     for(;;)     {         req = read_request(fd);         if(req == NULL)  // bad request             continue;         elem = request_to_job_element(req);         free(req);         if(elem == NULL)             continue;         if(queue == NULL)             queue = elem;         else         {             for(tmp = queue; tmp->next; tmp = tmp->next)                 ;             tmp->next = elem;         }     }     return NULL; }

Imagine you have a program containing multiple threads: one thread running the network_task() function and multiple threads running the job_task() function. Because there are no locks around any code that acts on the queue variable, it's possible that a thread can operate on queue when it's in an inconsistent state because the previously running thread was interrupted while operating on queue. Furthermore, when the previous thread commences running again, it might have outdated data in local variables, such as pointers to elements that have been dequeued and processed by another thread already. In reality, this kind of blatant failure to use locking mechanisms is quite rare. You'll probably encounter it only in code that was previously developed for a single-threaded application and migrated to a multithreaded application without careful review of all the components. You might also run into this problem when code is imported from a library that wasn't developed for a multithreaded environment, such as a single-threaded Java library that's later incorporated into a multithreaded Java servlet.

Sometimes locks are instantiated correctly but used incorrectly, which can also result in race conditions. Here's a modified version of the previous example:

struct element *queue; pthread_mutex_t queue_lock; pthread_cond_t queue_cond; int fd; void *job_task(void *arg) {     struct element *elem;     pthread_mutex_init(&queue_lock, NULL);     for(;;)     {         pthread_mutex_lock(&queue_lock);         if(queue == NULL)                 pthread_cond_wait(&queue_cond, &queue_lock);         elem = queue;         queue = queue->next;         pthread_mutex_unlock(&queue_lock);         .. process element ..     }     return NULL; } void *network_task(void *arg) {     struct element *elem, *tmp;     struct request *req;     pthread_mutex_init(&queue_lock, NULL);     for(;;)     {         req = read_request(fd);         if(req == NULL)  // bad request                 continue;         elem = request_to_job_element(req);          free(req);         if(elem == NULL)             continue;         pthread_mutex_lock(&queue_lock);         if(queue == NULL)         {             queue = elem;             pthread_cond_broadcast(&queue_cond);         }          else          {             for(tmp = queue; tmp->next; tmp = tmp->next)                       ;             tmp->next = elem;          }         pthread_mutex_unlock(&queue_lock);     } }

This example uses more locking mechanisms to ensure that the queue is accessed by only one thread, but there's still a problem: Each thread reinitializes queue_lock by calling pthread_mutex_init(). In effect, this allows multiple threads to obtain multiple locks, so it's not guaranteed that each thread can operate on the queue in an atomic fashion.

After you've determined that locks are used and the correct synchronization object is in place, you can begin to examine code that accesses a shared resource. This process involves ensuring that a lock is acquired for the synchronization primitive before accessing the resource, and then the primitive is signaled after the operation has been completed. This second point is worth keeping in mind because a code path could exist in which a synchronization primitive is never unlocked. This code path invariably leads to deadlock, discussed in the next section.

Paul Starzets, a security researcher with iSec, discovered a major race condition vulnerability in the Linux kernel's sys_uselib() system call. (Remember that kernels are multithreaded, too.) Starzets pointed out that the sys_brk() function is required to hold a semaphore lock specific to a process memory descriptor list (called mmap_sem) because it adds an element to the structure by using vma_link(). However, in the load_elf_binary() function that sys_uselib() uses, this semaphore is released before sys_brk() is called, as shown in Listing 13-3. The down_write() function is used to wait on a lock, and the up_write() function is used to release it.

Listing 13-3. Race Condition in the Linux Kernel's Uselib()

static int load_elf_library(struct file *file) {        down_write(&current->mm->mmap_sem); error = do_mmap(file,               ELF_PAGESTART(elf_phdata->p_vaddr),               (elf_phdata->p_filesz +                ELF_PAGEOFFSET(elf_phdata->p_vaddr)),               PROT_READ | PROT_WRITE | PROT_EXEC,               MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE,               (elf_phdata->p_offset -                ELF_PAGEOFFSET(elf_phdata->p_vaddr))); up_write(&current->mm->mmap_sem); if (error != ELF_PAGESTART(elf_phdata->p_vaddr))        goto out_free_ph; elf_bss = elf_phdata->p_vaddr + elf_phdata->p_filesz; padzero(elf_bss); len = ELF_PAGESTART(elf_phdata->p_filesz +     elf_phdata->p_vaddr + ELF_MIN_ALIGN - 1); bss = elf_phdata->p_memsz + elf_phdata->p_vaddr; if (bss > len)        do_brk(len, bss - len);

Using some inventive exploitation techniques, Starzets demonstrated how to leverag this bug for root access on a vulnerable system. You can find more information on this vulnerability at www.isec.pl/vulnerabilities/isec-0021-uselib.txt.

Return value checking is another important part of ensuring that a program is thread safe. Of course, checking return values is always important in preventing vulnerabilities, multithreaded or not, but this guideline especially applies to multithreaded programming. One interesting variation on thread race conditions is a failure to correctly check return values to make sure the API is functioning as expected. Take a look at the following code:

DWORD processJob(LPVOID arg) {     struct element *elem;     for(;;)     {         WaitForSingleObject(hMutex, MAX_TIME);         if(queue == NULL)             WaitForSingleObject(queueEvent, MAX_TIME);         elem = queue;         queue = queue->next;         ReleaseMutex(hMutex);         .. process element ..     }     return 0; }

Assume the processJob() function is run by multiple threads, as in the previous examples. Notice that the WaitForSingleObject() function's return value is ignored in both instances it's called. As you have seen previously, this function can return for a number of reasons, including when the maximum time limit to wait has been exceeded. Therefore, if MAX_TIME elapses before the mutex is released, this function could begin operating on queue when it doesn't actually own the mutex, or it operates on queue when the queueEvent object hasn't been signaled.

Deadlocks and Starvation

Starvation and deadlock cause a task to never be completed because a thread can never be scheduled for execution. The "Windows IPC Synchronization Objects" section included an example of a deadlock that resulted from waiting on an event object while maintaining ownership of a mutex object. This prevented another thread from signaling the necessary event. Deadlocks can be addressed in the Win32 API by using the WaitForMultipleObjects() function to wait for an entire set of synchronization objects to become signaled. However, this approach might create its own issues and result in starvation. These situations are hard to evaluate when auditing code; however, you should note if bWaitAll is set to true, and the number of objects is quite large. You also need to consider situations in which it's impossible or nearly impossible to have all objects that are being waited on signaled.

Deadlocks also happen in UNIX threaded programs. In PThreads, deadlocks are more likely to occur from the use of multiple mutexes, as shown in this simple example:

struct interface *interfaces[MAX_INTERFACES]; int packet_process(int num) {     struct interface *in = interfaces[num];     struct packet *pkt;     for(;;)     {         pthread_mutex_lock(in->lock);         pthread_cond_wait(in->cond_arrived, in->lock);         pkt = dequeue_packet(in);         if(needs_forwarding(pkt))         {             int destnum;             struct interface *dest;             destnum = find_dest_interface(pkt);             dest = interfaces[destnum];             pthread_mutex_lock(dest->lock);             enqueue_packet(pkt, dest);             pthread_mutex_unlock(dest->lock);            in->stats[FORWARDED]++;             pthread_mutex_unlock(in->lock);             continue;         }         pthread_mutex_unlock(in->lock);         .. process packet ..     } }

This example shows a classic deadlock situation: Two locks can be held by a single thread, and another thread can acquire the same locks in a different order. In this example, there's a thread for each network interface to handle dequeuing and dealing with arriving packets. If the packet needs to be forwarded, it's added to another queue. There's the potential, however, for two competing threads to cause a deadlock in this code. The following sequence of events describes how deadlock might occur:

Thread #1 locks interface[1] and dequeues a packet.
Thread #2 interrupts, locks interface[2], and dequeues a packet.
Thread #2 identifies a packet destined for interface[1], so pthread_mutex_lock(dest->lock) puts thread #2 to sleep because thread #1 holds the lock.
Thread #1 regains the processor. It realizes it needs to forward a packet to interface[2], so pthread_mutex_lock(dest->lock) puts thread #1 to sleep because thread #2 holds the lock.

Now both threads are unable to do anything because they are waiting on each other to release a lock to continue their work.

When auditing code for deadlocks, you need to evaluate whether multiple primitives are locked and held simultaneously by more than one thread. Then you must consider whether those threads can lock primitives in a different order to create a condition like the one in the previous example. Most threading mechanisms include timed waiting functions or use functions that return immediately if a lock is unavailable, which might mitigate the threat of deadlocks. However, a timeout that results in terminating the program might be noteworthy as a denial of service in itself, particularly if the service doesn't restart.