Flylib.com

Books Software

 
 
 

4.3. Chapter Summary


4.3. Chapter Summary

This chapter covered how to track the CPU performance bottlenecks of individual processes. You learned to determine how an application was spending its time by attributing the time spent to the Linux kernel, system libraries, or even to the application itself. You also learned how to figure out which calls were made to the kernel and system libraries and how long each took to complete. Finally, you learned how to profile an application and determine the particular line of source code that was spending a large amount of time. After mastering these tools, you can start with an application that hogs the CPU and use these tools to find the exact functions that are spending all the time.

Subsequent chapters investigate how to find bottlenecks that are not CPU bound. In particular, you learn about the tools used to find I/O bottlenecks, such as a saturated disk or an overloaded network.


Chapter 5. Performance Tools: Process-Specific Memory

This chapter covers tools that enable you to diagnose an application's interaction with the memory subsystem as managed by the Linux kernel and the CPU. Because different layers of the memory subsystem have orders of magnitude differences in performance, fixing an application to efficiently use the memory subsystem can have a dramatic influence on an application's performance.

After reading this chapter, you should be able to

  • Determine how much memory an application is using ( ps , /proc ).

  • Determine which functions of an application are allocating memory ( memprof ).

  • Profile the memory usage of an application using both software simulation ( kcachegrind , cachegrind ) and hardware performance counters ( oprofile ).

  • Determine which processes are creating and using shared memory ( ipcs ).


5.1. Linux Memory Subsystem

When diagnosing memory performance problems, it may become necessary to observe how an application performs at various levels within the memory subsystem. At the top level, the operating system decides how the swap and physical memory are being used. It decides what pieces of an application's address space will be in physical memory, which is called the resident set. Other memory used by the application but not part of the resident set will be swapped to disk. The application decides how much memory it will request from the operating system, and this is called the virtual set. The application can allocate this explicitly by calling malloc or implicitly by using a large amount of stack or using a large number of libraries. The application can also allocate shared memory that can be used by itself and other applications. The ps performance tool is useful for tracking the virtual and resident set size . The memprof performance tool is useful for tracking which code in an application is allocating memory. The ipcs tool is useful for tracking shared memory usage.

When an application is using physical memory, it begins to interact with the CPU's cache subsystem. Modern CPUs have multiple levels of cache. The fastest cache is closest to the CPU (also called L1 or Level 1 cache) and is the smallest in size. Suppose, for instance, that the CPU has only two levels of cache: L1 and L2. When the CPU requests a piece of memory, the processor checks to see whether it is already in the L1 cache. If it is, the CPU uses it. If it was not in the L1 cache, the processor generates a L1 cache miss. It then checks in the L2 cache; if the data is in the L2 cache, it is used. If the data is not in the L2 cache, an L2 cache miss occurs, and the processor must go to physical memory to retrieve the information. Ultimately, it would be best if the processor never goes to physical memory (because it finds the data in the L1 or even L2 cache). Smart cache use—rearranging an application's data structures and reducing code size, for example—may make it possible to reduce the number of caches misses and increase performance. cachegrind and oprofile are great tools to find information about how an application is using the cache and about which functions and data structures are causing cache misses.