Section 13.4. Tracing and Profiling Tools

13.4. Tracing and Profiling Tools

Many useful tools can provide you with various views of the system. Some tools offer a high-level perspective, such as what processes are running on your system and which processes are consuming the most CPU bandwidth. Other tools can provide detailed analysis, such as where memory is being allocated or, even more useful, where it is being leaked. The next few sections introduce the most important tools and utilities in this category. We have space for only a cursory introduction to these tools; references are provided where appropriate if you want more details.

13.4.1. strace

This useful system trace utility is found in virtually all Linux distributions. strace captures and displays useful information for every kernel system call executed by a Linux application program. strace is especially handy because it can be run on programs for which no source code is available. It is not necessary to compile the program with debug symbols as it is with GDB. Furthermore, strace can be a very insightful educational tool. As the man page states, "Students, hackers and the overly-curious will find that a great deal can be learned about a system and its system calls by tracing even ordinary programs."

While preparing the example software for the GDB section earlier in this chapter, I decided to use a software project unfamiliar to me, an early version of the GoAhead web server. The first attempt at compiling and linking the project led to an interesting example for strace. Starting the application from the command line silently returned control back to the console. No error messages were produced, and a look into the system logs also produced no clues! It simply would not run.

strace quickly identified the problem. The output from invoking strace on this software package is produced in Listing 13-5. Many lines from this output have been deleted due to space considerations. The unedited output is over one hundred lines long.

Listing 13-5. ^[4]`strace` Output: GoAhead Web Demo

[View full width]

01 root@coyote:/home/websdemo$ strace ./websdemo 02 execve("./websdemo", ["./websdemo"], [/* 14 vars */]) = 0 03 uname({sys="Linux", node="coyote", ...}) = 0 04 brk(0)                                   = 0x10031050 05 open("/etc/ld.so.preload", O_RDONLY)     = -1 ENOENT (No such file or directory) 06 open("/etc/ld.so.cache", O_RDONLY)       = -1 ENOENT (No such file or directory) 07 open("/lib/libc.so.6", O_RDONLY)         = 3 08 read(3, "\177ELF\1\2\1\0\0\0\0\0\0\0\0\0\0\3\0\24\0\0\0\1\0\1\322"..., 1024) = 1024 09 fstat64(0x3, 0x7fffefc8)                 = 0 10 mmap(0xfe9f000, 1379388, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0xfe9f000 11 mprotect(0xffd8000, 97340, PROT_NONE)    = 0 12 mmap(0xffdf000, 61440, PROT_READ|PROT_WRITE|PROT_EXEC,MAP_PRIVATE|MAP_FIXED, 3,  0x130000) = 0xffdf000 13 mmap(0xffee000, 7228, PROT_READ|PROT_WRITE|PROT_EXEC,  MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xffee000 14 close(3)                                 = 0 15 brk(0)                                   = 0x10031050 16 brk(0x10032050)                          = 0x10032050 17 brk(0x10033000)                          = 0x10033000 18 brk(0x10041000)                          = 0x10041000 19 rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0 20 stat("./umconfig.txt", 0x7ffff9b8)       = -1 ENOENT (No such file or directory) 21 uname({sys="Linux", node="coyote", ...}) = 0 22 gettimeofday({3301, 178955}, NULL)       = 0 23 getpid()                                 = 156 24 open("/etc/resolv.conf", O_RDONLY)       = 3 25 fstat64(0x3, 0x7fffd7f8)                 = 0 26 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x30017000 27 read(3, "#\n# resolv.conf  This file is th"..., 4096) = 83 28 read(3, "", 4096)                        = 0 29 close(3)                                 = 0 ... <<< Lines 30-81 removed for brevity 82 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP)  = 3 83 connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("0.0.0.0")}, 28) = 0 84 send(3, "\267s\1\0\0\1\0\0\0\0\0\0\6coyotea\0\0\1\0\1", 24, 0) = 24 85 gettimeofday({3301, 549664}, NULL)       = 0 86 poll([{fd=3, events=POLLIN, revents=POLLERR}], 1, 5000) = 1 87 ioctl(3, 0x4004667f, 0x7fffe6a8)         = 0 88 recvfrom(3, 0x7ffff1f0, 1024, 0, 0x7fffe668, 0x7fffe6ac) = -1 ECONNREFUSED (Connection  refused) 89 close(3)                                 = 0 90 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP)  = 3 91 connect(3, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("0.0.0.0")}, 28) = 0 92 send(3, "\267s\1\0\0\1\0\0\0\0\0\0\6coyote\0\0\1\0\1", 24, 0) = 24 93 gettimeofday({3301, 552839}, NULL)       = 0 94 poll([{fd=3, events=POLLIN, revents=POLLERR}], 1, 5000) = 1 95 ioctl(3, 0x4004667f, 0x7fffe6a8)         = 0 96 recvfrom(3, 0x7ffff1f0, 1024, 0, 0x7fffe668, 0x7fffe6ac) = -1 ECONNREFUSED (Connection  refused) 97 close(3)                                 = 0 98 exit(-1)                                 = ? 99 root@coyote:/home/websdemo#

^[4] See man ldconfig for details on creating a linker cache for your target system.

Line numbers have been added to the output produced by strace to make this listing more readable. Invocation of the command is found on line number 01. In its simplest form, simply add the strace command directly in front of the program you want to examine. This is how the output in Listing 13-5 was produced.

Each line of this trace represents the websdemo process making a system call into the kernel. We don't need to analyze and understand each line of the trace, although it is quite instructive to do so. We are looking for any anomalies that might help pinpoint why the program won't run. In the first several lines, Linux is setting up the environment in which the program will execute. We see several open() system calls to /etc/ld.so.*, which are the Linux dynamic linker-loader (ld.so) doing its job. In fact, line 06 was my clue that this example embedded board had not been properly configured. There should be a linker cache produced by running ldconfig. (The linker cache substantially speeds up searching for shared library references.) This was subsequently resolved by running ldconfig on the target.

Down through line 19 is more basic housekeeping, mostly by the loader and libc initializing. Notice in line 20 that the program is looking for a configuration file but did not find one. That could be an important issue when we get the software running. Starting with line 24, the program begins to set up and configure the appropriate networking resources that it needs. Lines 24 through 29 open and read a Linux system file containing instructions for the DNS service to resolve hostnames. Local network configuration activity continues through line 81. Most of this activity consists of network setup and configuration necessary to build the networking infrastructure for the program itself. This portion of the listing has been removed for brevity and clarity.

Notice especially the network activity starting with line 82. Here we have the program trying to establish a TCP/IP connection to an IP address of all zeros. Line 82 is reproduced here for convenience:

socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 3

A couple points about Listing 13-5 are worth noting. We might not know all the details of every system call, but we can get a general idea of what is happening. The socket() system call is similar to a file system open() call. The return value, indicated by the = sign, in this case, represents a Linux file descriptor. Knowing this, we can associate the activity from line 82 through the close() system call in line 89 with file descriptor 3.

We are interested in this group of related system calls because we see an error message in line 88: "Connection refused." At this point, we still don't know why the program won't run, but this appears abnormal. Let's investigate. Line 82, the system call to socket(), establishes an endpoint for IP communication. Line 83 is quite curious because it tries to establish a connection to a remote endpoint (socket) containing an IP address of all zeros. We don't have to be network experts to suspect that this might be causing trouble.^[5] Line 83 provides another important clue: The port parameter is set to 53. A quick Google search for TCP/IP port numbers reveals that port 53 is the Domain Name Service, or DNS.

^[5] Sometimes an all-zeros address is appropriate in this context. However, we are investigating why the program bailed abnormally, so we should consider this suspect.

Line 84 provides yet another clue. Our board has a hostname of coyote. This can be seen as part of the command prompt in line 01 of Listing 13-5. It appears that this activity is a DNS lookup for our board's hostname, which is failing. As an experiment, we add an entry in our target system's /etc/hosts^[6] file to associate our locally defined hostname with the board's IP locally assigned IP address, as follows:

^[6] See man hosts for details of this system administration file.

Coyote   192.168.1.21          #The IP address we assigned

Voilà: Our program begins to function normally. Although we might not know exactly why this would lead to a program failure (TCP/IP networking experts might), our strace output led us to the fact that a DNS lookup for our board name was failing. When we corrected that, the program started up happily and began serving web pages. To recap, this was a program for which we had no source code to reference, and it had no symbols compiled into its binary image. Using strace, we were able to determine the cause of the program failure, and implement a solution.

13.4.2. strace Variations

The strace utility has many command line options. One of the more useful includes the capability to select a subset of system calls for tracing. For example, if you want to see only the network-related activity of a given process, issue the command as follows:

$ strace -e trace=network process_name

This produces a trace of all the network-related system calls, such as socket(), connect(), recvfrom(), and send(). This is a powerful way to view the network activity of a given program. Several other subsets are available. For example, you can view only the file-related activities of a program, with open(), close(), read(), write(), and so on. Additional subsets include process-related system calls, signal-related system calls, and IPC-related system calls.

It is worth noting that strace is capable of dealing with tracing programs that spawn additional processes. Invoking strace with the -f option instructs strace to follow child processes that are created using the fork() system call. Numerous possibilities exist with the strace command. The best way to become proficient with this powerful utility is to use it. Make it a point with this and all the tools we present to seek out and read the latest open-source documentation. In this case, man strace on most Linux hosts will produce enough material to keep you experimenting for an afternoon!

One very useful way to employ strace is using the -c option. This option produces a high-level profiling of your application. Using the -c option, strace accumulates statistics on each system call, how many times it was encountered, how many times errors were returned, and the time spent in each system call. Listing 13-6 is an example of running strace -c on the webs demo from the previous example.

Listing 13-6. Profiling Using `strace`

root@coyote$ strace -c ./webs % time     seconds  usecs/call     calls    errors syscall ------ ----------- ----------- --------- --------- --------  29.80    0.034262         189       181           send  18.46    0.021226        1011        21        10 open  14.11    0.016221         130       125           read  11.87    0.013651         506        27         8 stat64   5.88    0.006762         193        35           select   5.28    0.006072          76        80           fcntl64   3.47    0.003994          65        61           time   2.79    0.003205        3205         1           execve   1.71    0.001970          90        22         3 recv   1.62    0.001868          85        22           close   1.61    0.001856         169        11           shutdown   1.38    0.001586         144        11           accept   0.41    0.000470          94         5           mmap2   0.26    0.000301         100         3           mprotect   0.24    0.000281          94         3           brk   0.17    0.000194         194         1         1 access   0.13    0.000150         150         1           lseek   0.12    0.000141          47         3           uname   0.11    0.000132         132         1           listen   0.11    0.000128         128         1           socket   0.09    0.000105          53         2           fstat64   0.08    0.000097          97         1           munmap   0.06    0.000064          64         1           getcwd   0.05    0.000063          63         1           bind   0.05    0.000054          54         1           setsockopt   0.04    0.000048          48         1           rt_sigaction   0.04    0.000046          46         1           gettimeofday   0.03    0.000038          38         1           getpid ------ ----------- ----------- --------- --------- ----------- 100.00    0.114985                   624        22 total

This is a very useful way to get a high-level view of where your application is consuming time and where errors are occurring. Some errors might be a normal part of your application's operation, but others might be consuming time that you hadn't intended. From Listing 13-6, we can see that the syscall with the longest duration was the execve(), which is the call that the shell used to spawn the application. As you can see, it was called only once. Another interesting observation is that the send() system call was the most frequently used syscall. This makes sensethe application is a small web server.

Bear in mind that, like the other tools we have been discussing here, strace must be compiled for your target architecture. strace is executed on your target board, not your development host. You must use a version that is compatible with your architecture. If you purchase a commercial embedded Linux distribution, you should make sure that this utility is included for your chosen architecture.

13.4.3. ltrace

The ltrace and strace utilities are closely related. The ltrace utility does for library calls what strace does for system calls. It is invoked in a similar fashion: Precede the program to be traced by the tracer utility, as follows:

$ ltrace ./example

Listing 13-7 reproduces the output of ltrace on a small example program that executes a handful of standard C library calls.

Listing 13-7. Example `ltrace` Output

$ ltrace ./example __libc_start_main(0x8048594, 1, 0xbffff944, 0x80486b4, 0x80486fc <unfinished ...> malloc(256)                                        = 0x804a008 getenv("HOME")                                     = "/home/chris" strncpy(0x804a008, "/home", 5)                      = 0x804a008 fopen("foo.txt", "w")                               = 0x804a110 printf("$HOME = %s\n", "/home/chris"$HOME =  /home/chris )             = 20 fprintf(0x804a110, "$HOME = %s\n", "/home/chris")   = 20 fclose(0x804a110)                                  = 0 remove("foo.txt")                                  = 0 free(0x804a008)                                    = <void> +++ exited (status 0) +++ $

For each library call, the name of the call is displayed, along with varying portions of the parameters to the call. Similar to strace, the return value of the library call is then displayed. As with strace, this tool can be used on programs for which source code is not available.

As with strace, a variety of switches affect the behavior of ltrace. You can display the value of the program counter at each library call, which can be helpful in understanding your application's program flow. As with strace, you can use -c to accumulate and report count, error, and time statistics, making a useful simple profiling tool. Listing 13-8 displays the results of our simple example program using the -c option.

Listing 13-8. Profiling Using `ltrace`

$ ltrace -c ./example $HOME = /home/chris % time     seconds  usecs/call     calls      function ------ -----------  ----------- --------- ----------------  24.16    0.000231         231         1 printf  16.53    0.000158         158         1 fclose  16.00    0.000153         153         1 fopen  13.70    0.000131         131         1 malloc  10.67    0.000102         102         1 remove   9.31    0.000089          89         1 fprintf   3.35    0.000032          32         1 getenv   3.14    0.000030          30         1 free   3.14    0.000030          30         1 strncpy ------ ----------- ----------- --------- ---------------- 100.00    0.000956                     9 total

The ltrace tool is available only for programs that have been compiled to use dynamically linked shared library objects. This is the usual default, so unless you explicitly specify -static when compiling, you can use ltrace on the resulting binary. Again similar to strace, you must use an ltrace binary that has been compiled for your target architecture. These utilities are run on the target, not the host development system.

13.4.4. ps

With the possible exception of strace and ltrace, no tools are more often neglected by the embedded systems developer than top and ps. Given the myriad options available for each utility, we could easily devote an entire chapter to these useful system-profiling tools. They are almost universally available in embedded Linux distributions.

Both of these utilities make use of the /proc file system, as described in Chapter 9, "File Systems." Much of the information they convey can be learned from the /proc file system if you know what to look for and how to parse the resulting information. These tools present that information in a convenient human-readable form.

The ps utility lists all the running processes on a machine. However, it is very flexible and can be tailored to provide much useful data on the state of a running machine and the processes running on it. For example, ps can display the scheduling policy of each process. This is particularly useful for systems that employ real-time processes.

Without any options, ps displays all processes with the same user ID as the user who invoked the command, and only those processes associated with the terminal on which the command was issued. This is useful when many jobs have been spawned by that user and terminal.

Passing options to ps can be confusing because ps supports a wide variety of standards (as in POSIX versus UNIX) and three distinct options styles: BSD, UNIX, and GNU. In general, BSD options are single or multiple letters, with no dash. UNIX options are the familiar dash-letter combinations, and GNU uses long argument formats preceded by double dashes. Refer to the man page for details of your ps implementation.

Everyone who uses ps likely has a favorite invocation. One particularly useful general-purpose invocation is ps aux. This displays every process on the system. Listing 13-9 is an example from a running embedded target board.

Listing 13-9. Process Listing

$ ps aux USER      PID %CPU %MEM    VSZ   RSS TTY   STAT START   TIME COMMAND root        1  0.0  0.8   1416   508 ?     S    00:00   0:00 init [3] root        2  0.0  0.0      0     0 ?     S<   00:00   0:00 [ksoftirqd/0] root        3  0.0  0.0      0     0 ?     S<   00:00   0:00 [desched/0] root        4  0.0  0.0      0     0 ?     S<   00:00   0:00 [events/0] root        5  0.0  0.0      0     0 ?     S<   00:00   0:00 [khelper] root       10  0.0  0.0      0     0 ?     S<   00:00   0:00 [kthread] root       21  0.0  0.0      0     0 ?     S<   00:00   0:00 [kblockd/0] root       62  0.0  0.0      0     0 ?     S    00:00   0:00 [pdflush] root       63  0.0  0.0      0     0 ?     S    00:00   0:00 [pdflush] root       65  0.0  0.0      0     0 ?     S<   00:00   0:00 [aio/0] root       36  0.0  0.0      0     0 ?     S    00:00   0:00 [kapmd] root       64  0.0  0.0      0     0 ?     S    00:00   0:00 [kswapd0] root      617  0.0  0.0      0     0 ?     S    00:00   0:00 [mtdblockd] root      638  0.0  0.0      0     0 ?     S    00:00   0:00 [rpciod] bin       834  0.0  0.7   1568   444 ?     Ss   00:00   0:00 /sbin/portmap root      861  0.0  0.0      0     0 ?     S    00:00   0:00 [lockd] root      868  0.0  0.9   1488   596 ?     Ss   00:00   0:00 /sbin/syslogd -r root      876  0.0  0.7   1416   456 ?     Ss   00:00   0:00 /sbin/klogd -x root      884  0.0  1.1   1660   700 ?     Ss   00:00   0:00 /usr/sbin/rpc.statd root      896  0.0  0.9   1668   584 ?     Ss   00:00   0:00 /usr/sbin/inetd root      909  0.0  2.2   2412  1372 ?     Ss+  00:00   0:00 -bash telnetd   953  0.3  1.1   1736   732 ?     S    05:58   0:00 in.telnetd root      954  0.2  2.1   2384  1348 pts/0 Ss   05:58   0:00 -bash root      960  0.0  1.2   2312   772 pts/0 R+   05:59   0:00 ps aux

This is but one of the many ways to view output data using ps. The columns are explained in the following text.

The USER and process ID (PID) fields should be self-explanatory.
The %CPU field expresses the percent of CPU utilization since the beginning of the process's lifetime; thus, CPU usage will virtually never add up to 100 percent.
The %MEM field indicates the ratio of the process's resident memory footprint to the total available physical memory.
The VSZ field is the virtual memory size of the process in kilobytes.
RSS is resident set size and indicates the nonswapped physical memory that a process has used, also in kilobytes.
TTY is the controlling terminal of the process.

Most of the processes in this example are not associated with a controlling terminal. The ps command that generated Listing 13-9 was issued from a Telnet session, which is indicated by the pts/0 terminal device.

The STAT field describes the state of the process at the time this snapshot was produced. Here, S means that the process is sleeping, waiting on an event of some type, often I/O. R means that the process is in a runnable state (that is, the scheduler is free to give it control of the CPU if nothing of a higher priority is waiting). The left bracket next to the state letter is an indication that this process has a higher priority.

The final column is the command name. Those listed in brackets are kernel threads. Many more symbols and options are available; refer to the man page for ps for complete details.

13.4.5. top

Whereas ps is a one-time snapshot of the current system, top takes periodic snapshots of the state of the system and its processes. Similar to ps, top has numerous command line and configuration options. It is interactive and can be reconfigured while operating to customize the display to your particular needs.

Entered without options, top displays all running processes in a fashion very similar to the ps aux command presented in Listing 13-9, updated every 3 seconds. Of course, this and many other aspects of top are user configurable. The first few lines of the top screen display system information, also updated every 3 seconds. This includes the system uptime, the number of users, information on the number of processes and their state, and much more.

Listing 13-10 shows top in its default configuration, resulting from executing top from the command line without parameters.

Listing 13-10. top

top - 06:23:14 up  6:23,  2 users,  load average: 0.00, 0.00, 0.00 Tasks: 24 total,   1 running,  23 sleeping,   0 stopped,   0 zombie Cpu(s): 0.0% us,  0.3% sy,  0.0% ni, 99.7% id,  0.0% wa,  0.0% hi,  0.0% si Mem:     62060k total,    17292k used,    44768k free,        0k buffers Swap:       0k total,        0k used,        0k free,    11840k cached    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND    978 root      16   0  1924  952  780 R  0.3  1.5   0:01.22 top      1 root      16   0  1416  508  452 S  0.0  0.8   0:00.47 init      2 root       5 -10     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/0      3 root       5 -10     0    0    0 S  0.0  0.0   0:00.00 desched/0      4 root      -2  -5     0    0    0 S  0.0  0.0   0:00.00 events/0      5 root      10  -5     0    0    0 S  0.0  0.0   0:00.09 khelper     10 root      18  -5     0    0    0 S  0.0  0.0   0:00.00 kthread     21 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 kblockd/0     62 root      20   0     0    0    0 S  0.0  0.0   0:00.00 pdflush     63 root      15   0     0    0    0 S  0.0  0.0   0:00.00 pdflush     65 root      19  -5     0    0    0 S  0.0  0.0   0:00.00 aio/0     36 root      25   0     0    0    0 S  0.0  0.0   0:00.00 kapmd     64 root      25   0     0    0    0 S  0.0  0.0   0:00.00 kswapd0    617 root      25   0     0    0    0 S  0.0  0.0   0:00.00 mtdblockd    638 root      15   0     0    0    0 S  0.0  0.0   0:00.34 rpciod    834 bin       15   0  1568  444  364 S  0.0  0.7   0:00.00 portmap    861 root      20   0     0    0    0 S  0.0  0.0   0:00.00 lockd    868 root      16   0  1488  596  504 S  0.0  1.0   0:00.11 syslogd    876 root      19   0  1416  456  396 S  0.0  0.7   0:00.00 klogd    884 root      18   0  1660  700  612 S  0.0  1.1   0:00.02 rpc.statd    896 root      16   0  1668  584  504 S  0.0  0.9   0:00.00 inetd    909 root      15   0  2412 1372 1092 S  0.0  2.2   0:00.34 bash    953 telnetd   16   0  1736  736  616 S  0.0  1.2   0:00.27 in.telnetd    954 root      15   0  2384 1348 1096 S  0.0  2.2   0:00.16 bash

The default columns from Listing 13-10 are the PID, the user, the process priority, the process nice value, the virtual memory used by the process, the resident memory footprint, the amount of shared memory used by the task, and other fields that are identical to those described in the previous ps example.

Space permits only a cursory introduction to these useful utilities. You are encouraged to spend an afternoon with the man pages for top and ps to explore the richness of their capabilities.

13.4.6. mtrace

The mtrace package is a simple utility that analyzes and reports on calls to malloc(), realloc(), and free() in your application. It is easy to use and can potentially help spot trouble in your application. As with other userland tools we have been describing in this chapter, you must have the mtrace package configured and compiled for your architecture. mtrace is a malloc replacement library that is installed on your target. Your application enables it with a special function call. Your embedded Linux distribution should contain the mtrace package.

To demonstrate this utility, we created a simple program that creates dynamic data on a simple linked list. Each list item was dynamically generated, as was each data item we placed on the list. Listing 13-11 reproduces the simple list structure.

Listing 13-11. Simple Linear Linked List

struct blist_s {   struct blist_s *next;   char *data_item;   int item_size;   int index; };

Each list item was dynamically created using malloc() as follows and subsequently placed at the end of the linked list:

struct blist_s *p = malloc( sizeof(struct blist_s) );

Each variable-sized data item in the list was also dynamically generated and added to the list item before being placed at the end of the list. This way, every list item was created using two calls to malloc(), one for the list item itself, represented by struct blist_s just shown, and one for the variable data item. We then generated 10,000 records on the list containing variable string data, resulting in 20,000 calls to malloc().

To use mtrace, tHRee conditions must be satisfied:

A header file, mcheck.h, must be included in the source file.
The application must call mTRace() to install the handlers.
The environment variable MALLOC_TRACE must specify the name of a writeable file to which the trace data is written.

When these conditions are satisfied, each call to one of the traced functions generates a line in the raw trace file defined by MALLOC_TRACE. The trace data looks like this:

@ ./mt_ex:[0x80486ec] + 0x804a5f8 0x10

The @ sign signals that the trace line contains an address or function name. In the previous example, the program was executing at the address in square brackets, 0x80486ec. Using binary utilities or a debugger, we could easily associate this address with a function. The plus sign (+) indicates that this is a call to allocate memory. A call to free() would be indicated by a minus sign. The next field indicates the virtual address of the memory location being allocated or freed. The last field is the size, which is included in every call to allocate memory.

This data format is not very user friendly. For this reason, the mtrace package includes a utility^[7] that analyzes the raw trace data and reports on any inconsistencies. In the simplest case, the Perl script simply prints a single line with the message "No memory leaks". Listing 13-12 contains the output when memory leaks are detected.

^[7] The analysis utility is a Perl script supplied with the mTRace package.

Listing 13-12. `mtrace` Error Report

$ mtrace ./mt_ex mtrace.log Memory not freed: -----------------    Address     Size     Caller 0x0804aa70     0x0a  at /home/chris/temp/mt_ex.c:64 0x0804abc0     0x10  at /home/chris/temp/mt_ex.c:26 0x0804ac60     0x10  at /home/chris/temp/mt_ex.c:26 0x0804acc8     0x0a  at /home/chris/temp/mt_ex.c:64

As you can see, this simple tool can help you spot trouble before it happens, as well as find it when it does. Notice that the Perl script has displayed the filename and line number of each call to malloc() that does not have a corresponding call to free() for the given memory location . This requires debugging information in the executable file generated by passing the -g flag to the compiler. If no debugging information is found, the script simply reports the address of the function calling malloc().

13.4.7. dmalloc

dmalloc picks up where mTRace leaves off. The mtrace package is a simple, relatively nonintrusive package most useful for simple detection of malloc/free unbalance conditions. The dmalloc package enables the detection of a much wider range of dynamic memory-management errors. Compared to mTRace, dmalloc is highly intrusive. Depending on the configuration, dmalloc can slow your application to a crawl. It is definitely not the right tool if you suspect memory errors due to race conditions or other timing issues. dmalloc (and mtrace, to a lesser extent) will definitely change the timing of your application.

dmalloc is a very powerful dynamic memory-analysis tool. It is highly configurable and, therefore, somewhat complex. It takes some time to learn and master this tool. However, from QA testing to bug squashing, it could become one of your favorite development tools.

dmalloc is a debug malloc library replacement. These conditions must be satisfied to use dmalloc:

Application code must include the dmalloc.h header file.
The application must be linked against the dmalloc library.
The dmalloc library and utility must be installed on your embedded target.
Certain environment variables that the dmalloc library references must be defined before running your application on the target.

Although it is not strictly necessary, you should include dmalloc.h in your application program. This allows dmalloc to include file and line number information in the output.

Link your application against the dmalloc library of your choice. The dmalloc package can be configured to generate several different libraries, depending on your selections during package configuration. In the examples to follow, we have chosen to use the libdmalloc.so shared library object. Place the library (or a symlink to it) in a path where your compiler can find it. The command to compile your application might look something like this:

$ ppc_82xx-gcc -g -Wall -o mtest_ex -L../dmalloc-5.4.2/ \    -ldmalloc mtest_ex.c

This command line assumes that you've placed the dmalloc library (libdmalloc.so) in a location searched by the -L switch on the command linenamely, the ../dmalloc-5.4.2 directly just above the current directory.

To install the dmalloc library on your target, place it in your favorite location (perhaps /usr/local/lib). You might need to configure your system to find this library. On our example PowerPC system, we added the path /usr/local/lib to the /etc/ld.so.conf file and invoked the ldconfig utility to update the library search cache.

The last step in preparation is to set an environment variable that the dmalloc library uses to determine the level of debugging that will be enabled. The environment variable contains a debug bit mask that concatenates a number of features into a single convenient variable. Yours might look something like this:

DMALLOC_OPTIONS=debug=0x4f4ed03,inter=100,log=dmalloc.log

Here, debug is the debug-level bit mask, and inter sets an interval count at which the dmalloc library performs extensive checks on itself and the heap. The dmalloc library writes its log output to the file indicated by the log variable.

The dmalloc package comes with a utility to generate the DMALLOC_OPTIONS environment variable based on flags passed to it. The previous example was generated with the following dmalloc invocation. The documentation in the dmalloc package details this quite thoroughly, so we shall not reproduce that here.

$ dmalloc -p check-fence -l dmalloc.log -i 100 high

When these steps are complete, you should be able to run your application against the dmalloc debug library.

dmalloc produces a quite detailed output log. Listing 13-13 reproduces a sample dmalloc log output for an example program that intentionally generates some memory leaks.

Listing 13-13. `dmalloc` Log Output

2592: 4002: Dmalloc version '5.4.2' from 'http://dmalloc.com/' 2592: 4002: flags = 0x4f4e503, logfile 'dmalloc.log' 2592: 4002: interval = 100, addr = 0, seen # = 0, limit = 0 2592: 4002: starting time = 2592 2592: 4002: process pid = 442 2592: 4002: Dumping Chunk Statistics: 2592: 4002: basic-block 4096 bytes, alignment 8 bytes 2592: 4002: heap address range: 0x30015000 to 0x3004f000, 237568 bytes 2592: 4002:     user blocks: 18 blocks, 73652  bytes (38%) 2592: 4002:    admin blocks: 29 blocks, 118784 bytes (61%) 2592: 4002:    total blocks: 47 blocks, 192512 bytes 2592: 4002: heap checked 41 2592: 4002: alloc calls: malloc 2003, calloc 0, realloc 0, free 1999 2592: 4002: alloc calls: recalloc 0, memalign 0, valloc 0 2592: 4002: alloc calls: new 0, delete 0 2592: 4002:   current memory in use: 52 bytes (4 pnts) 2592: 4002:  total memory allocated: 27546 bytes (2003 pnts) 2592: 4002:  max in use at one time: 27546 bytes (2003 pnts) 2592: 4002: max alloced with 1 call: 376 bytes 2592: 4002: max unused memory space: 37542 bytes (57%) 2592: 4002: top 10 allocations: 2592: 4002:  total-size  count in-use-size  count  source 2592: 4002:       16000   1000          32      2  mtest_ex.c:36 2592: 4002:       10890   1000          20      2  mtest_ex.c:74 2592: 4002:         256      1           0      0  mtest_ex.c:154 2592: 4002:       27146   2001          52      4  Total of 3 2592: 4002: Dumping Not-Freed Pointers Changed Since Start: 2592: 4002:  not freed: '0x300204e8|s1' (10 bytes) from 'mtest_ex.c:74' 2592: 4002:  not freed: '0x30020588|s1' (16 bytes) from 'mtest_ex.c:36' 2592: 4002:  not freed: '0x30020688|s1' (16 bytes) from 'mtest_ex.c:36' 2592: 4002:  not freed: '0x300208a8|s1' (10 bytes) from 'mtest_ex.c:74' 2592: 4002:  total-size  count  source 2592: 4002:          32      2  mtest_ex.c:36 2592: 4002:          20      2  mtest_ex.c:74 2592: 4002:          52      4  Total of 2 2592: 4002: ending time = 2592, elapsed since start = 0:00:00

It is important to note that this log is generated upon program exit. (dmalloc has many options and modes of operation; it is possible to configure dmalloc to print output lines when errors are detected.)

The first half of the output log reports high-level statistics about the heap and the overall memory usage of the application. Totals are produced for each of the malloc library calls, such as malloc(), free(), and realloc(). Interestingly, this default log reports on the top 10 allocations and the source location where they occurred. This can be very useful for overall system-level profiling.

Toward the end of the log, we see evidence of memory leaks in our application. You can see that the dmalloc library detected four instances of memory that was allocated that was apparently never freed. Because we included dmalloc.h and compiled with debug symbols, the source location where the memory was allocated is indicated in the log.

As with the other tools we've covered in this chapter, space permits only a brief introduction of this very powerful debug tool. dmalloc can detect many other conditions and limits. For example, dmalloc can detect when a freed pointer has been written. It can tell whether a pointer was used to access data outside its bounds but within the application's permissible address range. In fact, dmalloc can be configured to log almost any memory transaction through the malloc family of calls. dmalloc is a tool that is sure to pay back many times the effort taken to become proficient with it.

13.4.8. Kernel Oops

Although not strictly a tool, a kernel oops contains much useful information to help you troubleshoot the cause. A kernel oops results from a variety of kernel errors from simple memory errors produced by a process (fully recoverable, in most cases) to a hard kernel panic. Recent Linux kernels support display of symbolic information in addition to the raw hexadecimal address values. Listing 13-14 reproduces a kernel oops from a PowerPC target.

Listing 13-14. Kernel Oops

$ modprobe loop Oops: kernel access of bad area, sig: 11 [#1] NIP: C000D058 LR: C0085650 SP: C7787E80 REGS: c7787dd0 TRAP: 0300  Not tainted MSR: 00009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11 DAR: 00000000, DSISR: 22000000 TASK = c7d187b0[323] 'modprobe' THREAD: c7786000 Last syscall: 128 GPR00: 0000006C C7787E80 C7D187B0 00000000 C7CD25CC FFFFFFFF 00000000 80808081 GPR08: 00000001 C034AD80 C036D41C C034AD80 C0335AB0 1001E3C0 00000000 00000000 GPR16: 00000000 00000000 00000000 100170D8 100013E0 C9040000 C903DFD8 C9040000 GPR24: 00000000 C9040000 C9040000 00000940 C778A000 C7CD25C0 C7CD25C0 C7CD25CC NIP [c000d058] strcpy+0x10/0x1c LR [c0085650] register_disk+0xec/0xf0 Call trace:  [c00e170c] add_disk+0x58/0x74  [c90061e0] loop_init+0x1e0/0x430 [loop]  [c002fc90] sys_init_module+0x1f4/0x2e0  [c00040a0] ret_from_syscall+0x0/0x44 Segmentation fault

Notice that the register dump includes symbolic information, where appropriate. Your kernel must have KALLSYSMS enabled for this symbolic information to be available. Figure 13-4 shows the configuration options under the General Setup main menu.

Figure 13-4. Symbol support for oops

Much of the information in a kernel oops message is directly related to the processor. Having some knowledge of the underlying architecture is necessary to fully understand the oops message.

Analyzing the oops in Listing 13-14, we see right away that the oops was generated due to a "kernel access of bad area, sig: 11". We already know from previous examples in this chapter that signal 11 is a segmentation fault.

The first section is a summary showing the reason for the oops, a few important pointers, and the offending task. In Listing 13-14, NIP is the next instruction pointer, which is decoded later in the oops message. This points to the offending code that led to the oops. LR is a PowerPC register and usually indicates the return address for the currently executing subroutine. SP is the stack pointer. REGS indicates the kernel address for the data structure containing the register dump data, and TRAP indicates the type of exception that this oops message relates to. Referring to the PowerPC architecture reference manual referenced at the end of Chapter 7, "Bootloaders," we see that a TRAP 0300 is the PowerPC Data Storage Interrupt, which is triggered by a data memory access error.

On the third line of the oops message, we see additional PowerPC machine registers, such as MSR (machine state register) and a decode of some of its bits. On the next line, we see the DAR (data access register), which often contains the offending memory address. The DSISR register contents can be used in conjunction with the PowerPC architecture reference to discover much detail about the specific reason for the exception.

An oops message also contains the task pointer and the decoded task name to quickly determine what task or thread was running at the time of the oops. We also see a detailed processor register dump, which can be used for additional clues. Again, we need knowledge of the architecture and compiler register usage to make sense of the clues from the register values. For example, the PowerPC architecture uses the r3 register for return values from C functions.

The last part of the oops message provides a stack backtrace with symbol decode if symbols are enabled in the kernel. Using this information, we can construct a sequence of events that led to the offending condition.

In this simple example, we have learned a great deal of information from this oops message. We know that it was a PowerPC Data Storage Exception, caused by an error in a data memory access (as opposed to an instruction fetch memory access). The DAR register tells us that the data address that generated this exception was 0x0000_0000. We know that the modprobe process produced the error. From the backtrace and NIP (next instruction pointer), we know that it was in a call to strcpy() that can be traced directly back to the loop_init() function in the loop.ko module, which modprobe was trying to insert at the time of the exception. Given this information, tracking down the source of this errant null pointer dereference should be quite trivial.