Troubleshooting

Team-Fly    

Solaris™ Operating Environment Boot Camp
By David Rhodes, Dominic Butler
Table of Contents
Chapter 21.  Kernels and All About Them


Occasionally, problems or unusual behavior can arise in processes. Solaris provides a number of tools to help us troubleshoot problems.

Truss

Solaris provides a tool called truss that can be used to actually see the system calls that a program makes during its execution. If we were to look at a simple example of a program that makes a system call, we could choose the rm command, which simply deletes a file. We know that whenever a program wants to delete a file it must make a system call to the kernel to actually perform the action. The system call to delete a file is actually called "unlink"; the name is appropriate because if a file has many links and we remove one of them, the file won't actually go. It is only when the last link is removed that the file no longer exists (see Chapter 6, "The Filesystem and Its Contents"). The following command will run the rm command and display all the system calls that were made while it ran:

[View full width]

hydrogen# ls -l testfile -rw-r--r-- 1 root other 583 Dec 22 17:46 testfile hydrogen# truss rm testfile execve("/usr/bin/rm", 0xEFFFFD0C, 0xEFFFFD18) argc = 2 open("/dev/zero", O_RDONLY) = 3 mmap(0x00000000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0xEF7C0000 stat("/usr/bin/rm", 0xEFFFFA00) = 0 open("/usr/lib/libc.so.1", O_RDONLY) = 4 fstat(4, 0xEFFFF7BC) = 0 mmap(0x00000000, 4096, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0xEF7B0000 mmap(0x00000000, 770048, PROT_READ|PROT_EXEC, MAP_PRIVATE, 4, 0) = 0xEF6C0000 munmap(0xEF764000, 61440) = 0 mmap(0xEF773000, 27668, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 4, 667648) graphics/ccc.gif= 0xEF773000 mmap(0xEF77A000, 5480, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = graphics/ccc.gif0xEF77A000 close(4) = 0 open("/usr/lib/libdl.so.1", O_RDONLY) = 4 fstat(4, 0xEFFFF7BC) = 0 mmap(0xEF7B0000, 4096, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 4, 0) = 0xEF7B0000 close(4) = 0 open("/usr/platform/SUNW,SPARCstation-LX/lib/libc_psr.so.1", O_RDONLY) Err#2 ENOENT close(3) = 0 brk(0x00022C20) = 0 brk(0x00024C20) = 0 open("/usr/lib/locale/en_GB/en_GB.so.2", O_RDONLY) = 3 fstat(3, 0xEFFFF19C) = 0 mmap(0x00000000, 4096, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0xEF7A0000 mmap(0x00000000, 86016, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0xEF6A0000 munmap(0xEF6A4000, 61440) = 0 mmap(0xEF6B3000, 5934, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 12288) = graphics/ccc.gif0xEF6B3000 close(3) = 0 open("/dev/zero", O_RDONLY) = 3 mmap(0x00000000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0xEF790000 close(3) = 0 munmap(0xEF7A0000, 4096) = 0 getrlimit64(RLIMIT_NOFILE, 0xEFFFFC98) = 0 lstat64("testfile", 0xEFFFFBA0) = 0 access("testfile", 2) = 0 unlink("testfile") = 0 llseek(0, 0, SEEK_CUR) = 2660 _exit(0) hydrogen#

Each line of output displays a system call, along with any parameters being passed to it. The return code from the system call is shown after the equals sign. Generally, a return code of zero means the system call was successful and any other return code demonstrates that an error occurred, but this is not always the case. The system call open() tells the kernel that you wish to open a file. As we saw in Chapter 6, "The Filesystem and Its Contents," each time a file is opened a file descriptor is assigned to it, so this system call will return the value of the file descriptor that has been assigned to it. When a program wants to close a file it calls the close() system call and passes the file descriptor as the parameter.

It can be seen that even a relatively simple program can still make many system calls. The first system call comes from the truss process as it executes the rm process by calling the execve() system call (system calls are usually written with the empty brackets following their name).

There are many other system calls until the one that actually does what we want, which is the call to unlink(). The command finishes with the exit() call, passing a zero as the command succeeded in deleting the file.

The truss command can be very useful for troubleshooting processes that are not doing what they should. If the program you are trying to run ends without doing anything, then you can use truss as in the above example. You may find that a program is terminating prematurely because a system call is failing, in which case you should see the offending system call near the end of the truss output. Possible problems you would pick up here include failure to create a file (maybe due to a lack of permissions) or failure to open a file (maybe it is not there or has incorrect permissions). Alternatively, you may find that a program you are running appears to be hanging for no apparent reason. In this case, you can use truss to examine an already running program by using the "-p" option:

 hydrogen# ps -ef | tail -5   jsmith   537   535  0 15:38:34 pts/0    0:01 -sh   jsmith   643   537  1 15:53:11 pts/0    0:00 pg     root   579   577  0 15:44:39 pts/1    0:00 -sh     root   644   579  2 15:53:17 pts/1    0:00 ps -ef     root   577   173  0 15:44:38 ?        0:00 in.telnetd hydrogen# 

If, for example, we were worried that process ID 643 (shown above) had hung, we could examine it using truss to see what it was doing:

 hydrogen# # truss -p 643 read(0, 0xEF73B150, 1024)       (sleeping...) hydrogen# 

Here we see that the process is currently in the read() system call, but it is sleeping rather than actually reading any data. This means that the process (in this case pg) is trying to read data from a file but there is no data for it to read, but there is also no end of file, so it just sits there waiting for data. In this case, we can see what must have happened. The user jsmith has run the pg command without supplying a filename so it is reading the standard input instead. The standard input is attached to the keyboard so it will read whatever is typed until it receives the EOF character (which is usually <control-d>). The user that ran the command is not typing anything, so the process goes into a sleeping state while it waits to receive data. This is a very simple example, but it demonstrates the type of troubleshooting that can be performed using truss to examine the system calls that a process is making.

Pargs

This command was only introduced with Solaris 9, but provides a number of useful features that would make a system administrator wonder how (s)he got on without it.

The default action of pargs is to display all the arguments that were supplied to a running process. This is very useful, but can't we get this information from a ps listing? We can for most processes, but there is a fixed length limit to the amount of information displayed by ps so we may not see all the arguments and parameters that a certain process was started with.

The following example shows the console login process. If we were a bit unsure of the arguments it was called with we could simply look using ps:

 hydrogen# ps -ft console      UID   PID  PPID  C    STIME TTY      TIME CMD     root   244     1  0 13:22:56 console  0:00 /usr/lib/saf/ ttymon -g -h -p hydrogen console login:  -T sun -d /dev/console hydrogen# 

However, if we look using pargs we see that some information was missing from the ps listing:

 hydrogen# pargs 244 244:    /usr/lib/saf/ttymon -g -h -p hydrogen console login:    -T sun -d /dev/console argv[0]: /usr/lib/saf/ttymon argv[1]: -g argv[2]: -h argv[3]: -p argv[4]: junibacken console login: argv[5]: -T argv[6]: sun argv[7]: -d argv[8]: /dev/console argv[9]: -l argv[10]: console argv[11]: -m argv[12]: ldterm,ttcompat hydrogen# 

There are other options to pargs. Possibly the most useful is the "-e" option, which will display the environment variables of a process:

 hydrogen# pargs -e 244 244:    /usr/lib/saf/ttymon -g -h -p hydrogen console login:    -T sun -d /dev/console envp[0]: PATH=/usr/sbin:/usr/bin envp[1]: TZ=Europe/Stockholm hydrogen# 

If you look at the man page you will see that there are a few other options to pargs, but these are the most useful.

Prex

This command will get a mention here, but that is about all. It has existed in Solaris for a while, but the man page only appeared at Solaris 8. Prex is a very powerful tool that is much more informative than truss, but it does take some time to get used to. It enables you to control tracing and set probes points in running processes or even the kernel itself. If you are familiar with debugging tools, such as sdb, then you may want to have a play with prex to see what it can offer.


    Team-Fly    
    Top
     



    Solaris Operating Environment Boot Camp
    Solaris Operating Environment Boot Camp
    ISBN: 0130342874
    EAN: 2147483647
    Year: 2002
    Pages: 301

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net