We've found a deadlock situation involving 12 threads. On a system that may be running hundreds of threads, this doesn't necessarily cause a system to be hung. It just hangs those 12 threads. What we have to ask now is this: "Did the deadlock involve processes that gave the system a "hung" feeling from the end-users' point of view?" To find that out, we will check which processes were involved. Using offsets into the thread structure and the proc structure, we will take advantage of adb 's ability to use pointers to pull the commands for the 12 threads involved. *(e18c9ec0+a0)+260/s p0+0x260: sched *(e1bd8ec0+a0)+260/s p0+0x260: sched *(f5ab7400+a0)+260/s 0xf620ca60: /bin/sh /local/bin/abc.sh dragon *(f5a3ce00+a0)+260/s 0xf5d2fa60: /usr/lib/lpsched *(f6756000+a0)+260/s 0xf66b2260: abc_job.x -s multiprocessor -g gateway1,gateway2,gateway3 *(f57a1c00+a0)+260/s modlexec+0x7f3c: /usr/lib/saf/sac -t 300 *(e1b6bec0+a0)+260/s p0+0x260: sched *(e18d2ec0+a0)+260/s p0+0x260: sched *(f6226e00+a0)+260/s 0xf5ae1260: abc_daemon -e /export/dragon/abc_files The next three are the deadlocked processes *(e190cec0+a0)+260/s p0+0x260: sched *(f600ec00+a0)+260/s 0xf626e260: abc_printer *(f66d2800+a0)+260/s 0xf5d43a60: abc_job.x -s multiprocessor -g gateway1,gateway2,gateway3 The processes involved in the "hang" included a lot of third-party software. Also, it was surprising to see the kernel involved (always shown as command sched ) so many times. So, on a hunch, we decided to dig a bit more. |