Look for patterns


When analyzing a series of UNIX system crash dumps from one machine, we look for patterns. These four crash dumps all certainly have a strong similarity (in fact are as nearly identical as crashes can get). The most obvious similarity is the failure to access block 134678 on disk id010e. However, there is another very strong similarity. Did you notice it?

Look at the crash times. All four crashes occurred between 3:21 and 3:31 in the morning. They occurred on four different days of the week: Saturday, Wednesday, Monday, and Sunday. What does that make you think about?

Good old cron jobs, of course. A quick run of ps on any of the crash dumps reveals that the following command was being run at the time.

 find / -name .nfs* -mtime +7 -exec rm -f {} ; -o -fstype nfs -prune 

Another interesting note is that the system seems to have had trouble rebooting on at least two occasions. Look again at the time stamps on the crash dump files themselves . As you know, savecore creates the postmortem files during the execution of /etc/rc.local , in other words, when the system is nearing completion of the boot-up process.

It appears that crashes 12 and 13 didn't get saved until someone came into the office.



PANIC. UNIX System Crash Dump Analysis Handbook
PANIC! UNIX System Crash Dump Analysis Handbook (Bk/CD-ROM)
ISBN: 0131493868
EAN: 2147483647
Year: 1994
Pages: 289
Authors: Chris Drake

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net