What to do when your system has crashed


Depending on the cause of the system crash, the system may not have been able to reboot itself successfully. Cases where this would be true include:

  • Catastrophic hardware failure, such as faulty memory or a crashed disk

  • Major kernel configuration faults, such as a buggy device driver

  • Major kernel tuning errors, such as maxusers being much too big

  • Data corruption including corruption of the operating system files

  • Manual intervention is needed, for example, fsck needing answers to its queries

Was the system recently tuned ?

If you just tuned your system and tried to reboot under the new kernel and the system panic'ed, you already have a good idea where to start your search for the cause of the panic. If you named your new, untested kernel / vmunix on your Solaris 1 system or if you directly edited /etc/system on your Solaris 2 system, you will most likely find the system in an endless boot and panic loop. Rebooting the "generic" kernel for Solaris 1 will get the system back up. For a Solaris 2 system in this scenario, you can use boot -a and choose /dev/null as your /etc/system file to return to a generic kernel.

When tuning systems and testing the new kernel changes, it's a good idea not to use /vmunix or /etc/system until you know the changes are good. Instead, use /vmunix.test or /etc/system.test , for example. That way, should the system panic, at least the system will have a better chance of coming back up under a known good kernel. This is particularly sound advice if you are planning on going on vacation right after tuning a new kernel and booting it up.

Has anything else changed recently?

If the system had been running beautifully for the past year, suddenly died, and now won't come back up, you will need to read the messages that appear during the boot attempts. Look for messages that might point to hardware trouble. It would be a good idea to check all of the cables for proper connections. Also, make sure all the disk drives and other peripherals are still getting power. If everything seems to be in order, attempt to run diagnostics on the hardware.

On occasion, systems demonstrate sensitivity to their environment. With a workstation sitting on your desk next to your plants and your coffee mug, it's sometimes easy to forget that computers are ultrasensitive electronic devices. Always remember:

  • Proper air flow is required for cooling the electronic components .

  • If the environment is much too hot for you, it is probably also too hot for your computer. Power down your computer equipment if you expect the air-cooling systems in your area to be shut down.

  • Unless protected by an Uninterruptible Power Supply (UPS), your system can suffer damage during electrical storms and interruptions of power.

  • Dirt and dust inside some computers can lead to problems over time. Discuss with your vendor whether Preventative Maintenance visits are recommended.

  • Unless a system is designed to ruggedized standards, it can be damaged by high vibration and excessive movement.

  • Power down all components of the system whenever you need to do hardware repairs , replacements , or rearrangements. Don't, for example, change SCSI devices while the system is running.

  • Electrostatic discharge will easily damage your computer. Never touch or let anyone else touch the internal workings of your system without proper ESD protection .



PANIC. UNIX System Crash Dump Analysis Handbook
PANIC! UNIX System Crash Dump Analysis Handbook (Bk/CD-ROM)
ISBN: 0131493868
EAN: 2147483647
Year: 1994
Pages: 289
Authors: Chris Drake

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net