What to do when your system has crashed
Depending on the cause of the system crash, the system may not have been able to reboot itself successfully. Cases where this would be true include:
-
Catastrophic hardware failure, such as faulty memory or a crashed disk
-
Major kernel configuration faults, such as a
buggy
device driver
-
Major kernel tuning errors, such as
maxusers
being much too big
-
Data corruption including corruption of the operating system files
-
Manual intervention is needed, for example,
fsck
needing answers to its queries
Was the system recently
tuned
?
If you just tuned your system and tried to reboot under the new kernel and the system panic'ed, you already have a good idea where to start your search for the cause of the panic. If you named your new, untested kernel /
vmunix
on your Solaris 1 system or if you directly edited
/etc/system
on your Solaris 2 system, you will most likely find the system in an endless boot and panic loop. Rebooting the "generic" kernel for Solaris 1 will get the system back up. For a Solaris 2 system in this scenario, you can use
boot -a
and choose
/dev/null
as your
/etc/system
file to return to a generic kernel.
When tuning systems and testing the new kernel changes, it's a good idea not to use
/vmunix
or
/etc/system
until you know the changes are good. Instead, use
/vmunix.test
or
/etc/system.test
, for example. That way, should the system panic, at least the system will have a better chance of coming back up under a known good kernel. This is particularly sound advice if you are planning on going on vacation right after tuning a new kernel and booting it up.
Has anything else changed recently?
If the system had been running beautifully for the past year, suddenly died, and now won't come back up, you will need to read the messages that appear during the boot attempts. Look for messages that might point to hardware trouble. It would be a good idea to check all of the cables for proper connections. Also, make sure all the disk
drives
and other peripherals are still getting power. If everything seems to be in order, attempt to run diagnostics on the hardware.
On occasion, systems
demonstrate
sensitivity to their environment. With a workstation sitting on your desk
next
to your plants and your coffee mug, it's sometimes easy to forget that computers are ultrasensitive electronic devices. Always remember:
-
Proper air flow is required for cooling the electronic
components
.
-
If the environment is much too hot for you, it is probably also too hot for your computer. Power down your computer equipment if you expect the air-cooling systems in your area to be shut down.
-
Unless protected by an Uninterruptible Power Supply (UPS), your system can suffer damage during electrical storms and interruptions of power.
-
Dirt and dust inside some computers can lead to problems over time. Discuss with your vendor whether Preventative Maintenance
visits
are recommended.
-
Unless a system is designed to ruggedized standards, it can be damaged by high
vibration
and excessive movement.
-
Power down all components of the system whenever you need to do hardware
repairs
,
replacements
, or rearrangements. Don't, for example, change SCSI devices while the system is running.
-
Electrostatic
discharge
will easily damage your computer.
Never
touch or let
anyone
else touch the internal workings of your system without proper ESD protection
.
|