A stable system is able to do the job it is asked to do. An unstable system uses resources unnecessarily and may cause problems for other systems. It may also unexpectedly go offline, crash, and/or become unrecoverable. You should look for any evidence of instability and correct it. Availability is part of the securities domain, and external denial-of-service attacks are not the only threat to continued operation. Two tasks are necessary:
Validate hardware operation.
Ensure that power is stable.
A transport company spent four days and over $35,000 installing a new firewall. Staff had been complaining about poor server performance. The administrator noticed that base system load had gone up dramatically following a major virus problem that affected all MS Windows desktop systems. During times of poor Linux server performance there was no notable network traffic. The assumption was made that the Linux system might have been affected by a virus or may have been intruded upon by someone from the Internet connection. The administrator was instructed to put in a new firewall and then sort out the server. The real problem was not a virus or compromise, it was hardware malfunction. If the administrator had initially looked for hardware failure, a great deal of frustration, expense, and loss of productivity could have been avoided.
Hardware failures are mostly easy to detect. Monitor, keyboard, mice, and serial port failures are generally very obvious. Failures in storage media will almost certainly generate error logs in the system log file. To see if there may be a storage media problem, execute the following:
[root@linux /] # grep error /var/log/messages
A telltale hardware fault will produce an error message such as
Ide: sector buffer error ide: I/O error, dev XX:xx (had), ...
It is not uncommon to find error messages pertaining to CD-ROM and DVD drive use. These can be treated symptomatically.
Hard disk storage “ related errors must never be ignored. If a hard disk is defective, replace it. It never pays to gamble with file storage hardware integrity.
Unstable power sources pose great risk to data integrity. File system data can be damaged by power fluctuations, spikes, and surges. Brownouts and blackouts can do damage to hardware also. One fact is often overlooked: most damage to hardware is incurred as power is restored following a brownout or a blackout .
When power supply is interrupted briefly , the computer may show no immediate symptoms. Switch mode power supplies store power that is in flux while the power unit is in normal use. Depending on the design of the power unit, the power that is in flux may take a few milliseconds to be fully dissipated in the event of an unplanned interruption to power supply. When power flow is interrupted and restored before the power unit loses all power, the condition known as a brownout has occurred. The computer generally does not need to be rebooted following a brownout.
A complete power loss means that the system will have to be rebooted. This condition is caused by a blackout. Following a brownout or a blackout in the electrical supply grid, as power supply is restored there may be a sag and/or a surge in voltage as equipment comes back into service or recovers from the loss. The use of power conditioning equipment, an uninterruptible power system (UPS), is essential to protect computer equipment from exposure to such events.
Table 1-1 lists the most common hardware failures.
Hard drives , monitors
Power sags and surges
Motherboards and peripherals
Serial ports and network interfaces
Power spikes and lightning strikes
UPSs are manufactured in many different types. The basic types of UPSs sold today include
Power conditioning (filters)
Failover battery backup
Failover battery backup UPSs generally provide a filtered mains supply. When the mains supply fails, a battery operated inverter will cut in to provide power continuity. On the whole, failover UPSs must experience an interruption in supply before cutting over. This causes symptoms equivalent to a brownout to pass through to the computer system.
Always-on UPSs supply the computer equipment with power that is generated by an inverter. This generally results in the best quality of power that can be obtained. The inverter will run off rectified mains power with an active online battery that operates in parallel. When the grid power supply fails, the battery continues to provide power to the inverter. When power is restored, the battery is simply recharged while online.
Uninterruptible power supply technology is a specialist art. Make sure you obtain sound advice in selecting the right type of UPS for your installation. Also, be certain to follow the manufacturer s advice regarding planned maintenance for the UPS.