Summary

 < Day Day Up > 



Some Tools to Become Familiar With

Most systems and devices have some form of commands or interfaces that can be used in troubleshooting. We will take a quick look at a few of them to help you better understand how the use of these additional tools can help resolve problems more rapidly. Many of these commands have equivalent commands on other operating systems.

Following are some of the basic operating system commands:

  • truss(1) - Trace system calls and signals on Sun Solaris

  • tusc- Trace system calls and signals on HP systems

  • par- Trace system calls and signals on Silicon Graphics systems

These commands can be used to determine what is happening within a given process. They can be used when it appears the backup application is hung. By using truss or one of its equivalents, you can determine whether a particular file residing on a filesystem, when accessed by the operating system, could cause the same type of hang to occur. You can then confirm this by reading the file manually using some other type of operating system command. In this case, the use of the system command od could be used to enable you to determine that a file had an incorrect symbolic link to it.

Another situation where the od command is used is to show that accessing a file on a drive would cause the drive not to respond. This type of troubleshooting made clear that the problem was not with the application since it was waiting for the operating system to return the requested data. Therefore, the corrective action must be accomplished within the operating system.

In the second example, the system logs should show a failure as a timeout posted by the drive. However, in the case of the incorrect symbolic link in the first example, there were no errors and the system was happily spending its time trying to determine the endpoint of a circular link, resulting in the application appearing to be hung when trying to back up a particular file. The od command can also be used to manually read the data on a tape to help determine if the backup utility is writing the expected headers and data on the tape.

While rare, it is possible that the backup application can either directly or indirectly cause a system to panic or crash. In some instances, this can be caused by interoperability between the different vendors' hardware or firmware versions. In these cases, analysis by all the vendors involved is required for the problem to be determined. By using commands such as crash, you can sometimes determine if the problem is repetitive, which may signify a coding problem, or if it is random, which might point to some type of hardware problem. It is not our intention to teach you system dump analysis; however, we want to make you aware of some of the tools that are available.

Other areas that you need to become familiar with are the utilities that are incorporated in some peripheral devices, including those that are part of the SAN, such as switches, bridges, and routers. Although the terms bridge and router are sometimes used interchangeably, in the networking world, a fine distinction may be made. A bridge allows the change of medium between devices. For example, the Chaparral FS1310 allows connections between SCSI devices and fiber networks. It is a bridge between the two technologies. With a router, there may not be a transformation of medium.

No matter what you call the devices involved, failures at this level may result in the backup application reporting backup errors. Keep in mind that the backup application will probably be the first place that there is an opportunity for the error to become visible to the user. Within most devices that are on the SAN or network, there is also what one could consider a small operating system that is responsible for a particular type of activity. In these devices, there is usually a small command set that you can use to help determine the cause of a failure. These command sets can be used to show connections, errors, firmware levels, and so on. In addition, you can sometimes use these command sets to determine status of connections, health of the physical medium, and so on, as well as to determine where a failure may be happening.



 < Day Day Up > 



Implementing Backup and Recovery(c) The Readiness Guide for the Enterprise
Implementing Backup and Recovery: The Readiness Guide for the Enterprise
ISBN: 0471227145
EAN: 2147483647
Year: 2005
Pages: 176

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net