Self-Monitoring Analysis and Reporting Technology (SMART)

 < Day Day Up > 



Compaq Computer Corporation (Houston, Texas) took this predictive failure idea and, in conjunction with other major manufacturers, developed the Self-Monitoring Analysis and Reporting Technology, or SMART as it is commonly known. With SMART, the controller takes data from sensors and provides that data, upon request, to the BIOS, operating system, or other software designed to monitor drives. The exact items monitored vary from one manufacturer to another but can include such things as head flying height (predicts future head crashes), disk spin-up time and temperature (disk motor problems), or the number or errors corrected. In total, SMART can monitor up to 30 different indicators. It rates each of these items on a scale of 0 to 255. Separate standards are available for ATA (IDE) and SCSI drives.

Even though system administrators have a vital need for such a tool, the solution turns out to be not quite as smart as it could be and unfortunately may have somewhat limited real-world usefulness. Why? When the hardware manufactures established SMART, they all agreed on the data format, but it appears that they could not agree on much else. Every manufacturer, it turns out, decides which elements to monitor and report and what thresholds are meaningful. This means that end users can neither monitor their own drives directly nor establish which thresholds should be set to ensure reliability.

The SMART protocol, in fact, defined 30 attributes to be monitored. Each attribute is placed on a scale between 0 and 255. When the monitored value goes too high or too low, warnings are issued. Each manufacturer can pick and choose which attributes suit them (some use only a handful, others use most of them), and each manufacturer decides which thresholds to set. The worse part of all this for users is that they have no idea which manufacturer monitors what attributes or how the thresholds have been set. If a user wants to change these settings, too bad; they are not available. Users, then, really do not know what is happening on their disks. For example, is SMART monitoring head flying height, the number of remapped sectors, ECC use and error counts, spin-up time, temperature, data throughput, or any of the other 24 possible attributes? As noted, it is up to the manufacturer to program in what is considered to be an acceptable range for any or all of the above characteristics, depending on what the manufacturer regards as threatening.

This situation can be verified by studying the fine print of the various industry pronouncements on SMART. According to an American Megatrends, Inc., white paper, for instance, no particular standard is associated with SMART or SMART devices. Just as different manufacturers' disk drives have different designs and architectures, the same is true when it comes to implementing SMART on those drives, and the degree of monitoring varies from manufacturer to manufacturer. No evidence exists, of course, to support the claim that manufacturers intentionally set SMART to make their own drives look good nor does evidence suggest that the manufacturers are more interested in drives lasting out the warranty period than in providing a truly useful monitoring system. Anyone saying so would have a difficult time proving it; yet, some believe this to be the case.

Steve Gibson of Gibson Research Corporation (Laguna Hills, California) states on his Web site that the idea of SMART may sound cool, but drives do not like to "tattle on themselves" because that would make them look bad. As a result, it appears that, because each manufacturer decides what their drives will report about themselves and because those manufacturers are competing with each other, the SMART system has turned out to be rather dumb.

The fact that each manufacturer decides on its own standards does not bode well for buyers interested in finding out when their drives have a problem. Suppose a disk is operating at 90 percent but will not completely fail until after the warranty has expired; it is likely that the consumer will not be alerted to the problem, particularly if the problem becomes observable only after the warranty has expired. From an economic standpoint, this makes sense for the manufacturers, whether or not any of them have actually utilized SMART in this fashion. But, it would seem that in some situations, manufacturers may be acting against their own interests when warning users about potential or inevitable reductions in performance.

In the defense of disk manufacturers, it is easy to see why they have taken this approach. Seagate, for example, makes a disk analysis tool available to people to test drives before returning them. In many cases, the company finds that returned disks are actually working fine and that users returned them due to some other problem. Seagate reports that 40 percent of the disks returned to the company do not have a hardware problem. Its SeaTools diagnostic software (www.seagate.com) runs a series of tests on the drive and, when the disk is not at fault, offers advice on sorting out errors arising with the operating system, file system, or partitioning. It is available for download in either a desktop or enterprise edition. IBM similarly offers Windows and Linux versions of its free Drive Fitness Test 2.30 and several other disk utilities at www.storage.ibm.com.

So, faced with this tendency of users to attach blame to the wrong component, who can really blame manufacturers for not wanting to advertise potential problems, particularly those that are not noticeable and may never seriously affect disk performance? It could be a recipe for disaster in a business that already has the slimmest of margins.

Getting back to the main point, the real problem with SMART is that it does not appear to be reliable. It is common for drives to fail without SMART giving any type of warning; yet, having SMART can lull administrators into a false sense of security, thinking they do not have to monitor disk status because SMART is supposedly minding the shop. This may even lead to more unpredicted disk failures than would occur without SMART because at least then administrators would know they needed to pay more attention to disk health and possibly seek out an appropriate tool for effective monitoring. Sadly, though, most system administrators remember the hype that surrounded SMART a few years ago. Vendors boldly stated that hard drive failures were now a thing of the past and that, with SMART-enabled drives, most kinds of failure would be spotted in advance, thus allowing IT personnel enough time to replace the drives before any catastrophic data loss occurred.

Anecdotal evidence from users, IT consultants, and IT managers, as well as data from the disk vendors themselves, points to a general lack of results from SMART. This failure stems from many factors, including:

  • Failure to install SMART software to pick up alerts from drives

  • Lack of interoperability of SMART data among drive manufacturers

  • So many different implementations of SMART by manufacturers that users are never really sure what kind of alerts they might receive

  • Perhaps most surprisingly, shortcomings in the technology itself

How Often Do Disks Fail?

How often do disks fail? Hard drive failures seem to be growing in frequency, possibly related to factors such as the size of modern drives and their utilization in so many mission-critical activities. This is supported by a recent survey by San Jose, California-based Survey.com, which found that a majority of 1293 IT staff surveyed had experienced computer downtime in the previous year due to disk drive failure. An even more worrying finding of the survey, though, was that less than 10 percent actually received any prior warning (Exhibit 4). This brings to light an important point. One oft-forgotten aspect of SMART is that it requires three elements to succeed:

  • SMART-capable hard drive

  • Operating system that accepts and relays SMART commands

  • SMART application that displays these commands to the user

Exhibit 4: Hard Drive Failures Experienced in Past Year

start example

end example

These days, the first two are often present, but the third is either omitted or inactive. Such has been the hype behind SMART, in fact, that many users and even a fair number of system managers think that just buying SMART drives is enough. They do not bother to read the fine print and even discard the CD that comes in the box. Many have failed to appreciate that a SMART drive itself does not generate any SMART alerts. The user must use software to check drive status for it to have any value. An IT manager in a Los Angeles legal firm, for example, experienced the unthinkable on RAID-5 — two disks going down. He failed to check his alert lights and missed the first drive going down. In hindsight, he believes it had been down for weeks before the second disk died, as he had noticed a slight dip in performance. Afterwards, when he wondered why SMART had not warned him, he was embarrassed to realize that he had not configured the software required to receive the alerts.

Not So SMART

Engineers at the Burbank, California-based disk management vendor Executive Software, Inc., also ran into trouble with the SMART technology when they tried to build a graphical user interface (GUI) to make full use of SMART. They created fail conditions for over 100 hard drives using a variety of methods. According to the director of research, even with disks being horribly mistreated and about to fail, not once did they receive a SMART alert. Steve Speregen, MCP, MCSE, a network consultant who runs the LA Windows Networking Users Group, has worked with virtually every type of system and drive over the past decade. During that time he has experienced or witnessed over 100 disk failures. As he sees it, SMART is all smoke and mirrors, and he has yet to see it work. He reports that he put in all the settings and used special motherboards but not once was he notified of any errors. In his view, the newer SMART II is not any better. Even some drive manufacturers will admit that SMART is not all they thought it would be. Maxtor Corporation's Web site provides a SMART Phase II white paper that states that SMART technology is only capable of predicting 20 to 50 percent of hard drive failures with sufficient time to allow a user or system manager to respond. This paper also states that one of the best kept secrets is that SMART is only an advisory service at best.

Products Using SMART

A few products have attempted to make the best of SMART as it is. EZ SMART by Phoenix Technologies (Louisville, Colorado) and Intelli-SMART by LC Technology (Clearwater, Florida) are two examples, but these applications are primarily one-machine products that are not meant for broad monitoring of multiple disks.

Compaq Insight Manager (CIM)

Compaq Insight Manager (CIM) offers enterprise functionality. CIM agents do a fine job of monitoring Compaq servers and, according to some user reports, actually manage to extract some value from SMART. CIM collects over 200 data items from Compaq Proliant servers (of which SMART accounts for only a handful of data items) in order to monitor the performance of disks, CPUs, and various other devices. When it comes to disk monitoring, though, it basically relies on the SMART standard. In this case, however, CIM is probably the best existing usage of SMART as far as actually extracting some real value from it. Through its more comprehensive diagnostics and use of agents, CIM seems to do a better job of gathering SMART data. Some users even report it has alerted them to drive problems in advance. The downside, though, is that it works best with Compaq server drives. Fortunately, the latest version, CIM 7, integrates better with parallel monitoring systems in Hewlett-Packard and Dell server drives. It is not so good with clone servers, though. Additionally, some users report that CIM 7 can be quite complex to run, and setting it up to work with rival manufacturer systems can be a challenge. Further, as it relies only on SMART, its dependability is suspect.



 < Day Day Up > 



Server Disk Management in a Windows Enviornment
Server Disk Management in a Windows Enviornment
ISBN: N/A
EAN: N/A
Year: 2003
Pages: 197

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net