Trends in System Operations

I l @ ve RuBoard

With rapidly changing computer technology and the growth of the Internet, the duties of the system operator are becoming increasingly complex. New hardware platforms, new software products, and a larger number of systems and networks to manage make it difficult for system operators to rely on manual tools to do their jobs. Automation is clearly needed. For many tasks , the operator is already relying on software automation. For example, system backups used to be done with primitive commands; now there are sophisticated software packages such as Hewlett-Packard's OmniBack II and Legato's NetWorker that are commonly used to simplify and automate the task of backing up a system or set of systems.

A number of trends in the computer industry are making system monitoring much more critical. For some companies, corporate data must be available 24 hours a day, 7 days a week, and the penalty of downtime can be measured in lost revenue. In these environments, operators must not only detect current problems, but must also be able to predict when failures will occur. This book describes some of the tools that can identify trends or events that could lead to downtime.

Another industry trend is that Information Technology (IT) departments are now being measured on the percentage of time that systems are up, the performance of the systems, and how quickly operators can resolve user problems. Management software needs to indicate not only what a problem is, but also how to fix it. System management is now the largest contributor to the total cost of ownership of UNIX servers.

While customer demands on the operations staff are increasing at a rapid rate, the evolution of management tools is struggling to keep pace. Tools that once were text-based and user-initiated are now graphical and event-driven. An operator can watch multiple systems from a single, centralized console. Systems can be shown as icons with colors showing status. This gives the operator a quick overview of the state of the data center. An increasing number of problems can be detected and reported to the console.

The problem-reporting tools, however, are overwhelming the typical operator. Some help is provided by enterprise management products, which are introduced in Chapter 3, "Using Monitoring Frameworks," and then referenced throughout the rest of the book. If one component shared by many systems fails, the result can be a storm of events at the management station. In addition, a component failure can have a cascading effect as it causes other components or products to fail, too. Again, a large number of events arrive at the system or centralized console. An operator needs to be able to wade through these events to find the root causes. Chapter 9, "Enterprise Management," describes some of the emerging technologies to deal with this problem.

Before describing the fault management tools, we must first define what we mean by "fault management," which we do in the next chapter. We also give you descriptions of the types of events you need to be prepared to receive to adequately protect your UNIX servers.

I l @ ve RuBoard


UNIX Fault Management. A Guide for System Administrators
UNIX Fault Management: A Guide for System Administrators
ISBN: 013026525X
EAN: 2147483647
Year: 1999
Pages: 90

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net