Error Logging and Tracking


In the event-driven programming environments that are a feature of all modern network operating systems, event logging is a standard capability. Not only does the operating system log events, but so do major server applications, which log events in their own proprietary logs. Events can be either hardware or software related, or they can be a function of both. The following kinds of events are stored in event logs:

  • Error condition

  • Alert or warning issued

  • Status update

  • Successful monitored condition

  • Unsuccessful monitored condition

Event logging imposes a certain amount of overhead on a server and on the applications that run on it. That's why many of the logging capabilities of a network operating system or a server application are not used. In some cases, entire logs are always on by default; this is the case with Windows Server logs. In other cases, you have to explicitly turn on logging; for example, you might have to do this with an application log.

The snippets of software code used to trap events and record their existence are called counters. Counters are available to measure CPU utilization, the number of active processing threads, aspects of memory usage, and on and on. When you install an application, you may install with it a whole new set of counters, many of which aren't activated by default. If you logged all the counters that were actually available to log, the overhead would slow your system to a crawl. Therefore, many counters are left off by default.

One set of counters you have to turn on is the Windows server disk counters in Windows 2000; they are turned off by default in that operating system because using them affects disk I/O performance.

To turn on the disk performance counters in Windows Server 2000, you do the following:

1.

Select Start, Run, enter cmd in the dialog box that appears, and then press Enter.

2.

Enter the command diskperf y for standard IDE or SATA drives, or enter diskperf ye for software RAID and mirrored drives. (Note that hardware RAID is not affected by this command.)

Note

The diskperf disk counters are turned on in Windows Server 2003 by default.

3.

Open the Windows Performance Monitor by entering perfmon in the Command window and then press Enter.

4.

Verify that the disk performance counters are now available from the list of counters.

Different operating systems and different applications offer additional counters that are turned on (and off) either using command-line tools or through the use of graphical interface tools. You need to consult those products' documentation for details.

The Purpose of Event Logging

The purpose of event logging is to provide a means of analyzing the state of your system as well as provide you with a historical record of the system's performance. When you examine an event log, you see the type of events listed, and each event is identified by its source and some kind of an ID number. The sequence of events is often a key to understanding why something on your system doesn't work properly or how you can change your system to make it perform better. Many people turn to event logs when things go wrong.

Typical scenarios where you would want to examine your event logs include the following:

  • Hardware failure If you are experiencing problems such as an intermittent network connection or a disk error, you should check your event logs.

  • System performance issues If your system or application doesn't seem to run as well as it should, and you suspect some kind of resource issue such as a memory leak or errant daemon, you should check your event logs.

  • Security issue If you want to check who has logged on to your network or one particular server or who has logged on to a database application, you should check your event logs.

  • Resource access If you have a network share that can't be accessed and you want to find out whether the issue is a permissions problem, hardware problem, or some other issue, you should check your event logs.

  • State of an application If you initiated a system backup and want to know how long it took and whether it is finished, you should check your event logs.

As you can see, event logs can be very useful even when your system seems to be running correctly.

Proper analysis of your event logs can offer you all sorts of valuable information; they are veritable gold mines of information. And that's the real problem: With so many recordable events that it is possible to record, it is hard to keep track of them all, correlate them with behavior, manage the information they contain, and, most importantly, figure out which event has particular significance. No wonder the area of event analysis and management has given rise to a whole host of management tools that monitor events, collect events, analyze events, alert you when an event occurs, and more.

Many events just keep on happening. When your mouse button sticks, it sticks. If your event log records MouseDown events, your log is going to be getting a lot of MouseDown events to record. Chances are that you only need to see a few of these events to figure out the cause.

Many network, network operating system, and application operations repeat themselves. Give a command, and chances are that if it doesn't successfully complete, the application will continue to execute the command, if only to determine whether the resource required is simply busy and not permanently unavailable. When you examine an event log, you should see that it is loaded with large numbers of similar events, and this can be overwhelming at first. Any event log or product that manages events requires a good search filter utility to aid in finding events of interest. Smart products must be able to differentiate between multiple instances of the same event and related events that occur in an escalating chain. Understanding the relationships is one of the best troubleshooting tools for a server administrator.

You need to keep in mind that event logs of operating systems and event logs of applications are almost always in different formats. In most instances, you can export all event logs to database files and then analyze them in a database. However, when you try to pull all the information from multiple sources, you may find that you can't because all the different data isn't organized in the same way. Therefore, to work with the combined data from multiple event logs coming from different sources, you need to do considerable work up front to make the data compatible. This is an area where you need to rely on third-party tools to help you out, such as Kane Security Analyst's Event Log Analyzer Tool. One tool you can try is Microsoft's free LogParser 2.0, which is a command-line utility that lets you execute a SQL query against an event log file.

Event Logs

The Windows Server 2003/2000 operating system comes with several different event logs that system administrators can use. Which logs exist depends on the type of Windows server you have. All Windows servers come with three logs, and additional event logs are maintained for domain and DNS servers. Figure 21.4 shows the Windows Event Viewer.

Figure 21.4. The Windows Event Viewer.


These are the standard system logs for Windows servers:

  • Application log (AppEvent.evt) This log, found on all Windows servers, contains information for all counters that applications install and all others that they enable.

  • Directory Service log (NTDS.evt) This log, found on Windows domain controllers only, records the events associated with the Windows Active Directory service.

  • DNS Server log (DnsEvent.evt) This log is found on all DNS servers, whether they are domain servers, application servers, or even standalone servers.

  • File Replication log (NtFrs.evt) This log, found only on Windows domain servers, records replication events associated with domain controllers.

  • Security log (SecEvent.evt) This log, found on all Windows servers, contains information that is set by the Windows security audit policies.

  • System log (SysEvent.evt) This log, found on all Windows servers, contains information about the operating system and hardware components.

To view any of the specific log files, you should look for them located in your Windows directory, at %systemroot%\system32\config.

By default, the Windows event logs maintain a fairly small log size and overwrite older events on a first-in, first-out basis. You may find that the default settings for log size are inappropriately small and that a log doesn't maintain a record of events long enough to maintain an appropriate historical record.

To change the behavior of the different Windows event logs for Windows servers on a network, you can open the Windows Group Policy Editor and click the event log settings. These settings let you control the size, access to, retention methods, and other policies for the Application, Security, and System logs. Figure 21.5 shows the Group Policy Object Editor event log settings.

Figure 21.5. The Group Policy Object Editor event log settings.


You can access the Application, Security, and System logs from the Event Viewer utility. To access the event log, you select Start, All Programs, Administrative Tools, Event Log (refer to Figure 21.4). To view a specific event, you click the log you want to view in the left panel and then double-click the specific event you want to view. Figure 21.6 shows a sample event from the System logone that logs a DHCP error condition. The specific event properties list a variety of information, including when and where the error happened, the source of the error, and what the specific type or ID of the error is. The event log is well known to almost all Windows administrators. A little less well known is what to do with specific errors after you view them.

Figure 21.6. A particular event's description.


The problem with many event logs and with Microsoft's logs in particular is that they offer explanations that can often be difficult to decipher. Over the years, the messages have gotten somewhat better, but they still have the ability to confuse and confound. They also don't offer much in the way of practical advice on how to fix the problem in question. So the first step in most cases where you are diagnosing a Windows issue from the event log is to get more information about the event in question. At the end of an event's description (refer to Figure 21.7) is a hyperlink to Microsoft's event library. When you click the link, you are asked if you will allow the information to be sent to Microsoft so that it can match the event's ID. If you agree, the Windows Help and Support Center opens a browser window with more details on the event in question, as shown in Figure 21.7 If you liked the description of the event in the event log, chances are that you will like the description in the Help Center.

Figure 21.7. Windows Help and Support Center information on an event.


You might want to look at additional sources of information to help decipher different event IDs. One of them is EventID.Net (see www.eventid.net), which is run by the consulting group Altair Technologies. When you enter an event ID and its source into the EventID.Net search field, you get an alternative description of the event. The EventID.Net database contains a collection of event descriptions and also descriptions of the experiences of a number of contributors. If you scroll further down the page, you see comments, questions, and answers detailing experiences with this particular event, which can really be useful.

Novell maintains specialized event logs for security audits, as a management platform for Windows events, and with many of its applications, such as the GroupWise messaging server.

Solaris has several log files that you may want to check:

  • /etc/system This log lists your kernel configuration parameters.

  • /var/adm/messages syslog This log is the daemon log. Problems are listed with flags such as warnings, errors, panics, reboots, and so on. This is the most important Solaris log to examine.

  • /etc/release This log lists your OS version information.

Note

Sun offers a concise performance monitoring tutorial at http://sunsolve.sun.com/pub-cgi/show.pl?target=content/content3&ttl=article.





Upgrading and Repairing Servers
Upgrading and Repairing Servers
ISBN: 078972815X
EAN: 2147483647
Year: 2006
Pages: 240

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net