What to Do Every Day | Microsoft Windows Server 2003 Insider Solutions

IT professionals know better than anyone that the moment they walk in the door there will be dozens of people vying for their attention. Everyone needs something done and they all need it done now. Although you want to be helpful and attentive, it's critical not to forget that there are daily tasks that need to be done to ensure the stability of the network.

Ensuring that Maintenance Tasks are Completed Regularly

Ensuring that maintenance tasks are completed regularly is critical to the stability of a network. When possible, distribute maintenance tasks throughout your IT staff in the areas of their expertise. This way there are qualified personnel doing the maintenance tasks and they don't feel as though their entire job is maintenance. Ensure that the people performing the maintenance tasks are following the written procedures to the letter. Have them sign off on the task on a checklist so that both employee and management can be sure the task was completed.

Read the Logs

The Event Viewer is the first thing you should monitor and review. The event logs keep track of critical events that can be telltale signs of system problems, and you should be in the habit of checking the event logs each and every day. This is especially critical if any changes have been made in the environment. The Event Viewer will usually spot potential problems before users do. Investigating each and every critical event in the Event Viewer before they become problems can result in proactive problem resolution. Eventually you will reach a point where some events are expected and can safely ignore those events. Any new issues should always be researched.

Checking on System Resources

Being aware of the available resources on all systems serves several purposes. Not only can reviewing resources potentially identify short- term problems, but it also allows a network administrator to do educated resource planning. Part of maintaining a network is to know when to add servers and when to consolidate servers. There are several key items to examine daily:

Available hard drive space. Ensure that there is sufficient space on the server system drives. Running low on available disk space will result in a noticeable reduction in system performance. In some instances it can cause the system to crash. If data is stored on separate drives, monitor the data drives as well.

Running out of space on log drives can lead to a server crash because most applications that maintain log files halt the application service when the log files are full.
Available system memory. If the system is running low on memory, a network administrator can expect a reduction of performance. When a server runs low on memory, the network administrator should consider adding more memory or examining the system more closely to determine if a particular application might warrant its own server. This is also a great way to spot memory leaks in an application.
CPU utilization. This is much easier to monitor if you use a log of the performance counters. This allows you to spot anomalies in CPU usage. This can be a warning sign of upcoming system problems. This information can assist you in conducting capacity planning by determining what resources are required to add processing capability in a server or add an additional server.

Checking these items daily is a good start on the path to effective system maintenance. Logging the counter information on a long-term basis can help you conduct trending on a server to improve ongoing planning. This type of monitoring is covered more in depth in Chapter 21, "Proactive Monitoring and Alerting."

Any Event in the Event Viewer

For any event in the Event Viewer there will be a link that often leads to further insight on the issue and possibly a specific resolution.

For more information, go to the Help and Support Center at http://go.microsoft.com/fwlink/events.asp. Other helpful locations include Microsoft TechNet support at http://www.microsoft.com/technet, or Microsoft Support at http://support.microsoft.com.

Another resource for information on events is to perform a search on the Event ID on a search engine and possibly find useful information from someone who has encountered the same problem in the past.

Although it might seem redundant to check multiple resources for information, commonly the problem resolution is noted in one site that is not noted in another site. So using multiple resources can frequently identify problems that might otherwise not be found from other resources.

If You See That Something Is Amiss

If you see that something is amiss with system resources, don't just take it at face value. Although it is easy to look at a drive that holds the logs and determine that the drive is full and just delete files, you might want to investigate why the tape backup software is not deleting log files after a successful backup. If the backup software is supposed to be clearing log files after the system is successfully backed up, could the system possibly not be backed up successfully? Another question to ask is whether the drive demand has been growing at a steady rate or did the capacity demand go from 10MB per day to 1GB per day? Sudden changes in system resource usage are sometimes related to failures in other subsystems or even viruses. Always get to the bottom of odd changes to identify the root cause rather than simply creating a short-term fix.

Verify the Backups

Regular backups are vital to the recoverability of Windows 2003 and Active Directory. One of the most important maintenance tasks you can perform is to ensure that the backups are running properly. Most people think that this means checking the logs on the backup server and making sure there are no errors. Often, this is a task that is handled by the personnel that takes care of the backup systems. Server administrators should request a copy of the backup logs and review them as well to ensure they are working. This is a great time to compare the logs to your server's directory structure to ensure that no important directories are being missed.

Check to ensure that the backups are doing everything they are supposed to be doing. It's not unusual for a backup job to stop a particular service before the backup occurs and restart it afterward. If you are using these kinds of functions it's important to make sure they are doing what they are supposed to. If your backup software clears log files after they are backed up, ensure that this is happening. This can prevent annoying outages later due to drives becoming full.

Starting in Windows 2000, Microsoft added the concept of the System State. The System State contains information such as the system boot files, the system Registry, disk quota information, File Replication Service information, as well as databases for COM+ Class Registration, Certificate Services, Terminal Services, and Clustering. Even the event logs and context indexing catalogs are contained in the System State. On domain controllers, the System State also contains the Active Directory database. As such it is critical to ensure that the System State is being backed up on each Windows 2000 and 2003 server in the enterprise if you are to have the capability to recover Active Directory.

Try Adding Flags

If your backup jobs do additional tasks such as launching batches before or after a job, try adding flags to the batches to make it easier to see that they were done. Something as simple as:

 Net stop "Service A" Echo Prebackup batch "Stop Service A" ran on %date% at %time% successfully >> batchre- sults.txt

Or something as fancy as:

 mapisend -u "Default" -p "password" -r Administrator -s "Stopping Service A" -m " Prebackup batch "Stop Service A" ran on %date% at %time% successfully "

will give you a single place to look to see that the batch jobs ran successfully.