Incident Response | Inside Network Perimeter Security (2nd Edition)

An incident, in the context of this chapter, is an anomalous event that can impact the confidentiality, integrity, or availability of the infrastructure. The anomaly might be malicious, or it might be an indicator of a system fault. In either case, we need to know how to set up alerts that warn us about a potentially threatening condition. Both IDS and system-monitoring mechanisms are useful for detecting suspicious events, but they are not of much help if administrators are not actually aware of conditions that these systems observe.

One way to remain apprised of the state of your resources is to periodically check the status screens of the monitoring system or the IDS console. Relying solely on this approach, however, does not allow detection and reaction to problems as soon as they occur. This is an especially significant concern for organizations that need to respond to incidents around the clock but cannot afford to hire personnel to observe the alert screen of the monitoring system 24 hours a day. Configuring IDS and monitoring systems to send out alerts and knowing how to respond to alarm conditions is an integral part of effective security perimeter maintenance. In this section, we discuss how to configure notification options in a way that is consistent with your security policy and monitoring requirements. Additionally, we examine considerations for responding to detected system faults and malicious events.

Note

Relying solely on automation of IDS or of the monitoring systems to detect all suspicious events in real time is dangerous. We illustrated limitations of this approach in Chapter 8 in the discussion of false positives and false negatives. Therefore, be sure to combine automated alerting with manual examination of data that such systems collect.

Notification Options

An old slogan of a New York telephone company reminded people that "We're all connected." Accessibility of system administrators, wherever they are, allows systems to get in touch with them in close to real time whenever a device requires attention. When evaluating or deploying an IDS or a monitoring system, consider the following aspects of its notification functionality:

What means of alert notification do you require? Some of the more popular options are email, pager, and SMS-based messages. It might also be possible to integrate the notification mechanism with other applications already in use in your organization.
What alert acknowledgement options do you find useful? If a critical alert is sent to multiple administrators, you might want to have the ability to let your colleagues know that you are working on the problem so that your efforts do not overlap.
How configurable is the notification logic? You might want to send alerts to different people depending on the day of the week or the time of the day, or issue alerts to different devices depending on the event's severity. Also, plan for outages in your notification scheme because the primary notification mechanism might fall victim to a system fault or an attack.

Tip

Keep in mind that some of the cheaper paging devices, especially those that are purely numeric, do not guarantee a message will be delivered. To ensure that an alert eventually reaches the administrator when he is back in the communications range, you might want to subscribe to paging services that can queue messages to guarantee their delivery. Alternatively, or in conjunction with this, consider configuring the monitoring system to periodically reissue alerts until one of them is acknowledged.

One of the advantages of having a centralized alerting configuration is that only a single host needs to have the ability to generate alerts destined for the mailboxes or pagers of system administrators. This eliminates the need to set up dial-out modems on every observed system for dialing a numeric pager or the need to enable each server to send email when its business purpose does not require such functionality. In Big Brother, for example, a central server consolidates performance and availability information, and only the host that is designated as BBPAGER issues notification alerts. To eliminate a single point of failure, you might consider setting up multiple alert-generating systems or configuring them to operate in a failover mode.

General Response Guidelines

Make sure your company's policies and procedures clearly explain what an administrator should do when responding to an alert. Creating and distributing a document that answers the following questions will help you make sure problems are resolved in a consistent and thought-out manner that has the support of management and technical personnel:

Who is responsible for responding to alerts? Defining this in advance helps ensure that the qualified staff members are available, according to relevant policies and business requirements. Also, this helps prevent administrators from failing to respond because they think somebody else will react to the alert.
Whom should the administrator notify of the problem? As we already discussed, it is often worthwhile to let other administrators know that someone is responding to the alert. Additionally, the company's management should probably be notified of severe or prolonged conditions.
What troubleshooting and investigative steps should the administrator take when resolving a problem? Consider creating a document that explains how to handle problems common to your infrastructure. (Chapter 21, "Troubleshooting Defense Components," addresses this aspect of perimeter maintenance.)
How should the administrator connect to the troubled system? VPN access might come in handy when the administrator is away from the hosting facility. You might also want to define transportation options if she needs to travel to the hosting facility.
When should the administrator call for help? It's possible that the person who is responding to an alert might not be the best person to resolve a particular problem. Specifying in advance when to involve a colleague, or even an external vendor, and empowering the administrator to do so even during off-hours helps to expediently resolve problems.
How should the administrator document the cause of the problem and its resolution in a way that might help the company learn from this experience?

It is not uncommon for the administrator who is responding to the alert to perform a preliminary examination and then to call in the heavy artillery for in-depth troubleshooting. One such scenario, perhaps most relevant to this book, is when the administrator suspects that the event is malicious in nature. Your security policy should account for the need to investigate such situations and define the roles and responsibilities for the staff involved in responding to malicious incidents.

Responding to Malicious Incidents

Any anomaly, whether reported by an IDS or a monitoring system or recognized by a human, might turn out to be a malicious incident. After the event is deemed to be malicious, the specialized procedures created for handling such situations should guide your staff's response. These procedures can be broken into several phases, as defined in Computer Security Incident Handling Step by Step, published by the SANS Institute.³ The following list presents a high-level overview of these phases:

Preparation Tasks in this phase need to take place before an actual response to a malicious incident. They involve formalizing policies and procedures, training team members, and preparing communication channels for contacting people inside and outside your organization.
Identification In this phase, a primary handler is assigned to the incident. He begins the investigation by determining whether the incident is, indeed, malicious. If it is, he assesses its scope, establishes a chain of custody for collected evidence, and notifies appropriate personnel.
Containment The purpose of this phase is to set the stage for further analysis while preventing the incident from escalating. Here, the handler creates a backup of the affected systems to make sure that pristine evidence is available for later use. He also assesses the risk of continuing operations by reviewing logs, interviewing observers, and consulting with system owners. Determining the extent of the compromise and either taking appropriate systems offline or attempting to otherwise block the attackers' access achieves containment.
Eradication At this point in the response effort, the handler determines the cause of the malicious incident, reinforces the system's defenses, and closes any vulnerabilities that might have allowed the attack to succeed.
Recovery This phase is devoted to restoring and validating affected systems, returning to business as normal, and continuing to closely monitor systems that were compromised. At this point, the organization needs to decide whether it is ready to resume operations.
Follow up In this phase, the incident handler creates a report that consolidates the team's experiences that relate to the incident. This "lessons learned" process helps improve the organization's defenses by addressing factors that allowed the compromise to occur. At this stage, the team implements the follow-up actions management approved.

As you can see, responding to alerts, whether they relate to a system fault or a malicious event, is no easy matter. As we discuss in the following section, you might be able to automate responses to some of the simpler events that require immediate action and that can be addressed without directly involving the administrator.

Automating Event Responses

Automating responses to events that are unlikely to be false alarms helps to expedite problem resolution. For instance, a monitoring system might detect that a critical process on the observed host died, issue an alert to the administrator, and automatically start a new instance of the process. When configuring such functionality, you might want to set limits on the number of times the system attempts to take corrective action in a given time period. If the process repeatedly dies soon after being restarted, chances are good that the automated recovery mechanism cannot help in this situation, in which case an administrator should become directly involved. Even if the fault was automatically resolved, the administrator should still follow up to assess the scope of the problem, determine its cause, verify that the corrective action was acceptable, and attempt to prevent the fault from reoccurring.

Another type of automated response can take place when an intrusion detection system or an intrusion prevention system detects a malicious event. Such products may allow you to automatically respond to the attackfor instance, by resetting the offending network stream or dynamically reconfiguring the firewall to block the attack. As we discussed in Chapter 8, "Network Intrusion Detection," and Chapter 11, "Intrusion Prevention Systems," such automated response carries the advantage of shunning the attacker as soon as malicious actions are observed, but it is often dangerous because it might deny service to a legitimate user. When deciding whether to enable such intrusion prevention functionality, weigh the risk of mistakenly blocking an authorized user against the risk of not blocking a particular attack right away.

In our discussion about maintaining a security perimeter, so far we have looked at monitoring the infrastructure for faults and malicious events and discussed how to efficiently respond to alarming conditions. An effective way of decreasing the rate at which anomalies occur in the first place is to ensure that perimeter components are updated in a controlled manner. We examine the process of managing changes to the environment in the following section.