4.2 Incident Handling Process Overview

The term incident refers to an adverse event in an information system or network or the threat of the occurrence of such an event. Examples of incidents include unauthorized access, unauthorized use of system privileges, and execution of malicious code that destroys data. Other adverse events include floods, fires, electrical outages, and excessive heat that cause system crashes. Adverse events such as natural disasters and power- related disruptions are not within the scope of this book.

4.2.1 Incident Handling

Incident handling includes the process of establishing an incident response function in the organization. A security manager must consider several key issues when establishing an Incident Response Group. What goals does the group need to accomplish? What should this team be relied on to do in a consistent and professional manner? Who are they intended to provide this service to (i.e., what is the Incident Handling Group 's constituency)? It is important to understand the constituency because what is provided for one audience may be inadequate for another. For example, if your constituency is a distributed data center operation, their incident response needs will be quite different from those of a retail Web site.

Once the constituency is known, the next step is to begin determining what the structure of the Incident Response Group will look like. Should it be a centralized organization or a decentralized, distributed organization? This decision greatly affects the staffing and funding requirements. Once you have determined the structure that is best suited to the needs of a constituency, your organization's management team must support the decision and agree to the funding requirements. As you begin to set up the operation, a centralized mechanism needs to be put in place for the constituency to report incidents or potential incidents. A team must be assembled to respond to those incidents, and the team should operate from a high-level "guidebook" or charter. Creating a charter for the team will get everyone on the team working toward achieving the same goals. How they go about achieving those goals is defined by processes and procedures, which are usually put in place by creating an Incident Response Group Operations Handbook. This handbook is considered the starting point for handling all incidents, and team members must be instructed to update it and make it a living document as business conditions change. Finally, when an incident is reported , investigated, and resolved, a management reporting function must be in place to let management understand what happened and the impact the incident had on the organization.

4.2.2 Types of Incidents

The term incident encompasses the following general categories of adverse events [2]:

Malicious code attacks . Malicious code attacks include attacks by programs such as viruses, Trojan horses, worms, and scripts used by crackers/hackers to gain privileges, capture passwords, and/or modify audit logs to exclude unauthorized activity. Malicious code is particularly troublesome in that it is typically written to masquerade its presence and, thus, is often difficult to detect. Self-replicating malicious code such as viruses and worms can furthermore replicate rapidly , thereby making containment an especially difficult problem.
Unauthorized access . Unauthorized access encompasses a range of incidents from improperly logging into a user 's account (e.g., when a hacker logs into a legitimate user's account) to unauthorized access to files and directories stored on a system or storage media by obtaining superuser privileges. Unauthorized access could also entail access to network data by planting an unauthorized "sniffer" program or device to capture all packets traversing the network at a particular point.
Unauthorized utilization of services . It is not absolutely necessary to access another user's account to perpetrate an attack on a system or network. An intruder can access information, plant Trojan horse programs, and so forth by misusing available services. Examples include using the network file system (NFS) to mount the file system of a remote server machine, the virtual memory system (VMS) file access listener to transfer files without authorization, or interdomain access mechanisms in Windows NT to access files and directories in another organization's domain.
Disruption of service . Users rely on services provided by network and computing services. Perpetrators and malicious code can disrupt these services in many ways, including erasing a critical program, "mail spamming " (flooding a user account with electronic mail), and altering system functionality by installing a Trojan horse program.
Misuse . Misuse occurs when someone uses a computing system for other than official purposes, such as when a legitimate user uses an organizational computer to store personal tax records.
Espionage . Espionage is stealing information to subvert the interests of a corporation or government. Many of the cases of unauthorized access to U.S. military systems during Operation Desert Storm and Operation Desert Shield were the manifestation of espionage activity against the U.S. government. This type of espionage activity can also occur in corporate settings where activities are focused on illegally obtaining competitive data.
Hoaxes . Hoaxes occur when false information about incidents or vulnerabilities is spread. In early 1995, for example, several users with Internet access distributed information about a virus called the Good Times Virus, even though the virus did not exist.

Note that these categories of incidents are not necessarily mutually exclusive. A saboteur from a remote country could, for example, obtain unauthorized access to information systems for the purpose of espionage.

4.2.3 Incident Handling Process Planning

A primary objective of the entire security planning process is preventing security incidents from occurring. Incident Handling Process Planning (IHPP) can help achieve this objective. IHPP can be accomplished in a five-step process:

Step 1.	Identify measures to help prevent incidents from occurring, such as use of antivirus software, firewalls, instituting patch and upgrade policies, and so on.
Step 2.	Define measures that will detect an incident when it occurs, such as intrusion detection monitoring systems (IDS), firewalls, router tables, and antivirus software.
Step 3.	Establish procedures to report and communicate the occurrence of an incident. These procedures should notify all affected parties when an incident is detected , including parties internal and external to the affected organization.
Step 4.	Define processes used to respond to a detected incident. In order to minimize damage, isolate the problem, resolve it, and restore the affected system(s) to normal operation. Create a Computer Security Incident Response Team (CSIRT, see following discussion), and train that team to be responsible for incident response actions.
Step 5.	Develop procedures for conducting a postmortem. During this postmortem, identify and implement lessons learned regarding the incident in order to prevent future occurrences.

4.2.4 Forming a Computer Security Incident Response Team

A CSIRT, sometimes shortened to Computer Incident Response Team (CIRT), is a group of security professionals within an organization who are trained and tasked to respond to a security incident. The CSIRT is trained in investigative procedure and forensics. The team should include security management personnel who are empowered to take specific actions when an incident occurs. The CSIRT technical personnel are armed with the knowledge and expertise to rapidly diagnose and resolve problems that result when an incident occurs. The team should have communications personnel who are tasked to keep the appropriate individuals and organizations properly informed regarding the status of the incident and develop public relations and crisis management strategies as appropriate.

The composition of the CSIRT, and the circumstances under which it is activated, must be clearly defined in advance as part of the IHPP. The response team should be available and on call at all times in order to respond to emergency situations. The CSIRT management personnel must possess the authority to make decisions in real time. Procedures that define the circumstances under which the CSIRT is activated must be clear and unambiguous. Activation for every minor incident (e.g., an employee's data entry error) can be costly and time-consuming . Conversely, if a serious incident (such as an intrusion attack) occurs, the delay in activation of the CSIRT would likely result in a situation where even more damage to the organization would take place. Activation of the CSIRT should happen when information systems must be protected against serious compromise (e.g., an unexpected situation that necessitates immediate reaction in order to prevent the loss of organizational assets or capability). The individual who is authorized to activate the CSIRT should be clearly identified.

The planning process should consider which CSIRT team members will be needed for types of incidents and how they are to be contacted when an emergency occurs. Finally, the members of the CSIRT need to be properly trained to handle their duties rapidly and effectively. Training should include both the procedures to be followed in responding to a serious security incident and the specific technical skills that individual team members must possess in order to perform their assigned tasks correctly. Periodic "breach exercise" or simulations of security incidents, should be conducted to maintain the team's effectiveness. Planning for incident response should include training the CSIRT in procedures to collect and protect all relevant information, to contain the incident and correct the problem leading to the incident, and to return the system to normal operation.

4.2.5 Train to Collect and Protect Incident Information

All information regarding an information system security incident should be captured and securely stored. This may include system and network log files, network message traffic, user files, intrusion detection tool results, analysis results, system administrator logs and notes, backup tapes, and so on. In particular, if the incident leads to a prosecution , such as for an intruder/hacker, a disgruntled employee, or a thief , it is necessary to have complete, thorough, and convincing evidence that has been protected through a verifiable and secure chain-of-custody procedure. In order to achieve this level of information protection and accountability, it is necessary that the following tasks be accomplished:

All evidence is accounted for at all times (a.k.a., use of evidentiary procedures).
The passage of evidence from one party to the next is fully documented.
The passage of evidence from one location to the next is fully documented.
All critical information is duplicated and preserved both on and off site in a secure location.

4.2.6 Incident Identification

If an organization is not properly prepared to detect signs that an incident has occurred, is occurring, or is about to occur, it may be difficult or impossible to determine later if the organization's information systems were compromised. Failure to identify the occurrence of an incident in a timely manner can leave the organization vulnerable in several ways:

Damage to systems can occur because of an inability to react in time to prevent the spread of the incident.
Negative exposure can damage the organization's reputation and stature.
Possible legal liability can result for failing to exercise adequate care if the organization's information systems are used to inadvertently or intentionally attack others.
Damage can result from not taking timely action to contain and control an incident, resulting in loss of productivity, increased costs, and so on.

System and Network Logging Functions

Collecting data generated by system, network, application, and user activities is essential for analyzing the security of organizational assets. Data collection is critical for incident detection. Log files contain information about what activities have occurred over time on the system. These files are often the only record of suspicious behavior, and they may be used not only to detect an incident but also to help with system recovery and investigation. Log files can serve as evidence and may be used to substantiate insurance claims. Incident detection planning should include a process for identifying the types of logs and logging mechanisms available for each system asset and the data that is recorded in each log. If existing logging mechanisms are inadequate to capture the required information, they should be supplemented with other tools that are specifically designed to capture additional information. Logging functions should always be enabled.

Detection Tools

It is important to supplement system and network logs with additional tools that watch for signs that an incident has occurred or has been attempted. These include tools that monitor and inspect system resource usage, network traffic, network connections, user accounts, and file access; virus scanners; tools that verify file and data integrity; vulnerability scanners ; and tools to process log files. Examples of detection tools include the following:

Tools that report system events, such as password cracking, or the execution of unauthorized programs
Tools that report network events, such as access during nonbusiness hours or the use of Internet Relay Chat (IRC), a common means of communication used by intruders
Tools that report user-related events, such as repeated login attempts, or unauthorized attempts to access restricted information
Tools that verify data, file, and software integrity, including unexpected changes to the protections of files, or improperly set access control lists on system tools
Tools that examine systems in detail on a periodic basis, to check log file consistency or known vulnerabilities

Detection Techniques

Incident detection is based on three simple steps: (1) observe and monitor information systems for signs of unusual activity; (2) investigate anything that appears to be unusual; and (3) if something is found that cannot be explained by authorized activity, immediately initiate predetermined incident response procedures.

Recommended Detection Practices

When looking for signs of an incident, administrators should ensure that the software used to examine systems has not been compromised. Additional steps to take in the detection process include looking for any unexpected modifications that have been made to system directories or files, inspecting logs, reviewing alert notifications from monitoring mechanisms, inspecting triggers that occur for unexpected behavior, investigating unauthorized hardware attached to the network, looking for signs of unauthorized access to physical resources, and reviewing reports submitted by users or external contacts about suspicious system behavior.

4.2.7 Incident Containment

Containment consists of immediate, short-term, tactical actions designed to remove access to compromised systems. Containment can help limit the extent of damage that occurs and prevent additional damage from occurring. The specific steps to be followed in a containment process often depend on the type of incident (e.g., intrusion, virus, theft) and whether the incident is ongoing (e.g., an intrusion) or is over (e.g., a theft of equipment). Considerations in planning for containment include the following:

Defining an acceptable level of risk to business processes and the systems and networks that support them, and to what extent these processes, systems, and networks must remain operational, even during a major security incident
Identifying methods for performing a rapid assessment of the situation as it currently exists (e.g., scope, impact, damage)
Determining whether to quickly inform users that an incident has occurred, or is occurring, that could affect their ability to continue work
Identifying the extent to which containment actions might destroy or mask information required later to assess the cause of the incident
If the incident is ongoing, identifying the extent to which containment actions might alert the perpetrator (e.g., an intruder, thief, or other individual with malicious intent)
Identifying when to involve senior management in containment decisions, especially when containment includes shutting systems down or disconnecting them from a network
Identifying who has the authority to make decisions in situations not covered by existing containment policy

Containment strategies include temporarily shutting down a system, disconnecting it from a network, disabling system services, changing passwords, disabling accounts, changing physical access mechanisms, and so on. Specific strategies should be developed for serious incidents, such as the following:

Denial of service caused by e-mail "spamming" (sending a large volume of electronic messages to a targeted recipient) or "flooding" (filling a channel with garbage, thereby denying others the ability to communicate across it)
Programmed threats, such as new viruses not yet detected and eliminated by antivirus software, or malicious applets, such as those using ActiveX or Java
Scanning, probing, or mapping attempts made by intruders to conduct system hacking attempts
Major password compromises (e.g., an intruder with a password sniffer tool), requiring the need to change all user or account passwords at a specific site or at a specific organizational level

In general, the containment objective should be to provide a reasonable security solution until sufficient information has been gathered to take more appropriate actions to address the vulnerabilities exploited during the incident.

4.2.8 Incident Eradication

Removal or eradication of the root cause of a security incident often requires a great deal of analysis, followed by specific corrective actions, such as the improvement of detection mechanisms, changes in reporting procedures, installation of enhanced protection mechanisms (such as firewalls), implementation of more sophisticated physical access controls, development of methods that improve user community awareness and provide training on what to do when an incident occurs, or specific changes to security policy and procedures to prevent recurrence of an incident.

4.2.9 Incident Recovery

Restoring a compromised information system to normal operation should be accomplished when the root cause of the incident has been corrected. This prevents the same or a similar type of incident from occurring again and helps ensure that a recurring incident will be detected in a more timely fashion; however, business reality may require that the system be restored to operation before a full analysis can be conducted and all corrections are made. Such a risk needs to be carefully managed and monitored , recognizing that the system remains vulnerable to another occurrence of the same type of incident. Thus, an important part of the IHPP is determining the requirements and time frame for returning specific information systems to normal operation. The determination to return a system to normal operation before fully resolving the root problem should require the involvement of senior management. System restoration steps may include the following:

Using the latest trusted backup to restore user data . Users should review all restored data files to ensure that they have not been affected by the incident.
Enabling system and application services . Only those services actually required by system users should be enabled initially.
Reconnecting the restored system to its local area network . Validate the system by executing a known series of tests, where prior test results are available for comparison.
Being alert for problem recurrence . A recurrence of a viral or intrusion attack is a real possibility. Once a system has been compromised, especially by an intruder, the system will likely become a target for future attacks.

4.2.10 Incident Review and Prevention

It is extremely important to learn from the successful and unsuccessful actions taken in response to security incidents. Capturing and disseminating what did and did not work well will help reduce the possibility of similar incidents. This helps an organization improve its overall information system security posture . If an organization fails to heed lessons learned, its systems and applications will continue to operate at risk, and it will likely fall victim to the same or a similar type of incident again. Establishing a lessons learned capability includes four steps:

Postmortem analysis
Lessons learned implementation
Risk assessment
Reporting and communication

Postmortem Analysis

A postmortem analysis and review meeting should be held within three to five days of completion of the incident investigation. Waiting too long could result in people forgetting critical information. Questions to be asked include the following:

Did detection and response procedures work as intended? If not, why not?
Could any additional procedures have been taken that would have improved the ability to detect the incident?
What improvements to existing procedures and/or tools would have aided in the response process?
What improvements would have enhanced the ability to contain the incident?
What correction procedures would improve the effectiveness of the recovery process?
What updates to policies and procedures would have allowed the response and/or recovery processes to operate more smoothly?
How could user and/or system administrator preparedness be improved?
How could communication throughout the detection and response processes be improved?

The results of these and similar questions should be incorporated into a postmortem report for senior management review and comment.

Lessons Learned Implementation

When it is applicable , new and improved methods resulting from lessons learned should be included within current security plans, policies, and procedures. In addition, public, legal, and vendor information sources should be periodically reviewed regarding intruder trends, new virus strains, new attack scenarios, and new tools that could improve the effectiveness of response processes.

Risk Assessment

An information security risk assessment is used to determine the value of information assets that exist in an organization, the scope of vulnerabilities of information systems, and the importance of the overall risk to the organization. Without knowing the current state of risk to an organization's information systems, it is impossible to effectively implement a proper security program to protect organizational assets. This is achieved by following the risk management approach. Once the risk has been identified and quantified , you can select cost-effective countermeasures to mitigate that risk. The goals of an information security risk assessment are as follows :

To determine the value of the information assets
To determine the threats to those assets
To determine the existing vulnerabilities inherent in the organization
To identify risks that expose information assets
To recommend changes to current practice to reduce the risks to an acceptable level
To provide a foundation on which to build an appropriate security plan

If the severity or impact of the incident is severe, a new risk assessment for the affected information system should be considered. Part of this risk assessment process should include deriving a financial cost associated with an incident, which will not only help those who may be prosecuting any suspected perpetrators, but will also help your organization justify its expenditures for security time and resources.

Reporting and Communication

Designated organization personnel, as well as personnel outside of the organization, cannot execute their responsibilities if they are not notified in a timely manner that an incident is occurring or has occurred, and if they are not kept informed as the incident progresses. In addition, there are types of incidents wherein the public communications aspects, if mishandled, could result in serious negative publicity or loss of reputation for the organization. Hence, it is important that incident reporting and information dissemination procedures be established and periodically reinforced, so that all personnel are aware of how they are to participate when an incident occurs. Incident handling planning should specify who should be notified in the event of an intrusion, who does the notifying of whom, and in what order. The order of notification may depend on the type of incident or on other circumstance. Parties to be notified include the following:

The Information Systems Security Officer (ISSO) or the CSIRT, if one exists
Public Relations
System and network administrators
Responsible senior management
Human Resources
Legal counsel and law enforcement groups
System/network users

4.2.11 Countering Cyberattacks

Computer hacking has recently become a preferred weapon for individuals who campaign for particular causes or protest against particular activities. These people are known as "hactivists," and their goal is to inflict as much damage as possible and garner publicity for their actions. Hactivists initiate various types of attacks that range from defacement of Web sites to data manipulation. They try to cover their tracks (at least the skilled ones do) in such activities so that the consequences of their actions can go unnoticed for months or even years . These attacks have a devastating impact on businesses, ranging from embarrassment to loss of reputation to liability for compromise of data. In the early 1990s, the problem started to trend upward significantly, and as a result, businesses and government began to seek solutions to this problem.

In an effort to mitigate the adverse effects of a significant cyberattack, former U.S. President Bill Clinton created the President's Commission on Critical Infrastructure Protection. This group was tasked to study the impact of a cyberattack on the critical infrastructure of our country. In October 1997, the commission's findings were made public. The report identified eight critical infrastructure areas: telecommunications, electrical power, oil and gas distribution and storage, banking and finance, water supply, transportation, emergency services, and government services. Each of these infrastructures were potential cyberattack targets. As a result of the information disclosed in this report, President Clinton created the National Infrastructure Protection Center (NIPC), the Critical Infrastructure Assurance Office (CIAO), the National Infrastructure Assurance Council (NIAC), and the private-sector Information Sharing and Assessment Centers (ISACs). These organizations were created to begin preparation and strategic planning to counter cyberattacks specifically targeting the U.S. critical infrastructure.

4.2.12 Real-World Cyberwar Example

On July 12, 2001, the CodeRed worm began to infect hosts running unpatched versions of Microsoft's IIS Web server. This worm used a static seed for its random number generator. At around 10:00 Greenwich mean time on July 19, 2001, a random seed variant of the CodeRed worm (CRv2) appeared and spread. This second version shared almost all of its code with the first version, but spread much more rapidly. Later, on August 4, a new worm began to infect machines exploiting the same vulnerability in Microsoft's IIS Web server as the original CodeRed virus. Although the new worm shared almost no code with the two versions of the original worm, it contained in its source code the string "CodeRedII" and was thus named CodeRedII.

When a worm infects a new host, it first determines if the system has already been infected. If not, the worm initiates its propagation mechanism, sets up a "backdoor" into the infected machine, becomes dormant for a day, and then reboots the machine. Unlike CodeRed, CodeRedII is not memory resident, so rebooting an infected machine does not eliminate CodeRedII. This particular worm infected more than 359,000 computers in less than 14 hours. At the peak of the incident, more than 2,000 new hosts were infected every minute.

The CodeRedII worm was considerably more dangerous than the original CodeRed worm because CodeRedII installs a procedure that creates a means for obtaining remote, root-level access to the infected machine. Unlike CodeRed, CodeRedII does not deface Web pages on infected machines, and it does not launch a denial-of-service attack; however, the backdoor installed on the machine allows any code to be executed, so the machines could be used as zombies for future attacks (DoS or DDoS).

The trends being reported by industry experts indicate that organizations relying on the Internet to conduct business activities will continue to face significant challenges in protecting their infrastructures from cyberattack. With so many new variants of worms proliferating on the Internet, it has become a very dangerous place to conduct commerce. For the cyber-novices who hook up personal computers from home, many of these systems are routinely infected and even serve as launch points for cyberterroristic activities. The sad part of this is that the owner of the computer rarely even knows it is happening. For the uninitiated, the effects of CodeRed or Nimda or a myriad other threats are devastating. The @#$! computer is blamed for locking up, failing, crashing, or otherwise coming to a grinding halt because of an infection. That is the reality of our cyber centric environment of today.