A Six-Stage Methodology for Incident Response

‚ < ‚ Free Open Study ‚ > ‚

Now that the reasons for following an incident response methodology are clear, it is time to become acquainted with the methodology advocated in this chapter. The particular methodology presented here is by no means the only one that has ever been invented, but it is certainly the oldest ^[1] and most time-honored methodology in the incident response arena. It consists of six stages: preparation, detection, containment, eradication, recovery, and follow-up. (The acronym PDFCERF embodies the first letters of all six stages; see Figure 3.1.) The next sections cover each of these stages in detail.

^[1] The six-step methodology presented in this part of the book was created at the Invitational Workshop on Incident Response at the Software Engineering Institute in Pittsburgh, Pennsylvania, in July of 1989 by approximately one dozen workshop participants , the first author of this book included.

Figure 3.1. The PDCERF incident response methodology.

Preparation

The first stage is preparation, which means being ready to respond before an incident actually occurs. This stage is extremely important because so many of today's incidents are so complex and time consuming that preparation is a necessity, not a luxury. Here are the basic notions behind preparation:

Setting up a reasonable set of defenses/controls based on the threat that presents itself
Creating a set of procedures to deal with incidents as efficiently as possible
Obtaining the resources and personnel necessary to deal with the problem
Establishing an infrastructure to support incident response activity

We will now examine all four of these considerations in more detail.

Setting up Defenses/Controls

Setting up appropriate defenses/controls is one of the most important steps in establishing an effective incident response capability. Having wide-open systems that are completely vulnerable to attack but having a strong incident response capability is, to put it bluntly, downright stupid. On the other hand, having strong controls without any incident response capability is naive. It is thus important to achieve a balance between these two extremes. The trick is to allocate sufficient resources to achieve at least a baseline of security in systems, network devices, applications, databases, and so forth, so that incidents (particularly in areas in which risk is very high) are not likely to become commonplace. An appropriate part of the resources, however, also needs to be devoted to the operational side of security ‚ to what we have previously called the " vigilance " function ‚ in case the defenses that are in place are breached.

There is another side to setting up defenses and controls. Too often, people forget to ensure that the systems and applications used in handling incidents are themselves resistant to attack.An attacker could access the intrusion-detection tool you use, for example, and alter its parameters, rendering it useless. The potential consequences in terms of damage, destruction, and corruption, as well as the potential legal impact, ^[2] are frightening. You and others with whom you work on incident response efforts might, for example, have downloaded instances of what you believe to be malicious code into a particular server belonging to your organization. Now, just before you start to analyze this code, suddenly everything you have downloaded disappears and the system logs are erased.You now have nothing to analyze. The point here is that you also have to secure the systems and applications that are going to be used in dealing with incidents.

^[2] Chapter 7,"Legal Issues," covers this issue in much greater detail.

Procedures

Chapter 5,"Organizing for Incident Response," covers incident response procedures in detail. Suffice it to say here that procedures should cover the following at a minimum:

Specific steps to be taken by those involved in incident response and under what circumstances
Whom should be contacted and under what circumstances
Types of information that can and cannot be shared outside of your immediate organization
Priorities in response activity
Division of labor ‚ what roles will be assigned to each of the people who participate in an incident response effort
Acceptable risk limits and what kinds of activities, events, communications with others, and so forth must be documented (and how)

Obtaining Resources and Personnel

It goes without saying that resources and personnel are necessary if any incident response effort is to be successful. Resources are needed to pay the cost of labor, but resources are also needed for hardware, software, and training. An incident response effort almost invariably requires dedicated hardware platforms that can be used for purposes such as analysis and forensics. Hardware, such as personal digital assistants (PDAs), dictaphones, and locking combination safes or vaults, is also likely to be necessary. Sharing these sensitive platforms with other organizations is, for all practical purposes, out of the question.

Software such as intrusion detection software, reverse engineering tools (used to determine how an executable works when the source code is not available), forensics analysis software, and database server software is often also needed. Also, whoever is part of an incident response effort will need periodic training to ensure that each person has more than enough knowledge and skills to bring to each new situation. Forensics training in particular is becoming increasingly necessary in the world of incident response. The manager in charge of an incident response effort must ensure that sufficient attention is paid to obtaining the proper level of resources and personnel.

Building an Infrastructure to Support Incident Response

Ultimately, the process of responding to incidents works best if an infrastructure within an organization is established to support this process. An infrastructure provides a uniform, coherent way of organizing each element of the incident response process. In short, the business of incident response dictates that an organization should make incident response part of its overall business. To do this requires buy-in from senior management, which must ensure the following:

Suitable management oversight has been devised and put in place, including establishing lines of accountability, defining roles and delegation of authority, creating processes for evaluating the effectiveness of the incident response effort, and so forth.
Appropriate defenses/controls are chosen and implemented in systems, network devices, applications, databases, and so forth.
A set of procedures for incident handling is written, well distributed, and followed.
Appropriate tasks are assigned to each person in each incident response effort.
Resources are available to ensure that necessary hardware and software tools and technical personnel are available. Lack of funding is, all things considered , the biggest obstacle that an incident response capability is likely to face. The best solution is to prove (in terms of dollar figures) just how much this capability is being used and (again, in terms of dollar figures ^[3] ) how much money it has saved.

^[3] The latter is potentially extremely helpful in obtaining and maintaining funding, but unfortunately , it is often also very difficult to calculate meaningfully.
Contact lists (for staff involved in incident response, cognizant managers, law enforcement, points of contact for other response teams , and so forth) are created and updated as needed (see the next sidebar).
Any evidence gathered during the course of an incident is adequately preserved.
Legal considerations are being adequately addressed.

The Importance of Contact Lists

Little things that are too easily overlooked can make the difference between success and failure in an incident response effort. Contact lists are a good example. A new incident response team was being put in place a number of years ago, when suddenly a massive worm attack was discovered during the early part of an evening. Management tried to call team members in to work, only to discover that some of the key team members had unlisted phone numbers or their home numbers had changed. Consequently, by the next morning, only a small portion of the team had been assembled .

Fortunately, this team learned from this experience. The solution was to create a laminated card with proper contact information ‚ including work phone, home phone, mobile phone, pager number, and other contact information ‚ for each primary and secondary team member. This card was the same size as the employee badge and could be hung on a badge holder; when each team member picked up the badge to go to work, he or she now also had the contact information. The team manager also ensured that contact information was reviewed regularly so that any changes could be quickly incorporated.

Much of the burden of preparation actually falls on system administrators.

Implementing system security measures, for example, is the responsibility of system administrators, who need to do the following:

Ensure that the password policy is implemented through password filters (that reject weak passwords when users try to enter them) and/or password cracking tools ^[4]

^[4] npasswd , a free UNIX password filter, is available at ftp://cerias.purdue.edu/tools/unix/ or http://www.utexas.edu/cc/unix/software/npasswd, and most flavors of Linux offer password filtering through PAM, the built-in Pluggable Authentication Module. StrongPass, a free Windows NT password filter, is available at www.ntsecurity.nu. John the Ripper (available from http://packetstormsecurity.org) is a free UNIX and Linux password cracker (although John the Ripper will not work against Linux systems if MD-5 is used for password hashing in lieu of DES encryption). l0phtcrack (available from www.atstake.com) is the most popular password cracker for Windows NT and lc3 will crackWindows 2000 systems.
Ensure that dormant and default accounts are removed or disabled
Install and maintain the appropriate security tools (for example, intrusion-detection tools, forensics tools, a secure email program, and others)
Run and regularly examine system logging/auditing
Install patches and fixes (after their integrity and functionality have been tested )
Check system files for integrity
Back up each system as needed (including during incidents)
Investigate suspicious occurrences (no one knows what is "normal" and "abnormal" in any system better than the system administrator)

Detection

The second stage in the PDCERA methodology is detection. This section discusses the many considerations related to this stage.

About Detection

As far as incident response goes, detection and intrusion detection are not synonymous. Detection means determining whether malicious code is present, files or directories have been altered , or other symptoms of an incident are present and, if they are, what the problem as well as its magnitude is. Intrusion detection, in its most typical connotation, means determining whether unauthorized access to a system has transpired and (in a more complete definition) whether misuse ^[5] has occurred.A virus infection can be found using detection but not intrusion-detection software, for instance. Detection thus embraces a potentially much wider range of incidents than does intrusion detection.

^[5] Misuse can be defined as a violation of security policy. See Tuglular,T., and Spafford, E.H. "A framework for characterization of insider computer misuse." Unpublished paper, Purdue University, 1997; a white paper available at http://cerias.purdue.edu.

From an operational standpoint, all actions that transpire as part of the incident response process depend on detection. To be blunt, without detection, there is no meaningful incident response; detection triggers incident response. This elevates the relative importance of detection among the other five stages considerably.

Intrusion-Detection Systems (IDSs)

Many books have now been written about IDSs. For this reason, as well as the fact that the types of incidents discussed in this book are far broader than break-ins, this book does not cover IDSs in much detail. A quick summary, however, of the most important considerations will be helpful in understanding the second stage of the PDCERF methodology better.

Two basic types of IDSs, host-based and network-based IDSs, exist. Host-based IDSs must be installed on every system on which intrusion-detection capability is desired. Although much better suited to picking up attacks such as insider attacks, host-based IDSs can be quite expensive and can tie up system performance substantially. The alternative is network-based IDSs, which gather data from sensors and systems and process this data on a central host. Network-based IDSs tend to be less costly and do not affect system performance appreciably, but they generally are not as good as host-based IDSs in detecting attacks on systems. They also can be defeated by setting up encrypted links from victim machines to attacking machines and are more subject to denial-of-service attacks. The debate concerning host- or network-based IDSs will continue to rage, but a growing number of experts in the field of intrusion detection are simply advocating the use of both.

Most current IDSs base their recognition capability on signatures, characteristic patterns of attack. In Windows NT, for example, sending

 net use \<IP address>\IPC$

to a Windows NT host is very suspicious in that the sender is setting up a null session on behalf of the anonymous user . Although signatures constitute a very intuitive approach to intrusion detection, critics argue that signatures are always post-hoc and are thus not capable of detecting new attacks when they first surface. A few IDSs also analyze the nature of protocol connections to determine whether or not they are normal.

Detection Software

Given the sophistication of so many of today's attacks, detection software (such as virus detection software, IDS software, integrity-checking software, and so on) might, for all practical purposes, be necessary if an incident response effort is to be successful. For example, widely available vendor software can rapidly detect viruses in desktop systems and mail servers. This software usually can also detect worm infections and whether backdoor trojan horse programs (such as NetBus, Back Orifice 2000, and SubSeven) have been covertly installed in Windows systems. But buying and installing this type of software and then doing nothing more has few benefits; the software must also be regularly updated to include the latest signatures. It is also important to systematically assess the applications you use. An application (such as intrusion-detection software) that was effective three years ago might now be ineffective . Incorporating provisions for frequent updates of virus-protection software and regular evaluations of all applications you use in handling incidents as part of the incident response effort is thus another essential consideration.

Some kinds of incidents do not require detection software, however, because certain symptoms of incidents are rather obvious. Sources of information such as system and firewall logs might, in these cases, be sufficient. Some of the more obvious symptoms are as follows :

Failed login attempts. These are one of the most obvious symptoms that an attack has occurred.
Logins into dormant and default accounts. Dormant and default accounts (as well as orphan accounts, accounts that are not maintained ) should be viewed suspiciously, especially if the system administrator knows that the user of any of these accounts has not been able to log in for some time.
Activity during nonworking hours. In and of itself, activity such as logins or connections to services during off hours does not prove that an intrusion has occurred, but the fact that attackers prefer to gain unauthorized access at times when system administrators and users are least likely to notice their presence is well established.
Presence of new accounts not created by the system administrator. One of the best examples was when an attacker from the Netherlands broke into systems, escalated privileges, and then created a new superuser account named "rgb" on each compromised system several years ago.
Unfamiliar files or programs. Often these files or programs are back-door programs, and they are given innocuous names such as /tmp/bob, /etc/inet.d/ bootd , or even "..". ^[6]

^[6] In UNIX and Linux systems, entering the ^ls command produces a file and directory listing that displays " " first and then " ." Many, if not most, system administrators and users are not likely to notice the inclusion of a second, innocuous entry such as " ." (which in reality is dot-dot-space or space-dot-dot).
Unexplained changes in file and directory permissions. File and directory permissions could be changed to give an intruder back-door access, or a program that an attacker has run could also have changed these permissions.
Unexplained elevation or use of privileges. As mentioned previously, one of the goals of many attackers is to gain superuser privileges.
An altered home page or other page(s) on a Web server. This is one of the most obvious signs of an attack on a Web server.
Presence of pornographic images on a system.
Use of commands or functions not normally associated with a user's job. A good real-life example is when logs in a UNIX system indicated that an administrative assistant's account, used only to read email until then, had been used to compile programs.
Presence of cracking utilities. When cracking utilities are found on a system, this usually means either that an attacker has planted them or that a legitimate user has downloaded them (knowingly or unknowingly).
Gaps in or erasure of system logs. This is one of the prime indications that a system has been compromised, given the prevalence of "rootkit" tools that masquerade the presence of the attacker.
Changes in DNS tables or router or firewall rules that cannot be accounted for.
Unusually slow system performance. Be careful about concluding that a security breach has occurred whenever system performance is sluggish , but problems such as the presence of rogue programs can cause unusually slow system performance.
System crashes. These could be due to deliberate DoS attacks, or an attacker might have broken into a system and performed some actions or executed some routines that did not work as expected, causing the system to crash.

Social engineering attempts. Reports of social engineering attempts usually mean that a concerted attempt to break the security of an organization's systems and/or networks is underway and that almost certainly one or more of these attempts has been successful. Figure 3.2 shows real-life indications from the UNIX process accounting log (usually in the /var/adm/pacct path in UNIX systems and /var/log/pacct in Linux systems) that an attack on a UNIX system is occurring. In this case, the system administrator was already on the system, but someone attempted to log in directly as root at the same time.

Figure 3.2. Process accounting data showing an attempted attack on a UNIX system.

Figure 3.3 shows accounting logs from aVMS system that indicate that a security-related incident has occurred. In this case, MAILER, which is the mail program, is running with All Privileges, the highest level of privileges (see the third entry in this log). Someone has almost certainly gained access to the MAILER account and then escalated privilege in this system.

Figure 3.3. Accounting entries showing an attack on aVMS system.

Figure 3.4 captures the Windows NT Event Log entries for a series of brute force attacks. One of the administrator accounts was targeted in these attacks. You can view the Security Log by going to Start, Programs, Administrative Tools, EventViewer. If the Security Log is not displayed right away, pull down the Log menu in the upper-left corner to Security Log. Note that Event Code 529 in Windows NT means an unsuccessful logon attempt. If the system administrator double-clicks on the highlighted entry in the Security Log, an Event Detail screen that provides some additional information is displayed (see Figure 3.5).

Figure 3.4. Security Log entries showing brute force attacks on a Windows NT system.

Figure 3.5. An Event Detail screen for the highlighted Security Log entry in Figure 3.4.

Figure 3.6 shows Windows 2000 Event Log entries for a several brute force attacks (Event Code 677 ^[7] ), followed by a successful logon to the Administrator account (Event Code 680), followed by a change in the password of another account (Event Code 577). You can view the Security Log by going to Start, Programs, Administrative Tools, Event Viewer. If the Security Log is not displayed right away, pull down the Log menu in the upper-left corner to Security Log.

^[7] For an explanation of event codes, see Schultz, E., Windows NT/2000 Network Security . Indianapolis: New Riders, 2000.

Figure 3.6. Security Log entries showing an attack on a Windows 2000 system.

It is important to realize that of all the suspicious indications that manifest themselves in real-life settings, a relatively small proportion will turn out to be security-related incidents. Being cautious about the meaning of a single suspicious symptom, such as multiple login failures, is thus a wise strategy. Multiple login failures, for example, are frequently caused by a user who has had to change a password but who cannot remember what the new password is. Alternatively, the user's Caps Lock key on the keyboard might have been accidentally pressed, causing the user to enter all capital letters during each logon attempt.

Suspicious events, such as multiple connection attempts to file systems, often end up being due to problems such as a misconfigured application. Remember, too, that many times,"obvious" symptoms of attacks (for example, repeated system crashes) are not good indicators, but small, multiple indicators (resulting from something such as perpetrators failing to completely cover their tracks) can be the best indicators. Clifford Stoll, in his now legendary book, Cuckoo's Egg , ^[8] points out that a 75-cent discrepancy between computed charges for computer usage by a single user led to an investigation that led to the identification of four Germans who were paid by another country's intelligence service to glean information from U.S. computers.

^[8] Stoll, C. The Cuckoo's Egg . New York: Doubleday, 1989.

Having a team of the best technical personnel you can find (and also that you can afford) is one of the best solutions to the challenge of identifying incidents, despite that fact that symptoms might be very non-obvious. Chapter 4,"Forming and Managing a Response Team," greatly expands on this theme.

Initial Actions and Reactions

If a security-related incident appears to have occurred, it is important to avoid panicking. It is also important to avoid causing panic in others. Granted, the incident could be very serious and potentially costly, but human error costs organizations far more than security-related incidents do. Gathering your wits and thinking carefully about your next course of action is the best strategy in almost every case.

The following are some of the actions that tend to have the highest payoff during the detection stage of dealing with incidents:

Taking the time to analyze all anomalies. As mentioned earlier, sometimes very small symptoms indicate that an incident is in progress, so analyzing every anomaly that can be found is a very good measure.
Enabling auditing (if it is not already enabled) or increasing the amount of audit information capture. If auditing on the possible victim system is not enabled, during the time you or others notice a possible incident would be a good time to enable it; if a small amount of auditing is currently being captured, increasing the amount of auditing would be wise.
Promptly obtaining a full backup of the system in which the incident has apparently occurred and gathering a copy of any compromised files/ bogus code for analysis. This is a "showstopper action item" because if you don't take this action right away, an attacker might be able to erase or corrupt evidence that can be used for analysis as well as for legal purposes later.
Starting to document everything that happens. It is also extremely critical for every person involved in the possible incident to write down virtually everything that is at least marginally relevant to an incident (including names, phone numbers, and email addresses of everyone with whom you communicate). You and the others will not immediately know what information will ultimately be important and useful and what will not, so recording everything is essential.

Case Study: Making A UNIX Backup for Incident Response Purposes

Making a full backup of the victim system right away when it appears that an incident has occurred is an important thing to do. In some operating systems, however, some types of full backups are better than others. UNIX (and also Linux), for example, supports three types of backups : tar (tape archive), dump (the primary backup program), and dd (device-to-device copy). Because dd reads the files that are input to it on a block-by-block basis, it can capture data (such as deleted blocks of data) in a backup that tar and dump cannot. dd is thus most suitable for forensics analysis.

To run tar, enter the following:

 tar <flags> <device> <file system to be dumped>  e.g., tar cvf /dev/rmt0 /home

To run the dump command, enter the following:

 dump -<flags> <device> <file system to be dumped>  e.g., dump f /dev/rmt0 /home (where f is the device name)

To run dd, enter the following:

 dd if=<input file> <output file>  dd if=/dev/hd01 of=/dev/rmt0

To find out more about these commands, check the man pages of your UNIX or Linux system. (Note that in most Linux systems, a dash (-) does not appear before any flag[s]).

Estimating the Scope of the Incident

After an incident has been detected , it is important to promptly determine the scope of the incident. This not only helps in determining what to do during the next stage, containment, it also can help management and technical staff assign a priority to handling this incident. Here are some of the considerations:

How many hosts have been compromised? The more that have been compromised, the wider the scope. Different intervention methods are generally required as the scope of the incident increases .
How many networks are involved? As in the case of systems, the more networks involved, the broader the scope and the greater the need for urgent action.
How far into the internal network did the attackers get? If they have attacked only a host or a host outside of the security perimeter, the implications are radically different than if the attackers have gotten well within the internal network.
What level of privileges did the attacker(s) gain? Unauthorized superuser privileges greatly escalate the scope of an incident.
What is at risk? How critical are compromised machines to an organization's business/operations? Are valuable applications and/or data highly at risk? The more critical the compromised machines and the more valuable the applications and data, the wider the scope of the incident.
How many avenues of attack are being used? If the attacker is using the Internet, dialing in through the public phone system, and using PBX routes, the scope is much wider than if the attacker is using only one attack avenue.
Who knows about the incident, and how or to what extent can that knowledge make the damage from the incident worse ? Customers gaining knowledge about an attack might have serious business repercussions , for example.
How widespread is the vulnerability that the attacker(s) exploited? Are many machines vulnerable to the same kind of attack to which other systems have succumbed?

The Reporting Process

Notification of the appropriate authorities when an incident is first identified is also a crucial component of the detection phase. Incident response efforts in which critical information does not get to those who need it are seldom successful because most incidents are not limited to a single host within one network. Incidents are truly international in origin nowadays. Timely reporting to staff and organizations that need to know about incidents to be able to fulfill their role in the overall incident response infrastructure is thus imperative.

Unfortunately, the reality is that people do not distribute information to those who need it nearly as much as they should. Creating and enforcing provisions for mandatory reporting in an organization's information security policy is the solution if incident reporting is to occur. Policy provisions should include the following at a minimum:

Types of information to be reported
To whom this information must be reported
How quickly it must be reported
The type of method (for example, secure email, hard copy, and so forth) to be used
The consequences of violating the policy provisions

Requiring that the Chief Information Security Officer (CISO) be notified is, in most cases, also a minimal requirement. Other notifications that might be required include the following:

Personnel (in situations such as insider attack or misuse)
Public affairs (because of the possibility of adverse publicity due to security-related incidents)
A incident response team (such as AFCERT in the U.S. Air Force ^[9] )

^[9] See Schultz, E.E., et. al. "What Do People Really Need to Know about Computer Security Incidents?" In Proceedings of 16 ^th Department of Energy Computer Security Group Training Conference , Denver, Colorado, 1994.
A government agency's chief of information security and/or the head of classified computing for the agency
The legal department (Chapter 7 will greatly expand on this and other considerations)

What Type of Information about Incidents Needs to Be Reported?

When incidents occur, the need to communicate information from one person or organization to another almost invariably presents itself. What kind of information is most needed by others? Many variables affect the answer to this question ‚ more data would, for example, be appropriate if a massive set of intrusions occurred than if a single system were infected by a virus. Nevertheless, the following guidelines for reporting (derived from an empirical study on this issue) may be useful in determining what needs to be reported:

Basic information about the incident, including the type of attack, the attacker's apparent purpose, the type of operating system and version of the victim system, the particular networks/subnets involved, commands that the attacker has entered, the particular account(s) that have been compromised, and any information about the identity and characteristics of the attacker and/or any malicious programs
Information concerning the origin of the attack ‚ the attacking host(s), type of connection to the victim(s), and any known route(s) across the network used in the attack
The consequences of the attack ‚ whether the attacker has gotten superuser privileges, what particular data (if any) have been accessed without authorization, any integrity changes that have been made in the victim system, and so forth
Threat ‚ how widespread the attack is, the other systems that have been compromised, how sensitive any compromised systems and data are, and the likelihood that damage and/or disruption will escalate
Status ‚ the current status of the incident and when the incident is likely to be resolved
Critical personnel contacts (for the victim and source systems, networks, as well as for the incident response staff)

Other kinds of information, such as vulnerabilities that have been exploited and audit log data, may also be helpful. Creating a form that includes fields that correspond to each of the above bullets and making this form widely available is a good way to help ensure that information that needs to be reported is actually reported.

Containment

The purpose of the third stage of incident handling, containment, is to limit the extent of an attack and thus the potential for damage or loss. Containment-related activity should, of course, occur only if the indications observed during the second stage conclusively show that an incident is occurring.

The Logic of Containment

After an incident has been confirmed, whoever is handling the incident should quickly determine reasonable ways to contain the incident and then decide on the one that appears to be best. Containment-related measures can, in some cases, be relatively simple and quick. If several bad login attempts against a single account have occurred, for example, a reasonable containment measure would be to disable that account (at least temporarily). On the other hand, if an intruder has gained superuser access to multiple hosts, containment will almost certainly be more extensive .

It is important to never take containment lightly because so many incidents can get out of hand so quickly. A good example is a worm infection; it is important to limit the spread of a worm within a network as quickly as possible. ^[10] Port scanning and vulnerability scanning attacks, even though they occur so much that they have become commonplace, also constitute a potentially significant threat in that they are often harbingers of determined attacks. Similarly, the discovery of zombie or handler software in one or more hosts should elevate the priority of containment-related activity substantially, given the potential amount of damage that has resulted from DDoS attacks in the past. The same principle applies to Web defacements, which are often followed by all-out attacks.

^[10] The way to do this depends on how the worm in question functions. If, like the Internet Worm of 1988, a worm spreads through password attacks, access to mail-forwarding files, and vulnerabilities in service programs such as sendmail , stopping the spread of this type of worm can be relatively difficult. One possible technique is to "inoculate" every system with the signature the worm looks for in determining whether it will infect a system. The Internet Worm (like so many others that have surfaced since) attempted to avoid multiple infections of any single system.

The Role of Users When Incidents Occur

Users are in many ways a "two-edged sword" as far as security-related incidents go. Users are often the first to notice security breaches and anomalies that need to be investigated. On the other hand, users often attempt to eradicate incidents (or more often, perceived incidents) that do not turn out to be bona fide incidents), often doing more damage to their systems and files than an attacker could ever have done. In general, providing users with the following instructions results in a reduction in damage and/or disruption:

Do not shut down your system or disconnect from network without first consulting authorities.
Follow your organization's reporting procedures ‚ report any suspicious occurrences to your security point of contact.
Continue to monitor and document suspicious occurrences until help arrives.
Do not modify system or application software.
Do not talk to the media without prior management approval.

Possible Containment Strategies

An essential part of containment is decision making, (that is, what to do to minimize the spread of damage and/or disruption). Possible decisions include any or all of the following:

Shutting a system down altogether (a drastic, but sometimes very advisable, decision to prevent further loss and/or disruption).
Disconnecting from a network. This at least allows local users to obtain some level of services, although it will prove disruptive.
Changing the filtering rules of any firewalls and routers to exclude traffic from hosts that appear to be launching the attacks.
Disabling or deleting login accounts that have been compromised.
Increasing the level of monitoring of system and/or network activity.
Setting traps such as decoy servers, as discussed in Chapter 12,"Traps and Deceptive Measures."
Disabling services, such as file transfer services, if vulnerabilities in services are being exploited.
Striking back at the attacker's system(s), although you should in general avoid doing this and should never do this without the explicit approval of top-level management.

Sometimes this decision is trivial; shutting down a compromised system if the system, its data, and/or its applications are classified, sensitive, or proprietary is an obvious thing to do. The same applies to a compromised system that holds proprietary information or applications. In other cases, it is worthwhile to risk a certain amount of damage to the compromised system if keeping the system up might enable you to identify an intruder.

Other Considerations

Other considerations during the containment stage are also extremely important. In general, we recommend the following:

Adhering to well-defined and detailed containment procedures. This helps maximize the probability of successful incident containment.
Continuing to record in a notebook, dictaphone, PDA, or other means virtually everything that occurs during the course of the incident, what those who are dealing with the incident have done and how much time each action has required, and other important details.
Defining acceptable risk limits in dealing with an incident well in advance. After all, making decisions such as leaving compromised systems running and connected to the network involves elevated risk.
Advising users of the status of an attacked system that they use if there is prolonged disruption, if data are destroyed , and so forth. Sooner or later they will figure out that something is wrong, but at least you will be able to control false rumors (and also possibly the panic) that are likely to spread.
Continuing to report (through secure channels such as secure email) any significant updates to appropriate organizations and people mentioned in the previous section of this chapter.
Following an organization's policy regarding contact with the media (see the following sidebar).

Dealing with the Press

Part of the containment process includes containment of possible public relations damage that a security-related incident can cause. One thing is certain ‚ if an incident has a large impact and/or if it involves many hosts from many different organizations, at least one reporter will learn of it, then subsequently probe for details. Chapter 4, "Forming and Managing an Incident Response Team," covers how to deal with the press when a security-related incident occurs.

One of the fundamental parts of containing an incident is finding out whether attackers and/or malicious programs have installed any back-door programs that enable unauthorized reentry to compromised hosts and network devices. If any back doors have been installed, removing them is usually the best course of action. There might be exceptions ‚ such as when those involved in an incident response effort are trying to gather evidence to be used in a legal prosecution effort ‚ but when the back doors allow immediate superuser access, there is really little debate concerning what to do with them. Similarly, if someone has gained unauthorized superuser access to a host or has obtained a copy of the password file, it is also typically prudent to change all passwords as soon as possible (because the attacker might have cracked or reset passwords).

Eradication

The fourth stage in the PDCERF methodology is eradication. After the incident has been contained, it is now time to eradicate the cause of the incident.

The Logic of Eradication

The goal is to eliminate the cause of the incident. Software might be available to help you in this effort. For example, eradication software is available to eliminate most viruses that infect small systems (and often even larger ones). If any Trojan horse programs (other than back-door programs or ones that could cause an incident to spread), which should have already been eradicated during the containment stage, remain in the system at this time, it is now time to delete them. In the case of infections by extremely malicious and dangerous programs, it is probably best at this time to clean and reformat any hard drives containing the infected files. Finally, ensure that all backups are clean. Many systems infected with viruses become periodically reinfected simply because people do not systematically eradicate the virus from backups.

If a classified system has been infected by a virus, a worm, or some other kind of malicious executable, it is essential to follow guidance issued by the department or agency that has jurisdiction over classified computing. We also strongly advise that, in this case, a low-level format be performed to ensure that whatever has caused the incident is fully eradicated. In classified environments, however, the department or agency with jurisdiction over classified computing might instead require destruction of media.

Eradication Procedures

As in virtually all the stages of the PDCERF model, preparing detailed procedures and then following them is critical in the eradication stage. Procedures are exceptionally important in the eradication stage because it is so easy to overlook a critical detail or two in the heat of battle. Overlooking any detail could result in undesirable outcomes such as another flare-up of the incident or destruction or tainting of forensics evidence.We will now examine some of the procedures that can be used to eradicate the cause of UNIX (and Linux) and Windows NT incidents:

Eradication in a UNIX System

The following steps constitute a sound basic set of minimal eradication procedures for compromised UNIX systems:

Ensure that no unauthorized entries exist in .forward files.
Use ps (with appropriate flags such as ‚ elf in System V UNIX systems) to look for stray processes running.

Ensure that the following files are not modified:

 /etc/dfs/dfstab (or /etc/exports)  .login  .logout  .profile  /etc/profile  .cshrc  All files in the /etc/rc path  .rhosts  /etc/hosts.equiv  at

Also examine the following for unauthorized changes:

 netstat  ls  sum  find  diff  /etc/nsswitch.conf  /etc/resolv.conf  /var/spool/cron  /var/spool/cron/crontabs  kerb.conf

To discover the real modification time for files, enter ls -lac .

To discover suid programs enter the following:

  find / -type f -perm -4000 -ls

  find / -type f -perm -4000 -print

Ensure that there are no modifications of the following:

 /etc/passwd  the shadow password file  /etc/group  yppasswd

Ensure that no unauthorized entries exist in .rhosts files and /etc/hosts.equiv . Enter the following:

  find / -name .rhosts -ls -o -name .forward -ls

  find / -name .rhosts -print -o -name .forward -print

Inspect the following to ensure that no unauthorized services are running:

 /etc/inetd.conf  -/etc/inittab  /etc/services  -/etc/hosts.allow  -/etc/hosts.deny

Search for all files created during the time you think the attack(s) occurred by entering

  find / -ctime -1 -ls  or  find / -ctime -1 -print  and eradicate as necessary.

Search for all files modified during the time you think the attack(s) occurred by entering

  find / -mtime -1 -ls  or  find / -mtime -1 -print  and take the action you deem appropriate.

Use the strings command to inspect binaries because even though the binary might look like nonsense , cleartext strings within in might give some indication of what a modified binary is doing (for example, setting up an encrypted session using OpenSSH)

Eradication in a Linux System

In most flavors of Linux, the commands and associated functions are generally the same as in UNIX. Note, however, that in most flavors of Linux, flags used with commands are not preceded by a dash (-).

Eradication in a Windows NT System

We recommend that you use the following basic procedures for eradication in compromised Windows NT systems:

Ensure that the following have not been modified:
- The Security Accounts Manager (SAM) database ^[11]
  
  ^[11] Using a tool such as Tripwire for NT (see www.tripwiresecurity.com) is one of the best ways to check the integrity of any file in Windows NT. Be sure, however, to guard the integrity of any program that checks file integrity. One trick of attackers is to corrupt integrity-checking programs so that they always display the same results, no matter the contents of the files they check. Tripwire for UNIX is also a great tool to check the integrity of most UNIX flavors.
- Services (check by going to the Control Panel and then double-clicking on the Services icon)
- All .dll files (especially MSGINA.DLL, which is the graphics interface for user logons )
- Dial-in settings (especially the RAS.PBK ^[12] file) at (or, in SP5 and up, the Task Scheduler)
  
  ^[12] This is the RAS phonebook, which contains the names of users allowed to dial in.
- User Manager for Domains settings, particularly Account Policy,Audit Policy (remember that attackers love to disable auditing) and Trust Relationships settings
- All logon scripts
- The integrity (including ownership) of all Registry keys and values below HKLM\Software\Microsoft\WindowsNT\CurrentVersion\Winlogon and HKLM\System\CurrentControlSet\Control\LSA
- HKLM\Software\Microsoft\Windows\CurrentVersion\Run and HKLM\ Software\Microsoft\Windows\CurrentVersion\Run (and RunOnce and RunOnceEx)
- Membership in all privileged groups but especially the Domain Administrators and Local Administrators groups
- System and user profiles

Eradication in a Windows 2000 System

We recommend that you use the following basic procedures for eradication in compromised Windows 2000 systems:

Ensure that the following have not been modified:
- The Security Accounts Manager (SAM) database (in mixed mode ^[13] ) or the ntds.dit file ^[14] (in native mode)
  
  ^[13] In mixed mode, domain controllers are both Windows NT and Windows 2000. In native mode, all domain controllers are Windows 2000.
  
  ^[14] This is the password file in Windows 2000 native mode.
- Services (check by going to the appropriate policy that controls services)
- All .dll files (especially MSGINA.DLL, which is the graphics interface for user logons)
- Scheduler entries (go to the Scheduler icon in the Control Panel)
- Policy settings (particularly Password Policy, Account Lockout Policy, Kerberos Policy, Audit Policy, Dial-in Policy, and Trust Relationships) as well as the order of Group Policy Objects (GPOs)
- All logon scripts
- All Security Option settings
- All permissions for Active Directory
- Active Directory schemas
- All DNS settings
- The integrity (including ownership) of all Registry keys and values below HKLM\Software\Microsoft\WindowsNT\CurrentVersion\Winlogon and HKLM\System\CurrentControlSet\Control\LSA HKLM\Software\ Microsoft\Windows\Current Version\Run and HKLM\Software\Microsoft\Windows\Current Version\Run (and RunOnce and RunOnceEx)
- Membership in all privileged groups but especially the Enterprise Administrators, Schema Administrators, Domain Administrators and Local Administrators groups
- Permissions and ownerships in \%systemroot%\ntds and \%systemroot%\SYSVOL\sysvol and below

Other Considerations

Other considerations during the eradication stage are also extremely important. In general, we recommend the following:

Adhering to eradication procedures such as the ones recently described for UNIX and Windows NT systems

Continuing to record everything that occurs
Continuing to keep users advised of the status of any compromised systems that they use (if appropriate)
Continuing to report any major updates to appropriate people and organizations
Continuing to follow any applicable policy requirements concerning contact with the media

Recovery

The fifth stage in the PDCERF incident response methodology is recovery. After the cause of an incident has been eradicated, the recovery phase defines the next stage of action. The goal of recovery is to return any compromised system and network device completely back to its normal mission status.

Recovery Procedures

Following detailed technical procedures for system recovery is every bit as important during the recovery stage as in any other PDCERF stage. Procedures almost without question will vary across organizations; different organizations need different levels of assurance concerning the certainty regarding any compromised systems' complete integrity. Additionally, procedures for recovery will necessarily differ from one operating system to the next.

When it comes to recovery, one of the surefire recovery methods is to perform a full system restore from known good media. This might be difficult and time consuming, especially if many systems have been compromised, but provided that the media used to restore compromised systems have been adequately safeguarded at all times, this strategy can provide a high level of assurance that systems and network components have been returned to their normal operational status. Note that a full restore, including changes to every password, should be mandatory if an attacker has gained superuser access to a system.

Data recovery can be tricky. One reasonably safe (albeit less than perfect) method is to restore data from the most recent full backup (or incremental backup, provided that there have been data changes since the time of the last full backup). Another is to use fault tolerance system hardware such as a redundant array of independent drives (RAID) to recover mirrored or striped data that resides on the redundant hard drives. We are assuming , of course, that these files and data have not also been compromised. Recovery in classified computing systems is generally very detailed and time consuming, well outside the scope of this book. The agency or organization that has jurisdiction over classified computing will provide guidance concerning recovery in these systems.

Other Considerations

As in the other stages, other considerations during the recovery stage are critical to a successful outcome. You should continue to do the following:

Record everything that occurs (including the time required for each person working on incident response to perform each recovery-related task).
Keep users aware of the status of any compromised systems that they use (if appropriate). In particular, after recovery efforts are complete, it might be best to assure users that everything is back to normal.
Advise appropriate people and organizations of any major developments that might affect them.
Adhere to any applicable policy requirements concerning contact with the media.
Continue to log system and activity, but in general, return logging to a more normal level.
In the case of a network-based attack, install patches for any operating system, firewall, or router vulnerability that was exploited. Patches should be installed in both compromised and other systems.

Finally, during the recovery stage, it is also important to remove any interim (stop gap) defensive measures that have been deployed as short- term containment measures. To stop a series of remote FTP attacks, for example, network administrators might block all incoming traffic bound for TCP ports 20 and 21. During the recovery stage it would be appropriate to go back to the original filtering rules for FTP traffic (or perhaps even to tighten these rules somewhat but not block all FTP traffic anymore).

Follow-Up

The final stage in the PDCERF methodology is follow-up. The overall goal is to review and integrate information related to an incident that has occurred. Follow-up is, unfortunately, the most likely stage to be overlooked (partially because resources are usually limited and personnel are often exhausted by the time recovery from an incident is complete). This stage is extremely critical, however, so critical that it is hard to envision a successful incident response effort if it is omitted.

The Importance of Following Up

Conducting follow-up activity after recovery is essential for several reasons:

It helps those involved in handling an incident develop a set of "lessons learned" to improve their skills and learning in future situations.
This stage also provides information (including metrics) that can help justify an organization's incident response effort to management,"proof " that the team has been very active and achieving its purpose.
Any lessons learned can serve as training material for new team members. A collection of previous "lessons learned" reports can acquaint new team members with the mistakes that can be made, the kinds of actions that do and do not work in a variety of situations, and so forth.
It can serve as a basis for team building.
It can yield information that might be useful in legal proceedings.

The Nature of Follow-Up Activity

The most important element of the follow-up stage is performing a postmortem analysis on each significant incident. Exactly what happened and at what times? How well did the staff involved in dealing with the incident do? What kind of information did the staff need sooner, and how could they have gotten that information sooner? What would the staff do differently next time? Did management prove to be part of the problem or part of the solution? Why?

A follow-up report should also provide information that can be used for reference if other, similar incidents occur. Constructing a timeline of events (including observed events as well as actions taken to mitigate them) is also important for learning as well as legal reasons. Similarly, rapidly obtaining a monetary estimate of the total damages resulting from the incident is critical. Monetary damage and its implications will be discussed in greater detail in Chapter 7, but for the time being, we can define monetary damage in terms of destruction of, damage to, or unauthorized copying or possession of software, data files, and/or hardware damage, as well as labor and travel costs incurred in responding to the incident. The estimate of monetary damage can serve as the basis for prosecution efforts aimed at convicting perpetrators and recouping damages.

Another important part of the follow-up stage is reevaluation and modification of an organization's incident response procedures on the basis of the lessons learned. This is one of the more strategic (as opposed to tactical) benefits of this stage. Personnel who follow their organization's incident response procedures will invariably identify some gaps in the procedures. Certain steps might even cause problems that waste time and effort.

A good example of a procedures gap occurred a few years ago. Someone called a response team to report that the human genome database at a certain site had been broken into. The person who handled the phone call reported this to the team leader, who asked for more information. The team member reported that he had no information regarding how to contact the person who had originally called. As things turned out, the person who handled the phone call had followed the incident response procedures properly, but the procedures did not specify that everyone who comes in contact with someone who reports an incident should immediately record the phone number and email address of that person. This gap in the procedures was a topic of discussion at a follow-up meeting later; the organization's procedures were modified accordingly soon afterwards.

Another important part of follow-up activity is the process of having people involved in incident response interact with each other to improve the way they go about their business. This is potentially a very valuable activity in that it can promote team building. It can also sensitize management to the problems and challenges that people who deal with security-related incidents face.

‚ < ‚ Free Open Study ‚ > ‚

Figure 3.1. The PDCERF incident response methodology.

Preparation

Setting up Defenses/Controls

Procedures

Obtaining Resources and Personnel

Building an Infrastructure to Support Incident Response

The Importance of Contact Lists

Detection

About Detection

Intrusion-Detection Systems (IDSs)

Detection Software

Figure 3.2. Process accounting data showing an attempted attack on a UNIX system.

Figure 3.3. Accounting entries showing an attack on aVMS system.

Figure 3.4. Security Log entries showing brute force attacks on a Windows NT system.

Figure 3.5. An Event Detail screen for the highlighted Security Log entry in Figure 3.4.

Figure 3.6. Security Log entries showing an attack on a Windows 2000 system.

Initial Actions and Reactions

Case Study: Making A UNIX Backup for Incident Response Purposes

Estimating the Scope of the Incident

The Reporting Process

What Type of Information about Incidents Needs to Be Reported?

Containment

The Logic of Containment

The Role of Users When Incidents Occur

Possible Containment Strategies

Other Considerations

Dealing with the Press

Eradication

The Logic of Eradication

Eradication Procedures

Eradication in a UNIX System

Eradication in a Linux System

Eradication in a Windows NT System

Eradication in a Windows 2000 System

Other Considerations

Recovery

Recovery Procedures

Other Considerations

Follow-Up

The Importance of Following Up

The Nature of Follow-Up Activity