Managing Incidents

‚  < ‚  Free Open Study ‚  > ‚  

The actual management of incidents is covered throughout this book. There are, however, some issues that should be addressed early in the organization phase of the team's life cycle.

Obviously, the first step in managing and responding to an incident is to assemble the team. This might be either physical or virtual. Ideally, a core team can meet quickly in person and discuss the incident. It can then either meet with or call the affected persons and gather information quickly to formulate a response strategy.

Chapter 1,"An Introduction to Incident Response," defines incident response as the "actions taken to deal with an incident that occurs. These actions normally represent some form of intervention to negate or minimize the impact of the incident." The incident response team must develop a plan to contain and eradicate the incident as quickly and efficiently as possible.

The team might have specific objectives as well, including the following: [4]

[4] Other than the first bullet about the protection of human life, these are not in any suggested priority. Each organization must examine its own information assets, risks, and culture to establish priorities.

  • The preservation and protection of human life. Most organizations specifically address this as the highest priority in their incident response policy.

  • The preservation of evidence for litigation or prosecution .

  • The preservation or recovery of business data.

  • Resumption of computing and network services.

  • Containing and preventing escalation of the incident, including preventing the incident from spreading to other, unaffected computers or networks.

  • Cooperation with law enforcement, regulatory, or investigative agencies (including internal agencies such as internal audit).

  • Avoiding or minimizing damage to the company's reputation.

These priorities should be discussed during the initial team meeting. Some of them (for example, collection and preservation of evidence) might dictate many of the follow-up actions and might require that certain steps occur. It is always easier to gather the evidence early in the process and choose later not to use it; it is usually impossible to go back and try to collect usable (or admissible ) evidence later.

This is also the time to begin assigning resources to the incident. The incident should be prioritized in relation to other incidents that might be occurring (and to other requirements such as training or operations support). When possible, key personnel should not be assigned to minor incidents; they should be saved for the major problems. This might not be an option in some teams , but the team leaders must be able to reassign personnel quickly if another incident breaks.

Surviving the Long Haul

There is a danger when managing large incidents that team members might burn out early in the process. It is natural to work extremely hard when the incident is first discovered . Members might work around the clock for several days. This is fine, provided the incident is of a relatively short duration. If the incident drags on, however, those key personnel might not be able to keep up the pace.

Team members must plan (and leaders must enforce) an arrangement to sustain the incident management effort over time. Critical team members might find themselves neglecting other duties , home life, rest, and meals when working on the incident. This must not be allowed to continue over an extended period of time. The person will likely burn out and not be able to continue. Performance will certainly suffer, invariably at a time when critical decisions are required.

The team operating procedures should address this contingency and should have a procedure for assigning long- term tasks and managing the resources required to sustain them. For example, early in the incident, the team might be meeting twice a day. As the incident drags on for months (not unheard of in the event of large incidents involving prosecution), the meetings might drop to once a week.

Strategies for Sustained Operations

One organization discovered that team members were skipping meals to work on the incident. The company started furnishing breakfasts (coffee, donuts , and bagels) as soon as the workers arrived as well as lunch and dinner (typically fast food such as pizza or fried chicken) throughout the day. This was extremely helpful during the early phases of the incident, but it became unsupportable as the effort moved into the second and third months.

In this same organization, one key team member was working extremely long hours and even sleeping in his office to avoid the commute. The team leader recognized that this member would probably not scale back his efforts, even if told to do so. The company furnished the employee with a computer so that he could work from home. This allowed the employee to continue his efforts without completely neglecting his health or his family.

Assigning Incident Ownership

Another important consideration in organizing for incident response is ensuring that there is an owner for every significant incident (every incident in which severity ratings produce a score indicating elevated risk to an organization). An "owner" is a person assigned the responsibility of bringing the incident to as successful a conclusion as possible. Without ownership, incidents often "slip through the cracks," and when someone within an incident response team's constituency complains about the team's lack of action, a simple investigation usually reveals that lack of ownership is the reason. Persons assigned specific responsibilities are personally accountable for fulfilling those responsibilities and are usually thus motivated to do so.

We recommend assigning the roles of team lead and alternate lead for every significant incident. The team lead is the primary owner, and is responsible for day-to-day progress in dealing with the incident until its resolution. This person also normally serves as the main point of contact with the person who has reported the incident as well as the incident response team manager. If insufficient manpower or technical expertise is available, the team lead should promptly inform the incident response team manager accordingly . The team lead should also keep the team manager and other key players (including other team members working on the incident as well as people within the response team's constituency) informed of progress and developments. The alternate lead is, in effect, the secondary owner. This person should be prepared to fill in for the team lead in case the team lead must go on travel, becomes sick, becomes assigned to another, higher priority incident, and so forth. The alternate lead can also serve as a member of the team that is dealing with the incident.

Assigning the right person the role of team lead and alternate lead is critical. If a significant incident involving a Unix system occurs, a person with technical expertise in Unix is the logical candidate for the team lead. The same applies to the alternate lead. At times, however, your team's expertise will be stretched thin. You may, in these cases, have to temporarily assign someone the team lead role even though that person has little relevant expertise. Later, someone with the relevant expertise can replace the temporary team lead, who then might be moved to the role of alternate lead. As mentioned in Chapter 4, sometimes the best course of action is to bring in outside expertise. In some of these cases it may be prudent to assign the consultant or contractor the team lead or alternate lead role.

Tracking Charts

When the team is managing multiple incidents, some form of tracking is essential. As new incidents occur, the details of the earlier ones might be forgotten and required actions fail to occur.

Ideally, the team will have some sort of a "War Room" where it can conduct face-to-face meetings or teleconference with off-site personnel. Status charts on the walls allow the team leaders to quickly review the current status of any open incidents.

These charts can supplement an automated tracking system. Although automated systems are useful, especially in preparing reports to management, they are themselves a vulnerability. Because no one can predict ahead of time what systems might be compromised, there should always be a manual backup to any incident tracking and reporting system.

One chart that has proven useful in the past is illustrated in Figure 5.1. The format can, of course, be modified to fit the specific requirements of the organization, especially if the company already has incident tracking or priority criteria.

Figure 5.1. A sample incident operations tracking chart.

  • Incident Number. This field can contain any useful tracking number. If the company has an accounting system that allows charge backs, the charge number could be used. This is especially useful when gathering data about the costs of incidents.

  • Type. This is used to indicate information about the incident (virus, network penetration, internal investigation, and so on). The types of incidents outlined in the checklist in Appendix A,"RFC-2196," can be used, or the team can develop other categories.

  • Location. This could be physical (Singapore), logical (Internet banking), or both.

  • Point of Contact. This is the primary operations point of contact, often the system owner or manager of the affected business area.

  • Phone. The team's phone numbers , including work, home, pager, mobile, and email.

  • Priority. This is the priority assigned to the incident. Priorities and severity models are addressed in the next section.

  • Status. This is the current status of the incident. This field can contain as much or as little information as the team desires. It can also contain notes about upcoming status meetings, required reports, and outstanding actions.

  • Last Update. This is the date and time the information on the chart was last changed.

  • Team Lead. This is the team member currently managing the incident.

  • Alternate Lead. This is the person responsible for assisting the team lead and/or picking up responsibility if the team's lead cannot, for any reason, continue to manage the incident.

Prioritization

Not every incident is a major priority. A minor virus outbreak that merely requires that a couple of PCs be disinfected is not the same as a major worm spreading through the company's mail servers. A few "script kiddie " probes against the web server that are detected and blocked automatically are not the same as a major denial-of-service attack against the same server.

It is not possible to design a "one size fits all" solution to prioritization and severity. The relative importance of an incident depends on the platforms and systems affected, in the context of the business. For example, it is common in many government organizations to take servers offline in response to a virus outbreak.Arguably, the self-inflicted denial of service caused by taking the servers offline might be greater than that caused by the virus. However, this might be an acceptable solution if those servers are simply used to provide information to the public. The "company" can afford to make its "customers" wait for a day to get the address of the local tax office, for example.

If, however, the organization is a commercial enterprise, the server is conducting real-time e-commerce, and there are competitors for the customers. Taking the system down for a day might not be a viable solution because it causes the company to lose hundreds of thousand of dollars a day in revenue. The web server, in that case, becomes a critical business asset, and incidents involving it gain a much higher priority.

As part of the risk analysis, companies should develop some idea of their information assets and the relative values of those assets. This is the first step in developing a prioritization plan. High-value targets must be assigned a high priority.

The severity and scope of the incident should also fit into the methodology. A virus or worm infecting one PC is different than one that has spread throughout the organization. A virus that merely spreads is less dangerous than one that also destroys data or emails files to random addresses.

For example, the organization could define a four-level severity model.

  1. The first level is an event that affects only one location (physical or virtual) and that has a relatively low impact. Examples would include a small virus incident that does not damage or destroy data and the unauthorized use of an account on a local file server.

  2. The second level would be a local event that has a major impact on operations. An example would be the compromise of a privileged account or the physical theft of critical equipment.

  3. The third level would be an event that affects two or more locations (again, defined either physically or logically) but that has a minor impact. Examples might include the proliferation of a nondestructive computer virus on the network or spamming of the email system.

  4. The fourth level would be a high-impact event that affects multiple sites. This would require major intervention and the highest priority. An example might be an intrusion into a critical global application.

Other models might be tied in with studies done by other organizations. For example, physical security might have already prioritized physical assets and assigned values to them. They also might have a severity model that can be modified to fit information assets. If the company has a working business continuity or disaster recovery plan, it might also have already assigned priorities to systems for the purposes of recovery. A model that is already in use and accepted by management might be better than introducing a new one.

Obviously, it is impossible to fully quantify the impact of an incident. In the earlier example, compromise of the web server might fit into the severity model as a Level 2 but might warrant a higher priority based on the potential impact on the business. A model will help the team develop a framework for prioritizing resources when responding to multiple incidents.

‚  < ‚  Free Open Study ‚  > ‚  


Incident Response. A Strategic Guide to Handling System and Network Security Breaches
Incident Response: A Strategic Guide to Handling System and Network Security Breaches
ISBN: 1578702569
EAN: 2147483647
Year: 2002
Pages: 103

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net