Document Your Network | Protect Your Windows Network: From Perimeter to Data

Document Your Network

The first, and probably most important, step in threat modeling a network is to document the network adequately. Without understanding exactly how the network is working, we can never hope to secure it. The entire purpose of the documentation stage is communication. When the first author was a professor , he used to teach data and process modeling at the university. Invariably, students wanted to know the "correct" answer to the homework problems as soon as they were handed in. The answer was always "would your mother understand your picture?" In most cases, that statement was greeted with a surly answer to the effect that "my mother doesn't know anything about computers." That's the point! The purpose of the document is to communicate what the environment looks like. If someone who does not understand the environment could look at it and, with a minimum of knowledge, understand what it is doing, you have a good answer. There is no right and wrongonly good and bad. Communication proceeds more smoothly if you follow the proper grammar, and are consistent, and so on; but if you have a good answer, those other things just come by themselves . The key is whether you are communicating the point.

You can use any technique you want to communicate the point. We prefer to use a modified version of a technique known as data flow diagrams (DFD).

Data Flow Diagrams

A data flow diagram is the result of a process modeling technique that was invented many years ago for modeling applications. The idea was to model how data flows through an application. Although you can adapt any modeling technique you like for modeling networks, we like DFDs for their simplicity. There are only four constructs in a standard DFD. (Some modeling tools, such as Microsoft Visio, have more, but there are only four core constructs. The remainder are only for notational convenience and we do not need one for network threat modeling.) Figure 9-2 shows the constructs.

Figure 9-2. DFD constructs used in network threat modeling.

We do not use data stores in network threat modeling, which is why they are missing from the list. Data stores in networks would be databases, but databases are actually processes in and of themselves because they reside on a database server and you have to ask that database server to do something for you. Apart from that, the same constructs apply that you would use on a normal DFD. Note also that if you are used to using all the extended constructs, such as you might find in Microsoft Visio, we, in deference to simplicity, do not require those in network threat modeling. We only use the three shown in Figure 9-2. We do, however, duplicate entities and processes on a diagram to avoid crossing lines and making the diagram hard to read.

If you are starting with a brand new network, you would start by building a context diagram. A context diagram is simply a very high-level diagram that shows the network as a single process and any entities with which the process can communicate. The beauty of DFDs lie in their ability to help you traverse ladders of abstraction. Think of this as an automobile factory. A context diagram is similar to the view you would have if you saw the factory in the air from a helicopter. You see trucks with leather, tires, sheet metal, paint, etc., entering the factory on one end. On the other end, you see finished vehicles rolling out. The model tells you nothing about what is going on inside the factory. Now take a can- opener and slice the roof off the factory. Now you see that the sheet metal and paint goes to a metal processing facility, which produces fenders and body parts. The body parts come out of the metal processing facility painted and ready. The leather gets stitched into seats. They meet up with the tires at final assembly, and a finished vehicle is produced. This is a level-0 diagram. It shows us the major, high-level processes within the network. If you like, you may apply the can-opener to the metal processing facility and create a level-1 diagram of just that process. On that diagram, you may find that sheet metal gets stamped, then rust-proofed, then painted , and finally baked before it goes to the assembly facility.

The context and level-1 diagram can prove extremely useful during the requirements gathering and initial threat modeling phases of designing a new network. A context diagram sets the stage for what the network needs to be doing and who may pose threats to it. A level 1 gives you a view into the high-level pieces of the diagram. As the design gets more detailed, you can drill in as deep as you need. At some point, you will go into the applications themselves, and turn to application threat modeling (should you wish to do so). In other words, network threat modeling is really just a higher level of looking at similar things from application threat modeling. The threats discovered and the methods used by attackers are different, however.

In the remainder of this chapter, we will not try to design a network from scratch. Rather, to give you an idea for how to use network threat modeling, we model an existing network, represented by the diagram in Figure 9-3.

Figure 9-3. The sample network diagram.

The network shown in Figure 9-3 is a relatively simple data center network. It consists of a data center that is a dual-screened subnet. Inside the data center, we have a domain, two Web farms with associated SQL Server clusters, a VPN server for connections into the data center, and a terminal server for remote management. Administrators can VPN into the data center and then open a Terminal Services session to the Terminal Server. From there, they can connect via Terminal Services to any other machine in the data center for remote management. Behind the data center, we have a corporate network with the standard clients and servers. There is also a corporate domain controller, which is trusted by the data center domain controller. Because the corporate DC is trusted, it becomes part of our network of interest.

Simply take a diagram of your network infrastructure and overlay a DFD on it. Anything that is a system in the network we are interested in becomes a process. Anything that is outside of our network of interest is an external entity. Figure 9-4 shows the net result.

Figure 9-4. We start the process by overlaying a DFD on our network diagram.

After we have a basic diagram describing the processes we are interested in, we can start analyzing the threats to our network. Threat generation in this way is an extremely difficult process. During the Windows Security Push at Microsoft, we learned something very fundamental to threat modeling: Most people do not have their minds tuned to doing it. During the Windows Security Push, we took 9,000 developers off their normal tasks and told them to go develop threat models. We explained to them that there are six kinds of threats, as defined by the STRIDE model (see the sidebar).

STRIDE

STRIDE stands for the six types of threat. It is used primarily to categorize the threats to evaluate the type of damage they may cause:

Spoofing These are threats where an attacker can appear to be something he is not. For instance, he can spoof an IP address to appear to be a different system, or he can spoof inputs to appear to be a different user .
Tampering In tampering, an attacker can modify data, possibly causing a processing error when a system tries to process the data.
Repudiation When users take some action, you must be able to determine who did so. For instance, if someone sends you an e-mail stating that you will get $1,000,000 from him, it would be pretty hard to collect unless you can prove that he sent it.
Information disclosure An information-disclosure threat is where an attacker gets access to information or data that he should not be able to access. Credit card theft from a Web site is an example of this, as is sniffing a network and getting information about someone's bank account.
Denial of service In a denial-of-service attack, the attacker prevents a system from servicing its legitimate users. This could be a transient effect, such as you would find with TCP-SYN flood, or it could be permanent as in when an attacker steals a system or causes it to crash in an unrecoverable way.
Elevation of privilege Using an elevation-of-privilege attack, the attacker can gain further privileges on a system. For instance, an attacker may exploit a buffer overflow in a Web server to go from an anonymous Web surfer to the LocalSystem account on the host hosting the Web server. Some elevation-of-privilege attacks are local only (require an interactive logon), whereas others are remote.

After explaining the threats, we asked people to go forth and enumerate all the threats against their component that fell into one of those six categories. After they had discovered the threats, they were to rank order them based on the DREAD model (explained in depth in Howard and LeBlanc's Writing Secure Code book, mentioned earlier), which would tell them exactly what they needed to fix. Having told them this, we now had 9,000 people staring at us like goats staring at a new fence. They simply had no idea how to even start!

The core problem is that people, in general, are not wired to think up threats. We are taught from a very early age to be nice people who try to make things work, not break them. To generate threats, you have to be able to think of ways to break things. To break things, you largely have to be really good at thinking like our adversaries, which are criminals in this case. This is quite a dilemma. Do we really want criminals in our employ ? No, but we need some people who can think like them! To help you do this for application threat modeling, Swiderski and Snyder, as well as Howard and LeBlanc, recommend using threat trees.

Using Threat Trees to Analyze Threats

A threat tree is essentially a somewhat specialized version of a fault tree that many of us are familiar with from various engineering disciplines. Just as it is hard to think about how to act like a criminal, it is hard for an engineer to think about how artifacts she designed might break or malfunction. To aid in such analysis, the engineering disciplines have long turned to fault trees. A fault tree is essentially a tree-wise representation of sequences of occurrences that lead to some type of fault. In most cases, the individual nodes in the tree are actually faults themselves. This is the great power of a fault tree; it highlights relationships between faults and allows us to estimate interactions between them.

If you think about the current computer vulnerabilities, they are almost exclusively single-point vulnerabilities. For example, send some particular type of input into a component, and the component fails. These vulnerabilities are easy to understand (albeit not necessarily easy to find). The really interesting issues, however, come from combining multiple vulnerabilities into a single, devastating result. For instance, being able to write arbitrary new files to the temporary Internet files (TIF) cache on a computer through a Web browser is an interesting issue, but unless you can execute or retrieve the file, what damage can it cause? Similarly, being able to locate a particular file in the TIF cache on a user's computer is an interesting problem, but so what? How about being able to execute arbitrary applications on a user's computer, but not being allowed to specify parameters to them? Unless there is some application that will do the attacker's evil bidding already on the system, that is not interesting. Hang on, what if we could write an arbitrary file to the TIF, then find it, and then execute it. Now we have a very interesting problem. Although all of the constituent issues are somewhat interesting in their own right, none of them by themselves constitute a severe breach of security. It is only taken together that they become truly malicious.

Fault trees are uniquely suited to help with this type of analysis. A fault tree allows us to analyze and understand the interactions between faults and analyze pre- or post-conditions of a particular fault. We can start the analysis from either endeither from the goal or from the current vantage point. Frequently, analysis from both ends turns out to be very fruitful. If this all seems a bit complex, an example would probably help. Consider the fault tree in Figure 9-5.

Figure 9-5. A fault tree typically shows a related set of threats leading to a single ultimate failure.

In Figure 9-5, we started with a goalto "root" (i.e., completely compromise) the SQL Server. We start out by looking at the immediate ways we can do that. A couple of easy ways come to mind, such as exploiting a blank SA (system administrator) password on the SQL Server, or exploiting the buffer overflow in the SQL resolver service. ("1434" refers to the UDP port used by the resolver service. This is the vulnerability used by the Slammer worm. A patch was issued with Microsoft Security Bulletin MS02-039.)

We can also assign probabilities to the various nodes in the tree. For instance, given that SQL Server up until Service Pack 3 would default to a blank password, we may estimate that there is a 70 percent probability that the SA password is blank. We could also estimate that there is a 50 percent probability that the resolver buffer overflow is unpatched. However, to be able to exploit either of those problems, we would need some ports open in the firewall. More specifically , to exploit the blank SA password, we must be able to send traffic to TCP 1433 on the SQL Server. The resolver service, similarly, listens on UDP 1434. This requirement is represented by the little arc that connects the two fault sets to the goal. Both parts connected to the arc have to be true for the fault to actually occur. Because the probability of UDP port 1434 being open is 0.0, since we have a firewall in front of the SQL Server, this threat is already mitigated. Likewise, we cannot exploit the blank SA password because TCP port 1433 is not open in the firewall.

However, TCP port 80 is open in the firewall. This enables us to exploit a different issue, namely that write access is enabled on the Web app. This may not be a serious problem in and of itself because some sites use HTTP for uploading files. However, if we can write to the Web app, and the system is missing a particular patch, we could exploit how the operating system loads DLLs. Certain DLLs will always be loaded from the current directory. For example, if IIS called a particular DLL, it would load it from the Web content directory if found there. If an attacker is able to upload the DLL to the Web content directory, all he needs to do to execute it is request a file from the same directory. However, the chances that this particular Web server still has that vulnerability are really low, roughly 30 percent, since the patch was issued long ago.

What we can do, however, is exploit a misconfigured virtual root in IIS that has both the execute and the write bit set. We deem the chances that there is one of those as really good because some of the samples in IIS 5.0 created them. Many sites have also been found to have one or more of these, sometimes because of poorly designed third-party software. Let us call the probability that one exists at 90 percent. If we can exploit that issue, we have the ability to run arbitrary code on the Web server.

After we have the ability to run code on the Web server, we could exploit that blank SA password directly if we want. Alternatively, we could take a slightly more elegant route. If the vroot is configured to run as Local System (in process with IIS), we can now do whatever we want on the Web server. For instance, we could dump out the LSA Secrets. Recall from Chapter 2, "Anatomy of a Hack: The Rise and Fall of Your Network," and Chapter 8, "Security Dependencies," that the LSA Secrets store the passwords for all the service accounts.

After we have the service accounts passwords, we check whether any of them are also used on the SQL Server and, if so, we use that service account to connect to the SQL Server. At this point, we have achieved the goal of rooting the SQL Server. You may at this juncture calculate the probabilities of being able to achieve the goal. This is done by calculating the probability at each level of the diagram. In our particular case, the first level consists of three "or" sets of faultsthe two "and" faults, and the single fault of port 80 being open. Of course, having port 80 open to the Web server is quite normal, and not a true fault, but because we are considering a node in an attack sequence, we will think of it as a fault for now.

The probability of a set of "and" faults at the same level is calculated by taking the minimum probability of the component faults. The aggregate probability for a set of "or" faults at the same level (i.e., for separate subtrees rooted in the same node) is calculated by taking the maximum probability of the component faults. Prerequisite fault probabilities, in other words, two nodes where one is directly subordinated to the other, are calculated by multiplying the two probabilities together.

Using those basic rules, we can easily calculate the probability of the entire threat tree that can be used in a successful attack. The probability for the lowest level, the shared service accounts, is simply 0.5. The next level is an "or" level consisting of the "LSA Secrets" and the "blank SA password" faults. The max of those two subtrees is MAX[0.7,1.0*0.5]=0.7. That means that it would be preferable to an attacker to exploit the blank SA password than the shared service account. Thus, the probability at this point is 0.7. The next level is another "or" level (vroots with execute and DLL-loading Trojan). Here we get an aggregate probability of MAX[(0.9*0.7),0.3] = 0.63. Moving up one level, we have a 0.8 probability that we have write access. 0.8*0.63 = 0.504 aggregate probability so far. Finally, we have the top level where we have three "or" faults, two of which are combined from two "and" faults. The probability of each of the "and" faults MIN[0.7,0.0] = 0.0 and MIN[0.5,0.0] = 0.0, respectively, but the probability that TCP port 80 is open in the firewall is 100 percent. Thus, the aggregate probability for the top level is 1.0, and we have a total probability for the entire sequence of faults of 0.504. This should now be taken into consideration for fixing together with all the other threats for which we can calculate similar probabilities.

As you can see, our threat tree provides us with a way to calculate the probability of threats coming true for particular components of the network. However, after we have determined the threats to each of the pieces of the network, it is time to move on to the next step: preventing these threats. As we saw in our threat model, there were interactions between components of the network. The Web server was compromised first, and it allowed us to get to the database server due to shared service accounts. Our security policy should take these types of design considerations into account. If we find that our policy does not include adequate requirements to mitigate the threats we find when we analyze the design, we would need to go back and modify the policy. This is in accordance with the statement earlier that policy and threat modeling is an iterative process. At this point, we may modify the policy to require network segmentation of sensitive servers that should not have dependencies on each other.