Section 5.5. Assurance in Trusted Operating Systems

5.5. Assurance in Trusted Operating Systems

This chapter has moved our discussion from the general to the particular. We began by studying different models of protection systems. By the time we reached the last section, we examined three principlesisolation, security kernel, and layered structureused in designing secure operating systems, and we looked in detail at the approaches taken by designers of particular operating systems. Now, we suppose that an operating system provider has taken these considerations into account and claims to have a secure design. It is time for us to consider assurance, ways of convincing others that a model, design, and implementation are correct.

What justifies our confidence in the security features of an operating system? If someone else has evaluated the system, how have the confidence levels of operating systems been rated? In our assessment, we must recognize that operating systems are used in different environments; in some applications, less secure operating systems may be acceptable. Overall, then, we need ways of determining whether a particular operating system is appropriate for a certain set of needs. Both in Chapter 4 and in the previous section, we looked at design and process techniques for building confidence in the quality and correctness of a system. In this section, we explore ways to actually demonstrate the security of an operating system, using techniques such as testing, formal verification, and informal validation. Snow [SNO05] explains what assurance is and why we need it.

Typical Operating System Flaws

Periodically throughout our analysis of operating system security features, we have used the phrase "exploit a vulnerability." Throughout the years, many vulnerabilities have been uncovered in many operating systems. They have gradually been corrected, and the body of knowledge about likely weak spots has grown.

Known Vulnerabilities

In this section, we discuss typical vulnerabilities that have been uncovered in operating systems. Our goal is not to provide a "how-to" guide for potential penetrators of operating systems. Rather, we study these flaws to understand the careful analysis necessary in designing and testing operating systems. User interaction is the largest single source of operating system vulnerabilities, for several reasons:

The user interface is performed by independent, intelligent hardware subsystems. The humancomputer interface often falls outside the security kernel or security restrictions implemented by an operating system.
Code to interact with users is often much more complex and much more dependent on the specific device hardware than code for any other component of the computing system. For these reasons, it is harder to review this code for correctness, let alone to verify it formally.
User interactions are often character oriented. Again, in the interest of fast data transfer, the operating systems designers may have tried to take shortcuts by limiting the number of instructions executed by the operating system during actual data transfer. Sometimes the instructions eliminated are those that enforce security policies as each character is transferred.

A second prominent weakness in operating system security reflects an ambiguity in access policy. On one hand, we want to separate users and protect their individual resources. On the other hand, users depend on shared access to libraries, utility programs, common data, and system tables. The distinction between isolation and sharing is not always clear at the policy level, so the distinction cannot be sharply drawn at implementation.

A third potential problem area is incomplete mediation. Recall that Saltzer [SAL74] recommended an operating system design in which every requested access was checked for proper authorization. However, some systems check access only once per user interface operation, process execution, or machine interval. The mechanism is available to implement full protection, but the policy decision on when to invoke the mechanism is not complete. Therefore, in the absence of any explicit requirement, system designers adopt the "most efficient" enforcement; that is, the one that will lead to the least use of machine resources.

Generality is a fourth protection weakness, especially among commercial operating systems for large computing systems. Implementers try to provide a means for users to customize their operating system installation and to allow installation of software packages written by other companies. Some of these packages, which themselves operate as part of the operating system, must execute with the same access privileges as the operating system. For example, there are programs that provide stricter access control than the standard control available from the operating system. The "hooks" by which these packages are installed are also trapdoors for any user to penetrate the operating system.

Thus, several well-known points of security weakness are common to many commercial operating systems. Let us consider several examples of actual vulnerabilities that have been exploited to penetrate operating systems.

Examples of Exploitations

Earlier, we discussed why the user interface is a weak point in many major operating systems. We begin our examples by exploring this weakness in greater detail. On some systems, after access has been checked to initiate a user operation, the operation continues without subsequent checking, leading to classic time-of-check to time-of-use flaws. Checking access permission with each character transferred is a substantial overhead for the protection system. The command often resides in the user's memory space. Any user can alter the source or destination address of the command after the operation has commenced. Because access has already been checked once, the new address will be used without further checkingit is not checked each time a piece of data is transferred. By exploiting this flaw, users have been able to transfer data to or from any memory address they desire.

Another example of exploitation involves a procedural problem. In one system a special supervisor function was reserved for the installation of other security packages. When executed, this supervisor call returned control to the user in privileged mode. The operations allowable in that mode were not monitored closely, so the supervisor call could be used for access control or for any other high-security system access. The particular supervisor call required some effort to execute, but it was fully available on the system. Additional checking should have been used to authenticate the program executing the supervisor request. As an alternative, the access rights for any subject entering under that supervisor request could have been limited to the objects necessary to perform the function of the added program.

The time-of-check to time-of-use mismatch described in Chapter 3 can introduce security problems, too. In an attack based on this vulnerability, access permission is checked for a particular user to access an object, such as a buffer. But between the time the access is approved and the access actually occurs, the user changes the designation of the object, so that instead of accessing the approved object, the user now accesses another, unacceptable, one.

Other penetrations have occurred by exploitation of more complex combinations of vulnerabilities. In general, however, security flaws in trusted operating systems have resulted from a faulty analysis of a complex situation, such as user interaction, or from an ambiguity or omission in the security policy. When simple security mechanisms are used to implement clear and complete security policies, the number of penetrations falls dramatically.

Assurance Methods

Once we understand the potential vulnerabilities in a system, we can apply assurance techniques to seek out the vulnerabilities and mitigate or eliminate their effects. In this section, we consider three such techniques, showing how they give us confidence in a system's correctness: testing, verification, and validation. None of these is complete or foolproof, and each has advantages and disadvantages. However, used with understanding, each can play an important role in deriving overall assurance of the systems' security.

Testing

Testing, first presented in Chapter 3, is the most widely accepted assurance technique. As Boebert [BOE92] observes, conclusions from testing are based on the actual product being evaluated, not on some abstraction or precursor of the product. This realism is a security advantage. However, conclusions based on testing are necessarily limited, for the following reasons:

Testing can demonstrate the existence of a problem, but passing tests does not demonstrate the absence of problems.
Testing adequately within reasonable time or effort is difficult because the combinatorial explosion of inputs and internal states makes testing very complex.
Testing based only on observable effects, not on the internal structure of a product, does not ensure any degree of completeness.
Testing based on the internal structure of a product involves modifying the product by adding code to extract and display internal states. That extra functionality affects the product's behavior and can itself be a source of vulnerabilities or mask other vulnerabilities.
Testing real-time or complex systems presents the problem of keeping track of all states and triggers. This problem makes it hard to reproduce and analyze problems reported as testers proceed.

Ordinarily, we think of testing in terms of the developer: unit testing a module, integration testing to ensure that modules function properly together, function testing to trace correctness across all aspects of a given function, and system testing to combine hardware with software. Likewise, regression testing is performed to make sure a change to one part of a system does not degrade any other functionality. But for other tests, including acceptance tests, the user or customer administers tests to determine if what was ordered is what is delivered. Thus, an important aspect of assurance is considering whether the tests run are appropriate for the application and level of security. The nature and kinds of testing reflect the developer's testing strategy: which tests address what issues.

Similarly, it is important to recognize that testing is almost always constrained by a project's budget and schedule. The constraints usually mean that testing is incomplete in some way. For this reason, we consider notions of test coverage, test completeness, and testing effectiveness in a testing strategy. The more complete and effective our testing, the more confidence we have in the software. More information on testing can be found in Pfleeger and Atlee [PFL06a].

Penetration Testing

A testing strategy often used in computer security is called penetration testing, tiger team analysis, or ethical hacking. In this approach, a team of experts in the use and design of operating systems tries to crack the system being tested. (See, for example, [RUB01, TIL03, PAL01].) The tiger team knows well the typical vulnerabilities in operating systems and computing systems, as described in previous sections and chapters. With this knowledge, the team attempts to identify and exploit the system's particular vulnerabilities. The work of penetration testers closely resembles what an actual attacker might do [AND04, SCH00b].

Penetration testing is both an art and a science. The artistic side requires careful analysis and creativity in choosing the test cases. But the scientific side requires rigor, order, precision, and organization. As Weissman observes [WEI95], there is an organized methodology for hypothesizing and verifying flaws. It is not, as some might assume, a random punching contest.

Using penetration testing is much like asking a mechanic to look over a used car on a sales lot. The mechanic knows potential weak spots and checks as many of them as possible. It is likely that a good mechanic will find significant problems, but finding a problem (and fixing it) is no guarantee that no other problems are lurking in other parts of the system. For instance, if the mechanic checks the fuel system, the cooling system, and the brakes, there is no guarantee that the muffler is good. In the same way, an operating system that fails a penetration test is known to have faults, but a system that does not fail is not guaranteed to be fault-free. Nevertheless, penetration testing is useful and often finds faults that might have been overlooked by other forms of testing. One possible reason for the success of penetration testing is its use under real-life conditions. Users often exercise a system in ways that its designers never anticipated or intended. So penetration testers can exploit this real-life environment and knowledge to make certain kinds of problems visible.

Penetration testing is popular with the commercial community who think skilled hackers will test (attack) a site and find problems in hours if not days. These people do not realize that finding flaws in complex code can take weeks if not months. Indeed, the original military red teams to test security in software systems were convened for 4- to 6-month exercises. Anderson et al. [AND04] point out the limitation of penetration testing. To find one flaw in a space of 1 million inputs may require testing all 1 million possibilities; unless the space is reasonably limited, this search is prohibitive. Karger and Schell [KAR02] point out that even after they informed testers of a piece of malicious code they inserted in a system, the testers were unable to find it. Penetration testing is not a magic technique for finding needles in haystacks.

Formal Verification

The most rigorous method of analyzing security is through formal verification, which was introduced in Chapter 3. Formal verification uses rules of mathematical logic to demonstrate that a system has certain security properties. In formal verification, the operating system is modeled and the operating system principles are described as assertions. The collection of models and assertions is viewed as a theorem, which is then proven. The theorem asserts that the operating system is correct. That is, formal verification confirms that the operating system provides the security features it should and nothing else.

Proving correctness of an entire operating system is a formidable task, often requiring months or even years of effort by several people. Computer programs called theorem provers can assist in this effort, although much human activity is still needed. The amount of work required and the methods used are well beyond the scope of this book. However, we illustrate the general principle of verification by presenting a simple example that uses proofs of correctness. You can find more extensive coverage of this topic in [BOW95], [CHE81], [GRI81], [HAN76], [PFL06a], and [SAI96].

Consider the flow diagram of Figure 5-22, illustrating the logic in a program to determine the smallest of a set of n values, A[1] through A[n]. The flow chart has a single identified beginning point, a single identified ending point, and five internal blocks, including an if-then structure and a loop.

Figure 5-22. Flow Diagram for Finding the Minimum Value.

In program verification, we rewrite the program as a series of assertions about the program's variables and values. The initial assertion is a statement of conditions on entry to the module. Next, we identify a series of intermediate assertions associated with the work of the module. We also determine an ending assertion, a statement of the expected result. Finally, we show that the initial assertion leads logically to the intermediate assertions that in turn lead logically to the ending assertion.

We can formally verify the example in Figure 5-22 by using four assertions. The first assertion, P, is a statement of initial conditions, assumed to be true on entry to the procedure.

n > 0 (P)

The second assertion, Q, is the result of applying the initialization code in the first box.

n > 0 and (Q)

1 min A[1]

The third assertion, R, is the loop assertion. It asserts what is true at the start of each iteration of the loop.

n > 0 and (R)

1 for all j, 1 min A[The final assertion, S, is the concluding assertion, the statement of conditions true at the time the loop exit occurs.

n > 0 and (S)

i = n + 1 and

for all j, 1 min A[These four assertions, shown in Figure 5-23, capture the essence of the flow chart. The next step in the verification process involves showing the logical progression of these four assertions. That is, we must show that, assuming P is true on entry to this procedure, Q is true after completion of the initialization section, R is true the first time the loop is entered, R is true each time through the loop, and the truth of R implies that S is true at the termination of the loop.

Figure 5-23. Verification Assertions.

Clearly, Q follows from P and the semantics of the two statements in the second box. When we enter the loop for the first time, i = 2, so i - 1 = 1. Thus, the assertion about min applies only for j = 1, which follows from Q. To prove that R remains true with each execution of the loop, we can use the principle of mathematical induction. The basis of the induction is that R was true the first time through the loop. With each iteration of the loop the value of i increases by 1, so it is necessary to show only that min i] for this new value of i. That proof follows from the meaning of the comparison and replacement statements. Therefore, R is true with each iteration of the loop. Finally, S follows from the final iteration value of R. This step completes the formal verification that this flow chart exits with the smallest value of A[1] through A[n] in min.

The algorithm (not the verification) shown here is frequently used as an example in the first few weeks of introductory programming classes. It is quite simple; in fact, after studying the algorithm for a short time, most students convince themselves that the algorithm is correct. The verification itself takes much longer to explain; it also takes far longer to write than the algorithm itself. Thus, this proof-of-correctness example highlights two principal difficulties with formal verification methods:

Time. The methods of formal verification are time consuming to perform. Stating the assertions at each step and verifying the logical flow of the assertions are both slow processes.
Complexity. Formal verification is a complex process. For some systems with large numbers of states and transitions, it is hopeless to try to state and verify the assertions. This situation is especially true for systems that have not been designed with formal verification in mind.

These two difficulties constrain the situations in which formal verification can be used successfully. Gerhart [GER89] succinctly describes the advantages and disadvantages of using formal methods, including proof of correctness. As Schaefer [SCH89a] points out, too often people focus so much on the formalism and on deriving a formal proof that they ignore the underlying security properties to be ensured.

Validation

Formal verification is a particular instance of the more general approach to assuring correctness: verification. As we have seen in Chapter 3, there are many ways to show that each of a system's functions works correctly. Validation is the counterpart to verification, assuring that the system developers have implemented all requirements. Thus, validation makes sure that the developer is building the right product (according to the specification), and verification checks the quality of the implementation [PFL06a]. There are several different ways to validate an operating system.

Requirements checking. One technique is to cross-check each operating system requirement with the system's source code or execution-time behavior. The goal is to demonstrate that the system does each thing listed in the functional requirements. This process is a narrow one, in the sense that it demonstrates only that the system does everything it should do. In security, we are equally concerned about prevention: making sure the system does not do the things it is not supposed to do. Requirements checking seldom addresses this aspect of requirements compliance.
Design and code reviews. As described in Chapter 3, design and code reviews usually address system correctness (that is, verification). But a review can also address requirements implementation. To support validation, the reviewers scrutinize the design or the code to ensure traceability from each requirement to design and code components, noting problems along the way (including faults, incorrect assumptions, incomplete or inconsistent behavior, or faulty logic). The success of this process depends on the rigor of the review.
System testing. The programmers or an independent test team select data to check the system. These test data can be organized much like acceptance testing, so behaviors and data expected from reading the requirements document can be confirmed in the actual running of the system. The checking is done in a methodical manner to ensure completeness.

Open Source

A debate has opened in the software development community over so-called open source operating systems (and other programs), ones for which the source code is freely released for public analysis. The arguments are predictable: With open source, many critics can peruse the code, presumably finding flaws, whereas closed (proprietary) source makes it more difficult for attackers to find and exploit flaws.

The Linux operating system is the prime example of open source software, although the source of its predecessor Unix was also widely available. The open source idea is catching on: According to a survey by IDG Research, reported in the Washington Post [CHA01], 27 percent of high-end servers now run Linux, as opposed to 41 percent for a Microsoft operating system, and the open source Apache web server outruns Microsoft Internet Information Server by 63 percent to 20 percent.

Lawton [LAW02] lists additional benefits of open source:

Cost: Because the source code is available to the public, if the owner charges a high fee, the public will trade the software unofficially.
Quality: The code can be analyzed by many reviewers who are unrelated to the development effort or the firm that developed the software.
Support: As the public finds flaws, it may also be in the best position to propose the fixes for those flaws.
Extensibility: The public can readily figure how to extend code to meet new needs and can share those extensions with other users.

Opponents of public release argue that giving the attacker knowledge of the design and implementation of a piece of code allows a search for shortcomings and provides a blueprint for their exploitation. Many commercial vendors have opposed open source for years, and Microsoft is currently being quite vocal in its opposition. Craig Mundie, senior vice president of Microsoft, says open source software "puts at risk the continued vitality of the independent software sector" [CHA01]. Microsoft favors a scheme under which it would share source code of some of its products with selected partners, while still retaining intellectual property rights. The Alexis de Tocqueville Institution argues that "terrorists trying to hack or disrupt U.S. computer networks might find it easier if the Federal government attempts to switch to 'open source' as some groups propose," citing threats against air traffic control or surveillance systems [BRO02].

But noted computer security researchers argue that open or closed source is not the real issue to examine. Marcus Ranum, president of Network Flight Recorder, has said, "I don't think making [software] open source contributes to making it better at all. What makes good software is single-minded focus." Eugene Spafford of Purdue University [LAW02] agrees, saying, "What really determines whether it is trustable is quality and care. Was it designed well? Was it built using proper tools? Did the people who built it use discipline and not add a lot of features?" Ross Anderson of Cambridge University [AND02] argues that "there are more pressing security problems for the open source community. The interaction between security and openness is entangled with attempts to use security mechanisms for commercial advantage, to entrench monopolies, to control copyright, and above all to control interoperability."

Anderson presents a statistical model of reliability that shows that after open or closed testing, the two approaches are equivalent in expected failure rate [AND05]. Boulanger [BOU05] comes to a similar conclusion.

Evaluation

Most system consumers (that is, users or system purchasers) are not security experts. They need the security functions, but they are not usually capable of verifying the accuracy or adequacy of test coverage, checking the validity of a proof of correctness, or determining in any other way that a system correctly implements a security policy. Thus, it is useful (and sometimes essential) to have an independent third party evaluate an operating system's security. Independent experts can review the requirements, design, implementation, and evidence of assurance for a system. Because it is helpful to have a standard approach for an evaluation, several schemes have been devised for structuring an independent review. In this section, we examine three different approaches: from the United States, from Europe, and a scheme that combines several known approaches.

U.S. "Orange Book" Evaluation

In the late 1970s, the U.S. Department of Defense (DoD) defined a set of distinct, hierarchical levels of trust in operating systems. Published in a document [DOD85] that has become known informally as the "Orange Book," the Trusted Computer System Evaluation Criteria (TCSEC) provides the criteria for an independent evaluation. The National Computer Security Center (NCSC), an organization within the National Security Agency, guided and sanctioned the actual evaluations.

The levels of trust are described as four divisions, A, B, C, and D, where A has the most comprehensive degree of security. Within a division, additional distinctions are denoted with numbers; the higher numbers indicate tighter security requirements. Thus, the complete set of ratings ranging from lowest to highest assurance is D, C1, C2, B1, B2, B3, and A1. Table 5-7 (from Appendix D of [DOD85]) shows the security requirements for each of the seven evaluated classes of NCSC certification. (Class D has no requirements because it denotes minimal protection.)

Table 5-7. Trusted Computer System Evaluation Criteria.
Criteria	D	C1	C2	B1	B2	B3
Security Policy
Discretionary access control	-
Object reuse	-	-
Labels	-	-	-
Label integrity	-	-	-
Exportation of labeled information	-	-	-
Labeling human-readable output	-	-	-
Mandatory access control	-	-	-
Subject sensitivity labels	-	-	-	-
Device labels	-	-	-	-
Accountability
Identification and authentication	-
Audit	-	-
Trusted path	-	-	-	-
Assurance
System architecture	-
System integrity	-
Security testing	-
Design specification and verification	-	-	-
Covert channel analysis	-	-	-	-
Trusted facility management	-	-	-	-
Configuration management	-	-	-	-
Trusted recovery	-	-	-	-	-
Trusted distribution	-	-	-	-	-	-
Documentation
Security features user's guide	-
Trusted facility manual	-
Test documentation	-
Design documentation	-
Legend: -: no requirement; : same requirement as previous class; : additional requirement

The table's pattern reveals four clusters of ratings:

D, with no requirements
C1/C2/B1, requiring security features common to many commercial operating systems
B2, requiring a precise proof of security of the underlying model and a narrative specification of the trusted computing base
B3/A1, requiring more precisely proven descriptive and formal designs of the trusted computing base

These clusters do not imply that classes C1, C2, and B1 are equivalent. However, there are substantial increases of stringency between B1 and B2, and between B2 and B3 (especially in the assurance area). To see why, consider the requirements for C1, C2, and B1. An operating system developer might be able to add security measures to an existing operating system in order to qualify for these ratings. However, security must be included in the design of the operating system for a B2 rating. Furthermore, the design of a B3 or A1 system must begin with construction and proof of a formal model of security. Thus, the distinctions between B1 and B2 and between B2 and B3 are significant.

Let us look at each class of security described in the TCSEC. In our descriptions, terms in quotation marks have been taken directly from the Orange Book to convey the spirit of the evaluation criteria.

Class D: Minimal Protection

This class is applied to systems that have been evaluated for a higher category but have failed the evaluation. No security characteristics are needed for a D rating.

Class C1: Discretionary Security Protection

C1 is intended for an environment of cooperating users processing data at the same level of sensitivity. A system evaluated as C1 separates users from data. Controls must seemingly be sufficient to implement access limitation, to allow users to protect their own data. The controls of a C1 system may not have been stringently evaluated; the evaluation may be based more on the presence of certain features. To qualify for a C1 rating, a system must have a domain that includes security functions and that is protected against tampering. A keyword in the classification is "discretionary." A user is "allowed" to decide when the controls apply, when they do not, and which named individuals or groups are allowed access.

Class C2: Controlled Access Protection

A C2 system still implements discretionary access control, although the granularity of control is finer. The audit trail must be capable of tracking each individual's access (or attempted access) to each object.

Class B1: Labeled Security Protection

All certifications in the B division include nondiscretionary access control. At the B1 level, each controlled subject and object must be assigned a security level. (For class B1, the protection system does not need to control every object.)

Each controlled object must be individually labeled for security level, and these labels must be used as the basis for access control decisions. The access control must be based on a model employing both hierarchical levels and nonhierarchical categories. (The military model is an example of a system with hierarchical levelsunclassified, classified, secret, top secretand nonhierarchical categories, need-to-know category sets.) The mandatory access policy is the BellLa Padula model. Thus, a B1 system must implement BellLa Padula controls for all accesses, with user discretionary access controls to further limit access.

Class B2: Structured Protection

The major enhancement for B2 is a design requirement: The design and implementation of a B2 system must enable a more thorough testing and review. A verifiable top-level design must be presented, and testing must confirm that the system implements this design. The system must be internally structured into "well-defined largely independent modules." The principle of least privilege is to be enforced in the design. Access control policies must be enforced on all objects and subjects, including devices. Analysis of covert channels is required.

Class B3: Security Domains

The security functions of a B3 system must be small enough for extensive testing. A high-level design must be complete and conceptually simple, and a "convincing argument" must exist that the system implements this design. The implementation of the design must "incorporate significant use of layering, abstraction, and information hiding."

The security functions must be tamperproof. Furthermore, the system must be "highly resistant to penetration." There is also a requirement that the system audit facility be able to identify when a violation of security is imminent.

Class A1: Verified Design

Class A1 requires a formally verified system design. The capabilities of the system are the same as for class B3. But in addition there are five important criteria for class A1 certification: (1) a formal model of the protection system and a proof of its consistency and adequacy, (2) a formal top-level specification of the protection system, (3) a demonstration that the top-level specification corresponds to the model, (4) an implementation "informally" shown to be consistent with the specification, and (5) formal analysis of covert channels.

European ITSEC Evaluation

The TCSEC was developed in the United States, but representatives from several European countries also recognized the need for criteria and a methodology for evaluating security-enforcing products. The European efforts culminated in the ITSEC, the Information Technology Security Evaluation Criteria [ITS91b].

Origins of the ITSEC

England, Germany, and France independently began work on evaluation criteria at approximately the same time. Both England and Germany published their first drafts in 1989; France had its criteria in limited review when these three nations, joined by the Netherlands, decided to work together to develop a common criteria document. We examine Britain and Germany's efforts separately, followed by their combined output.

German Green Book

The (then West) German Information Security Agency (GISA) produced a catalog of criteria [GIS88] five years after the first use of the U.S. TCSEC. Keeping with tradition, the security community began to call the document the German Green Book because of its green cover. The German criteria identified eight basic security functions, deemed sufficient to enforce a broad spectrum of security policies:

identification and authentication: unique and certain association of an identity with a subject or object
administration of rights: the ability to control the assignment and revocation of access rights between subjects and objects
verification of rights: mediation of the attempt of a subject to exercise rights with respect to an object
audit: a record of information on the successful or attempted unsuccessful exercise of rights
object reuse: reusable resources reset in such a way that no information flow occurs in contradiction to the security policy
error recovery: identification of situations from which recovery is necessary and invocation of an appropriate action
continuity of service: identification of functionality that must be available in the system and what degree of delay or loss (if any) can be tolerated
data communication security: peer entity authentication, control of access to communications resources, data confidentiality, data integrity, data origin authentication, and nonrepudiation

Note that the first five of these eight functions closely resemble the U.S. TCSEC, but the last three move into entirely new areas: integrity of data, availability, and a range of communications concerns.

Like the U.S. DoD, GISA did not expect ordinary users (that is, those who were not security experts) to select appropriate sets of security functions, so ten functional classes were defined. Classes F1 through F5 corresponded closely to the functionality requirements of U.S. classes C1 through B3. (Recall that the functionality requirements of class A1 are identical to those of B3.) Class F6 was for high data and program integrity requirements, class F7 was appropriate for high availability, and classes F8 through F10 relate to data communications situations. The German method addressed assurance by defining eight quality levels, Q0 through Q7, corresponding roughly to the assurance requirements of U.S. TCSEC levels D through A1, respectively. For example,

The evaluation of a Q1 system is merely intended to ensure that the implementation more or less enforces the security policy and that no major errors exist.
The goal of a Q3 evaluation is to show that the system is largely resistant to simple penetration attempts.
To achieve assurance level Q6, it must be formally proven that the highest specification level meets all the requirements of the formal security policy model. In addition, the source code is analyzed precisely.

These functionality classes and assurance levels can be combined in any way, producing potentially 80 different evaluation results, as shown in Table 5-8. The region in the upper-right portion of the table represents requirements in excess of U.S. TCSEC requirements, showing higher assurance requirements for a given functionality class. Even though assurance and functionality can be combined in any way, there may be limited applicability for a low-assurance, multilevel system (for example, F5, Q1) in usage. The Germans did not assert that all possibilities would necessarily be useful, however.

Table 5-8. Relationship of German and U.S. Evaluation Criteria.
	Q0	Q1	Q2	Q3	Q4	Q5	Q6	Q7
F1		=U.S.C1						Beyond U.S.A1
F2			=U.S.C2					Beyond U.S.A1
F3				=U.S.B1				Beyond U.S.A1
F4					=U.S.B2			Beyond U.S.A1
F5						=U.S.B3	=U.S.A1	Beyond U.S.A1
F6	New functional class
F7	New functional class
F8	New functional class
F9	New functional class
F10	New functional class

Another significant contribution of the German approach was to support evaluations by independent, commercial evaluation facilities.

British Criteria

The British criteria development was a joint activity between the U.K. Department of Trade and Industry (DTI) and the Ministry of Defence (MoD). The first public version, published in 1989 [DTI89a], was issued in several volumes.

The original U.K. criteria were based on the "claims" language, a metalanguage by which a vendor could make claims about functionality in a product. The claims language consisted of lists of action phrases and target phrases with parameters. For example, a typical action phrase might look like this:

This product can [not] determine … [using the mechanism described in paragraph n of this document] …

The parameters product and n are, obviously, replaced with specific references to the product to be evaluated. An example of a target phrase is

… the access-type granted to a [user, process] in respect of a(n) object.

These two phrases can be combined and parameters replaced to produce a claim about a product.

This access control subsystem can determine the read access granted to all subjects in respect to system files.

The claims language was intended to provide an open-ended structure by which a vendor could assert qualities of a product and independent evaluators could verify the truth of those claims. Because of the generality of the claims language, there was no direct correlation of U.K. and U.S. evaluation levels.

In addition to the claims language, there were six levels of assurance evaluation, numbered L1 through L6, corresponding roughly to U.S. assurance C1 through A1 or German Q1 through Q6.

The claims language was intentionally open-ended because the British felt it was impossible to predict which functionality manufacturers would choose to put in their products. In this regard, the British differed from Germany and the United States, who thought manufacturers needed to be guided to include specific functions with precise functionality requirements. The British envisioned certain popular groups of claims being combined into bundles that could be reused by many manufacturers.

The British defined and documented a scheme for Commercial Licensed Evaluation Facilities (CLEFs) [DTI89b], with precise requirements for the conduct and process of evaluation by independent commercial organizations.

Other Activities

As if these two efforts were not enough, Canada, Australia, and France were also working on evaluation criteria. The similarities among these efforts were far greater than their differences. It was as if each profited by building upon the predecessors' successes.

Three difficulties, which were really different aspects of the same problem, became immediately apparent.

Comparability. It was not clear how the different evaluation criteria related. A German F2/E2 evaluation was structurally quite similar to a U.S. C2 evaluation, but an F4/E7 or F6/E3 evaluation had no direct U.S. counterpart. It was not obvious which U.K. claims would correspond to a particular U.S. evaluation level.
Transferability. Would a vendor get credit for a German F2/E2 evaluation in a context requiring a U.S. C2? Would the stronger F2/E3 or F3/E2 be accepted?
Marketability. Could a vendor be expected to have a product evaluated independently in the United States, Germany, Britain, Canada, and Australia? How many evaluations would a vendor support? (Many vendors suggested that they would be interested in at most one because the evaluations were costly and time consuming.)

For reasons including these problems, Britain, Germany, France, and the Netherlands decided to pool their knowledge and synthesize their work.

ITSEC: Information Technology Security Evaluation Criteria

In 1991 the Commission of the European Communities sponsored the work of these four nations to produce a harmonized version for use by all European Union member nations. The result was a good amalgamation.

The ITSEC preserved the German functionality classes F1F10, while allowing the flexibility of the British claims language. There is similarly an effectiveness component to the evaluation, corresponding roughly to the U.S. notion of assurance and to the German E0E7 effectiveness levels.

A vendor (or other "sponsor" of an evaluation) has to define a target of evaluation (TOE), the item that is the evaluation's focus. The TOE is considered in the context of an operational environment (that is, an expected set of threats) and security enforcement requirements. An evaluation can address either a product (in general distribution for use in a variety of environments) or a system (designed and built for use in a specified setting). The sponsor or vendor states the following information:

system security policy or rationale: why this product (or system) was built
specification of security-enforcing functions: security properties of the product (or system)
definition of the mechanisms of the product (or system) by which security is enforced
a claim about the strength of the mechanisms
the target evaluation level in terms of functionality and effectiveness

The evaluation proceeds to determine the following aspects:

suitability of functionality: whether the chosen functions implement the desired security features
binding of functionality: whether the chosen functions work together synergistically
vulnerabilities: whether vulnerabilities exist either in the construction of the TOE or how it will work in its intended environment
ease of use
strength of mechanism: the ability of the TOE to withstand direct attack

The results of these subjective evaluations determine whether the evaluators agree that the product or system deserves its proposed functionality and effectiveness rating.

Significant Departures from the Orange Book

The European ITSEC offers the following significant changes compared with the Orange Book. These variations have both advantages and disadvantages, as listed in Table 5-9.

Table 5-9. Advantages and Disadvantages of ITSEC Approach vs. TCSEC.
Quality	Advantages of ITSEC over TCSEC	Disadvantages of ITSEC Compared with TCSEC
New functionality requirement classes	Surpasses traditional confidentiality focus of TCSEC Shows additional areas in which products are needed	Complicates user's choice
Decoupling of features and assurance	Allows low-assurance or high-assurance product	Requires user sophistication to decide when high assurance is needed Some functionality may inherently require high assurance but not guarantee receiving it
Permitting new feature definitions; independence from specific security policy	Allows evaluation of any kind of security-enforcing product Allows vendor to decide what products the market requires	Complicates comparison of evaluations of differently described but similar products Requires vendor to formulate requirements to highlight product's features Preset feature bundles not necessarily hierarchical
Commercial evaluation facilities	Subject to market forces for time, schedule, price	Government does not have direct control of evaluation Evaluation cost paid by vendor

U.S. Combined Federal Criteria

In 1992, partly in response to other international criteria efforts, the United States began a successor to the TCSEC, which had been written over a decade earlier. This successor, the Combined Federal Criteria [NSA92], was produced jointly by the National Institute for Standards and Technology (NIST) and the National Security Agency (NSA) (which formerly handled criteria and evaluations through its National Computer Security Center, the NCSC).

The team creating the Combined Federal Criteria was strongly influenced by Canada's criteria [CSS93], released in draft status just before the combined criteria effort began. Although many of the issues addressed by other countries' criteria were the same for the United States, there was a compatibility issue that did not affect the Europeans, namely, the need to be fair to vendors that had already passed U.S. evaluations at a particular level or that were planning for or in the middle of evaluations. Within that context, the new U.S. evaluation model was significantly different from the TCSEC. The combined criteria draft resembled the European model, with some separation between features and assurance.

The Combined Federal Criteria introduced the notions of security target (not to be confused with a target of evaluation, or TOE) and protection profile. A user would generate a protection profile to detail the protection needs, both functional and assurance, for a specific situation or a generic scenario. This user might be a government sponsor, a commercial user, an organization representing many similar users, a product vendor's marketing representative, or a product inventor. The protection profile would be an abstract specification of the security aspects needed in an information technology (IT) product. The protection profile would contain the elements listed in Table 5-10.

Table 5-10. Protection Profile.
Rationale
Protection policy and regulations
Information protection philosophy
Expected threats
Environmental assumptions
Intended use
Functionality
Security features
Security services
Available security mechanisms (optional)
Assurance
Profile-specific assurances
Profile-independent assurances
Dependencies
Internal dependencies
External dependencies

In response to a protection profile, a vendor might produce a product that, the vendor would assert, met the requirements of the profile. The vendor would then map the requirements of the protection profile in the context of the specific product onto a statement called a security target. As shown in Table 5-11, the security target matches the elements of the protection profile.

Table 5-11. Security Target.
Rationale
Implementation fundamentals
Information protection philosophy
Countered threats
Environmental assumptions
Intended use
Functionality
Security features
Security services
Security mechanisms selected
Assurance
Target-specific assurances
Target-independent assurances
Dependencies
Internal dependencies
External dependencies

The security target then becomes the basis for the evaluation. The target details which threats are countered by which features, to what degree of assurance and using which mechanisms. The security target outlines the convincing argument that the product satisfies the requirements of the protection profile. Whereas the protection profile is an abstract description of requirements, the security target is a detailed specification of how each of those requirements is met in the specific product.

The criteria document also included long lists of potential requirements (a subset of which could be selected for a particular protection profile), covering topics from object reuse to accountability and from covert channel analysis to fault tolerance. Much of the work in specifying precise requirement statements came from the draft version of the Canadian criteria.

The U.S. Combined Federal Criteria was issued only once, in initial draft form. After receiving a round of comments, the editorial team announced that the United States had decided to join forces with the Canadians and the editorial board from the ITSEC to produce the Common Criteria for the entire world.

Common Criteria

The Common Criteria [CCE94, CCE98] approach closely resembles the U.S. Federal Criteria (which, of course, was heavily influenced by the ITSEC and Canadian efforts). It preserves the concepts of security targets and protections profiles. The U.S. Federal Criteria were intended to have packages of protection requirements that were complete and consistent for a particular type of application, such as a network communications switch, a local area network, or a stand-alone operating system. The example packages received special attention in the Common Criteria.

The Common Criteria defined topics of interest to security, shown in Table 5-12. Under each of these classes, they defined families of functions or assurance needs, and from those families, they defined individual components, as shown in Figure 5-24.

Table 5-12. Classes in Common Criteria.
Functionality	Assurance
Identification and authentication	Development
Trusted path	Testing
Security audit	Vulnerability assessment
Invocation of security functions	Configuration management
User data protection	Life-cycle support
Resource utilization	Guidance documents
Protection of the trusted security functions	Delivery and operation
Privacy
Communication

Figure 5-24. Classes, Families, and Components in Common Criteria.

Individual components were then combined into packages of components that met some comprehensive requirement (for functionality) or some level of trust (for assurance), as shown in Figure 5-25.

Figure 5-25. Functionality or Assurance Packages in Common Criteria.

Finally, the packages were combined into requirements sets, or assertions, for specific applications or products, as shown in Figure 5-26.

Figure 5-26. Protection Profiles and Security Targets in Common Criteria.

Summary of Evaluation Criteria

The criteria were intended to provide independent security assessments in which we could have some confidence. Have the criteria development efforts been successful? For some, it is too soon to tell. For others, the answer lies in the number and kinds of products that have passed evaluation and how well the products have been accepted in the marketplace.

Evaluation Process

We can examine the evaluation process itself, using our own set of objective criteria. For instance, it is fair to say that there are several desirable qualities we would like to see in an evaluation, including the following:

Extensibility. Can the evaluation be extended as the product is enhanced?
Granularity. Does the evaluation look at the product at the right level of detail?
Speed. Can the evaluation be done quickly enough to allow the product to compete in the marketplace?
Thoroughness. Does the evaluation look at all relevant aspects of the product?
Objectivity. Is the evaluation independent of the reviewer's opinions? That is, will two different reviewers give the same rating to a product?
Portability. Does the evaluation apply to the product no matter what platform the product runs on?
Consistency. Do similar products receive similar ratings? Would one product evaluated by different teams receive the same results?
Compatibility. Could a product be evaluated similarly under different criteria? That is, does one evaluation have aspects that are not examined in another?
Exportability. Could an evaluation under one scheme be accepted as meeting all or certain requirements of another scheme?

Using these characteristics, we can see that the applicability and extensibility of the TCSEC are somewhat limited. Compatibility is being addressed by combination of criteria, although the experience with the ITSEC has shown that simply combining the words of criteria documents does not necessarily produce a consistent understanding of them. Consistency has been an important issue, too. It was unacceptable for a vendor to receive different results after bringing the same product to two different evaluation facilities or to one facility at two different times. For this reason, the British criteria documents stressed consistency of evaluation results; this characteristic was carried through to the ITSEC and its companion evaluation methodology, the ITSEM. Even though speed, thoroughness, and objectivity are considered to be three essential qualities, in reality evaluations still take a long time relative to a commercial computer product delivery cycle of 6 to 18 months.

Criteria Development Activities

Evaluation criteria continue to be developed and refined. If you are interested in doing evaluations, in buying an evaluated product, or in submitting a product for evaluation, you should follow events closely in the evaluation community. You can use the evaluation goals listed above to help you decide whether an evaluation is appropriate and which kind of evaluation it should be.

It is instructive to look back at the evolution of evaluation criteria documents, too. Figure 5-27 shows the timeline for different criteria publications; remember that the writing preceded the publication by one or more years. The figure begins with Anderson's original Security Technology Planning Study [AND72], calling for methodical, independent evaluation. To see whether progress is being made, look at the dates when different criteria documents were published; earlier documents influenced the contents and philosophy of later ones.

Figure 5-27. Criteria Development Efforts.

The criteria development activities have made significant progress since 1983. The U.S. TCSEC was based on the state of best practice known around 1980. For this reason, it draws heavily from the structured programming paradigm that was popular throughout the 1970s. Its major difficulty was its prescriptive manner; it forced its model on all developments and all types of products. The TCSEC applied most naturally to monolithic, stand-alone, multiuser operating systems, not to the heterogeneous, distributed, networked environment based largely on individual intelligent workstations that followed in the next decade.

Experience with Evaluation Criteria

To date, criteria efforts have been paid attention to by the military, but those efforts have not led to much commercial acceptance of trusted products. The computer security research community is heavily dominated by defense needs because much of the funding for security research is derived from defense departments. Ware [WAR95] points out the following about the initial TCSEC:

It was driven by the U.S. Department of Defense.
It focused on threat as perceived by the U.S. Department of Defense.
It was based on a U.S. Department of Defense concept of operations, including cleared personnel, strong respect for authority and management, and generally secure physical environments.
It had little relevance to networks, LANs, WANs, Internets, client-server distributed architectures, and other more recent modes of computing.

When the TCSEC was introduced, there was an implicit contract between the U.S. government and vendors, saying that if vendors built products and had them evaluated, the government would buy them. Anderson [AND82] warned how important it was for the government to keep its end of this bargain. The vendors did their part by building numerous products: KSOS, PSOS, Scomp, KVM, and Multics. But unfortunately, the products are now only of historical interest because the U.S. government did not follow through and create the market that would encourage those vendors to continue and other vendors to join. Had many evaluated products been on the market, support and usability would have been more adequately addressed, and the chance for commercial adoption would have been good. Without government support or perceived commercial need, almost no commercial acceptance of any of these products has occurred, even though they have been developed to some of the highest quality standards.

Schaefer [SCH04a] gives a thorough description of the development and use of the TCSEC. In his paper he explains how the higher evaluation classes became virtually unreachable for several reasons, and thus the world has been left with less trustworthy systems than before the start of the evaluation process. The TCSEC's almost exclusive focus on confidentiality would have permitted serious integrity failures (as obliquely described in [SCH89b]).

On the other hand, some major vendors are actively embracing low and moderate assurance evaluations: As of May 2006, there are 78 products at EAL2, 22 at EAL3, 36 at EAL4, 2 at EAL5 and 1 at EAL7. Product types include operating systems, firewalls, antivirus software, printers, and intrusion detection products. (The current list of completed evaluations (worldwide) is maintained at www.commoncriteriaportal.org.) Some vendors have announced corporate commitments to evaluation, noting that independent evaluation is a mark of quality that will always be a stronger selling point than so-called emphatic assertion (when a vendor makes loud claims about the strength of a product, with no independent evidence to substantiate those claims). Current efforts in criteria-writing support objectives, such as integrity and availability, as strongly as confidentiality. This approach can allow a vendor to identify a market niche and build a product for it, rather than building a product for a paper need (that is, the dictates of the evaluation criteria) not matched by purchases. Thus, there is reason for optimism regarding criteria and evaluations. But realism requires everyone to accept that the marketnot a criteria documentwill dictate what is desired and delivered. As Sidebar 5-7 describes, secure systems are sometimes seen as a marketing niche: not part of the mainstream product line, and that can only be bad for security.

It is generally believed that the market will eventually choose quality products. The evaluation principles described above were derived over time; empirical evidence shows us that they can produce high-quality, reliable products deserving our confidence. Thus, evaluation criteria and related efforts have not been in vain, especially as we see dramatic increases in security threats and the corresponding increased need for trusted products. However, it is often easier and cheaper for product proponents to speak loudly than to present clear evidence of trust. We caution you to look for solid support for the trust you seek, whether that support be in test and review results, evaluation ratings, or specialized assessment.

Sidebar 5-7: Security as an Add-On

In the 1980s, the U.S. State Department handled its diplomatic office functions with a network of Wang computers. Each American embassy had at least one Wang system, with specialized word processing software to create documents, modify them, store and retrieve them, and send them from one location to another. Supplementing Wang's office automation software was the State Department's own Foreign Affairs Information System (FAIS).

In the mid-1980s, the State Department commissioned a private contractor to add security to FAIS. Diplomatic and other correspondence was to be protected by a secure "envelope" surrounding sensitive materials. The added protection was intended to prevent unauthorized parties from "opening" an envelope and reading the contents.

To design and implement the security features, the contractor had to supplement features offered by Wang's operating system and utilities. The security design depended on the current Wang VS operating system design, including the use of unused words in operating system files. As designed and implemented, the new security features worked properly and met the State Department requirements. But the system was bound for failure because the evolutionary goals of VS were different from those of the State Department. That is, Wang could not guarantee that future modifications to VS would preserve the functions and structure required by the contractor's security software. Eventually, there were fatal clashes of intent and practice.