< Day Day Up > |
Once it has been located and loaded by a ClassLoader other than the primordial class loader, a class still has another hurdle to cross before being available for execution within the JVM. At this point, we can be reasonably sure that the class file in question cannot supplant any of the core classes, cannot inveigle its way into the trusted packages, and cannot interfere with other safe classes already loaded. We cannot, however, be sure that the class itself is safe. The class might contain illegal bytecode, forge pointers to protected memory, overflow or underflow the program stack, or in some other way attempt to corrupt the integrity of the JVM and runtime. A number of factors could cause a class to be unsafe. Malicious compilers could cause a class to be unsafe. A well-behaved Java compiler produces well-behaved Java classes. There would not be any harm in running these classes within the JVM, as the Java language itself and the compiler define a high degree of safety. Unfortunately, there is no guarantee that everyone is using a well-behaved Java compiler. Hackers may be using corrupted compilers to produce bytecode designed to crash the JVM or, worse , subvert the security thereof. In fact, the source language may not have been Java in the first place; programs written in Common Business Oriented Language (COBOL) or Net Restructured Extended Executor (NetREXX) can be compiled to Java bytecode. Class editors, decompilers , and disassemblers too may cause a class to be unsafe. The Java language is like any other high-level programming language, as it is created as source code in an English-like form. Before it can be executed, the source code has to be translated into a more efficient, machine-readable format. In general, to perform this conversion, the code is either compiled converted once and stored as machine codeor interpreted converted and executed at runtime. The Java language combines these two approaches, as shown in Figure 7.10. Figure 7.10. Compilation and Interpretation of Java Programs
Before it can be used, the source code has to be compiled with a Java compiler, such as javac . This is a conventional compilation. However, the output that a Java compiler produces is not machine-specific code but instead is bytecode , a system-independent format. In order to execute, the bytecode has to be processed by an interpreter, such as java , which is part of the JVM. The bytecode is machine code written for the JVM instruction set. The JVM processes bytecode while the program is running and converts it to real machine code that it executes on the fly. The fact that Java programs are compiled to bytecode instead of to machine code makes them portable across platforms. However, bytecode is a much higher-level language. As such, it lends itself to easy attacks. For example, Java bytecode can be easily edited by using a hexadecimal class editor. Listings 7.2 and 7.3 show the source code and the corresponding bytecode, respectively, of a simple HelloWorld program. As you can see, the Java bytecode contains several pieces of information in cleartext, and whoever has an average understanding of the structure of a class file could easily compromise the behavior of the class by editing the bytecode. Listing 7.2. HelloWorld.javaclass HelloWorld { public static void main(String args[]) { System.out.println("Hello World"); } } Listing 7.3. Hello World Java Bytecode0: CA FE BA BE 00 00 00 2E 00 ID 0A 00 06 00 0F 09 Eb9<............ 10: 00 10 00 11 08 00 12 0A 00 13 00 14 07 00 15 07 ................ 20: 00 16 01 00 06 3C 69 6E 69 74 3E 01 00 03 28 29 .....<init>...() 30: 56 01 00 04 43 6F 64 65 01 00 0F 4C 69 6E 65 4E V...Code...LineH 40: 75 6D 62 65 72 54 61 62 6C 65 01 00 04 6D 61 69 umberTable...mai 50: 6E 01 00 16 28 5B 4C 6A 61 76 61 2F 6C 61 6E 67 n...([Ljava/lang 60: 2F 53 74 72 69 6E 67 3B 29 56 01 00 0A 53 6F 75 /String;)V...Sou 70: 72 63 65 46 69 6C 65 01 00 0F 48 65 6C 6C 6F 57 rceFile...HelloW 80: 6F 72 6C 64 2E 6A 61 76 61 0C 00 07 00 08 07 00 orld.Java....... 90: 17 0C 00 18 00 19 01 00 0B 48 65 6C 6C 6F 20 57 .........Hello W A0: 6F 72 6C 64 07 00 1A 0C 00 1B 00 1C 01 00 0A 48 orld...........H B0: 65 6C 6C 6F 57 6F 72 6C 64 01 00 10 6A 61 76 61 elloWorld...Java C0: 2F 6C 61 6E 67 2F 4F 62 6A 65 63 74 01 00 10 6A /lang/Object...j D0: 61 76 61 2F 6C 61 6E 67 2F 53 79 73 74 65 6D 01 ava/lang/System. E0: 00 03 6F 75 74 01 00 15 4C 6A 61 76 61 2F 69 6F ..out...Ljava/io F0: 2F 50 72 69 6E 74 53 74 72 65 61 6D 3B 01 00 13 /PrintStream;... 100: 6A 61 76 61 2F 69 6F 2F 50 72 69 6E 74 53 74 72 java/io/PrintStr 110: 65 61 6D 01 00 07 70 72 69 6E 74 6C 6E 01 00 15 eam...println... 120: 28 4C 6A 61 76 61 2F 6C 61 6E 67 2F 53 74 72 69 (Ljava/lang/Stri 130: 6E 67 3B 29 56 00 20 00 05 00 06 00 00 00 00 00 ng;)V. ......... 140: 02 00 00 00 07 00 08 00 01 00 09 00 00 00 1D 00 ................ 150: 01 00 01 00 00 00 05 2A B7 00 01 Bl 00 00 00 01 .......*...... 160: 00 0A 00 00 00 06 00 01 00 00 00 01 00 09 00 0B ................ 170: 00 0C 00 01 00 09 00 00 00 25 00 02 00 01 00 00 .........%...... 180: 00 09 B2 00 02 12 03 B6 00 04 Bl 00 00 00 01 00 .. 2 ............. 190: 0A 00 00 00 0A 00 02 00 00 00 05 00 08 00 06 00 ................ 1A0: 01 00 0D 00 00 00 02 00 0E ......... In addition to class editors, several decompilers and disassemblers can operate on Java bytecode. A decompiler can usually recreate the source code, except for the original comments; the decompiled code can then be modified and complied back into malicious Java bytecode. Although a regular compiler would refuse to compile back into bytecode a maliciously modified Java source code, a hacker can use a corrupted compiler to generate harmful bytecode. A disassembler generates pseudo-assembly code, which can be maliciously modified and reassembled back into corrupted Java bytecode. In this case, there is not even the issue of modifying the compiler to force it to produce bytecode from an illegal source code. Besides the security attacks just described, class editors, decompilers, and disassemblers can be used to perpetrate privacy and intellectual property attacks.
A break in release-to-release binary compatibility too can cause a class to be unsafe. When a new version of an API is released, programs that relied on that API may fail if some of the API members changed.
These conditions imply that the binary-code compatibility between the classes has been broken between releases. These problems exist with all forms of binary-distributable libraries. On most systems, this results in at best a system message and the application's refusing to run. At worst, the entire operating system could crash. The JVM has to perform at least as well as other systems in these circumstances and preferably better. For all these reasons, an extra stage of checking is required before executing Java code, and this is where the class file verifier comes in. After loading an untrusted class via a ClassLoader instance, the class file is handed over to the class file verifier, which attempts to ensure that the class is fit to be run, as shown in Figure 7.1 on page 204. The class file verifier is itself a part of the JVM and as such cannot be removed or overridden without replacing the JVM. 7.3.1 The Duties of the Class File VerifierAfter seeing what can make a Java class unsafe and before discussing what the class file verifier does, we want to look at the possible ways in which a class file might be unsafe. By understanding a threat, we can better understand how the Java architecture guards against it. Following are some of the things that a class file could do to compromise the integrity of the JVM:
By tagging each object with its type, the JVM could check for illegal casts. By checking the size of the stack before and after each method call, stack overflows and underflows can be caught. The JVM could also test the stack before each bytecode is executed and thus avoid illegal parameters. In fact, all these tests could be made at runtime, but the performance impact would be significant. Any work that the class file verifier can do in advance of runtime to reduce the performance burden will be done. With some idea of the magnitude of the task before the class file verifier, we now look at how it meets this challenge. 7.3.2 The Four Passes of the Class File VerifierBefore we go into any detail on how the class file verifier works, it is important to note that the Java specification requires the JVM to behave in a particular way when it encounters certain problems with class files, which is usually to throw an error and refuse to use the class. The precise implementation varies from one vendor to the next and is not specified. Thus, some vendors may make all checks prior to making a class file available; other vendors may defer some or all checks until runtime. The following process description is how Sun Microsystems' JVM works; this process has been adopted by most JVM writers, not least because it saves the effort of reinventing a complex process. The class file verifier makes four passes over the newly loaded class file, each pass examining it in closer detail. Should any of the passes find fault with the code, the class file is rejected. Not all these tests are performed prior to executing the code. The first three passes are performed prior to execution; only if the code passes the tests here will it be made available for use. The fourth pass, really a series of ad hoc tests, is performed at execution time, once the code has already started to run. 7.3.2.1 File-Integrity CheckThe first and simplest pass checks the structure of the class file. This pass ensures that the file has the appropriate signaturethe first four bytes must correspond to the hexadecimal magic number 0xCAFEBABE , as shown in the example of Listing 7.3 on page 225and that each structure within the file is of the appropriate length. This pass checks that the class file itself is neither too long nor too short and that the constant pool contains only valid entries. Of course, class files may have varying lengths, but each of the structures, such as the constant pool, has its length included as part of the file specification. If a file is too long or too short, the class file verifier throws an error and refuses to make the class available for use. 7.3.2.2 Class-Integrity CheckThe second pass performs all other checking that is possible without examining the bytecode instructions themselves . Specifically, it ensures that
Note that in this pass, no check is made as to whether fields, methods, or classes actually exist, merely that their names and signatures are legal according to the language specification. 7.3.2.3 Bytecode-Integrity CheckIn this pass, the most complex pass of the class file verifier, the bytecode verifier runs. The individual bytecodes are examined to determine how the code will behave at runtime. This examination includes data-flow analysis, stack checking, and static type checking for method arguments and bytecode operands. The bytecode verifier is responsible for checking that the bytecodes have the correct number and type of operands, that data types are not accessed illegally, that the stack is not overflowed or underflowed, and that methods are called with the appropriate parameter types. Section 7.3.3 on page 231 gives the precise details of how the bytecode verifier operates. For now, it is important to state two points.
To summarize: Any class file is in one of three categories.
Clearly, the bytecode verifier should accept those class files in category 1 and reject those in category 2. The problem arises with category 3 class files, which may or may not contain code that will cause a problem at runtime, but it is impossible from static analysis of the code to determine which. The more complex the bytecode verifier becomes, the more it can reduce the number of cases that fall in category 3, but no matter how complex the verifier, it can never completely eliminate category 3. For this reason, there will always be bytecode programs that pass verification but that may contain illegal code. This means that simply having the bytecode verifier is not enough to prevent runtime errors in the JVM and that it must perform some runtime checking of the executable code. Lest you begin panicking at this stage, you should comfort yourself with the thought that the level of verification performed by the JVM prior to executing bytecode is significantly higher than that performed by traditional runtime environments for native code: that is, none at all. 7.3.2.4 Runtime Integrity CheckAs we have hinted, the JVM must make a trade-off between security and efficiency. For that reason, the bytecode verifier does not exhaustively check for the existence of fields, methods, and classes when it performs bytecode-integrity checks. If it did, the JVM would need to load all classes required by a program prior to running it, resulting in a very heavy overhead, which may not be strictly required. We will examine the following case, which has three classes: ClassA , ClassB , and ClassC . ClassC is a subclass of ClassA . ClassB has two public methods:
The architecture of this simple scenario is shown in Figure 7.11. Figure 7.11. Runtime Integrity Check Scenario
Against this background, consider the following code snippet: ClassB b = new ClassB(); ClassA a = b.methodReturningClassA(); When it did bytecode-integrity checks, the class file verifier ascertained that methodReturningClassA() is listed in the constant pool as a method of ClassB and that it is reachable from this code because it is public. The class file verifier also checked that the return type of methodReturningClassA() is ClassA . Having made this check and assuming that the classes and methods involved do exist, the assignment statement in the second line of code is perfectly legal. The bytecode verifier does not in fact need to load and check ClassA at this point. Now consider this similar code: ClassB b = new ClassB(); ClassA a = b.methodReturningClassC(); In this case, the return type of the method call does not return an object of ClassA , but the assignment is still legal, as the method returns a subclass of ClassA . This is not, however, obvious from the code alone: The verifier would need to load the class file for the return type ClassC and check that this is indeed a subclass of ClassA . Loading this class involves a possible network access and running the class file verifier for the class, and it may well be that these lines of code are never executed in the normal course of the program's execution, in which case loading and checking the subclass would be a waste of time. For this reason, class files are loaded only when they are required, that is, when a method call is executed or a field in an object of that class is modified. This is determined at runtime, which is when the fourth pass of the verifier is executed. 7.3.3 The Bytecode Verifier in DetailThe first stage of the bytecode verifier process is identifying bytecode instructions and their arguments. This operation is completed in two passes. The first pass locates the start of each instruction and stores it in a table. Having found the start of each instruction, the verifier makes a second pass, parsing the instructions. For each instruction, this involves building a structure storing the instruction itself and its arguments. These arguments are checked for validity at this point.
Having established that the bytecodes are syntactically correct, the bytecode verifier now has the task of analyzing the runtime behavior of the code, within the limitations examined on page 233. To perform this analysis, the bytecode verifier has to keep track of two pieces of information for each instruction:
Where types are concerned in the preceding two points, the analyzer does not need to distinguish between the various normal integer types, as they all have the same internal representation. The first stage is the initialization of the data-flow analyzer.
Finally, the data-flow analyzer runs, looping through the following steps.
If the data-flow analyzer runs on the method without reporting any failures, the method has been successfully verified by the class file verifier during the bytecode integrity checks (see Section 7.3.2.3 on page 229). The bytecode verifier is a key component of Java security but can be improved by reducing its area of uncertainty. Can you eliminate uncertainty completely? Can you build a complete bytecode verifier that determines whether a program is safe before it runs? The answer is no. It is mathematically impossible. To demonstrate this, we focus on one aspect of bytecode verification: stack-underflow checking, which involves determining whether a bytecode program will underflow the stack, by removing more items from it than were ever placed on it. Then, we use the argument known as reductio ad absurdum . We assume that there is a complete stack-underflow checker and show that this assumption leads to a contradiction. This means that the assumption must have been falsea complete stack-underflow checker is impossible. Because a complete bytecode verifier must contain a complete stack-underflow checker, a complete bytecode verifier is impossible too. Suppose, then, that there is such a thing as a complete stack-underflow checker. We write a method in standard Java bytecode, which takes as its argument the name of a class file and returns
We call this method doesNotUnderflow() . Figure 7.12 offers a graphical representation of the method's functionality. Figure 7.12. Functionality of Method doesNotUnderflow()
We now consider the bytecode program Snarl, whose main() method's Java source code contains the lines in Listing 7.4. Listing 7.4. Portion of main() Method in Java Program Snarlif (doesNotUnderflow(classFileName)) while (true) pop() else { } The pop() method, which removes the top element from the stack, may not be pure Java code but can certainly be written in bytecode. The bytecode program Snarl is compiled into the class file Snarl.class . What happens if we give Snarl itself as a parameter? The first thing it does is to invoke the method doesNotUnderflow() on Snarl.class .
This contradiction means that there could never have been a method doesNotUnderflow() that worked for all class files. The quest for a way of determining statically that a class would behave itself at runtime was doomed. Complete checking for stack underflow must be done at runtime if it is to be done at all. This result can be generalized and applied to any aspect of bytecode verification for which you try to determine statically something that happens at run time. So all bytecode verifiers are incomplete. This does not, of course, mean that they are not usefulthey contribute significantly to Java securityor that they cannot be improved. It does mean, however, that some checking has to be left until runtime. 7.3.4 An Example of Class File VerificationAs an example to show the effects of class file verification and to see when classes are subjected to verification, let us consider Listing 7.5, Java method add() , which adds two integers initialized to the values 3 and 4 and returns the answer, 7 . Listing 7.5. add() Method Codestatic int add() { int a, b; a = 3; b = 4; return (a + b); } The add() method can be embedded in any Java program, such as an application, a servlet, or an enterprise bean, without any security restrictions. The same program could invoke add() through a line of code such as the following: System.out.println("3 + 4 = " + add()); This is the output that would be produced: 3 + 4 = 7 We want to use the add() method to determine when and how verification of classes occurs. To do this, we would like to modify the initialization of variable b in method add() , using a hexadecimal editor to reinitialize variable a , so that variable b is never initialized. A regular compiler would refuse to compile a program in which the values of two variables are manipulated, if one of them has not been properly initialized. Therefore, after compiling the program embedding the add() method, we need to manually alter the bytecode. To understand what should be changed, let us consider Listing 7.6, the disassembled code of method add() produced by the javap command line utility. Listing 7.6. Disassembled Code of Method add() in Class TestVerifyMethod int add() 0 iconst_3 1 istore_0 2 iconst_4 3 istore_1 4 iload_0 5 iload_1 6 iadd 7 ireturn Instruction 3 in Listing 7.6 shows an istore_1 instruction. This is the initialization of variable b and has the bytecode 3C . Figure 7.13 shows the bytecode of the add() method after we changed bytecode 3C via a hexadecimal editor to 3B , the bytecode for istore_0 , which is the same as instruction 1 in Listing 7.6 and reinitializes variable a , thereby eliminating the initialization of variable b . Figure 7.13. TestVerify Class with istore_1 Instruction Changed to istore_0
The change shown in Figure 7.13 prevents variable b from being initialized. Because method add() operates on the value of variable b , the fact that b has not been initialized implies an illegal memory access, which the class file verifier is responsible for detecting. In a Java 2 system, the only classes that are exempted from verification are those loaded by the primordial class loader, which is responsible for loading classes from the boot class path. Therefore, when the corrupted version of the bytecode is loaded from the application or extension class path or from a remote network location, it always fails verification and does not run. In fact, an attempt to run it produces a VerifyError with the message: Accessing value from uninitialized register In early Java versions, the JVM considered application code fully trusted and exempted it from class file verification. As a local application, a user class can be found only by searching the CLASSPATH system environment variable. As the current directory is always front appended to CLASSPATH , pre-Java 2 version program classes always ran as trusted and were exempted from class file verification. Running the modified version of the add() method on a JDK V1.1.6 platform produced a result similar to the following: 3 + 4 = 26246588 A result like this could still be produced on a Java 2 platform. In the J2SE reference implementation, prevent a class file from being verified by the class file verifier, the java command can be run with the -noverify option. Alternatively, the JVM can be forced to load the class files of an application from the boot class path via the primordial class loader, which does not pass loaded classes to the class file verifier. Section 7.2.2.1 on page 209 explains how to modify the boot class path. Note that the results produced by the corrupted version of the add() method when it is not verified may be different each time. The reason is that the memory for integer b is never initialized, and the add() operation simply adds 3 to whatever value happens to be left from some previous use of that memory location. |
< Day Day Up > |