What you'd really like is for the verification algorithm to prove that your program does what you meant it to do, but because the computer can't read your mind (yet), this isn't possible. Failing that, you'd like the verification algorithm to reject any programs that could do something illegal, like overflow the stack or apply the wrong type of instruction to a value, and accept all of those that won't. Unfortunately, this guarantee is also a little bit too strong to make. (See the sidebar "Decidability and Java Virtual Machine Verification.")
In the end, you have to settle for the somewhat weaker promise that it will reject any class that may do something illegal but it may also reject some safe classes. The verification algorithm asks these questions:
The first requirement guarantees that each instruction will find the appropriate operands on the stack. The second requirement guarantees that the stack cannot overflow or underflow by adding or subtracting elements in a loop. To test these conditions, the verification algorithm draws a picture of what types the stack and local variables will hold at the beginning of each instruction. In forthcoming sections, we shall go over some examples. 6.5.1 Example 1: Hello, WorldThe "Hello, world" program in Oolong looks like this: .method static public main([Ljava/lang/String;)V .limit stack 2 .limit locals 1 getstatic java/lang/System/out ; Push the output stream Ljava/io/PrintStream; ldc "Hello, world" ; Set the argument invokevirtual java/io/PrintStream/println ; Call the function (Ljava/lang/String;)V return ; Return .end method The verification algorithm can prove that this program is safe by drawing pictures to show what the stack must look like before each instruction: The diagram shows what the stack looks like before each instruction. You always start at the beginning of the method with an empty stack. The getstatic instruction doesn't take anything off the stack, and it pushes a single value onto the stack. The type of that value is given by the last argument to the getstatic instruction. In this case, it's a PrintStream, so the resulting stack picture has just a PrintStream on it. The new stack picture is the input to the ldc instruction. The ldc instruction doesn't take anything off the stack, either. It pushes one item, the type of which is determined by the type of the argument. In this case, the argument is a String. Now the stack has two elements on it, which is the limit for this method. The next instruction is invokevirtual. It is expecting to find two operands on the stack: a PrintStream and a String, as described by the method descriptor. Since the return type is V, nothing is left on the stack. The stack is now empty. Notice that the verification algorithm doesn't have to actually trace into the method to determine its effect on the stack picture. The verification algorithm can determine the method invocation's effect on the stack just by looking at the method descriptor. When the system loads a class, it may delay loading classes that appear in field or method descriptors. The verification algorithm makes the temporary assumption that the other classes actually have all the fields and methods referenced in the code and that they have the appropriate descriptors. Later, the system may be forced to load the other classes. This happens when an instance is created or when it is used as the superclass of some other class that is loaded or when a static field or method is used. When these classes are actually loaded, they are checked to make sure that they actually have all the methods and fields required by other classes already loaded. If not, the system throws an exception, which prevents that class from being used. This delayed loading improves performance. If loading had to be done as soon as a class was loaded, the system would appear to hang until the classes were loaded. By delaying class loading, the system can spread out the load, which improves overall response times. The last instruction in this example is a return. This instruction doesn't do anything to the stack; it just returns from the method. It is important to check that the method uses the right kind of return for the descriptor of the method. Since main returns void and return is the right kind of return for void, everything is good. The stack does not have to be empty after returning from a method, but many programs do leave an empty stack. That's because JVM code is often arranged so that each unit of code expects an empty stack and leaves an empty stack, making it easy to combine units without risking violating stack constraints. Most Java compilers work this way. For more details, see chapter 10. 6.5.2 Example 2: gotos and ifsExample 1 omitted some important details, like how to handle gotos, ifs, and local variables. We explain some of them with a slightly more complicated example: a loop that adds up an array of numbers. .method public static addit ([I)V .limit stack 2 .limit locals 3 ; Variables: ; 0 is the array we're adding up (the parameter to the method) ; 1 is the running total ; 2 is a loop counter iconst_0 ; Initialize the total to 0 istore_1 iconst_0 ; Initialize the loop counter to 0 istore_2 loop: aload_0 ; Compute the length of the array arraylength iload_2 ; Test if the loop counter is greater than ; the length of the array if_icmpge end ; If it is, go to the end of the method ; Otherwise, do the body of the loop body: aload_0 ; Push the array iload_2 ; Push the counter iaload ; Take the counter'th element of the array iload_1 ; Push the total iadd ; Add that number to the running total istore_1 ; Store the total back into variable 1 iinc 2 1 ; Increment the counter goto loop ; Repeat the loop end: return This time there are local variables as well as a stack. A local variable picture looks a lot like the stack picture. The only difference is that instead of growing and shrinking like the stack, all the variables are allocated beforehand, though they may be uninitialized. Uninitialized variables are represented by an empty box. As before, we start at the top of the method with an empty stack. There are three variables. The first is automatically initialized to the first parameter; the others are uninitialized. The first seven lines are straightforward: elements are loaded onto the stack, then stored into local variables. The code, annotated with stack and local variable pictures, looks like this: The if_icmpge instruction pops two elements off the stack. After executing it, the stack is empty. Control will go to either body (if the test fails) or end (if it succeeds). You have to annotate both instructions with the new stack picture: You must check both paths, but it doesn't matter which one you choose to do first. In this example, we'll check the path through body first. We proceed as before until we hit the goto instruction. The picture now looks like this: At this point, the goto before end says that control returns to loop. Since we've already been to loop, we need to make sure that the stack and local variable pictures currently at loop are identical with the picture after executing the goto. They are; the stack is empty both times, and the local variable pictures are identical. Any flow of control that made it to the goto could continue at loop and be sure that the stack and variables will always contain the proper types. Since we have already checked the instruction at loop, we don't need to check any further down that flow of control. Actually, the stack pictures do not have to be identical. However, they must be compatible. See section 6.5.4 for an example. We still need to go back to the other instruction we marked, at end. This instruction is a return. Since it's compatible with the return type of this method (void), there are no errors. The verification algorithm affirms that this is a valid method. 6.5.3 Example 3: Code That Doesn't VerifyPerhaps a better way to see what's happening is to look at a method that the verification algorithm rejects. Here's an example, a fragment of code that is intended to produce a stack containing the numbers 0 through 4. ; Loop 5 times. Each time, push local var 0 onto the stack iconst_5 ; Initialize variable 0 to 5 istore 0 loop: iinc 0 -1 ; Decrement variable 0 iload 0 ; Push it dup ; Make a copy, leaving the number on the stack ifeq break ; When we reach 0, break the loop goto loop ; Otherwise, loop again break: ; More instructions Here is an analysis of the code up to the goto. We have annotated both the instruction at break and the goto with the picture that results after executing the ifeq: a single int on the stack. The ifeq instruction branches to break if the top of the stack is 0; otherwise, it continues at the goto. The goto is supposed to go back to the instruction at loop, but there's a problem. The last time we looked at the instruction at loop, the stack was empty. This time, there's an int on the stack. The verification algorithm will reject this method, because it can't be sure of a constant stack height. If it permitted this method to execute, the stack would grow by 1 each time. Eventually, the program might overflow the stack. Because you don't want that to happen, the verification algorithm rejects this code. 6.5.4 Example 4: Dealing with SubclassesOne more complication: it isn't necessary for two stack pictures to be identical when two different flows of control come to the same place. Here's a (somewhat contrived) example. The example depends on three classes: abstract class Person { abstract void printName(); } class Programmer { void printName() {/* Implementation goes here */ } } class Author { void printName() {/* Implementation goes here */ } } The code we wish to verify is .method public static print(ZLProgrammer;LAuthor;)V iload_0 ; Is the boolean false? ifeq false ; If not, true: aload_1 ; then push the programmer goto print false: aload_2 ; Otherwise, push the author print: invokevirtual Person/printName ()V ; Call printName on the Person ; This works whether it's an ; Author or a Programmer, ; since each is a Person done: return .end method This method takes three arguments: a boolean control, a Programmer, and an Author. If the control is true then it prints the name of the Programmer. Otherwise, it prints the name of the Author. The program arrives at print with either the Programmer or the Author on the stack. It is impossible to predict which will be there, but either way the value is certainly a Person. After executing the iload_0 on line 1, the program proceeds to either true or false, depending on the value. The two flows of control come together at done. In the next diagram, we have traced through the entire flow of control in the case that the ifeq succeeds, and control goes to false. This means that the program arrives at print with an Author on top of the stack. The invokevirtual at print is legal, because Author is a subclass of Person. This leaves an empty stack picture at done, which is fine for a return. Since the method returns void, the return instruction is valid, and we're done checking this flow of control. Returning to the mark at true, we begin with an empty stack. The aload_1 pushes the Programmer onto the stack. The picture now looks like Figure 6.3. (The local variable picture is shown only once, because it does not change in this code.) Figure 6.3. A possible conflict at printWhen we get to the mark in Figure 6.3, there is a slight hitch. We must go to print, but we've already visited print and the stack pictures are not identical: one has a Programmer and the other an Author. The print problem is resolved by unifying the two stack pictures. You can only unify two stack pictures that are the same height; stacks of differing height violate the rules, and cause verification failure. In Figure 6.3 at print, the stack height is 1 in both stack pictures. We unify the two stack pictures by taking each corresponding element and building a new picture that uses the most derived type that is super to both of them. We do that by listing the superclasses of each type in order and taking the lowest item that is the same on each list. This is what it looks like for Programmer and Author: The most specific type common to both Programmer and Author is Person. Eventually, every class derives from Object, so you can always unify two reference types. Now we have to redraw the stack picture, substituting Person for Author at print: Figure 6.4. After unifying the stack pictures at printBecause the picture at print has changed, we need to leave a mark there and continue checking. The invokevirtual instruction here is still legal, because the stack contains a Person. The new stack picture is empty. Control proceeds to done with an empty stack picture. Because we've already been to done, we need to unify the new stack picture with the old one. Since both pictures are empty, unification is trivial, and the unified picture is identical to the old one. This means that we don't need to check any further. Since there are no more marks and we haven't found any problems, the verification algorithm decides that this method is okay. Everything said here about the stack picture also applies to the local variable picture. Local variable pictures are unified the same way: find the least common supertype of the classes and recheck if anything is different. If a variable slot is uninitialized in either local variable picture, then it will be uninitialized in the unified picture. Note that you can't unify two non-reference types. You can't unify a float and an int or a float and a double. This code is illegal: .method public static print(ZIF)V iload_0 ; Is the boolean false? ifeq false ; If not, iload_1 ; then push the int goto done false: fload_2 ; Otherwise, push the float done: ; ILLEGAL! Can't unify the stack pictures here return .end method This example is identical to the last example, but it uses int and float in place of Programmer and Author. At done we would have to unify the stack containing a float with the stack containing an int. Because int and float are numeric types, not reference types, there is no unification between the two. The verification algorithm will reject this method.
6.5.5 AlgorithmThis algorithm summarizes verification:
While executing this algorithm, ask the following questions:
If the answer to any of these questions is "no," then the method is invalid. This algorithm is incomplete. For example, it does not describe what to do with ret and jsr instructions, or how to deal with exception handlers. For the complete algorithm, see The Java Virtual Machine Specification, section 4.9. |