6.7 Java Language and Verification Algorithm | Programming for the Javaв„ў Virtual Machine

The most common programming language for writing Java virtual machine programs is Java. One thing the language guarantees is that anything you can say in Java can be translated into virtual machine code that will pass the verification algorithm.

The verification rules state that the arithmetic instructions (iadd, ddiv, etc.) always take two values of the same type (two ints, two doubles, etc.). Say you have the following Java program fragment:

 int i = 70;              // Call this variable 1 float j = 111.1;         // Call this variable 2 double k = i+j;          // Call this variable 3 (and 4)

A naïve (and invalid) translation into bytecodes would be

 bipush 70        ; Initialize i (variable 1) to 70 istore_1 ldc 111.1        ; Initialize j (variable 2) to 111.1 fstore_1 iload_1          ; Push the integer 70 fload_2          ; Push the float 111.1 fadd             ; Add two floats (This is an error!) dstore_3         ; Store it in variables 3 and 4 (Another error!)

The verification algorithm rejects this, because it attempts to apply a fadd instruction when the top of the stack contains an int and a float instead of two floats. The attempt to store a float value with a dstore instruction is also invalid.

The Java language has a complex set of implicit and explicit type conversion rules, described in chapter 5 of The Java Programming Language Specification. One of those rules says that when you apply an arithmetic operator to a float and an int, you must first convert the int into a float. This is done automatically by the compiler. A proper Java compiler would instead generate

 bipush 70        ; Initialize i (variable 1) to 70 istore_1 ldc 111.1        ; Initialize j (variable 2) to 111.1 fstore_1 iload_1          ; Push the integer 70 i2f              ; Convert the integer to a float fload_2          ; Push the float 111.1 fadd             ; Now adding two floats is legal f2d              ; Convert the result to a double dstore_3         ; Store it in variable 3

The verification algorithm has no problem with this code.

The verification rules also require that no matter how you get to a particular instruction, whether by falling through from the previous instruction or by use of a goto or one of the if instructions, the stack will always have the same structure. A Java compiler can make sure that this is true by following two rules of its own:

Each Java statement translates so that it leaves nothing on the stack.
You can branch only to the beginning of a statement.

To see how these two rules help the compiler enforce the JVM verification algorithm rules, consider the following code from a class called Parser:

 if(isLegalToken)     getNextToken(); else     printErrorMessage("Invalid token");

with the following declarations

 static int getNextToken(); static void printErrorMessage();

A Java compiler generates code like

 line1: getstatic Parser/isLegalToken Z line2: ifeq l6                 ; if(isLegalToken)                                ; getNextToken(); line3: invokestatic Parser/getNextToken ()I line4: pop line5: goto l8 line6: ldc "Invalid token"     ; else                                ; printErrorMessage("Invalid                                   token"); line7: invokestatic Parser/printErrorMessage            (Ljava/lang/String;)V line8:                         ; The method continues here

The statement getnextToken(); translates into line3 and line4. The statement printErrorMessage("Invalid Token") translates into lines line6 and line7.

The call to getNextToken leaves its result on the stack (an int). The instruction at line4 pops that value off the stack, since the program doesn't use the value. The program pushes a message onto the stack on line6, but the call to printErrorMessage pops it off and leaves nothing in its place. In each case, the net effect is an empty stack.

Both branches of the if end up at line8 with nothing on the stack. That means that the if statement as a whole also works as a statement. You could put this if statement inside another if or inside a for loop. Since its net effect on each statement on the stack is zero, you'll never violate the virtual machine's rules.

A Java compiler is free to generate any virtual machine code it wants to, as long as the resulting code does the same thing as the original Java program. By breaking these rules in carefully controlled ways a compiler can generate much faster code with the same thing, without compromising safety.

6.7.1 Fooling the Virtual Machine with a Java Compiler

There are a number of ways to fool a Java compiler into generating code that will fail to verify. One of the verification rules states that references to methods and fields in other classes must be able to be resolved. If you go back and alter those class files after the Java compiler has turned your Java source into a class file, then that class file will fail to verify when you load it.

At the time the Java compiler wrote the code, it followed its own rules. However, by changing some Java code without recompiling all of it, you can potentially make changes that cause the program as a whole to break. That's one of the reasons the verification algorithm is there. Besides catching malicious code, it can also catch mistakes when you accidentally change something, even if you haven't recompiled those things that need to be compiled.

6.7.2 Other Languages and the Verification Algorithm

Java isn't the only language that can be successfully translated into virtual machine code. In chapter 11 we'll see how various non-Java language constructs can be translated into verifiable machine code. In chapters 12 and 13 we'll look in detail at Scheme and Prolog and produce working compilers that generate verifiable code.