The Java Bytecode | Practical Java Game Programming (Charles River Media Game Development)

A Java compiler such as javac.exe is responsible for translating source files written in the Java language to a machine language, or, more specifically, the Java VM’s language. The instructions or bytecodes outputted by the compiler are not dependent on the Java language. In fact, many compilers can translate source files written in other languages to Java VM’s language. The VM’s language is essentially an assembly language for a stack machine.

Every JVM instruction contains an opcode that specifies the operation that must be performed. The 8-bit opcode can be any of the predefined opcodes recognized by the VM. The JVM has more than 200 but less than 256 opcodes, which means that a bytecode needs only 8 bits to store the opcode. Most opcodes either read from or write to the operand stack. Some manipulate local variables, and others rely on the constant pool table to accomplish a task.

Bytecode Examples

The following are the bytecodes of a simple method that simply adds the value of two variables:

public static void arithmetic(){     int var1 = 10;     int var2 = var1;     var1 = var1 + var2; } 0:   bipush  10 2:   istore_0 3:   iload_0 4:   istore_1 5:   iload_0 6:   iload_1 7:   iadd 8:   istore_0 9:   return

The bytecodes of the method are rather straightforward. Before we go over the details, recall that every Java frame has a local variables array that stores the values of all the local variables. If a method has two local variables, the local variables array will have the length of two. If the method is nonstatic, the length of the array will be three and the entry at index zero will be the reference to the object whose method was invoked. The reference corresponds to the this keyword. The length of the array is determined at compile time and stored in the max_locals field of the code_attribute structure shown earlier.

The instruction or bytecode at address zero pushes the constant 10 onto the stack using bipush. Then istore_0 pops an integer from the stack and stores it in local variable at index zero, which corresponds to var1. Then the value of var1 is loaded onto the stack. Instruction istore_1, which is at address four, pops the value from the stack and saves it in var2. iload_0 and iload_1 push the values of var1 and var2 onto the stack, respectively. iadd pops two integers from the stack, adds them, and pushes the result back onto the stack. The value is then popped from the stack using the istore_0 instruction, which is the instruction at address eight, and stored in var1. The return instruction tells the VM that we are done in this method. Note that throughout this method the depth of the operand stack does not grow beyond two.

As you might have noticed, iload, istore, and iadd instructions all perform integer operations. If you want to add two longs together, you must use lload, lstore, and ladd instead. Similar instructions are available for other types, such as floats and doubles. If you have played with any assembly code before, you may find the redundancy of some of the instructions to be odd. Because the operand stack must store values of different lengths, the VM has to have type-specific opcodes so that it knows how many bytes to pop off the stack. For example, an integer occupies four bytes in the stack whereas a long needs eight. Therefore, each instruction must know how many bytes of the operand stack are occupied by its operand. It is important to note that integer values are the VM’s favorite type. For example, a number of conditional branch opcodes are specific to integers and not available for other types. Another example is the iinc opcode, which increments a local variable of type integer without having to explicitly load the value onto the stack, add a value to it, and then store the result. Such operations are more efficient in terms of both memory and speed because the length of a method is reduced. The boolean type is arguably the VM’s least favorite type. In fact, the VM does not even recognize the boolean type. The Java compilers are responsible for converting booleans to integers. The same is partially true for bytes, chars, and shorts. They are converted to integers and treated as integers. This behavior is in part because if every type had its own instruction sets, there would be over 256 different opcodes. This means that the opcodes would have to be two bytes long, which would have a significant effect on memory consumption. Instructions such as bipush and sipush push byte and short values onto the stack while at the same time converting them to integers. Note that in the example shown earlier, the first line pushed the constant 10 using the opcode bipush. This is because the compiler realized that the number 10 fits in one byte. Even though 10 is recognized as a byte, it is pushed onto the stack such that it occupies four bytes as an integer would. Again, this allows the VM to avoid having to explicitly understand the byte type.

Another example that it is a little more involved follows. You can view the bytecodes of a method by executing javap.exe on a class that contains the method of interest.

public int factorial(int number){     int result = 1;               for(int i=2; i<=number; i++)         result *= i;       return result; }   0:   iconst_1 1:   istore_2 2:   iconst_2 3:   istore_3 4:   iload_3 5:   iload_1 6:   if_icmpgt 19 9:   iload_2 10:  iload_3 11:  imul 12:  istore_2 13:  iinc 3, 1 16:  goto 4 19:  iload_2 20:  ireturn

The first instruction loads the constant 1 onto the stack. istore_2 pops the value off the stack and stores it in the local variable at index two of the local variables array. Note that the variable at index one is number, and because the method is an instance method, the variable at index zero is initialized to store the this pointer. if_icmple at address six compares the top two integers on the stack, which are the values of the loop counter and the constant 1. If the value of the loop counter is less than or equal to 1, the program counter of the current thread will be set to 19. Note that the numbers next to each instruction are not line numbers but the address of the instructions. Because the opcode if_icmple is always followed by a short representing the target address, the instruction immediately after it is three bytes over. That is one byte used to store the opcode and two to store the short value. The following code segment is the recursive version of the factorial method, which demonstrates how method invocations work:

public int factorialRecursive(int number){     if(number == 2)         return 2;              return number * factorialRecursive(number-1); }   0:   iload_1 1:   iconst_2 2:   if_icmpne 7 5:   iconst_2 6:   ireturn 7:   iload_1 8:   aload_0 9:   iload_1 10:  iconst_1 11:  isub 12:  invokevirtual #9 15:  imul 16:  ireturn

The instruction aload_0 loads a reference from local variable zero onto the stack, which is the this pointer. Because the method is an instance method, a reference to the object whose method is being called must be pushed onto the stack before the method is invoked. The invokevirtual opcode takes the index into the constant pool entry that contains the symbolic reference to the method that should be called.

As mentioned already, some bytecodes rely on the content of the constant pool table. The constant pool table must support types such as integers, longs, floats, and doubles, as well as strings. In addition to these basic types, data such as MethodeRef_info and FieldRef_info are used to represent symbolic references to fields and methods. They both provide information about the class name and the corresponding field or method. Opcodes such as new, getfield, invokespecial, and instanceof are just a few examples of opcodes that rely on the constant pool table. When an instruction such as invokespecial is executed, the symbolic reference must be resolved before the method is invoked. That is, the symbolic reference must be validated to ensure that it is appropriate, and if it is, a reference to the method can be retrieved. After the method is resolved, a VM may choose to replace the opcode with one that uses the resolved reference thereafter.