6.3 Arithmetic Expressions

Probably the biggest shock to beginners facing assembly language for the very first time is the lack of familiar arithmetic expressions. Arithmetic expressions, in most high level languages, look similar to their algebraic equivalents, e.g.,

      X:=Y*Z;

In assembly language, you'll need several statements to accomplish this same task, e.g.,

      mov( y, eax );      intmul( z, eax );      mov( eax, x );

Obviously the HLL version is much easier to type, read, and understand. This point, more than any other, is responsible for scaring people away from assembly language. Although there is a lot of typing involved, converting an arithmetic expression into assembly language isn't difficult at all. By attacking the problem in steps, the same way you would solve the problem by hand, you can easily break down any arithmetic expression into an equivalent sequence of assembly language statements. By learning how to convert such expressions to assembly language in three steps, you'll discover there is little difficulty to this task.

6.3.1 Simple Assignments

The easiest expressions to convert to assembly language are simple assignments. Simple assignments copy a single value into a variable and take one of two forms:

      variable := constant

      variable := variable

Converting the first form to assembly language is trivial, just use the assembly language statement:

      mov( constant, variable );

This mov instruction copies the constant into the variable.

The second assignment above is slightly more complicated because the 80x86 doesn't provide a memory-to-memory mov instruction. Therefore, to copy one memory variable into another, you must move the data through a register. By convention (and for slight efficiency reasons), most programmers tend to use AL/AX/EAX for this purpose. If the AL, AX, or EAX register is available, you should use it for this operation. For example,

      var1 := var2;

becomes

      mov( var2, eax );      mov( eax, var1 );

This is assuming, of course, that var1 and var2 are 32-bit variables. Use AL if they are 8-bit variables; use AX if they are 16-bit variables.

Of course, if you're already using AL, AX, or EAX for something else, one of the other registers will suffice. Regardless, you must use a register to transfer one memory location to another.

Although the 80x86 does not support a memory-to-memory move, HLA does provide an extended syntax for the mov instruction that allows two memory operands. However, both operands have to be 16-bit or 32-bit values; 8-bit values won't work. Assuming you want to copy the value of a word or double word object to another variable, you can use the following syntax:

      mov( var2, var1 );

HLA translates this "instruction" into the following two instruction sequences:

      push( var2 );      pop( var1 );

Although this is slightly slower than the two mov instructions, it is convenient and it doesn't require the use of an intermediate register (important if you don't have any free registers available).

6.3.2 Simple Expressions

The next level of complexity is a simple expression. A simple expression takes the form:

      var₁ := term₁ op term₂;

var1 is a variable, term1 and term2 are variables or constants, and op is some arithmetic operator (addition, subtraction, multiplication, and so on). Most expressions take this form. It should come as no surprise, then, that the 80x86 architecture was optimized for just this type of expression.

A typical conversion for this type of expression takes the following form:

      mov( term1, eax );      op( term₂, eax );      mov( eax, var₁ )

op is the mnemonic that corresponds to the specified operation (e.g., "+" = add, "-" = sub, and so on).

There are a few inconsistencies you need to be aware of. Of course, when dealing with the (i)mul, (i)div, and (i)mod instructions on the 80x86, you must use the AL/AX/EAX and DX/EDX registers. You cannot use arbitrary registers as you can with other operations. Also, don't forget the sign extension instructions if you're performing a division operation and you're dividing one 16-/32-bit number by another. Finally, don't forget that some instructions may cause overflow. You may want to check for an overflow (or underflow) condition after an arithmetic operation.

Examples of common simple expressions:

 x := y + z;           mov( y, eax );           add( z, eax );           mov( eax, x ); x := y - z;           mov( y, eax );           sub( z, eax );           mov( eax, x ); x := y * z; {unsigned}           mov( y, eax );           mul( z, eax );      // Don't forget this wipes out EDX.           mov( eax, x ); x := y * z; {signed}           mov( y, eax );           intmul( z, eax );     // Does not affect EDX!           mov( eax, x ); x := y div z; {unsigned div}           mov( y, eax );           mov( 0, edx );        // Zero extend EAX into EDX.           div( z, edx:eax );           mov( eax, x ); x := y idiv z; {signed div}           mov( y, eax );           cdq();                // Sign extend EAX into EDX.           idiv( z, edx:eax );           mov( eax, z ); x := y mod z; {unsigned remainder}           mov( y, eax );           mov( 0, edx );        // Zero extend EAX into EDX.           mod( z, edx:eax );           mov( edx, x );        // Note that remainder is in EDX. x := y imod z; {signed remainder}           mov( y, eax );           cdq();                // Sign extend EAX into EDX.           imod( z, edx:eax );           mov( edx, x );        // Remainder is in EDX.

Certain unary operations also qualify as simple expressions. A good example of a unary operation is negation. In a high level language negation takes one of two possible forms:

      var := -var or var1 := -var2

Note that var := -constant is really a simple assignment, not a simple expression. You can specify a negative constant as an operand to the mov instruction:

      mov( -14, var );

To handle "var = -var;" use the single assembly language statement:

      // var = -var;      neg( var );

If two different variables are involved, then use the following:

      // var1 = -var2;      mov( var₂, eax );      neg( eax );      mov( eax, var₁ );

6.3.3 Complex Expressions

A complex expression is any arithmetic expression involving more than two terms and one operator. Such expressions are commonly found in programs written in a high level language. Complex expressions may include parentheses to override operator precedence, function calls, array accesses, and so on. While the conversion of many complex expressions to assembly language is fairly straight-forward, others require some effort. This section outlines the rules you use to convert such expressions.

A complex expression that is easy to convert to assembly language is one that involves three terms and two operators, for example:

      w := w - y - z;

Clearly the straight-forward assembly language conversion of this statement will require two sub instructions. However, even with an expression as simple as this one, the conversion is not trivial. There are actually two ways to convert this from the statement above into assembly language:

      mov( w, eax );      sub( y, eax );      sub( z, eax );      mov( eax, w );

and

      mov( y, eax );      sub( z, eax );      sub( eax, w );

The second conversion, because it is shorter, looks better. However, it produces an incorrect result (assuming Pascal-like semantics for the original statement). Associativity is the problem. The second sequence above computes W := W − (Y − Z) which is not the same as W := (W − Y) − Z. How we place the parentheses around the subexpressions can affect the result. Note that if you are interested in a shorter form, you can use the following sequence:

      mov( y, eax );      add( z, eax );      sub( eax, w );

This computes W:=W-(Y+Z). This is equivalent to W := (W − Y) − Z.

Precedence is another issue. Consider the Pascal expression:

 X := W * Y + Z;

Once again there are two ways we can evaluate this expression:

      X := (W * Y) + Z;

      X := W * (Y + Z);

By now, you're probably thinking that this text is crazy. Everyone knows the correct way to evaluate these expressions is by the second form. However, you're wrong to think that way. The APL programming language, for example, evaluates expressions solely from right to left and does not give one operator precedence over another. Which way is "correct" depends entirely on how you define precedence in your arithmetic system.

Most high level languages use a fixed set of precedence rules to describe the order of evaluation in an expression involving two or more different operators. Such programming languages usually compute multiplication and division before addition and subtraction. Those that support exponentiation (e.g., FORTRAN and BASIC) usually compute that before multiplication and division. These rules are intuitive because almost everyone learns them before high school. Consider the expression:

      X op₁ Y op₂ Z

If op₁ takes precedence over op₂ then this evaluates to (X op₁ Y) op₂ Z otherwise if op₂ takes precedence over op₁ then this evaluates to X op₁ (Y op₂ Z). Depending upon the operators and operands involved, these two computations could produce different results. When converting an expression of this form into assembly language, you must be sure to compute the subexpression with the highest precedence first. The following example demonstrates this technique:

 // w := x + y * z;           mov( x, ebx );           mov( y, eax );      // Must compute y*z first because "*"           intmul( z, eax );   // has higher precedence than "+".           add( ebx, eax );           mov( eax, w );

If two operators appearing within an expression have the same precedence, then you determine the order of evaluation using associativity rules. Most operators are left associative, meaning that they evaluate from left to right. Addition, subtraction, multiplication, and division are all left associative. A right associative operator evaluates from right to left. The exponentiation operator in FORTRAN and BASIC is a good example of a right associative operator:

      2^2^3 is equal to 2^(2^3) not (2^2)^3

The precedence and associativity rules determine the order of evaluation. Indirectly, these rules tell you where to place parentheses in an expression to determine the order of evaluation. Of course, you can always use parentheses to override the default precedence and associativity. However, the ultimate point is that your assembly code must complete certain operations before others to correctly compute the value of a given expression. The following examples demonstrate this principle:

 // w := x - y - z           mov( x, eax );      // All the same operator, so we need           sub( y, eax );      // to evaluate from left to right           sub( z, eax );      // because they all have the same           mov( eax, w );      // precedence and are left associative. // w := x + y * z           mov( y, eax );      // Must compute Y * Z first because           intmul( z, eax );   // multiplication has a higher           add( x, eax );      // precedence than addition.           mov( eax, w ); // w := x / y - z           mov( x, eax );      // Here we need to compute division           cdq();              // first because it has the highest           idiv( y, edx:eax ); // precedence.           sub( z, eax );           mov( eax, w ); // w := x * y * z           mov( y, eax );      // Addition and multiplication are           intmul( z, eax );   // commutative, therefore the order           intmul( x, eax );   // of evaluation does not matter.           mov( eax, w );

There is one exception to the associativity rule. If an expression involves multiplication and division it is generally better to perform the multiplication first. For example, given an expression of the form:

      W := X/Y * Z          // Note: this is (x*z)/y, not x/(y*z)

It is usually better to compute X*Z and then divide the result by Y rather than divide X by Y and multiply the quotient by Z. There are two reasons this approach is better. First, remember that the imul instruction always produces a 64-bit result (assuming 32-bit operands). By doing the multiplication first, you automatically sign extend the product into the EDX register so you do not have to sign extend EAX prior to the division. This saves the execution of the cdq instruction. A second reason for doing the multiplication first is to increase the accuracy of the computation. Remember, (integer) division often produces an inexact result. For example, if you compute 5/2 you will get the value 2, not 2.5. Computing (5/2)*3 produces 6. However, if you compute (5*3)/2 you get the value 7 which is a little closer to the real quotient (7.5). Therefore, if you encounter an expression of the form

      w := x/y*z;

you can usually convert it to the following assembly code:

      mov( x, eax );      imul( z, eax );          // Note the use of IMUL, not INTMUL!      idiv( y, edx:eax );      mov( eax, w );

Of course, if the algorithm you're encoding depends on the truncation effect of the division operation, you cannot use this trick to improve the algorithm. Moral of the story: always make sure you fully understand any expression you are converting to assembly language. Obviously if the semantics dictate that you must perform the division first, do so.

Consider the following Pascal statement:

      w := x - y * x;

This is similar to a previous example, except it uses subtraction rather than addition. Because subtraction is not commutative, you cannot compute y * z and then subtract x from this result. This tends to complicate the conversion a tiny amount. Rather than a straight-forward multiply and addition sequence, you'll have to load x into a register, multiply y and z leaving their product in a different register, and then subtract this product from x, e.g.,

      mov( x, ebx );      mov( y, eax );      intmul( x, eax );      sub( eax, ebx );      mov( ebx, w );

This is a trivial example that demonstrates the need for temporary variables in an expression. This code uses the EBX register to temporarily hold a copy of x until it computes the product of y and z. As your expressions increase in complexity, the need for temporaries grows. Consider the following Pascal statement:

      w := (a + b) * (y + z);

Following the normal rules of algebraic evaluation, you compute the subexpressions inside the parentheses (i.e., the two subexpressions with the highest precedence) first and set their values aside. When you've computed the values for both subexpressions you can compute their product. One way to deal with complex expressions like this one is to reduce it to a sequence of simple expressions whose results wind up in temporary variables. For example, we can convert the single expression above into the following sequence:

      Temp₁ := a + b;      Temp₂ := y + z;      w := Temp₁ * Temp₂;

Because converting simple expressions to assembly language is quite easy, it's now a snap to compute the former, complex expression in assembly. The code is

      mov( a, eax );      add( b, eax );      mov( eax, Temp1 );      mov( y, eax );      add( z, eax );      mov( eax, Temp2 );      mov( Temp1, eax );      intmul( Temp2, eax );      mov( eax, w );

Of course, this code is grossly inefficient and it requires that you declare a couple of temporary variables in your data segment. However, it is very easy to optimize this code by keeping temporary variables, as much as possible, in 80x86 registers. By using 80x86 registers to hold the temporary results this code becomes:

      mov( a, eax );      add( b, eax );      mov( y, ebx );      add( z, ebx );      intmul( ebx, eax );      mov( eax, w );

Yet another example:

      x := (y+z) * (a-b) / 10;

This can be converted to a set of four simple expressions:

      Temp1 := (y+z)      Temp2 := (a-b)      Temp1 := Temp1 * Temp2      X := Temp1 / 10

You can convert these four simple expressions into the assembly language statements:

      mov( y, eax );      // Compute eax = y+z      add( z, eax );      mov( a, ebx );      // Compute ebx = a-b      sub( b, ebx );      imul( ebx, eax );   // This also sign extends eax into edx.      idiv( 10, edx:eax );      mov( eax, x );

The most important thing to keep in mind is that you should attempt to keep temporary values, in registers. Remember, accessing an 80x86 register is much more efficient than accessing a memory location. Use memory locations to hold temporaries only if you've run out of registers to use.

Ultimately, converting a complex expression to assembly language is little different than solving the expression by hand. Instead of actually computing the result at each stage of the computation, you simply write the assembly code that computes the result. Because you were probably taught to compute only one operation at a time, this means that manual computation works on "simple expressions" that exist in a complex expression. Of course, converting those simple expressions to assembly is fairly trivial. Therefore, anyone who can solve a complex expression by hand can convert it to assembly language following the rules for simple expressions.

6.3.4 Commutative Operators

If "@" represents some operator, that operator is commutative if the following relationship is always true:

      (A @ B) = (B @ A)

As you saw in the previous section, commutative operators are nice because the order of their operands is immaterial and this lets you rearrange a computation, often making that computation easier or more efficient. Often, rearranging a computation allows you to use fewer temporary variables. Whenever you encounter a commutative operator in an expression, you should always check to see if there is a better sequence you can use to improve the size or speed of your code. Tables 6-6 and 6-7, respectively, list the commutative and noncommutative operators you typically find in high level languages.

Table 6-6: Some Common Commutative Dyadic (Binary) Operators
Pascal	C/C++	Description

+	+	Addition
*	*	Multiplication
AND	&& or &	Logical or bitwise AND
OR	\|\| or \|	Logical or bitwise OR
XOR	^	(Logical or) bitwise exclusive OR
=	==	Equality
<>	!=	Inequality

Table 6-7: Some Common Noncommutative Dyadic (Binary) Operators
Pascal	C/C++	Description

-	-	Subtraction
/ or DIV	/	Division
MOD	%	Modulo or remainder
<	<	Less than
<=	<=	Less than or equal
>	>	Greater than
>=	>=	Greater than or equal