In programs, especially large ones, it is often required to repeatedly implement the same task and, therefore, to write the same group of commands many times. In order to avoid this repetition, a programmer usually develops it once according to certain rules. In appropriate points of the program, he or she simply passes control to these commands, which return control after their execution. Such a group of commands, which implements a task and is designed so that it can be used any number of times at any places of code, is called a subroutine or a procedure. In contrast to a subroutine, the rest of the program is usually called the main program. This chapter discusses the principles of building procedures in assembler. These procedures are saved in files with the ASM extension and compiled to individual object module files with the OBJ extension. To use a procedure in an object module, it is necessary to add the OBJ file to the C++ project and to call the procedure in accordance with calling conventions adopted in C++ .NET.
An assembly subroutine corresponds to a C++ function, and a call to the subroutine corresponds to a function call. Implementing subroutines in assembly language is a more complicated process than declaring functions in C++ .NET. From now on, we will use the terms subroutine , procedure , and function synonymously.
In C++, many aspects of work with subroutines are hidden from a programmer, and their implementation is the compiler s job. In the assembler, a programmer has to do much work on his or her own. Although it is more difficult to write programs in assembler than in C++, the use of assembler gives full control over the program code and makes it possible to achieve a higher optimization of the application as a whole. What follows is a discussion regarding the development of procedures in the context of conventions adopted for Microsoft MASM 6.14 assembler, though the main principles are valid for any other assembler on the Intel platform.
How should you declare a procedure in the assembler? Its declaration looks like this:
<procedure name> proc <parameter> <procedure body> <procedure name> endp
The body of the procedure (its commands) is preceded by the proc (procedure) directive and followed by the endp (end of procedure) directive. For example, a piece of code that declares the AsmSub procedure could look as follows:
AsmSub proc . . . ret AsmSub endp
The procedure must be terminated with the ret command. One ASM file can contain several procedures. Here is an example of combining two procedures named AsmSub1 and AsmSub2 in one module:
. . . .code AsmSub1 proc . . . ret AsmSub1 endp AsmSub2 proc . . . ret AsmSub2 endp end
The proc directive is considered the entry point to the procedure. Note that there is no colon after the name in the proc directive. Nevertheless, this name is considered a label and points to the first command of the procedure. The name of the procedure can be specified in a jump command. Then control will be passed to the first command of the procedure.
The proc directive has a parameter. It is either near or far . If the parameter is missing, it is considered near (this is why the near parameter is usually omitted). When the near parameter is used or the parameter is missing, the procedure is called near, and with the far parameter, it is called far. A near procedure can be called only from that command segment, in which it was declared, while a far procedure can be called from any command segment (including the segment, in which it was declared). This is the difference between near and far procedures. For 32-bit applications considered in this book, all procedure calls are near.
It should be mentioned that the names and labels declared in an assembly procedure are not local. This is why they must be unique relative to the other names used in the program. In assembler, it is possible to declare a procedure inside another procedure. However, this does not provide any advantages, so programmers rarely use nested procedures.
Now, we will discuss how procedures are called, and how they return. When programming in C++, it is enough for you to specify the name and actual parameters of a procedure in order to execute it. The work of the procedure and return to the main program are hidden from you by the compiler. However, if you write a procedure in assembler, you will have to implement all interaction between the main program and the procedure by yourself. Here are instructions on how to do this.
Two problems arise: How can you make the procedure work from the main program, and how can you return from the procedure to the main program? The first problem is easily solved . Simply execute a jump command to the first command of the procedure. In other words, specify the name of the procedure in a jump command. The other problem is more complicated. The procedure can be called from different places within the main program, therefore, it should return to different places. The procedure itself does not know where to return, but the main program does. Therefore, when calling a procedure, the main program must tell it a so-called return address. This is the address of the command in the main program, to which the procedure must return control after it completes. Usually, it is the address of the command next to the call command. The main program tells this address to the procedure, and the procedure returns control to this address. Since different calls to a procedure tell it different return addresses, the procedure returns control to different places in the main program.
How can you tell a procedure the return address? This can be done in different ways. First, you can pass it via a register. The main program writes the return address to a register, and the procedure reads it and jumps accordingly . Second, you can use the stack. Before the main program calls a procedure, it pushes the return address on the stack, and the procedure pops it and uses to jump. It is a common practice to pass the return address via the stack, so we will use only this method.
Passing the return address via the stack and returning to this address can be implemented with the commands that are already familiar to you. However, procedures are used in actual programs very often, so the processor s command set includes special commands that make it simpler to implement jumps between the main program and procedures. These are the call command and the ret command, which are familiar to you. The main variants of these commands are:
call <procedure name> ret
The call command pushes the address of the next command on the stack and jumps to the first command of the specified procedure. The ret command pops the address from the top of the stack and jumps to this address.
Here is an example. Suppose you want to display an integer computed by the formula i1 “ i2 “ 100 , where i1 and i2 are integers. To compute this, write two functions in assembler and save them in an ASM file. The source code in this file is shown in Listing 3.1.
. . . asmsub proc mov EAX, i1 sub EAX, i2 call sub100 ret asmsub endp sub100 proc sub EAX, 100 ret sub100 endp . . .
The asmsub procedure begins by computing the difference i1 “ i2 and puts the intermediate result to the EAX register. The call sub100 command pushes the address of the next command on the stack and passes control to the beginning of the sub100 procedure, i.e., to the sub EAX, 100 command. This procedure returns the final value (equal to i1 “ i2 “ 100 ) via the EAX register. After that, the ret command pops the address from the stack and jumps to this address. Thus the asmsub procedure resumes execution from the command that follows the call sub100 command.
There are a few versions of the call command. We demonstrated the main variant where the name of a procedure is specified as a parameter of the command. However, you can use a register as a parameter. In this case, the address of the called procedure is put in the register. Modify the previous example as shown in Listing 3.2.
. . . asmsub proc mov EAX, i1 sub EAX, i2 push EBX lea EBX, sub100 call EBX pop EBX ret asmsub endp sub100 proc sub EAX, 100 ret sub100 endp . . .
The following four lines are most important for understanding the working principles of the procedures in this example:
push EBX lea EBX, sub100 call EBX pop EBX
The first command pushes the EBX register to the stack before modifying it. There are not very many registers in the PC, but almost every command uses one or another register. Therefore, it is likely that the main program and the procedure need the same registers, which would complicate using them. You could develop your application so that the main program and the procedure use different registers, but this would be rather difficult because of the limited number of processor registers. This is why the code of a procedure does not make any assumptions on what registers are used in the main program and simply saves the values of all registers.
The EBX register is one of the most frequently used in the main application, so it is important to save it and return its unchanged value to the caller. This is done with the commands push EBX and pop EBX .
After the EBX register is saved, the address of the sub100 procedure is loaded to it. Finally, the call EBX command calls the procedure, i.e., pushes the return address on the stack and passes control to the procedure.
Now, we will consider another modified variant of the code fragment. To call a procedure, you can use the jmp command (Listing 3.3).
. . . asmsub proc mov EAX, i1 sub EAX, i2 lea EDX, ex push EDX jmp sub100 ex: ret asmsub endp sub100 proc sub EAX, 100 ret sub100 endp . . .
Using the jmp command is based on the understanding that the address of the command that follows jmp in the main program is pushed on the stack first. The sub100 procedure subtracts 100 from the value in the EAX register and passes control to the calling procedure asmsub with the ret command. The ret command does not know that the stack stores the address of the command that follows the call command. After ret is executed, the address of the ex label is put to the program counter. This address was earlier pushed on the stack with the commands:
lea EDX, ex push EDX
In these examples, the EAX register was used to pass parameters and return the result. This is one of the simplest variants. Generally, the problem of passing parameters and returning the result cannot be solved so simply. Therefore, it is worth more detailed consideration.
There are different ways of passing actual parameters to a procedure. The simplest one, which was demonstrated in the previous examples, is to pass the parameters via registers: The main program writes the actual parameters to registers, and the procedure reads and uses them. You can deal with the result (if any) in the same manner: The procedure writes its result to a register, and the main program extracts it. Which registers can be used to pass parameters and return the result? You may choose them at will, although there are a few rules.
To pass parameters, the EAX , EBX , ECX , and EDX registers are used most frequently, and the EBP , ESI , and EDI are used less frequently. Usually, the EBP register is used with the stack pointer register ( ESP ) to access parameters in the stack. We will address this topic later. It is convenient to use the ESI and EDI registers as index registers for operations over arrays. However, nothing can prevent you from using them at your discretion.
Here, we will consider an example. Suppose you need to find the maximum of two integers and compute its absolute value. Develop two procedures in the assembler (name them maxint and maxabs ) that find the maximum and its absolute value.
The maxint procedure takes two integer parameters ( i1 and i2 ) and can be declared as maxint (i1, i2) . The maxabs procedure takes an integer as a parameter. Let it be intval , and its declaration can be maxabs (intval) .
We will assume that the first parameter i1 is passed via the EAX register, while i2 is passed via the EBX register. The procedures will return the results in the EAX register. The result of the maxint procedure is an input parameter for the maxabs procedure. The source code of a fragment of the program that uses these procedures is shown in Listing 3.4.
;the main program . . . mov EAX, i1 mov EBX, i2 call maxint ; The maximum is in the EAX register mov intval, EAX call maxabs ; The absolute value of the maximum . . . ; The procedures are declared here ; maxint(i1, i2) maxint proc cmp EAX,EBX jge ex mov EAX,EBX ex: ret maxint endp ; maxabs(intval) maxabs proc mov EAX, intval cmp EAX, 0 jge quit neg EAX quit: ret maxabs endp . . .
When passing parameters via registers, you must keep in mind one important point. The registers that the procedure uses can be exploited in other parts of the program, so destroying their contents by the procedure can cause the program to crash. It is important to save the contents of the registers before entering a procedure if you are not completely sure that their contents are not used by other procedures or by the program. To save registers, the stack is normally used. Thus, if the main program in the previous example uses the EBX register, you should modify the source code (Listing 3.5). The additional commands are in bold.
;the main program . . . mov EAX, i1 push EBX mov EBX, i2 call maxint pop EBX ; The maximum is in the EAX register mov intval, EAX call maxabs ; The procedures are declared here . . . ; maxint(i1, i2) maxint proc . . . maxint endp ; maxabs(intval) maxabs proc . . . maxabs endp . . .
Another approach is often used. Both the main program and the procedure are allowed to use the same registers, but the procedure is obliged to save the values of the registers used by the main program. This is simple: The procedure must first push the values of the registers it needs on the stack, and then it can use them as it likes. Before it returns, it must restore the original values of the registers by popping them from the stack.
It is strongly recommended that you do this in every procedure, even if it is obvious that the main program and the procedure use different registers. The source code of the program might change later (in fact, this is very likely), and the main program might need the registers after the changes are made.
Therefore, it is important to save registers by using two special commands: pusha and popa . They push the values of the general-purpose registers on the stack and pop them from it.
Note that you do not have to save the register used by a procedure to return the result. Changing this register is the goal of the procedure.
It is convenient to pass parameters via registers, and this method is used frequently. It is effective when there are few parameters. However, if you have many parameters, you might be short of registers for them. In this case, another method of passing parameters is used: via the stack. The main program pushes actual parameters (their values or addresses) on the stack, and the procedure pops them from the stack.
Suppose a procedure (named myproc ) has n parameters and is declared as myproc (x1,x2, ,xn) . We will assume that the main program pushes the parameters on the stack in a certain order before calling the procedure. Options of arrangement of the parameters within the stack are limited; in fact, there are only two variants. The first one involves pushing the parameters on the stack from left to right: First, the first parameter is pushed; then the second, etc. In 32-bit applications, each parameter has the size of a double word, so the main program s commands that implement a procedure call are as follows:
push x1 push x2 . . . push xn call myproc
According to the second variant, the parameters are pushed on the stack from right to left: The n th parameter is pushed first, then the (n “ 1) th parameter, etc. In this case, you should execute the following commands before you call the function:
push xn . . . push x1 call myproc . . .
How does a procedure access its parameters? A commonly used method involves accessing the parameters by using the EBP register. You should put the address of the top of the stack (the contents of the ESP register) to the EBP register and then use an expression like [EBP+i] to access the parameters of the procedure. It is advisable to save the EBP register because it might be used in the main program. Therefore, first save the contents of this register and only then move the contents of the ESP register to it.
We will illustrate this with an example. Modify the previous example so that the stack is used for passing the parameters. Assume the parameters are passed to the maxint procedure from right to left, i.e., the i2 variable is pushed on the stack first, and then the i1 variable. The fragments of the source code of the main program and the procedures where the parameters are passed via the stack are shown in Listing 3.6.
; Main program . . . push i2 push i1 call maxint ; The maximum is in the EAX register mov intval, EAX push intval call maxabs ; The absolute value of the maximum . . . ; The procedures are declared here ; maxint(i1, i2) maxint proc push EBP mov EBP, ESP ; Loading the i1 parameter to the EAX register mov EAX, DWORD PTR [EBP+8] ; Saving the EBX register push EBX ; Loading the i2 parameter to the EBX register mov EBX, DWORD PTR [EBP+i2] cmp EAX, EBX jge ex mov EAX, EBX ex: pop EBP ret 8 maxint endp ; maxabs(intval) maxabs proc push EBP mov EBP, ESP ; Loading a parameter to the EAX register push EBP mov EBP, ESP mov EAX, DWORD PTR [EBP+8] ; intval cmp EAX, 0 jge quit neg EAX quit: pop EBP ret 4 maxabs endp . . .
When a procedure completes, it must perform certain actions, which we will describe. Note that by the moment of exiting the procedure, the stack should have the same state as before the procedure call.
When the procedure completes, the top of the stack will contain the old value of the EBP register. Pop it and restore EBP with the pop EBP command. Now, the top of the stack contains the return address. You might think that you can exit the procedure with the ret command, but this is not the case. You should clear the stack from the parameters that you no longer need. This can be done either in the calling program or in the procedure. Of course, the main program can do this by executing the add SP , n command (where n is the number of bytes to clear) after the call mysub command.
However, it is best to clear the stack in the procedure. There can be many calls to the procedure; therefore, you will have to write the add command in the main program many times. In the procedure, you will write this command only once. Here is a useful rule for program optimization: If an action can be done either in the main program or in the procedure, it is best to do it in the procedure. In this case, you will need fewer commands.
Thus, the procedure should first clear the stack from the parameters and only then pass control to the return address. To make it simpler to implement these two actions, an extended version of the ret command was introduced to the command set. It has a direct operand that is treated as an unsigned integer:
ret n
This command pops the return address from the stack first, then it clears n bytes in the stack, and finally it jumps to the return address.
A few additional notes: First, the ret command is actually the ret 0 command, i.e., it returns without clearing the stack. Second, the operand of this command tells how many bytes in the stack should be cleared. Finally, the operand should not take into account the return address because the ret command pops it before clearing the stack.
After the procedure returns control in such a manner, the stack will have the same state as before the call to the procedure, i.e., before the parameters were pushed on it. Thus, all traces of the call are covered up, and this is what you want.
This is a rough design of passing parameters via the stack. Remember that this method for passing parameters is universal, and it can be used with any number of parameters. On the other hand, it is more complicated than passing parameters via registers, so you should prefer passing parameters via registers because it is simpler and shorter. As for the result, it is very seldom passed via the stack and usually via a register. It is common that the result of a procedure is returned in the EAX register.
Many procedures do not have problems with storing local data (variables necessary only for procedure execution) because registers will do for this purpose. However, when there are many local variables in a procedure, the question arises: Where should they be stored? You can store the data in the data segment, the code segment, or the stack.
The MASM macro assembler allows you to use any of these options. To store data in the data segment, you should initialize it. This is done with the . data directive. The data type can be specified as DB (byte), DW (word), or DD (double word). If a sequence of elements is stored in the data segment, you can define its size with the $ operator.
For example, we will examine how the data segment can be used for processing a character string. Suppose the calling procedure, or main program requires the seventh element of a string stored in a called procedure. The parameter passed to the called procedure is the position of the element within the string (in this case ”six, because the first element index is zero). The procedure developed in the MASM macro assembler (we will name it findchar ) returns the result in the EAX register. The result is a character or a zero if the specified position number is greater than the string length. The source code of the procedure is shown in Listing 3.7.
. . . .data s1 DB 'STRING1!!!' 1s EQU $s1 .code findchar proc push EBP mov EBP, ESP mov EDX, DWORD PTR [EBP+8] cmp EDX, ls jbe next mov EAX, 0 jmp ex next: lea ESI, s1 add ESI, EDX xor EAX, EAX mov AL, BYTE PTR [ESI] ex: pop EBP ret findchar endp end
The position number is passed via the stack at the address of [EBP+8] . It is saved in the EDX register and compared to the ls1 string length. If the string length is less than the passed position number, the procedure writes zero to EAX and returns.
If the position number is within the string, the corresponding element is put to the AL register. The address of this element is computed by adding the initial offset to the position number with the following commands:
lea ESI, s1 add ESI, EDX
In this case, the stack should be cleared either in the caller program or in the procedure.
To work with local data, you also can use the procedure s code segment.
Often, it is quite convenient because no data segment initialization is required, and performance increases . It is very easy to modify the source code of the previous example for work with local data directly in the code segment (Listing 3.8).
. . . .code findchar proc jmp strt s1 DB 'STRING1!!!!' ls1 EQU$s1 strt: push EBP mov EBP, ESP mov EDX, DWORD PTR [EBP+8] cmp EDX, ls1 jbe next mov EAX, 0 jmp ex next: lea ESI, s1 add ESI, EDX xor EAX, EAX mov AL, BYTE PTR [ESI] ex: pop EBP ret findchar endp end
Here, we used a simple trick with the jmp strt command to jump to the main branch of the procedure. The local data are stored in the code segment. The stack memory can be allocated with a standard method.
Push the current value of the EBP register on the stack and then set this register to the address of the top of the stack. After that, decrease the value of the ESP stack counter by the number of the bytes you need. For example, if the mysub procedure requires three double words, write the following commands:
mysub proc push EBP mov EBP, ESP sub ESP, 12 . . . mysub endp
Now, the local data can be accessed with expressions like [EBP “ i] , where i is the position of the element. After the procedure completes, you should execute the following commands:
. . . mov ESP, EBP pop EBP ret . . .
Allocating the stack memory is used in high-level programming languages when calling procedures (functions). For procedures in assembler, this method is less effective than, for example, using the data segment.
In most cases, especially when developing large programs, it is required to process the same data with several procedures or even individual programs. It would be very convenient to make these data available to several procedures. You might be wondering what point there is to use specific techniques for procedure interaction if data can be passed from one procedure to another as parameters. The examples above demonstrate this. However, when parameters are used for interaction between procedures, certain problems arise. Here are just a couple of them:
When the number of procedures that process the same data is relatively large, the performance of a procedure decreases. Suppose a procedure contains an array of numbers that is processed with a few procedures. Every time the procedure is called, the other programs have to compute the addresses of the elements of the array. If the stack is used, which is the case in most instances, it is necessary to access the stack to obtain the memory addresses of the array elements. When the procedure is called relatively seldom, this may present no problems. However, the performance of the application can decrease by increasing the complexity of the program structure and the data-processing algorithms.
When the same data is processed with different procedures, the structurability of the program that uses the assembly procedures becomes worse .
Using common, or global, variables allows you to work with them with the minimum use of the stack, which economizes the processor time. In addition, the fixed links to the common variables established at the link stage speed up access to them.
From now on, we will use the terms common variable and global variable synonymously. To work with common variables in assembly language, the public and extern directives are used. The public directive declares a variable or function accessible to other modules, and the extern directive indicates that the variable or procedure is external relative to the procedure being executed. Both directives are used for assembling the main program or a procedure from several object modules and are very convenient when building large programs.
Global variables are declared as follows:
In the object module that contains such a variable, specify the accessibility of the variable with the public directive.
In the object modules that access the common variable, declare it with the extern directive.
The following example demonstrates the technique of using common data in the work of two procedures. The first one (we will name it sub2 ) subtracts three integers. The first two numbers are its input parameters, and the third number (named add2res ) is external to the sub2 procedure and is located in another object module. The source code of the sub2 procedure is shown in Listing 3.9.
. . . extern add2res: DWORD .code sub2 proc push EBP mov EBP, ESP mov EAX, DWORD PTR [EBP+8] sub EAX, DWORD PTR [EBP+i2] sub EAX, add2res pop EBP ret sub2 endp end
Note the line:
extern add2res: DWORD
The extern directive declares the add2res variable as an external double word which corresponds to an integer.
As usual, the result is put to the EAX register.
Where should you take the add2res variable? Its value is the result of the second procedure, a2 , which computes the sum of two integers being its input parameters. The add2res is declared as a double word in the data segment and contains the sum of two integers. Since add2res should be available to the sub2 procedure from another object module, it should be declared as public . The source code of the a2 procedure is shown in Listing 3.10.
. . . public add2res .data add2res DD 0 .code a2 proc push EBP mov EBP, ESP mov EAX, DWORD PTR [EBP+8] add EAX, DWORD PTR [EBP+i2] mov add2res, EAX pop EBP ret a2 endp end
To work correctly, the main program should first call the a2 procedure and then the sub2 procedure.
This example is very simple. It demonstrates the key aspects of using common variables. If you want to pass a procedure, a string, or array by using common variables, the task becomes complicated.
For example, suppose you want to display a particular symbol of a string in your main program. Develop two procedures interacting with the main program. The first one (name it rets ) contains a null-terminated character string. Its only purpose is to pass the string to the other procedure (name it fchar ). The fchar procedure uses this string to search for an element whose ordinal number is passed to it from the main program as a parameter. Below is a more detailed description of this procedure. Consider the rets procedure first. Its source code is shown in Listing 3.11.
public as1 .data s1 DB 'STRING TO SEND' DB 0 as1 DD 0 .code rets proc lea EAX, s1 mova s1, EAX ret rets endp end
The s1 string is declared as a null-terminated string. As you know, a string can be passed by passing its address. Since the address is 32-bit, it is convenient to store it in the as1 double-word variable with the commands:
lea EAX, s1 mov as1, EAX
In addition, you should declare as1 with the public directive. The source code of the fchar procedure is shown in Listing 3.12.
. . . extern as1:DWORD public fchar .code fchar proc push EBP mov EBP, ESP mov EDX, DWORD PTR [EBP+8] mov ESI, DWORD PTR as1 add ESI, EDX xor EAX, EAX mov AL, BYTE PTR [ESI] pop EBP ret fchar endp end
To access the string from the fchar procedure, declare the variable that contains the string address with the extern directive:
extern as1:DWORD
Then the procedure searches for the element with the given number, and the result is returned in the EAX register, as usual.
In addition to common variables, assembly language allows you to use common (global) procedures shared by several modules. As with common variables, such procedures are declared as follows:
In the object module that contains a common procedure, specify the accessibility of the variable with the public directive.
In the object modules that access the common procedure, declare it with the extern directive.
Here is an example. Suppose the main program displays the absolute value of the difference between two integers. The difference is computed with the subcom procedure, and the absolute value is computed with the abs procedure. The subcom procedure is used in the main program and is declared as public . The abs procedure is called from subcom and is declared as extern . The source code of the subcom procedure is shown in Listing 3.13.
. . . .model flat, C public subcom extern abs:proc .code subcom proc push EBP mov EBP, ESP mov EAX, DWORD PTR [EBP+8] sub EAX, DWORD PTR [EBP+i2] push EAX call abs add ESP, 4 pop EBP ret subcom endp end
This source code is simple. Note the following lines:
push EAX call abs add ESP, 4
The difference of two numbers is passed to the abs procedure via the stack, with the EAX register used. After the called procedure completes, the stack is reset by the calling one, subcom in this case. The source code of the abs procedure is shown in Listing 3.14.
. . . public abs .code abs proc push EBP mov EBP, ESP mov EAX, DWORD PTR [EBP+8] cmp EAX, 0 jge no_change neg EAX no_change: pop EBP ret abs endp end
The abs procedure is declared as public , which makes it accessible in other modules.
This completes the discussion of the use of procedures written in the MASM 6.14 assembly language. As you see from the given examples, the assembler has many possibilities for development of separate modules, and we will use them in the consequent chapters.