Finding a Maximum Value | Programming from the Ground Up

Enter the following program as maximum.s:

  #PURPOSE:  This program finds the maximum number of a  #          set of data items.  #  #VARIABLES: The registers have the following uses:  #  # %edi - Holds the index of the data item being examined  # %ebx - Largest data item found  # %eax - Current data item  #  # The following memory locations are used:  #  # data_items - contains the item data.  A 0 is used  #              to terminate the data  #  .section .data data_items:                       #These are the data items  .long 3,67,34,222,45,75,54,34,44,33,22,11,66,0  .section .text  .globl _start _start:  movl $0, %edi             # move 0 into the index register  movl data_items(,%edi,4), %eax # load the first byte of data  movl %eax, %ebx           # since this is the first item, %eax is                            # the biggest start_loop:                   # start loop  cmpl $0, %eax             # check to see if we've hit the end  je loop_exit  incl %edi                 # load next value  movl data_items(,%edi,4), %eax  cmpl %ebx, %eax           # compare values  jle start_loop            # jump to loop beginning if the new                            # one isn't bigger  movl %eax, %ebx           # move the value as the largest  jmp start_loop            # jump to loop beginning loop_exit:  # %ebx is the status code for the exit system call  # and it already has the maximum number         movl $1, %eax             #1 is the exit() syscall         int  $0x80

Now, assemble and link it with these commands:

 as maximum.s -o maximum.o ld maximum.o -o maximum

Now run it, and check its status.

 ./maximum echo $?

You'll notice it returns the value 222. Let's take a look at the program and what it does. If you look in the comments, you'll see that the program finds the maximum of a set of numbers (aren't comments wonderful!). You may also notice that in this program we actually have something in the data section. These lines are the data section:

 data_items:                       #These are the data items         .long 3,67,34,222,45,75,54,34,44,33,22,11,66,0

Lets look at this. data_items is a label that refers to the location that follows it. Then, there is a directive that starts with .long. That causes the assembler to reserve memory for the list of numbers that follow it. data_items refers to the location of the first one. Because data_items is a label, any time in our program where we need to refer to this address we can use the data_items symbol, and the assembler will substitute it with the address where the numbers start during assembly. For example, the instruction movl data_items, %eax would move the value 3 into %eax. There are several different types of memory locations other than .long that can be reserved. The main ones are as follows:

.byte

Bytes take up one storage location for each number. They are limited to numbers between 0 and 255.

.int

Ints (which differ from the int instruction) take up two storage locations for each number. These are limitted to numbers between 0 and 65535.^[9]

.long

Longs take up four storage locations. This is the same amount of space the registers use, which is why they are used in this program. They can hold numbers between 0 and 4294967295.

.ascii

The .ascii directive is to enter in characters into memory. Characters each take up one storage location (they are converted into bytes internally). So, if you gave the directive .ascii "Hello there\0", the assembler would reserve 12 storage locations (bytes). The first byte contains the numeric code for H, the second byte contains the numeric code for e, and so forth. The last character is represented by \0, and it is the terminating character (it will never display, it just tells other parts of the program that that's the end of the characters). Letters and numbers that start with a backslash represent characters that are not typeable on the keyboard or easily viewable on the screen. For example, \n refers to the "newline" character which causes the computer to start output on the next line and \t refers to the "tab" character. All of the letters in an .ascii directive should be in quotes.

In our example, the assembler reserves 14 .longs, one right after another. Since each long takes up 4 bytes, that means that the whole list takes up 56 bytes. These are the numbers we will be searching through to find the maximum. data_items is used by the assembler to refer to the address of the first of these values.

Take note that the last data item in the list is a zero. I decided to use a zero to tell my program that it has hit the end of the list. I could have done this other ways. I could have had the size of the list hard-coded into the program. Also, I could have put the length of the list as the first item, or in a separate location. I also could have made a symbol which marked the last location of the list items. No matter how I do it, I must have some method of determining the end of the list. The computer knows nothing - it can only do what it is told. It's not going to stop processing unless I give it some sort of signal. Otherwise it would continue processing past the end of the list into the data that follows it, and even to locations where we haven't put any data.

Notice that we don't have a .globl declaration for data_items. This is because we only refer to these locations within the program. No other file or program needs to know where they are located. This is in contrast to the _start symbol, which Linux needs to know where it is so that it knows where to begin the program's execution. It's not an error to write .globl data_items, it's just not necessary. Anyway, play around with this line and add your own numbers. Even though they are .long, the program will produce strange results if any number is greater than 255, because that's the largest allowed exit status. Also notice that if you move the 0 to earlier in the list, the rest get ignored. Remember that any time you change the source file, you have to re-assemble and re-link your program. Do this now and see the results.

All right, we've played with the data a little bit. Now let's look at the code. In the comments you will notice that we've marked some variables that we plan to use. A variable is a dedicated storage location used for a specific purpose, usually given a distinct name by the programmer. We talked about these in the previous section, but didn't give them a name. In this program, we have several variables:

a variable for the current maximum number found
a variable for which number of the list we are currently examining, called the index
a variable holding the current number being examined

In this case,we have few enough variables that we can hold them all in registers. In larger programs, you have to put them in memory, and then move them to registers when you are ready to use them. We will discuss how to do that later. When people start out programming, they usually underestimate the number of variables they will need. People are not used to having to think through every detail of a process, and therefore leave out needed variables in their first programming attempts.

In this program, we are using %ebx as the location of the largest item we've found. %edi is used as the index to the current data item we're looking at. Now, let's talk about what an index is. When we read the information from data_items, we will start with the first one (data item number 0), then go to the second one (data item number 1), then the third (data item number 2), and so on. The data item number is the index of data_items. You'll notice that the first instruction we give to the computer is:

  movl $0, %edi

Since we are using %edi as our index, and we want to start looking at the first item, we load %edi with 0. Now, the next instruction is tricky, but crucial to what we're doing. It says:

  movl data_items(,%edi, 4), %eax

Now to understand this line, you need to keep several things in mind:

data_items is the location number of the start of our number list.
Each number is stored across 4 storage locations (because we declared it using .long)
%edi is holding 0 at this point

So, basically what this line does is say, "start at the beginning of data_items, and take the first item number (because %edi is 0), and remember that each number takes up four storage locations." Then it stores that number in %eax. This is how you write indexed addressing mode instructions in assembly language. The instruction in a general form is this:

 movl  BEGINNINGADDRESS(,%INDEXREGISTER,WORDSIZE)

In our case data_items was our beginning address, %edi was our index register, and 4 was our word size. This topic is discussed further in the Section called Addressing Modes.

If you look at the numbers in data_items, you will see that the number 3 is now in %eax. If %edi was set to 1, the number 67 would be in %eax, and if it was set to 2, the number 34 would be in %eax, and so forth. Very strange things would happen if we used a number other than 4 as the size of our storage locations.^[10] The way you write this is very awkward, but if you know what each piece does, it's not too difficult. For more information about this, see the Section called Addressing Modes

Let's look at the next line:

  movl %eax, %ebx

We have the first item to look at stored in %eax. Since it is the first item, we know it's the biggest one we've looked at. We store it in %ebx, since that's where we are keeping the largest number found. Also, even though movl stands for move, it actually copies the value, so %eax and %ebx both contain the starting value.^[11]

Now we move into a loop. A loop is a segment of your program that might run more than once. We have marked the starting location of the loop in the symbol start_loop. The reason we are doing a loop is because we don't know how many data items we have to process, but the procedure will be the same no matter how many there are. We don't want to have to rewrite our program for every list length possible. In fact, we don't even want to have to write out code for a comparison for every list item. Therefore, we have a single section of code (a loop) that we execute over and over again for every element in data_items.

In the previous section, we outlined what this loop needed to do. Let's review:

Check to see if the current value being looked at is zero. If so, that means we are at the end of our data and should exit the loop.
We have to load the next value of our list.
We have to see if the next value is bigger than our current biggest value.
If it is, we have to copy it to the location we are holding the largest value in.
Now we need to go back to the beginning of the loop.

Okay, so now lets go to the code. We have the beginning of the loop marked with start_loop. That is so we know where to go back to at the end of our loop. Then we have these instructions:

  cmpl $0, %eax  je loop_exit

The cmpl instruction compares the two values. Here, we are comparing the number 0 to the number stored in %eax This compare instruction also affects a register not mentioned here, the %eflags register. This is also known as the status register, and has many uses which we will discuss later. Just be aware that the result of the comparison is stored in the status register. The next line is a flow control instruction which says to jump to the loop_exit location if the values that were just compared are equal (that's what the e of je means). It uses the status register to hold the value of the last comparison. We used je, but there are many jump statements that you can use:

Jump if the values were equal

Jump if the second value was greater than the first value^[12]

jge

Jump if the second value was greater than or equal to the first value

Jump if the second value was less than the first value

jle

Jump if the second value was less than or equal to the first value

jmp

Jump no matter what. This does not need to be preceeded by a comparison.

The complete list is documented in Appendix B. In this case, we are jumping if %eax holds the value of zero. If so, we are done and we go to loop_exit.^[13]

If the last loaded element was not zero, we go on to the next instructions:

  incl %edi  movl data_items(,%edi, 4), %eax

If you remember from our previous discussion, %edi contains the index to our list of values in data_items.incl increments the value of %edi by one. Then the movl is just like the one we did beforehand. However, since we already incremented %edi, %eax is getting the next value from the list. Now %eax has the next value to be tested. So, let's test it!

  cmpl %ebx, %eax  jle start_loop

Here we compare our current value, stored in %eax to our biggest value so far, stored in %ebx. If the current value is less or equal to our biggest value so far, we don't care about it, so we just jump back to the beginning of the loop. Otherwise, we need to record that value as the largest one:

  movl %eax, %ebx  jmp start_loop

which moves the current value into %ebx, which we are using to store the current largest value, and starts the loop over again.

Okay, so the loop executes until it reaches a 0, when it jumps to loop_exit. This part of the program calls the Linux kernel to exit. If you remember from the last program, when you call the operating system (remember it's like signaling Batman), you store the system call number in %eax (1 for the exit call), and store the other values in the other registers. The exit call requires that we put our exit status in %ebx We already have the exit status there since we are using %ebx as our largest number, so all we have to do is load %eax with the number one and call the kernel to exit. Like this:

  movl $1, %eax  int $0x80

Okay, that was a lot of work and explanation, especially for such a small program. But hey, you're learning a lot! Now, read through the whole program again, paying special attention to the comments. Make sure that you understand what is going on at each line. If you don't understand a line, go back through this section and figure out what the line means.

You might also grab a piece of paper, and go through the program step-by-step, recording every change to every register, so you can see more clearly what is going on.

^[9]Note that no numbers in assembly language (or any other computer language I've seen) have commas embedded in them. So, always write numbers like 65535, and never like 65,535.

^[10]The instruction doesn't really use 4 for the size of the storage locations, although looking at it that way works for our purposes now. It's actually what's called a multiplier. basically, the way it works is that you start at the location specified by data_items, then you add %edi*4 storage locations, and retrieve the number there. Usually, you use the size of the numbers as your multiplier, but in some circumstances you'll want to do other things.

^[11]Also, the 1 in movl stands for move long since we are moving a value that takes up four storage locations.

^[12]notice that the comparison is to see if the second value is greater than the first. I would have thought it the other way around. You will find a lot of things like this when learning programming. It occurs because different things make sense to different people. Anyway, you'll just have to memorize such things and go on.

^[13]The names of these symbols can be anything you want them to be, as long as they only contain letters and the underscore character(_). The only one that is forced is _start, and possibly others that you declare with .globl. However, if it is a symbol you define and only you use, feel free to call it anything you want that is adequately descriptive (remember that others will have to modify your code later, and will have to figure out what your symbols mean).