Assembly Language Philosophy | Shellcoders Programming Uncovered (Uncovered series)

Assembly is a low-level language that operates with machine codes and concepts. Do not even try to find a command for displaying the " hello, world! " string; there is no such command here. I'll give a brief list of the actions that the processor is capable of carrying out: add, subtract, divide, multiply or compare two numbers , and, depending on the result of this operation, pass the control to appropriate program branch, send a number from location to location, write a number to the port, or read some number from a port. Peripheral devices are controlled exactly through the ports or through a special memory region (video memory, for example). To output a character to the terminal, it is necessary to first consult the technical documentation for the video adapter; to read a sector from the hard disk, consult the documentation supplied with that drive. Fortunately, this part of the job is delegated to hardware drivers, and programmer mustn't carry it out manually. Furthermore, in normal operating systems, such as Windows NT, ports are not available from the application level.

Another machine concept that needs to be mastered is the register. It is difficult to explain what the register is without sin against the truth. The register is something that looks like a register but isn't such a thing. In the ancient computer, a register was a part of the data-processing device. The processor cannot add two numbers loaded into the main memory. Before carrying out this operation, it must load them into registers. This is the situation as it appears at the micro level. Above this level, there is the machine command interpreter, which no contemporary processor can do without. Yes, machine codes are interpreted. PDP-11 didn't require the programmer to previously load the data into the registers, and it pretended that it was taking them directly from the memory. In reality, the data were secretly loaded into the internal registers. After carrying out arithmetic operations, the result was written either to the memory or into a "logical" register, which actually was a cell of fast memory.

In x86, registers are as virtual as they were in PDP. However, in contrast to PDP, they have partially retained their specialization. Some commands ( mul , for example) work with a strictly defined set of registers that cannot be changed. This is the price of backward compatibility with previous versions. Another disappointing limitation is that x86 doesn't support "memory to memory" addressing, and one of the numbers being processed must be loaded into the register or represent a direct value. Actually, 5 percent of an Assembly program is made up of data exchange commands.

All these actions take place on the arena called address space. Address space is simply a set of virtual memory cells available to the processor. Operating systems like Windows 9 x and most UNIX clones create an individual 4-GB region of memory for each application, where it is possible to distinguish at least three areas: code segment, the data segment, and the stack.

The stack is simply a method of storing data. It is something like a combination of a list and an array (see The Art of Computer Programming famous book by Donald Knuth). The push command loads a new portion of data on top of the stack, and the pop command retrieves the contents of the stack top. This allows data to be stored in memory without the need to take care of their absolute addresses. It is convenient ! Function calls are carried out in exactly this manner. The call func command pushes the address of the next command onto the stack, and ret pops it from the stack. The pointer to the current position of the stack top is stored in the ESP register. As relates to the stack bottom, only the length of the address space formally limits the stack. In practice, it is limited by the amount of memory allocated to it. The direction of stack growth is from higher addresses to lower ones. In other words, the stack grows from bottom to top.

The EIP register contains the pointer to the next executable command. It is not available for direct modification. The EAX, EBX, ECX, EDX, ESI, EDI , and EBP registers are called general-purpose registers and can freely participate in any arithmetic operations or memory-access operations. There are seven such 32-bit registers in total. The first four registers ( EAX, EBX, ECX , and EDX ) can be accessed by their 16-bit halves storing the least significant words ” AX, BX, CX , and DX , respectively. Each of these words, in turn , is divided into most significant and least significant bytes ” AH/AL, BH/BL, CH/CL , and DH/DL , respectively. It is important to understand that AL, AX , and EAX are not three different registers but, on the contrary, three different parts of the same register.

Furthermore, there are other registers ” segment registers, multimedia registers, mathematical coprocessor registers, debug registers, etc. Without a comprehensive manual, beginners can be easily confused and get lost in this jungle . At the beginning, however, I won't consider them too excessively.