Section 2.2. Assembly


2.2. Assembly

Linux is an operating system. As such, sections of it are closely bound to the processor on which it is running. The Linux authors have done a great job of keeping the processor- (or architecture-) specific code to a minimum, striving for the maximum reuse of code across all the supported architectures. In this section, we look at the following:

  • How the same C function is implemented in x86 and PowerPC architectures.

  • The use of macros and inline assembly code.

This section's goal is to cover enough of the basics so you can trace through the architecture-specific kernel code having enough understanding so as not to get lost. We leave advanced assembly-language programming to other books. We also cover some of the trickiest architecture-specific code: inline assembler.

To discuss freely PPC and x86 assembly languages, let's look at the architectures of each processor.

2.2.1. PowerPC

The PowerPC is a Reduced Instruction Set Computing (RISC) architecture. The goal of RISC architecture is to improve performance by having a simple instruction set that executes in as few processor cycles as possible. Because they take advantage of the parallel instruction (superscalar) attributes of the hardware, some of these instructions, as we soon see, are far from simple. IBM, Motorola, and Apple jointly defined the PowerPC architecture. Table 2.1 lists the user set of registers for the PowerPC.

Table 2.1. PowerPC User Register Set

Register Name

Width for Arch.

Function

Number of Regs

 

32 Bit

64 Bit

  

CR

32

32

Condition register

1

LR

32

64

Link register

1

CTR

32

64

Count register

1

GPR[0..31]

32

64

General-purpose register

32

XER

32

64

Fixed-point exception register

1

FPR[0..31]

64

64

Floating-point register

32

FPSCR

32

64

Floating-point status control register

1


Table 2.2 illustrates the Application Binary Interface usage of the general and floating-point registers. Volatile registers are for use any time, dedicated registers have specific assigned uses, and non-volatile registers can be used but must be preserved across function calls.

Table 2.2. ABI Register Usage

Register

Type

Use

r0

Volatile

Prologue/epilogue, language specific

r1

Dedicated

Stack pointer

r2

Dedicated

TOC

r3-r4

Volatile

Parameter passing, in/out

r5-r10

Volatile

Parameter passing

r11

Volatile

Environment pointer

r12

Volatile

Exception handling

r13

Non-volatile

Must be preserved across calls

r14-r31

Non-volatile

Must be preserved across calls

f0

Volatile

Scratch

f1

Volatile

1st FP parm, 1st FP scalar return

f2-f4

Volatile

2nd4th FP parm, FP scalar return

f5-f13

Volatile

5th13th FP parm

f14-f31

Non-volatile

Must be preserved across calls


Application Binary Interface (ABI)

An ABI is a set of conventions that allows a linker to combine separately compiled modules into one unit without recompilation, such as calling conventions, machine interface, and operating-system interface. Among other things, an ABI defines the binary interface between these units. Several PowerPC ABI variations are in existence. They are often related to the target operating system and/or hardware. These variations or supplements are documents based on the UNIX System V Application Binary Interface, originally from AT&T and later from the Santa Cruz Operation (SCO). The benefits of conforming to an ABI are that it allows linking object files compiled by different compilers.


The 32-bit PowerPC architecture uses instructions that are 4 bytes long and word aligned. It operates on byte, half-word, word, and double-word accesses. Instructions are categorized into branch, fixed-point, and floating-point.

2.2.1.1. Branch Instructions

The condition register (CR) is integral to all branch operations. It is broken down into eight 4-bit fields that can be set explicitly by a move instruction, implicitly, as the result of an instruction, or most common, as the result of a compare instruction.

The link register (LR) is used by certain forms of the branch instruction to provide the target address and the return address after a branch.

The count register (CTR) holds a loop count decremented by specific branch instructions. The CTR can also hold the target address for certain branch instructions.

In addition to the CTR and LR above, PowerPC branch instructions can jump to a relative or absolute address. Using Extended Mnemonics, there are many forms of conditional branches along with the unconditional branch.

2.2.1.2. Fixed-Point Instructions

The PPC has no computational instructions that modify storage. All work must be brought into one or more of the 32 general-purpose registers (GPRs). Storage access instructions access byte, half-word, word, and double-word data in Big Endian ordering. With Extended Mnemonics, there are many load, store, arithmetic, and logical fixed-point instructions, as well as special instructions to move to/from system registers.

2.2.1.3. Floating-Point Instructions

Floating-point instructions can be broken down into two categories: computational, which includes arithmetic, rounding, conversion, and comparison; and non-computational, which includes move to/from storage or another register. There are 32 general-purpose floating-point registers; each can contain data in double-precision floating-point format.

Big Endian/Little Endian

In processor architecture, Endianness refers to byte ordering and operations. The PowerPC is said to be Big Endian, that is, the most significant byte is at the lower address and the least significant byte is 3 bytes later (for 32-bit words). Little Endian, adopted by the x86 architecture, is just the opposite. The least-significant byte is at the lower address and the most significant is 3 bytes later. Let's examine the representation of 0x12345678 (see Figure 2.5):

Figure 2.5. Big and Little Endian Byte Ordering


Discussion on which system is better is beyond the scope of this book, but it is important to know which system you are working with when writing and debugging code. An example pitfall to Endianness is writing a device driver using one architecture for a PCI device based on the other.

The terms Big Endian and Little Endian originate from Jonathan Swift's Gulliver's Travels. In the story, Gulliver comes to find two nations at war over which way to eat a boiled eggfrom the big end or the little end.


2.2.2. x86

The x86 architecture is a Complex Instruction Set Computing (CISC) architecture. Instructions are variable length, depending on their function. Three kinds of registers exist in the Pentium class x86 architecture: general purpose, segment, and status/control. The basic user set is as follows.

Here are the eight general-purpose registers and their conventional uses:

  • EAX. General purpose accumulator

  • EBX. Pointer to data

  • ECX. Counter for loop operations

  • EDX. I/O pointer

  • ESI. Pointer to data in DS segment

  • EDI. Pointer to data in ES segment

  • ESP. Stack pointer

  • EBP. Pointer to data on the stack

These six segment registers are used in real mode addressing where memory is accessed in blocks. A given byte of memory is then referenced by an offset from this segment (for example, ES:EDI references memory in the ES (extra segment) with an offset of the value in the EDI):

  • CS. Code segment

  • SS. Stack segment

  • ES, DS, FS, GS. Data segment

The EFLAGS register indicates processor status after each instruction. This can hold results such as zero, overflow, or carry. The EIP is a dedicated pointer register that indicates an offset to the current instruction to the processor. This is generally used with the code segment register to form a complete address (for example, CS:EIP):

  • EFLAGS. Status, control, and system flags

  • EIP. The instruction pointer, contains an offset from CS

Data ordering in x86 architecture is in Little Endian. Memory access is in byte (8 bit), word (16 bit), double word (32 bit), and quad word (64 bit). Address translation (and its associated registers) is discussed in Chapter 4, but for this section, it should be enough to know the usual registers for code and data instructions in the x86 architecture can be broken down into three categories: control, arithmetic, and data.

2.2.2.1. Control Instructions

Control instructions, similar to branch instructions in PPC, alter program flow. The x86 architecture uses various "jump" instructions and labels to selectively execute code based on the values in the EFLAGS register. Although many variations exist, Table 2.3 has some of the most common uses. The condition codes are set according to the outcome of certain instructions. For example, when the cmp (compare) instruction evaluates two integer operands, it modifies the following flags in the EFLAGS register: OF (overflow), SF (sine flag), ZF (zero flag), PF (parity flag), and CF (carry flag). Thus, if the cmp instruction evaluated two equal operands, the zero flag would be set.

Table 2.3. Common Forms of the Jump Instruction

Instruction

Function

EFLAGS Condition Codes

je

Jump if equal

ZF=1

jg

Jump if greater

ZF=0 and SF=OF

jge

Jump if greater or equal

SF=OF

jl

Jump if less

SF!=OF

jle

Jump if less or equal

ZF=1

jmp

Unconditional jump

unconditional


In x86 assembly code, labels consist of a unique name followed by a colon. Labels can be used anywhere in an assembly program and have the same address as the line of code immediately following it. The following code uses a conditional jump and a label:

 ----------------------------------------------------------------------- 100   pop eax 101 loop2: 102   pop ebx 103   cmp eax, ebx 104   jge loop2 ----------------------------------------------------------------------- 

Line 100

Get the value from the top of the stack and put it in eax.

Line 101

This is the label named loop2.

Line 102

Get the value from the top of the stack and put it in ebx.

Line 103

Compare the values in eax and ebx.

Line 104

Jump if eax is greater than or equal to ebx.

Another method of transferring program control is with the call and ret instructions. Referring to the following line of assembly code:

 -----------------------------------------------------------------------    call my_routine ----------------------------------------------------------------------- 

The call instruction transfers program control to the label my_routine, while pushing the address of the instruction immediately following the call instruction on the stack. The ret instruction (executed from within my_routine) then pops the return address and jumps to that location.

2.2.2.2. Arithmetic Instructions

Popular arithmetic instructions include add, sub, imul (integer multiply), idiv (integer divide), and the logical operators and, or, not, and xor.

x86 floating-point instructions and their associated registers move beyond the scope of this book. Recent extensions to Intel and AMD architectures, such as MMX, SSE, 3DNow, SIMD, and SSE2/3, greatly enhance math-intensive applications, such as graphics and audio. You are directed to the programming manuals for their respective architectures.

2.2.2.3. Data Instructions

Data can be moved between registers, between registers and memory, and from a constant to a register or memory, but not from one memory location to another. Examples of these are as follows:

 ----------------------------------------------------------------------- 100  mov eax,ebx 101  mov eax,WORD PTR[data3] 102  mov BYTE PTR[char1],al 103  mov eax,0xbeef 104  mov WORD PTR [my_data],0xbeef ----------------------------------------------------------------------- 

Line 100

Move 32 bits of data from ebx to eax.

Line 101

Move 32 bits of data from memory variable data3 to eax.

Line 102

Move 8 bits of data from memory variable char1 to al.

Line 103

Move the constant value 0xbeef to eax.

Line 104

Move the constant value 0xbeef to the memory variable my_data.

As seen in previous examples, push, pop, and the long versions pushl and popl move data to and from the stack (pointed to by SS:ESP). Similar to the mov instruction, the push and pop operations can be used with registers, data, and constants.




The Linux Kernel Primer. A Top-Down Approach for x86 and PowerPC Architectures
The Linux Kernel Primer. A Top-Down Approach for x86 and PowerPC Architectures
ISBN: 131181637
EAN: N/A
Year: 2005
Pages: 134

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net