Section 2.4. Inline Assembly

2.4. Inline Assembly

Another form of coding allowed with the gcc compiler is the ability to do inline assembly code. As its name implies, inline assembly does not require a call to a separately compiled assembler program. By using certain constructs, we can tell the compiler that code blocks are to be assembled rather than compiled. Although this makes for an architecture-specific file, the readability and efficiency of a C function can be greatly increased.

Here is the inline assembler construct:

 ----------------------------------------------------------------------- 1  asm (assembler instruction(s) 2   : output operands   (optional) 3   : input operands   (optional) 4   : clobbered registers  (optional) 5  );  -----------------------------------------------------------------------

For example, in its most basic form,

 asm ("movl %eax, %ebx");

could also be written as

 asm ("movl %eax, %ebx" :::);

We would be lying to the compiler because we are indeed clobbering ebx. Read on.

What makes this form of inline assembly so versatile is the ability to take in C expressions, modify them, and return them to the program, all the while making sure that the compiler is aware of our changes. Let's further explore the passing of parameters.

2.4.1. Ouput Operands

On line 2, following the colon, the output operands are a list of C expressions in parentheses preceded by a "constraint." For output operands, the constraint usually has the = modifier, which indicates that this is write-only. The & modifier shows that this is an "early clobber" operand, which means that this operand is clobbered before the instruction is finished using it. Each operand is separated by a comma.

2.4.2. Input Operands

The input operands on line 3 follow the same syntax as the output operands except for the write-only modifier.

2.4.3. Clobbered Registers (or Clobber List)

In our assembly statements, we can modify various registers and memory. For gcc to know that these items have been modified, we list them here.

2.4.4. Parameter Numbering

Each parameter is given a positional number starting with 0. For example, if we have one output parameter and two input parameters, %0 references the output parameter and %1 and %2 reference the input parameters.

2.4.5. Constraints

Constraints indicate how an operand can be used. The GNU documentation has the complete listing of simple constraints and machine constraints. Table 2.4 lists the most common constraints for the x86.

Table 2.4. Simple and Machine Constraints for x86
Constraint	Function
`a`	`eax` register.
`b`	`ebx` register.
`c`	`ecx` register.
`d`	`edx` register.
`S`	`esi` register.
`D`	`edi` register.
`I`	Constant value (0…31).
`q`	Dynamically allocates a register from `eax`, `ebx`, `ecx`, `edx`.
`r`	Same as `q` + `esi`, `edi`.
`m`	Memory location.
`A`	Same as `a` + `b`. `eax` and `ebx` are allocated together to form a 64-bit register.

2.4.6. asm

In practice (especially in the Linux kernel), the keyword asm might cause errors at compile time because of other constructs of the same name. You often see this expression written as __asm__, which has the same meaning.

2.4.7. volatile

Another commonly used modifier is __volatile__. This modifier is important to assembly code. It tells the compiler not to optimize the inline assembly routine. Often, with hardware-level software, the compiler thinks we are being redundant and wasteful and attempts to rewrite our code to be as efficient as possible. This is useful for application-level programming, but at the hardware level, it can be counterproductive.

For example, say we are writing to a memory-mapped register represented by the reg variable. Next, we initiate some action that requires us to poll reg. The compiler simply sees this as consecutive reads to the same memory location and eliminates the apparent redundancy. Using __volatile__, the compiler now knows not to optimize accesses using this variable. Likewise, when you see asm volatile (...) in a block of inline assembler code, the compiler should not optimize this block.

Now that we have the basics of assembly and gcc inline assembly, we can turn our attention to some actual inline assembly code. Using what we just learned, we first explore a simple example and then a slightly more complex code block.

Here's the first code example in which we pass variables to an inline block of code:

 ----------------------------------------------------------------------- 6  int foo(void) 7  { 8  int ee = 0x4000, ce = 0x8000, reg; 9  __asm__ __volatile__("movl %1, %%eax"; 10   "movl %2, %%ebx"; 11   "call setbits"  ; 12   "movl %%eax, %0" 13   : "=r" (reg)   // reg [param %0] is output    14   : "r" (ce), "r"(ee)  // ce [param %1], ee [param %2] are inputs 15   : "%eax" , "%ebx"   // %eax and % ebx got clobbered  16   ) 17  printf("reg=%x",reg); 18 } -----------------------------------------------------------------------

Line 6

This line is the beginning of the C routine.

Line 8

ee, ce, and req are local variables that will be passed as parameters to the inline assembler.

Line 9

This line is the beginning of the inline assembler routine. Move ce into eax.

Line 10

Move ee into ebx.

Line 11

Call some function from assembler.

Line 12

Return value in eax, and copy it to reg.

Line 13

This line holds the output parameter list. The parm reg is write only.

Line 14

This line is the input parameter list. The parms ce and ee are register variables.

Line 15

This line is the clobber list. The regs eax and ebx are changed by this routine. The compiler knows not to use the values after this routine.

Line 16

This line marks the end of the inline assembler routine.

This second example uses the switch_to() function from include/ asm-i386/system.h. This function is the heart of the Linux context switch. We explore only the mechanics of its inline assembly in this chapter. Chapter 9, "Building the Linux Kernel," covers how switch_to() is used:

[View full width]
 ----------------------------------------------------------------------- include/asm-i386/system.h 012  extern struct task_struct * FASTCALL(__switch_to(struct task_struct *prev, struct  task_struct *next)); ... 015  #define switch_to(prev,next,last) do {      016   unsigned long esi,edi;        017   asm volatile("pushfl\n\t"        018   "pushl %%ebp\n\t"         019   "movl %%esp,%0\n\t"  /* save ESP */     020   "movl %5,%%esp\n\t"  /* restore ESP */     021   "movl $1f,%1\n\t"   /* save EIP */    022   "pushl %6\n\t"   /* restore EIP */    023   "jmp __switch_to\n"         023   "1:\t"           024   "popl %%ebp\n\t"         025   "popfl"          026   :"=m" (prev->thread.esp),"=m" (prev->thread.eip),   027   "=a" (last),"=S" (esi),"=D" (edi)      028   :"m" (next->thread.esp),"m" (next->thread.eip),   029   "2" (prev), "d" (next));        030  } while (0) -----------------------------------------------------------------------

Line 12

FASTCALL tells the compiler to pass parameters in registers.

The asmlinkage tag tells the compiler to pass parameters on the stack.

Line 15

do { statements...} while(0) is a coding method to allow a macro to appear more like a function to the compiler. In this case, it allows the use of local variables.

Line 16

Don't be confused; these are just local variable names.

Line 17

This is the inline assembler; do not optimize.

Line 23

Parameter 1 is used as a return address.

Lines 1724

\n\t has to do with the compiler/assembler interface. Each assembler instruction should be on its own line.

Line 26

prev->thread.esp and prev->thread.eip are the output parameters:

  [ %0]= (prev->thread.esp), is write-only memory [%1]= (prev->thread.eip), is write-only memory

Line 27

[%2]=(last) is write only to register eax:

 [%3]=(esi), is write-only to register esi [%4]=(edi), is write-only to register edi

Line 28

Here are the input parameters:

 [%5]=  (next->thread.esp), is memory [%6]= (next->thread.eip), is memory

Line 29

[%7]= (prev), reuse parameter "2" (register eax) as an input:

 [%8]= (next), is an input assigned to register edx.

Note that there is no clobber list.

The inline assembler for PowerPC is nearly identical in construct to x86. The simple constraints, such as "m" and "r," are used along with a PowerPC set of machine constraints. Here is a routine to exchange a 32-bit pointer. Note how similar the inline assembler syntax is to x86:

 ----------------------------------------------------------------------- include/asm-ppc/system.h 103  static __inline__ unsigned long 104  xchg_u32(volatile void *p, unsigned long val) 105  { 106   unsigned long prev; 107 108   __asm__ __volatile__ ("\n\ 109  1:  lwarx  %0,0,%2 \n" 110   111  "  stwcx.  %3,0,%2 \n\ 112   bne-  1b" 113   : "=&r" (prev), "=m" (*(volatile unsigned long *)p) 114   : "r" (p), "r" (val), "m" (*(volatile unsigned long *)p) 115   : "cc", "memory"); 116 117   return prev; 118  } -----------------------------------------------------------------------

Line 103

This subroutine is expanded in place; it will not be called.

Line 104

Routine names with parameters p and val.

Line 106

This is the local variable prev.

Line 108

This is the inline assembler. Do not optimize.

Lines 109111

lwarx, along with stwcx, form an "atomic swap." lwarx loads a word from memory and "reserves" the address for a subsequent store from stwcx.

Line 112

Branch if not equal to label 1 (b = backward).

Line 113

Here are the output operands:

 [%0]= (prev), write-only, early clobber [%1]= (*(volatile unsigned long *)p), write-only memory operand

Line 114

Here are the input operands:

 [%2]= (p), register operand [%3]= (val), register operand [%4]= (*(volatile unsigned long *)p), memory operand

Line 115

Here are the clobber operands:

 [%5]= Condition code register is altered [%6]= memory is clobbered

This closes our discussion on assembly language and how the Linux 2.6 kernel uses it. We have seen how the PPC and x86 architectures differ and how general ASM programming techniques are used regardless of platform. We now turn our attention to the programming language C, in which the majority of the Linux kernel is written, and examine some common problems programmers encounter when using C.

2.4. Inline Assembly

2.4.1. Ouput Operands

2.4.2. Input Operands

2.4.3. Clobbered Registers (or Clobber List)

2.4.4. Parameter Numbering

2.4.5. Constraints

Table 2.4. Simple and Machine Constraints for x86

2.4.6. asm

2.4.7. __volatile__

Line 6

Line 8

Line 9

Line 10

Line 11

Line 12

Line 13

Line 14

Line 15

Line 16

Line 12

Line 15

Line 16

Line 17

Line 23

Lines 1724

Line 26

Line 27

Line 28

Line 29

Line 103

Line 104

Line 106

Line 108

Lines 109111

Line 112

Line 113

Line 114

Line 115

2.4.7. volatile