How Is It Done on a PC? | Embedded Systems Firmware Demystified (With CD-ROM)

To create a convenient execution context on the target machine, the developer needs to implement substitutes for many of the key operating system services that both users and programmers tend to take for granted in the desktop world. A PC uses DOS and BIOS to perform many tasks that the average user takes for granted. You begin to realize just how much the PC does once you try to duplicate some of its functionality in an embedded systems project.

Consider the sample program shown in Listing 2.1.

Listing 2.1: A Simple Sample.

 int PrintAMessage(void) {     return(printf("This is a message\n")); } int main(int argc,char *argv[]) {     int msize;          sleep(1);     msize = PrintAMessage();     return(msize); }

Typically, this program exists as an executable file with the extension .exe . When the user types in the file name , a search is made through the file system directories (specified by the PATH shell variable) looking for a match. The information in the file header is checked to verify that the file is a valid executable. Lastly, the executable file is loaded into DRAM of the PC and executed.

This process requires four significant steps:

the shell takes the incoming program name from the keyboard and determines that it is not a command within the local shell;
the shell searches the file system until the program name is found;
the loader verifies the program and loads it into DRAM;
the loader transfers control to the program in DRAM, and the program executes .

These four steps describe the process of running a program from the highest level. If you really want to get inquisitive , there are certainly more details to consider. For example, how does the shell start up in the first place? How does a character get from the keyboard to the CPU inside the PC? How does the CPU retrieve the file from the disk drive? Some of these questions are relevant because they must be translated into some process or sequence of steps in our embedded system, and other questions are not especially relevant to an embedded system. So, although a desktop PC is quite a bit different from a typical embedded system, there are many fundamental similarities. If you understand how the PC works, you will find it a lot easier to understand firmware.

Four concepts from the PC world are particularly helpful for an understanding of the embedded systems environment:

Command line interface In DOS, the file command.com is one of the first files retrieved from the disk when the PC is turned on. The command.com program is a shell that provides an interface between the user and the PC hardware. In an embedded system, there is no insulation (shell) between the program and the hardware. The program must interact with the hardware directly. In other words, you need to create the equivalent of a command.com interpreter for your embedded system. You cant just write a printf("hello world\n") statement and take the rest for granted.
File system The file system allows programs to be stored in some large capacity device and pulled into the processors memory space only when needed. In an embedded system, the program is typically stored in flash memory all the time, as there is seldom any other storage device from which you can retrieve a program. As you will see later, this doesnt mean that the flash device cant look and act like a file system.
Loader Because the PC can have more than one application active at a time (through the use of interrupts and the vector table), there is no way to predict just where the program will reside in memory when it is executed. The loader takes care of the memory location details using the relocation information in the .exe file. In an embedded system, you can assume that an application runs at a specific point in memory, so you can build the program to reside at a specific location in target memory. A program that is built to run at some dynamically chosen point in system memory is called relocatable . A program built to run at a fixed location in memory is called absolute .
Services On a PC, the application can assume that certain services are provided by trap handlers already loaded into the vector table of the CPU. These handlers are an important part of the service layer created by the BIOS. Because an embedded target has no BIOS, the application obviously cant assume there is one! The implication is that some of the standard libraries provided with the compilers cannot be used, or, if they are, they must not assume any underlying facilities exist in the platform. For example, your embedded application cant call fopen() and expect a file to magically open somewhere. You cant even count on being able to use use putchar () because the fundamental system calls that interface to the serial port may depend on BIOS services (or some equivalent). If you wish to make these types of functions available on an embedded system, you must provide these services yourself.

To summarize, many of the features you take for granted on a PC are features you must build for yourself on an embedded system. Youll learn how to build these features later in this book.

The program in Listing 2.1 makes several other implicit assumptions about the execution context, for example:

Upon entry into main() , the stack pointer must be pointing to memory space that can be used for temporary storage.
Something from somewhere must be able to pass arguments to main() (the function that is supposedly the starting point of the program).
Some notion of time must be established to allow functions like sleep() to work properly.
When main() returns, some supervisor such as the operating sytem (OS) will resume control.

The Cross-Compilation Process

Within the context of this discussion, there are two different types of compilation: native and cross. Both kinds of compilation produce a file containing binary machine code (called an executable ), but thats where the similarity ends.

You are probably most familiar with native compilation . With native compilation, the programmer writes a program on a PC, compiles it on a PC, and runs it on a PC. The environment in which the program was compiled is the same environment in which it is executed. Embedded systems developers, on the other hand, usually generate code using cross -compilation . You write a program on a PC, compile it on a PC, and run it on some other target. (The host platform for compilation doesnt have to be a PC, it can be a UNIX machine or other operating system.) Not only is the host a different machine, but the target and host CPUs are probably totally different beasts.

The end result of the native compilation process is a file (the executable) in a format that the operating systems loader understands. (The loader is the tool in the host system that takes the executable and does what is necessary to execute it on the host system.) Depending on the host OS, the executable may be relocatable or absolute . If an executable is relocatable, it is built so that the loader can place it anywhere in the memory space of the host system. If the executable is absolute, it is built to run starting at some fixed address in memory (usually zero). Obviously, there is only one real location zero in the memory space, but the loader knows that the hosts MMU will take the necessary steps so that the program thinks it is running from location zero. Consequently, in a native environment, the programmer cannot know where the program will be loaded in memory.

The end result of the cross-compilation process is also an executable file, and prior to some final processing, this file may actually look a lot like a standard executable file for a UNIX or DOS machine. When building a program that is to boot a target system, the file is not relocatable. ^[1] This is because boot code is always destined to reside at some fixed memory location in the targets memory space (not necessarily zero), and, when boot code is executed, the MMU (if there is one) is disabled. At boot time, there is no underlying loader; the only layer below the bootcode is the solder and circuit board. Various toolsets might compile the executable to any of several different commonly used file formats. Rather than covering each of these formats, this book examines a generic file format that represents most of the common file formats used today. The executable file is divided into two main parts : a series of headers, followed by a series of sections. The first header in the series describes the file. The remaining headers describe the following sections. Each section header contains information about the physical location of the section, the section size , whether or not the section is instruction or data, and, if it is data, whether or not it is assumed to be writable. The section header that corresponds to the RAM space needed for uninitialized data, usually called BSS, ^[2] does not actually have a corresponding section, because it does not have any associated data. This space is simply cleared at startup. Some of the other sections contain binary data destined for the boot device. Other sections contain symbolic information used by a debugger to debug the program. Since the current focus is on providing information to the boot device, I do not detail the sections that contain debug information.

On its own, the file header information doesnt offer much help for the CPU. The CPU doesnt know what a file header is, it just wants to be able to fetch instructions from the reset location in target memory space and start executing them. The cross-compilation process requires an additional step to convert the executable file from the format illustrated in Figure 2.1 to one that presents the data in a way the CPU understands.

Figure 2.1: Example Executable File Format.

Typically, executable files are partitioned into sections, where each section represents a contiguous block of memory. For purposes of discussion, I assume executable files follow the simple structure shown in this illustration.

Since the executable file represents an absolute memory map, the code behind it has been compiled and linked to run at a specific location in memory. The linker extracts the necessary location information from the memory map input file. This file tells the linker where to put the different output sections (text, data and BSS).

The final step is to convert the executable file into a single binary image that will look to the processor like raw instructions and data. This is usually a fairly easy step, but its complexity depends on the complexity of the memory map. For the sake of our discussion, you can assume that the systems boot-flash-device starts at location 0x000000 and its RAM starts at location 0x800000 ( assuming our CPU has a 24-bit address bus). In addition, you can assume that the .text section resides at and all the other non-BSS sections are concatenated above this location. This layout makes the transition to a true binary image fairly simple. The raw data in the .text section becomes the beginning of the image file, the raw data of .data and .rodata sections are appended to the raw data of .text image.

Note that even though you instruct the linker to place each of the sections back-to-back in memory, there still may be a hole between any two sections. You can determine the size of this hole by observing the starting address and size of each of the sections. If the .text section is 0xfff9 bytes, starts at address 0x0000 , and has the .data section directly appended to it, the starting address for the .data section will be 0xfff9 . If the address is higher, then for one reason or another (usually alignment), the linker has shifted the start address of the section. This shift must be accounted for with padding in the image file. For example, if the .data section starts at 0x10000 , you must insert 0x10000-0xfff9 bytes of padding before appending the content of the .data section to the new image file. You also need to take the shifting into account for each of the remaining sections. The end result is a binary file that looks something like Figure 2.2.

Figure 2.2: File Containing Raw Binary Data Destined for Boot Flash Memory.

This figure shows how the sections of Figure 2.1 map into the eventual binary image that is to programmed to flash memory. Notice that the headers are gone!

The file of Figure 2.2 contains the actual instructions and data that the CPU will fetch out of memory and convert into some logical operation. Using a device programmer, a developer can transfer this binary image, unchanged, to the non-volatile storage space of the flash device. After the flash device is placed in the target system, on reset the CPU begins to fetch the binary data that represents the program.

You now know a little bit about what an embedded system executable looks like on the inside. Initially, the file looks similar to those on a UNIX or DOS machine, but the similarities fade abruptly on closer examination. There is no shell to invoke the loader and no file system from which to load the executable. Intead, the embedded system has an empty socket into which you insert a programmed memory device.

The loader (if you dare call it that) in a typical embedded system works something like this:

Transfer the raw binary file to a floppy disk.
Take the floppy disk to some flash device programmer.
Install the floppy disk and copy the file as binary data to a local buffer on the programmer.
Insert the flash device into the programmer socket.
Erase the flash memory and then transfer the content of the local buffer to the flash device.
Wait for completion, then remove the floppy disk and flash device and insert the flash device into the socket on the target board.
Turn on the power and pray.

Its quite a different loader from what was described for the PC earlier!

Today most, but not all, cross-development environments offer friendlier, faster alternatives to this device-substitution approach. Some device programmers have a network connection so that the file can be transferred over a network, thereby eliminating the floppy disk steps. If you are really lucky, the target system has a JTAG or BDM interface that you can use to connect directly to load the program image. These interfaces can eliminate the need for an external programming device.

Establishing the Memory Map

The memory map describes (in terms of address space ranges) where the designer has placed memory and peripheral devices. The memory map may be simple flash memory and RAM. On the other hand, the processor may boot out of the top of memory, and there may be multiple devices that are not necessarily in contiguous memory space.

Usually, the hardware is designed so that the firmware can use all the memory of any one type contiguously. All flash memory is located as a contiguous block, as is all RAM, EPROM, etc. The processor also requires a block of non-volatile memory in hardware at the location where it resets. This block of memory is typically called a boot block , and it is a requirement, not an option. The CPU must reset and access valid memory; hence, the boot block must be non-volatile.

The details of the memory map are determined by the hardware designers (hopefully with some input from the firmware developers!). The firmware developer communicates these design decisions to the linker and other tools through entries in a configuration file called the link editor file or memory map file. Depending on the target and some of the needs of the firmware, this file can get quite complicated.

The Link Editor File

The following example is basic, but complete. It shows how the link editor file tells the linker where the physical memory is (the MEMORY directive) and where within that memory each section is to be placed (the SECTION directive). Each line in the MEMORY description specifies a name, start address, and length for each block of memory within the system being mapped. In the case of Listing 2.2, there is 256K of flash memory ( 0x40000 ) starting at location zero and 512K of dynamic RAM ( 0x80000 ) starting at location 0x80000 . Note that the names flash and dram have no meaning other than to tag the block of memory. The example code could just as easily call these memory blocks "bill" and "mary." Also, note that comments are usually allowed in these files, and you should use them to describe what you are trying to configure. As with any piece of code, other developers might need to modify it.

Listing 2.2: Link Editor File

 /* Memory Map File for widget.  This hardware has .25Meg of FLASH  * from 0-0x3ffff and .5Meg of DRAM from at 0x80000-0xfffff.    * This program is built such that initialized data is left in  * ROM space.  This keeps things simple, but does require  * that no initialized data be written at runtime.  */ MEMORY  {     flash : org = 0,        len = 0x40000     dram  : org = 0x80000,  len = 0x80000  } /* Note the use of boot_base, bss_start and bss_end to tag beginning  * of flash, and boundaries of .bss space respectively.  These tags  * can be used by the program to reference the hard-coded memory locations  * they represent.  */ SECTIONS {     .text   :     {         boot_base = .;         *(.text)     } >flash     .data   :     {         *(.data)     } >flash     .bss    :     {         bss_start = .;         *(.bss)         bss_end = .;     } >dram }

The blocks in the SECTIONS portion of the file establish where each of the fundamental sections are placed in real memory space (see Listing 2.2). The *( .text ), *( .data ) and *( .bss ) lines tell the linker to put all of the text, data, and BSS portions (respectively) of the image into the referenced memory area. If needed, the user can also list the object file names in the order they should be placed in memory. The other alternative is to specify the order on the linker command line.

The boot_base = ., bss_start = ., and bss_end = . lines are used to create labels that can be referenced by C code. These labels look like variables located at the current location (.) in the memory map. In this example, boot_base corresponds to address 0x0000 (in the flash memory block). The bss_start tag maps to the begining of BSS space, while bss_end corresponds to the last DRAM address allocated to BSS.

Note	Some toolsets provide these tags; others dont. So the consistent thing to do is provide your own set of tags and use them regardless of what the toolset provides.

Text, Data, and BSS

So far this chapter has concentrated on the memory sections . text , . data , and . bss . There can actually be several more sections aside from these three; however, these are the basic sections (or section types). A text section usually contains code (instructions that are fetched from memory and executed by the CPU). A BSS section is a section of memory that does not contain any initialized data. Usually, this section is cleared to zero at run-time startup. A data section usually holds initialized data. The data section includes the space used by variables that are declared with an initial value. The data section can also house all the initialized strings used throughout the code.

What further complicates the data section is that some initialized data may also be writable. Nonwritable initialized data can be part of flash memory. The initial state of writable initialized data must also be part of flash memory, but for this data to be writable, its run-time value must be stored in RAM. To accomodate this dual personality, when the program starts all writable initial values are copied from flash to matching data blocks in RAM. The program manipulates only the values in RAM.

The code of Listing 2.3 illustrates the varieties of data. The actual instructions that make up the function called func are placed in the text section. The strings passed to printf() represent initialized, read-only data, so this information is placed in a data section that does not need to be copied to RAM. The variable justStarted is initialized, but the code wants to be able to modify it, so the variable must be assigned to a data section that starts off in flash memory and is copied to RAM. Finally, the variable sysClock is not initialized but is writable, so it is assigned to the BSS section, which represents RAM space that is cleared to zero at startup.

Listing 2.3: A Program That Uses Text, Data, and BSS Sections.

 int justStarted = 1; int sysClock; int func() {     if (justStarted == 1) {         printf("Hey, we just started!\n");         justStarted = 0;     }     else {         printf("Hello sysclock = %d\n",sysClock);     } }

One last note regarding these sections: in assembly language, you can put code in data space and data in code space if you want to. The directives in the assembler allow this; however, standard C compilers place the different types of binary data in the sections as described above, so it only makes sense to follow that lead.

Different Reset Vectors Equate to Different Memory Maps

It is worth noting that different CPUs branch to different points in memory as a result of a reset or power-up . Some jump to the bottom of memory, some to the top, and some jump somewhere in between. Wherever the reset vector goes, so goes the boot flash memory. For this and other reasons, the reset vector position can complicate the memory map.

Clearly, the CPUs reset philosophy plays a major role in establishing the memory map of the system. For top-boot CPUs, boot flash memory occupies the top of physical memory space, and everything else is below it. For mid-boot CPUs, additional memory could be on either side, and, for bottom-boot CPUs, all additional memory is above. Figure 2.3 shows these three different scenarios.

Figure 2.3: Reset Vector <-> Boot Flash Configuration.

The reset behavior of the processor determines the location (in address space) of the boot device and therefore, of the flash memory.

The make File

As with many of the examples in this book, I will discuss a simplified make file. The goal of this section is to introduce the build procedure for a cross-compilation environment, not to provide a tutorial on make .

The sample makefile (Listings 2.42.7) assumes the programmer is using the GNU cross-compilation tools for a ColdFire 5272 microprocessor under the bash shell environment. This toolset generates an output file format known as Common Object File Format (COFF). COFF is one specific example of the many different output file formats that are available. Typically, the GNU tools produce COFF, ELF, or AOUT file formats. The bash shell provides an environment that looks very much like a typical UNIX shell; hence, you see commands like rm instead of delete and cp instead of copy .

Ive divided the make file into four sections: initialization, linkage, modules, and miscellaneous.

The basic format of the makefile is the same for cross-compilation as it would be in a native environment. On closer examination, however, you can see that there are quite a few subtle differences that make the typical cross-compilation makefile more complicated.

Listing 2.4: Initialization Section for the ColdFire 5206 Makefile.

 ############################################################################### # # Makefile for building M5272C3 based system. # PROG        = myprog OBJCOPY     = m68k-coff-objcopy OBJDUMP     = m68k-coff-objdump NM          = m68k-coff-nm CC          = m68k-coff-gcc ASM         = m68k-coff-as ASMCPP      = cpp -D ASSEMBLY_ONLY LD          = m68k-coff-ld CCOPTS      = -Wall -g -c -m5200 -o $@ ASMOPTS     = -m5200 -o $@ OBJS=obj/reset.o obj/start.o obj/cpuio.o obj/main.o

The initialization section of the makefile (Listing 2.4) shows the GNU/ColdFire-specific commands that replace the standard cc and ld commands. The code also defines additional commands to create a variety of different output files. The first PROG definition serves as a standard output file prefix. You will see shortly that there are a variety of different output files that can be created based on the final executable produced by ld , so the PROG definition specifies a common base name for these different files. Most of the tools have simple one-to-one mappings. For example, for the ColdFire, the CC macro is simply m68k-coff-gcc , ld is simply m68k-coff-ld , and so on. Most of the options are also pretty standard: -g to include symbolic information in the final output file, -c to tell the compiler that this module is incomplete and will be linked later with additional modules, and o to tell the tools where to put the output. Notice that CCOPTS includes m5200 to let the compiler know what type of ColdFire microprocessor will use the code.

Listing 2.5: Top Level all Target.

 ############################################################################### # # Top level target to create executable image: # all: $(OBJS) makefile      $(LD) -Map=$(PROG).map -TROM.lnk -nostartfiles -ecoldstart \         -o $(PROG) $(OBJS)      coff -B $(PROG).bin $(PROG)      $(NM) --numeric-sort $(PROG) > $(PROG).sym

The top level, all target (see Listing 2.5), depends on all of the individual object modules (OBJS) that must be compiled or assembled prior to performing the linkage ( LD ). The LD line takes all of the modules ( reset , start , cpuio , and main ) and links them together in the order of their listing. It establishes an absolute memory map based on the memory map specified by - TROM.lnk (where ROM.lnk is the name of the memory map file). The - Map option tells the linker to generate a file ( myprog.map ) that describes, in detail, the memory map of the image being created. Typically, a loader automatically includes code that does some initialization prior to turning over control to the applications main() function. Because embedded systems run on the bare metal, you must write custom code to initialize your particular hardware (in reset.s and start.c ). The -nostartfile option tells the linker to omit the default startup module ( crt0.o ). The -ecoldstart option tells the linker to use coldstart as the entry point (instead of the default entry point in crt0.o ).

The coff command converts the output COFF file produced by ld into the binary image that the CPU requires. (The source and executable for the coff tool are supplied on the CD but are not part of the standard GNU toolset.) Finally, the NM command is a convenience that allows the developer to query the myprog.sym file immediately for symbolic information about individual variables in the final load image.

Listing 2.6: Rules Section.

 ############################################################################### # # Individual rules: # obj/cpuio.o: cpuio.c cpuio.h      $(CC) $(CCOPTS) cpuio.c obj/start.o: start.c config.h      $(CC) $(CCOPTS) start.c obj/main.o: main.c config.h      $(CC) $(CCOPTS) main.c obj/reset.o:    reset.s  config.h      $(ASMCPP) reset.s >tmp.s     $(ASM) $(ASMOPTS) tmp.s

The rules section (Listing 2.6) is pretty standard, except in the way that it uses both CPP and ASM for the reset.s file. Invoking both tools allows the developer to include header files in both C and assembly language. Though some assemblers support a CPP pass as an option, I have coded two separate steps to emphasize the two passes .

The final section of the makefile (see Listing 2.7) shows some of the options that are typically not needed for native compilation. This section also demonstrates how defining a PROG prefix can keep things more organized. The make targets in this correspond to different output files that can be generated by the ld step. The S-record file format could be used in place of the binary format generated by the coff tool. The showmap target simply dumps the content of the section headers. Because the section headers contain information about each of the sections, the showmap target can be useful when trying to diagnose incorrect memory map file settings. Finally, the dis and disx targets dump a listing that includes the source and disassembly intermingled. This listing is useful when using a primitive debugger to set breakpoints within functions.

Listing 2.7: Miscellaneous Section.

 ############################################################################### # # Miscellaneous utilities: # clean:     rm -f obj/* clobber:    clean     rm -f $(PROG) $(PROG).map     rm -f $(PROG).bin $(PROG).srec $(PROG).dis $(PROG).sym  gnusrec:     $(OBJCOPY) -F srec $(PROG) $(PROG).srec showmap:     $(OBJDUMP) --section-headers $(PROG) dis:     $(OBJDUMP) --source --disassemble $(PROG) >$(PROG).dis disx:     $(OBJDUMP) --source --disassemble --show-raw-insn $(PROG) >$(PROG).dis

^[1] In general, programs destined for embedded systems are not relocatable; however, if the program is destined for an embedded system that already has some underlying platform, relocation becomes an option if the platform supports it.

^[2] BSS means block started by symbol. This term originates from the old IBM mainframe world and refers to a block of memory that is not initialized. Gintaras R. Gircys, Understanding and Using COFF , (Sebastopol, CA: OReilly & Associates, 1988), pg 9.