System Profiling | Embedded Systems Firmware Demystified (With CD-ROM)

Assume that you are on a firmware development project and that you are working on a hardware platform that has been completed and successfully deployed to the field, but there are new features yet to be delivered. Your assignment is to add some new feature, but when your feature is added, current features start to fail because you dont have enough horsepower to support the additional capability, or you dont have enough memory space left to add your feature.

One solution is to throw up your hands at the hardware people and tell them they need to give you more. This tactic often doesnt go over too well, so you need an alternative. You need to determine if you can rearrange things a bit so that you can get some more bandwidth out of the current implementation. You also need to determine if any functions in the system can be removed because they arent used. These questions are best answered through a technique known as system profiling (sometimes called runtime analysis ).

Function counting , task-ID activity , and basic block coverage are some basic system profiling capabilities. Some of the information you uncover through system profiling is useful for verifying that all code has been executed, and some information is useful for determining where to focus optimization time if you need to speed things up. Note that some of the more sophisticated RTOS packages provide various types of system profiling as a standard component or an optional extension. These tools can shed a lot of light on an RTOS-based firmware project, and, depending on what is used, system profiling might or might not insert additional load on the system. The following sections discuss some system profiling techniques for the developer who does not want to worry about implementing an externally purchased package.

Note

By using external hardware to analyze and display the runtime statistics of a system (based on tracing instruction fetches), you can profile a system without any runtime overhead. This hardware technique is only practical, though, if the address bus is externally accessible and there is some way to deal with the instruction cache.

Use the System Tick

If you are programming an embedded system, it is very likely that your target has some heartbeat or system tick interrupt. This interrupt is usually established in hardware as the highest-level interrupt in the system. You can use this interrupt to log what code segment or task was interrupted each time the interrupt occurs.

The simplest application of this technique is to gather task statistics. When any given task is in the running state, there is some global pointer in the system (usually referred to as a task ID, or TID) that can be made accessible to the system tick handler to let the system tick handler know what task was active at the time of the tick. The interrupt handler can then increment by one some counter (one per task) to keep track of task activity.

This information might or might not be valuable , depending on whats going on. Also, the value of this information depends on the degree of correlation between a given task and certain pieces of code. For example, if many tasks are active but most of the tasks call the same functions, knowing what task is most active doesnt do much good because it doesnt point you to the function that needs to be optimized. If, on the other hand, each task is primarily executing some unique block of code, then task ID logging might be enough to point you in the right direction.

An alternative to logging the task ID statistics is to build in the ability to log what function is active at each system tick. Following are two different ways to log the active function. One method puts the burden on the system tick, and the other method is dependent on the availability of a fairly large amount of RAM.

For the first case, start by building the application and gathering function statistics (location and size ). After you know the function statistics, insert into the system tick a function call that passes the address of the code that was interrupted by the system tick to some function that knows the location and size of all of the functions in the build. By comparing this address to what is known about the build, this function can identify which function was executing and can increment the associated counter. Because this technique adds a fairly complex search or lookup to every clock tick, it can seriously burden the processor.

The second technique offers better performance but requires more RAM. To use this technique, you must have a block of extra RAM equal in size to your .text space (the area that has the functions in it). The system tick would then pass the address of the code that was interrupted to a function that would treat this block of RAM as a table of counters. The function would use the incoming address as an offset into this table and would modify the counter at that offset. This solution is quick and gives a really clear picture of what code is most active.

A command in MicroMonitor called prof supports some of the preceding concepts. Listing 12.20 shows some of the associated code.

Listing 12.20: The prof Command.

 struct pdata {     ulong   data;      /* Start of symbol or tid. */     int     pcount;    /* Pass count.             */ }; /* prof_FuncConfig():  * This function builds a table of pdata structures based on the   * content of the symbol table.  * It assumes the file is a list of symbols and addresses listed  * in ascending address order.  */ void prof_FuncConfig(void) {     int     tfd, i;     struct  pdata *pfp;     char    line[80], *space;     tfd = prof_GetSymFile();     if (tfd < 0) {         return;     }     prof_FuncTot = 0;     pfp = prof_FuncTbl;          while(tfsgetline(tfd,line,sizeof(line)-1)) {         space = strpbrk(line,"\t ");         if (!space) {             continue;         }         *space++ = 0;         while(isspace(*space)) {             space++;         }         pfp->data = strtoul(space,0,0);         pfp->pcount = 0;         pfp++;         prof_FuncTot++;     }     tfsclose(tfd,0);     /* Add one last item to the list so that there is an upper limit for      * the final symbol in the table:      */     pfp->pdata = 0xffffffff;     pfp->pcount = 0;     /* Test to verify that all symbols are in ascending address order...      */     for (i=0;i<prof_FuncTot;i++) {         if (prof_FuncTbl[i].data > prof_FuncTbl[i+1].data) {             printf("Warning: function addresses not in order\n");             break;         }     }     prof_FuncTot++; }

The prof command calls the function prof_FuncConfig() when the user wants to build a table of pdata structures based on the content of the symtbl file. Similar to the stack trace facility, prof_FuncConfig() assumes that the symbols are in the file in ascending address order. The prof_FuncConfig() function then steps through the file and creates one pdata entry for each line in the symbol table file. The table is built at a location specified by prof_FuncTbl , which is also established by the user by using the prof command.

Listing 12.21: profiler() .

 #define HALF(m) (m >> 1) static int prof_Enabled;      /* If set, profiler runs; else return.     */  static int prof_BadSymCnt;    /* Number of hit, but not in a symbol.     */ static int prof_CallCnt;      /* Number of times profiler was called.    */ static int prof_FuncTot;      /* Number of functions being profiled.     */ static int prof_TidTot;       /* Number of TIDs being profiled.          */ static int prof_TidTally;     /* Number of unique TIDs logged so far.    */ static int prof_TidOverflow;  /* More TIDs than the table was built for. */ static struct pdata *prof_FuncTbl; static struct pdata *prof_TidTbl; static char prof_SymFile[TFSNAMESIZE+1]; void profiler(struct monprof *mpp) {     struct pdata *current, *base;     int nmem;     if (prof_Enabled == 0)         return;     if (mpp->type & MONPROF_FUNCLOG) {         nmem = prof_FuncTot;         base = prof_FuncTbl;         while(nmem) {             current = &base[HALF(nmem)];             if (mpp->pc < current->data) {                 nmem = HALF(nmem);             }             else if (mpp->pc > current->data) {                 if (mpp->pc < (current+1)->data) {                     current->pcount++;                     goto tidlog;                 }                 else {                     base = current + 1;                     nmem = (HALF(nmem)) - (nmem ? 0 : 1);                 }             }             else {                 current->pcount++;                 goto tidlog;             }         }         prof_BadSymCnt++;     } tidlog:     if (mpp->type & MONPROF_TIDLOG) {         /* First see if the tid is already in the table.  If it is,          * increment the pcount.  If it isn't add it to the table.          */         nmem = prof_TidTally;         base = prof_TidTbl;         while(nmem) {             current = &base[HALF(nmem)];             if (mpp->tid < current->data) {                 nmem = HALF(nmem);             }             else if (mpp->tid > current->data) {                 base = current + 1;                 nmem = (HALF(nmem)) - (nmem ? 0 : 1);             }             else {                 current->pcount++;                 goto profdone;             }         }         /* Since we got here, the tid must not be in the table, so          * do an insertion into the table.  Items are in the table in          * ascending order.          */         if (prof_TidTally == 0) {             prof_TidTbl->pcount = 1;             prof_TidTbl->data = mpp->tid;             prof_TidTally++;         }         else if (prof_TidTally >= prof_TidTot) {             prof_TidOverflow++;         }         else {             current = prof_TidTbl + prof_TidTally - 1;             while(current >= prof_TidTbl) {                 if (mpp->tid > current->data) {                     current++;                     current->pcount = 1;                     current->data = mpp->tid;                     break;                 }                 else {                     *(current+1) = *current;                     if (current == prof_TidTbl) {                         current->pcount = 1;                         current->data = mpp->tid;                         break;                     }                 }                 current--;             }             prof_TidTally++;         }     } profdone:     prof_CallCnt++;     return; }

The function profiler() (see Listing 12.21) would be called by the applications system tick (through the MicroMonitor API mon_ hookup discussed in Chapter 11). The application code would populate a monprof structure to tell the profiler to do TID and/or function profiling and to specify the TID and/or PC.

The function profiling ( MONPROF_FUNCLOG ) uses the table built by prof_FuncConfig(). For this service, profiler() searches the table (using a binary search) for a match between the incoming address and the address range of one of the symbols. After a match is found, the matching symbols pass count is incremented.

The TID profiling assumes that the user has established some maximum number of expected unique TID values. A table of pdata structures of that size is assumed to exist (and to be initialized to NULL ) starting at the location specified by prof_TidTbl. At each call to profiler() , the table is built with each unique TID value inserted in the table so that the entries are in ascending order. This approach allows the profiler to then do a quick binary search for a match between the incoming TID and the entries in the table already. If a match is found, that pass count is incremented; if no match is found, the TID is added to the list, and the pass count is set to one.

Listing 12.22: Printing the Profile Statistics.

 void prof_ShowStats(int minhit, int more) {     int     i, tfd, linecount;     ulong   notused;     char    symname[64];     struct  pdata   *pptr;     printf("FuncCount Cfg: tbl: 0x%08x, size: %d\n",         prof_FuncTbl, prof_FuncTot);     printf("TidCount  Cfg: tbl: 0x%08x, size: %d\n",         prof_TidTbl, prof_TidTot);     if (prof_CallCnt == 0) {         printf("No data collected\n");         return;     }     linecount = 0;     tfd = prof_GetSymFile();     if ((prof_FuncTbl) && (prof_FuncTot > 0)) {         printf("\nFUNC_PROF stats:\n");         pptr = prof_FuncTbl;         for(i=0;i<prof_FuncTot;pptr++,i++) {             if (pptr->pcount < minhit) {                 continue;             }             if ((tfd < 0)                  (AddrToSym(tfd,pptr->data,symname,&notused) == 0)) {                 printf(" %08x    :  %d\n",pptr->data,pptr->pcount);             }             else {                 printf(" %-12s:  %d\n",symname,pptr->pcount);             }             if ((more) && (++linecount >= more)) {                 linecount = 0;                 if (More() == 0) {                     goto showdone;                 }             }         }     }     if ((prof_TidTbl) && (prof_TidTot > 0)) {         printf("\nTID_PROF stats:\n");         pptr = prof_TidTbl;         for(i=0;i<prof_TidTot;pptr++,i++) {             if (pptr->pcount < minhit) {                 continue;             }             printf(" %08x    :  %d\n",pptr->data,pptr->pcount);             if ((more) && (++linecount >= more)) {                 linecount = 0;                 if (More() == 0) {                     goto showdone;                 }             }         }     } showdone:     putchar('\n');     if (prof_BadSymCnt) {         printf("%d out-of-range symbols\n",prof_BadSymCnt);     }     if (prof_TidOverflow) {         printf("%d tid overflow attempts\n",prof_TidOverflow);     }     printf("%d total profiler calls\n",prof_CallCnt);     if (tfd >= 0) {         tfsclose(tfd,0);     }     return; }

When profiling is completed, the statistics gathered by these two functions can be dumped to the user through the prof command. The prof command ultimately calls prof_ShowStats(). Function prof_ShowStats() (Listing 12.22) prints the contents of the two tables of pdata structures to the console. The user can specify the minimum number of hits so that only the big hitters are printed. Also, the user can specify some more throttling because the output could be fairly large, depending on the number of functions in the application.

Basic Block Coverage

The basic block coverage (BBC) capability depends on the compiler and certainly adds runtime overhead to the application. For each basic block in the code, the compiler inserts a counter. This counter can be used by the system test team to verify that all code under test has been covered. After you put the target through a test suite, you can use the results of the BBC to see what lines of code were missed. This solution is not perfect, but it is a step towards completion. The BBC approach does not cover all functional routes, just code.

Note	A basic block is a sequence of code that is always executed as a ( potentially interruptable) unit. In other words, if any one line of code within the block has been executed, then every line of code in the block was executed. In C, a basic block is generally enclosed in braces.

Listing 12.23: BBC Example.

 void funcX(void) {     if (aa == 15)     {         Block 1 of funcX().     }     if ((aa == 15)  (bb == 35))     {         Block 2 of funcX().     } }

In Listing 12.23, the BBC technique indicates that both blocks have been executed even if only the aa == 15 case was invoked; hence, the case of bb == 35 and aa != 15 (which would pass through funcX but only invoke Block 2 ) is not tested .