Hooking Functions in the Profiling API | Debugging Applications for MicrosoftВ® .NET and Microsoft WindowsВ® (Pro-Developer)

One of the most difficult parts of writing a real profiler in Microsoft Win32 was that hooking into the function call stream was nearly impossible without considerable help from the compiler or by changing the binary on disk. Without this capability, you couldn't even come close to getting accurate timings that related to anything in the user's application. Microsoft has built this function-call notification right into the Profiling API—yet another instance where they deserve great credit for doing the heavy lifting. Now tool developers can concentrate on solving hard profiling problems without having to spend six months or more developing just the infrastructure.

Requesting Enter and Leave Notifications

With the Profiling API, you'll get notified whenever a method is called and whenever that method returns. The /Gh and /GH switches (which enable the _penter and _pexit hook functions, respectively) in the native C++ compiler follow the same basic strategy as the Profiling API, but the Profiling API makes notifications even easier by also handing you the FunctionID of the executing function.

As with any of the other notifications, you first have to tell the run time that you'd like the notifications by ORing in the flag COR_PRF_MONITOR_ENTERLEAVE to the ICorProfilerInfo::SetEventMask method. I would have bet that once you ask for enter and leave notifications, the notifications would be immutable for the life of the process. However, the smart folks at Microsoft allow you to set and unset the enter and leave notifications as much as you want. Keep that in mind, because you could create some very interesting tools with this capability, for example, one that times only exception processing.

After setting the options, you need to tell the run time which functions you want called, so you'll call ICorProfilerInfo::SetEnterLeaveFunctionHooks. The three parameters are pointers to the functions you want called: the entry function, exit function, and tailcall exit function. The first two functions are self-explanatory, but the third is a little weird. In the current version of the common language runtime (CLR), the tailcall exit function is never called. A tailcall is when the current method's stack frame is removed before the actual call instruction is executed. In other words, the stack gets cleaned up before the method executes because nothing on the stack is needed. Future versions of the CLR will utilize the tailcall compiler optimization, so you'll need it. Since for most Profile API users, the tailcall exit function performs the same thing as exiting the function, you can simply use your regular exit function if you'd like.

Implementing the Hook Functions

The special part of the function hooking process is defining your actual hook functions. To keep performance as fast as possible, the Profiling API requires that you write the functions using the naked calling convention. In essence, your function is inlined right inside the Just In Time (JIT) compiler, so you have to handle the function prolog and epilog needs.

The typedef for any of the hook functions is as follows:

typedef void FunctionEnter ( FunctionID funcID ) ;

However, what's not too clear from the documentation but fortunately is shown in the profiling samples is that the hook functions are like standard calls in that they are responsible for popping the FunctionID parameter off the stack. The comments in CorProf.IDL, whose word you should always take over Profiling.DOC, mention that your hook functions are also required to save any registers your code will touch, including any floating-point registers.

Listing 11-1 shows you my enter hook function, so you can see one in action. The hook functions use the naked calling convention, so you're required to write your own prolog and epilog. CFlowTrace::FuncEnter is where the real work is being done, so the hook function is really just a wrapper to call it. The prolog (the first three PUSH instructions) preserves the registers that are modified by this function. The last four instructions are the epilog, which restores the saved registers and returns. RET 4 returns and clears the function ID passed to the enter hook function off the stack, saving me a POP instruction.

Listing 11-1: Hook function example

 void __declspec ( naked ) NakedEnter ( FunctionID /*funcID*/ ) {     __asm     {         PUSH EAX                        // Save off whacked registers.         PUSH ECX         PUSH EDX                  PUSH [ESP + 10h]                // Push the function ID as the                                         //  parameter.         MOV  ECX , g_pFlowTrace         // Push the instance data on the         PUSH ECX                        //  stack.         CALL CFlowTrace::FuncEnter      // Call the FuncEnter method.                  POP  EDX                        // Restore saved registers.         POP  ECX         POP  EAX                  RET 4                           // Return and clear the function                                         //  ID off the stack as it was                                         //  passed to this function.     } }

The middle four instructions call CFlowTrace::FuncEnter, passing it the class instance and the function ID. The function ID was passed to the enter hook function. It's now 16 (0x10) bytes up the stack, before the three registers we saved and the return address. PUSH [ESP + 10] pushes a copy onto the stack to pass to CFlowTrace::FuncEnter. Eagle-eyed readers will note that the declaration of CFlowTrace::FuncEnter takes only a single parameter. That's because C++ class methods always pass the instance pointer (or this pointer) as the first, hidden parameter. I played around with the inline assembly language in the hook functions quite a bit to see whether I could get anything smaller, but what you see in Listing 11-1 is about as small as you can safely go.

Inlining

One very important issue with the hooked function notifications is inlining. The CLR execution engine is highly optimized and will inline code like crazy to eke out a couple of clock cycle savings. This means that although you think you might be seeing everything that's going on in your program, you're seeing only the calls and returns for methods that were not inlined and nothing for the inlined methods.

If you want a complete graph of all calls actually made in a program, you have two options for turning off inlining. However, as you can imagine, disabling inlining can have a dramatic performance hit on the managed code. The easiest way to turn off inlining is to OR in the COR_PRF_DISABLE_INLINING flag to ICorProfilerInfo::SetEventMask when processing the ICorProfilerCallback::Initialize notification. The drawback is that COR_PRF_ DISABLE_INLINING is an immutable flag, so you've turned it off for the entire life of the process, no matter where your code executes.

A second way, while allowing finer grained control, requires much more work on your part. One of the JIT notifications you get is JITInlining, which as you can tell by the name, indicates that a function is being inlined into another function. (You'll have to OR in COR_PRF_MONITOR_JIT_COMPILATION in the event mask to get the JIT notifications.) The JITInlining parameters, in order, are the caller FunctionID, the callee FunctionID (the function being inlined), and a pointer to a BOOL, which, if set to FALSE, will prevent inlining.

You could do some very interesting things with the JITInlining notification. For example, you could leave inlining on for the .NET Framework class library (FCL) classes but turn it off when executing non-FCL code. You have to be careful, however, because the CLR will call JITInlining billions of times and your code will look up caller and/or callee FunctionID values, which could cause a much worse performance hit than simply disabling globally in the process. Although you might think about storing the FunctionID values you do look up, keep in mind that the CLR garbage collector can rejiggle those values, so you'll have to handle garbage collection notifications to keep your data tables straight.

The Function ID Mapper

In addition to the very cool hooking functions, I need to tell you about one other special function: FunctionIDMapper. The purpose of this function is to allow you to change the value of the FunctionID passed to the three separate hook functions. The CLR calls it right before any of the hook functions. You don't have to set the FunctionIDMapper function if you don't want to, but doing so can open up some very interesting development possibilities.

Setting the FunctionIDMapper is immutable and you should do it in the ICorProfilerCallback::Initialize method by passing the function pointer to ICorProfilerInfo::SetFunctionIDMapper. One thing that caused me some problems is that the Profiling.DOC discussion of the function prototype is wrong. The FunctionIDMapper function returns a UINT_PTR, the underlying type for a FunctionID, instead of the void documented. The correct prototype is this:

UINT_PTR __stdcall FunctionIDMapper ( FunctionID functionId ,                                        BOOL *pbHookFunction ) ;

Interestingly, FunctionIDMapper is a normal standard call function, not one of the naked functions required by the other hook functions. The FunctionID parameter is the function that the CLR is about to call one of the hook functions for. The Boolean pointer parameter allows you to control whether the CLR calls the hook function. If you want to allow the hook call, set *pbHookFunction to TRUE. If you set it to FALSE, the hook function isn't called. If you want to change the value passed as the hook parameter, return that value from your FunctionIDMapper function.

I see FunctionIDMapper as being quite interesting for larger projects that use the Profiling API. For example, you have to look up the function class and method name on nearly all calls into the hooking functions. You could use the FunctionIDMapper function to handle the function lookup instead and pass the values onto the hook function. That way you have a single place in which you're performing the lookups.

With the option to control whether the hook function is actually called, you have more power at your disposal. For example, when you want to do logging or analysis for only a single thread, you can use FunctionIDMapper to determine the thread ID, and if you're not interested in watching the thread, you can skip the hook function. The option to skip the hook function can make the design of your profiler much easier to implement. In fact, I took advantage of it in the FlowTrace program.