Reed-Solomon Codes in Practical Implementations

  1. Run-time errors have the highest priority. Only another error with a higher priority can terminate error execution.

  2. Errors can ignore requests from the operating system.

  3. Requests from errors to the operating system cannot be ignored.

  4. When working with files, errors can use the file system of the basic OS and exploit its errors.

  5. On a computer with parallel architecture, several errors can be executed simultaneously .

V.Tikchonov. Theory of Errors

Applications, Illegal Operations, and Everything Else

Low-level control over equipment requires extreme care and caution. Even the smallest error can result in the Blue Screen of Death (BSOD) or the abnormal termination of one or more applications. Driver developers and combat engineers have very much in common ” neither of these professions is particularly forgiving of carelessness. ASPI and SPTI interfaces, despite their high-level wrappers, are equally aggressive . They can freeze the system or shut it down with or without pretext. It takes a long time to master the skill of writing stable and simple code. Until that level has been reached, the only guarantee of survival is the skill of recovering the system after critical errors and various kinds of malfunctions.

Different operating systems react to critical errors differently. For example, Windows NT reserves two regions of its address space for detecting stray pointers. One of them is located at the very bottom of the memory map and is intended for the trapping of zero pointers. Another is located between the heap and the memory area allocated for the operating system itself. It controls events that involve crossing the limits of the memory area allocated to user processes. Contrary to common opinion, it is in no way related to the WriteProcessMemory function (see MSDN article Q92764). Both regions take 64 K each, and any attempt of accessing them is interpreted by the system as a critical error. In Windows 9x, there is only one 4 K region for tracing stray pointers. Therefore, this system has significantly weaker controlling capabilities than Windows NT.

In Windows NT, the critical error screen (Fig. 3.1) contains the following information:

  • The address of machine instruction that has caused the current exception

  • A brief description of the exception category (or its code, if category is unknown)

  • The exception parameters (address of invalid memory cell , type of operation, etc.)

image from book
Fig. 3.1: Critical error message displayed by Windows 2000

Operating systems of the Windows 9x family are considerably more informative in this respect (see Fig. 3.2). Besides the exception category, they display the contents of CPU registers, stack condition and memory bytes located by the address CS:EIP (e.g., by the current execution address). However, the existence of the Doctor Watson tool, which will be described later in this chapter, diminishes this difference between the two families of operating systems. Therefore, in this case we can only point out that Windows 9x is more user-friendly and ergonomic, since it immediately provides the required minimum of error information, while in Windows NT error reports are created by a separate utility.

image from book
Fig. 3.2: Critical error message displayed by Windows 98

If no additional debugger has been installed in the system, then the critical error message window has only one button ” OK . After the user clicks this button, the application that carried out the illegal operation will be terminated . If you wish, it is possible to add the Cancel button to this window. Clicking on this button will start the debugger or any other utility intended for analyzing the situation. It is important to understand that clicking the Cancel button doesn t cancel automatic termination of the incorrect application. However, having mastered some skills, you can close the breach manually and continue working in a normal way.

Start the Registry Editor application and go to the following registry key: HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AeDebug . If there is no such key, just create it. The Debugger value specifies the path to the debugger with all of the required command-line options; Auto string parameter determines whether the debugger must start automatically (the value must be set to 1) or provide the user with a choice (0). Finally, the DWORD parameter UserDebuggerHotKey specifies the scancode for the hotkey for starting the debugger.

Doctor Watson

The Doctor Watson tool is the standard built-in debugger for critical errors that is included with all operating systems of the Windows family. Principally, it is a static tool for collecting all relevant information. Although Doctor Watson provides a detailed report on the causes of a failure, it lacks the active functions that would allow it to influence incorrectly operating programs. Thus, having only Doctor Watson at your disposal, you won t be able to make the application that has caused an error continue operating as if nothing has happened . To achieve this, you ll have to use interactive debuggers . The Microsoft Visual Studio Debugger, supplied as part of the Microsoft Visual Studio, is one of such tools. It will be considered later in this chapter.

That Doctor Watson is preferable for use on workstations, while interactive debuggers are the best for servers is a widely held opinion. Those who hold this view generally think that end users cannot understand all of the mysteries of the assembler, while interactive debuggers are the tools of choice on servers. This opinion is partially true. However, it isn't wise to ignore the point that not every cause of an error can be detected by static analysis tools. Furthermore, interactive tools simplify the procedure of analysis considerably. On the other hand, Doctor Watson is included with the operating system, while all other tools must be purchased separately. Therefore, it is up to you to choose the preferred debugger for handling critical errors.

To specify Doctor Watson as your default debugger, add the following entry to the system registry or issue the Drwtsn32.exe -i command (to carry out any of these operations, you must have administrative privileges):

Listing 3.1: Installing Doctor Watson as the default debugger
image from book
 [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AeDebug]  "Auto"="1"  "Debugger"="drwtsn32 -p %ld -e %ld -g"  "UserDebuggerHotKey"=dword:00000000 
image from book
 

Now the occurrence of any critical error will be followed by the generation of a report composed by Doctor Watson and containing a more or less detailed explanation on the error type and what has caused it.

image from book
Fig. 3.3: Reaction of Doctor Watson to a critical error

An example of a report created by Doctor Watson is provided below. Comments are added by the author; the report s lines are in bold.

Listing 3.2: An example of report produced by Doctor Watson (with the author's comments in bold).
image from book
  Exception in application:   App:  (pid=612)  ; pid of the process where the exception took place  Time: 14.11.2003 @ 22:51:40.674  ; Time when the exception took place  Number: c0000005 (access rights violation)  ; Code of the Exception category       ; Code decoding can be found in WINNT.H       ; included with SDK, supplied with any Windows compiler       ; A detailed description of all exceptions can be found          ; in supplementary documentation          ; to all Intel and AMD processor, distributed freely          ; by the respective manufacturers       ; (Attention: To change the OS exception code to the CPU interrupt vector,       ; you must reset the most significant word to zero.)       ; In this case, this is 0x5  an attempt to access       ; an invalid memory address.  *----> System information <----*   Computer name: KPNC   User name: Kris Kaspersky   Number of processors: 1   Processor type: x86 Family 6 Model 8 Stepping 6   Windows version: 2000: 5.0   Current build: 2195   Service pack: None   Current type: Uniprocessor Free   Registered organization:   Registered user: Kris Kaspersky  ; Brief info on the system  *----> Task list <----*   0 Idle.exe   8 System.exe   232 smss.exe   ...   1244 os2srv.exe   1164 os2ss.exe   1284 windbg.exe   1180 MSDEV.exe   1312 cmd.exe   612 test.exe   1404 drwtsn32.exe   0 _Total.exe   (00400000 - 00406000)   (77F80000 - 77FFA000)   (77E80000 - 77F37000)  ; List of loaded DLLs  ; According to documentation, the names of appropriate modules  ; must be listed to the right of the addresses. They are  ; masked so well, however, that they became practically invisible.  ; Still, it is possible to extract their names from the log file.  ; But this can't be done without the use of a few tricks (see character table below),  Memory copy for flow 0x188  ; Provided below is a copy of the memory flow that has caused an exception.  eax=00000064 ebx=7ffdf000 ecx=00000000 edx=00000064 esi=00000000 edi=00000000   eip=00401014 esp=0012ff70 ebp=0012ffc0 iopl=0         nv up ei pl nz na pe nc   cs=001b  ss=0023  ds=0023  es=0023  fs=0038  gs=0000             efl=00000202  ; Contents of registers and flags  Function: <nosymbols>  ; Printout of the failure environment  00400ffc 0000      add [eax],al        ds:00000064=??   ; Writing the value into the cell that adds AL value to EAX   ; The value of the cell address computed by Doctor Watson is equal to 64h,   ; which, obviously, doesn't correspond to reality;   ; Doctor Watson substitutes the value of the EAX register   ; for the moment of failure into the expression   ; and this value is different from the one   ; that this register had at the moment of execution!   ; Unfortunately, neither we nor Doctor Watson   ; know the run-time value of the EAX register.   00400ffe 0000     add       [eax], al          ds:00000064=??  ; Writing the AL value of the cell referenced by EAX       ; What? again? what a pain?! Actually,       ; it is the sequence 00 00 00 00 that is encoded this way.       ; For all appearances, this sequence is a piece       ; of some machine command incorrectly interpreted       ; by the disassembling engine of Doctor Watson.  00401000 8b542408 mov       edx, [esp+0x8]     ss:00f8d547=????????  ; Loading function argument into EDX       ; It is impossible to tell for certain which argument we should load,       ; since we do not know the address       ; of the stack frame.  00401004 33c9     xor       ecx, ecx  ; Resetting ECX to zero  00401006 85d2     test      edx, edx   00401008 7e18     jle       00409b22  ; If EDX == 0, jumping to the 409B22h address  0040100a 8b442408 mov       eax, [esp+0x8]     ss:00f8d547=????????  ; Loading the above-mentioned argument into EAX  0040100e 56       push      esi  ; Saving ESI in the stack, thus moving the stack top pointer       ; up by 4 bytes (into the area of lower addresses)  0040100f 8b742408 mov       esi, [esp+0x8]     ss:00f8d547=????????  ; Loading the next argument into ESI       ; Since ESP has just been changed, this isn't the argument       ; with which we were dealing before.  00401013 57       push      edi  ; Saving the EDI register in the stack  FAILURE -> 00401014 0fbe3c31     movsx  edi, byte ptr [ecx+esi] ds:00000000=??  ; Well, we've got the instruction that has caused the access violation.       ; it accesses the cell referenced by the sum of the ECX and ESI registers.       ; What are their values? scroll the screen upwards slightly and find out that       ; ECX and ESI are equal to 0, a fact about which       ; Doctor Watson informs us: "ds:000000"       ; Note that this information can be trusted, since substitution       ; of the effective address was carried out at run time.       ; Now, let us recall that ESI contains       ; the copy of the argument passed to the function       ; and that ECX was explicitly reset to zero. Consequently,       ; in the [ECX+ESI] expression,       ; the ESI register is the pointer, and ECX is the index.       ; Since ESI is equal to zero, this means that our function       ; passed the pointer to unallocated memory area.       ; This usually happens       ; either because of an algorithmic error in a program       ; or because the virtual memory has been exhausted.       ; Unfortunately, Doctor Watson doesn't disassemble       ; the parent function, and we have to guess, which of the       ; two ossible variants is true.       ; Although, it is possible to disassemble the memory dump       ; of the process (provided, of course, that it has been saved),       ; this isn't what we actually need...  00401018 03c7     add       eax, edi  ; Add the contents of the EAX register       ; to the EDI register and write the result to EAX.  0040101a 41      inc        ecx  ; increase ECX by one  0040101b 3bca     cmp       ecx, edx   0040101d 7cf5     jl        00407014  ; Until ECX < EDX, jump to 407014       ; (obviously, we are dealing with a loop controlled by the ECX counter).       ; In the case of interactive debugging, we could forcibly exit the function       ; that is returning the error flag, informing us so that the parent function       ; (and the entire program along with it) can continue execution.       ; In this case, only the last operation would be lost,       ; while all the other data will remain correct.  0040101f 5f       pop       edi   00401020 5e       pop       esi   00401021 c3       ret  ; exiting the function  *----> Backward tracing of the stack <----- ; Stack contents at the moment of failure  ; prints addresses and parameters of previously executed functions.  ; In the case of interactive debugging, we can simply pass control to one  ; of the upper functions, which is equivalent to a return to the past.  ; Only in reality is it impossible to fix smashed porcelain,  ; in the computer universe, everything is possible!  FramePtr ReturnAd Param#1 Param#2 Param#3 Param#4 Function Name  ; FramePtr: points to the value of the stack frame,  ;         above (i.e., in smaller addresses) are the function arguments,  ;         below are its local variables.  ;  ; ReturnAd: stores the return address to the parent function.  ;         If this location contains garbage and back-tracing of the stack  ;         starts to make a characteristic noise,  ;         then it is highly likely  ;         that we are dealing with the stack overflow error  ;         or, possibly, that your computer is under attack.  ;  ; Param#: the first four parameters of the function   ;         this is the number of parameters  ;         that Doctor Watson displays on the screen.  ;         This is an overly stringent limitation,  ;         since most functions have dozens of parameters  ;         and the first four do not provide sufficient information.  ;         However, a missing parameter can be retrieved easily  ;         from the copy of the unprocessed stack manually.  ;         To do so, it is enough to go by the address specified in the  ;         FramePtr field  ;  ; Func Name: function name (if it is possible to detect it) . In fact,  ;         it displays only the names of functions imported from other DLLs,  ;         since it is impossible to find a commercial program  ;         compiled along with debug info.  ;  0012FFC0  77E87903 00000000 00000000 7FFDF000  C0000005 !<nosymbols>   0012FFF0 00000000 00401040 00000000 000000C8 00000100 kernel32!SetUnhandledExceptionFilter  ; Functions are listed in the order of their execution.  ; The last one that was executed was the same  ; kernel32!SetUnhandledExceptionFilter function that handles the current exception.  *----> Copy of unprocessed stack <----*  ; The copy of the unprocessed stack contains it "as is."  ; It is very helpful when detecting buffer overfull attacks  the entire shell-code  ; passed by the intruder will be printed out by Doctor Watson,  ; and you'll only have to detect it (for further details,  ; see my book "Technique and philosophy of network attacks")  0012ff70 00 00 00 00 00 00 00 00 - 39 10 40 00 00 00 00 00 ........9.@ .....   0012ff80 64 00 00 00 f4 10 40 00 - 01 00 00 00 d0 0e 30 00 d.....@.......0.   ...   00130090 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00   001300a0 00 00 00 00 00 00 00 00 - 00 00 00 00 00 00 00 00  *----> Symbol table <----*  ; The symbol table contains the names of all loaded DLLs, along with the names  ; of imported functions. Using these addresses as the starting point,  ; we can easily restore the list of loaded DLLs....  ntdll.dll   77F81106 00000000 ZwAccessCheckByType   ...   77FCEFB0 00000000 fltused   kernel32.dll   77E81765 0000003d IsDebuggerPresent   ...   77EDBF7A 00000000 VerSetConditionMask  ;  ; Thus, let us return to the list of loaded DLLs.  ; (00400000 - 00406000)  - obviously,  ; this is the memory area occupied by the program itself.  ; (77F80000 - 77FFA000)  - this is KERNEL32.DLL  ; (77E80000 - 77F37000)  - this is NTDDL.DLL 
image from book
 

Microsoft Visual Studio Debugger

When you install the Microsoft Visual Studio programming environment, it registers its debugger as the default one for handling critical errors. Although this debugger is very easy to use, it has very limited functions, and doesn't even support such a simple operation as looking for a hex sequence in memory. Its only advantage in comparison to the most advanced (in every respect) option, Microsoft Kernel Debugger, is the ability to trace processes that have generated a critical exception.

In the hands of an experienced professional, Microsoft Visual Studio Debugger is capable of bringing wonders to reality, and one such wonder is making applications that have executed an illegal operation continue their work, even given that the operating system closes such applications abnormally without saving their data. Anyway, an interactive debugger (Microsoft Visual Studio Debugger is the one) provides much more detailed information on the failure and simplifies considerably the process of detecting its sources. Unfortunately, the limited space allowed in this chapter (even though it already contains a large amount of off topic information!) prevents the author from providing a detailed description of the entire methodic of debugging. Instead, I must limit myself to only a narrow range of the most interesting problems. For more details, see the section " Inhabitants of the Shadowy Zone, or From Morgue to Reanimation ).

In order to set Microsoft Visual Studio Debugger as the default debugger for critical errors manually, add the following entries to the system registry:

Listing 3.3: Specifying Microsoft Visual Studio Debugger as your default debugger for critical errors
image from book
 [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AeDebug]  "Auto"="1"  "Debugger"="\"C:\Prg Files\MS VS\Conmon\MSDev98\Bin\msdev.exe\" -p %ld -e %ld"  "UserDebuggerHotKey"=dword:00000000 
image from book
 
Listing 3.4: A demo example that causes a critical exception
image from book
 // The function returns the sum of n char characters.  // If it is passed the null-pointer, the function will "drop,"  // although itself isn't the source of error, rather,  // the arguments passed to it,  //by the parent function.  test(char *buf, int n)  {      int a, sum;       for (a = 0; a < n; a) sum += buf[a];  // Here, the exception is thrown.       return sum;  }  main()  {       #define N 100       char *buf = 0;                         // Initializing the pointer to the buffer       /* buf = malloc(100) ; */              // "Forgetting" to allocate the memory,  // which is the error       test (buf, N);                         // Passing the null-pointer to some function  } 
image from book
 

Inhabitants of the Shadowy Zone, or From Morgue to Reanimation

Would you like to know how to make an application continue normal operation after a critical error message has appeared? In fact, this is an important, and sometimes urgent task. Suppose that an application containing unique data that have not been saved yet has crashed. In the best case, you ll have to enter this information once again, while in the worst case, you have lost the data for good. There are some utilities on the market aimed exactly at solving this problem (Norton Utilities is a typical example). Unfortunately, however, their abilities are far from comprehensive, and, on average, they turn out to be effective in only one in ten occasions. At the same time, manual reanimation of a faulty program is successful in 75 to 90 per cent of all cases.

Strictly speaking, it is impossible to recover fully the functionality of a crashed program or to roll back all of the actions that preceded the crash. In the best case, you ll be able to save the data before the program totally loses control and starts to behave unpredictably. Even this achievement would have to be counted as a success!

There are at least three different methods of reanimation: a) forcibly exiting the function that has caused a critical exception; b) unwinding the stack and passing control back, c) passing control to the message handler function. Let us consider each of these methods in the example of the testt.exe application, a copy of which can be found on the companion CD.

Jumping ahead a few steps, note that only faults that are caused by algorithmic errors can be reanimated. Errors caused by hardware faults are irrecoverable. If information stored in RAM was corrupted because of a physical defect in the memory, you probably won t be able to recover the crashed application. If, however, the failure did not affect vitally important data structures, there is some hope for successful recovery even in this case.

Forcibly Exiting the Function

Start the test program, enter some text in one or more of the windows, then select the About TestCEdit command from the Help menu. When the dialog opens, click the Make error button. Oops! The program displays a critical error message. If we click OK , all unsaved data will be lost, which isn t what we planned. However, if a previously installed debugger is present in the system, we can still make some attempts at saving the data. For the purposes of being specific, let s suppose that we have Microsoft Visual Studio Debugger.

Click Cancel, and the debugger will immediately disassemble the function that caused the exception (see the listing provided below).

Listing 3.5: Microsoft Visual Studio Debugger has disassembled the function that has thrown an exception
image from book
 0040135C   push   esi  0040135D   mov    esi, dword ptr [esp+8]  00401361   push   edi   00401362   movsx  edi, byte ptr [ecx+esi]   00401366   add    eax, edi  00401368   inc    ecx  00401369   cmp    ecx, edx  0040136B   jl     00401362  0040136D   pop    edi  0040136E   pop    esi  0040136F   ret    8 
image from book
 

Having analyzed the cause of the exception (the function has been passed the pointer to unallocated memory), we draw the conclusion that it is impossible to make the function continue execution, since we do not know the structure of the data passed to it. In such a case, we have to return forcibly to the parent function, without forgetting to set the error flag, which sends a signal to the program that the current operation has not been accomplished. Unfortunately, there are no commonly adopted error flags. Therefore, different functions use different agreements. To discover the situation in each specific case, we must disassemble the parent function and determine which error code it expects.

Place the cursor on the dump window and enter the name of the pointer to the stack top, ESP register, into the address line. Then press <Enter>. The stack contents will be immediately displayed:

Listing 3.6: Searching for the return address from the current function (in bold)
image from book
 0012F488  0012FA64  0012FA64  004012FF  0012F494  00000000  00000064  00403458  0012F4A0  FFFFFFFF  0012F4C4  6C291CEA  0012F4AC  00000019  00000000  6C32FAF0  0012F4B8  0012F4C0  0012FA64  01100059  0012F4C4  006403C2  002F5788  00000000  0012F4DO  00640301  77E16383  004C1E20 
image from book
 

The first two double words correspond to the POP EDI/POP ESI machine commands. Therefore, they are of little or no importance to us. As for the next double word, it contains the return address to the parent procedure (in the above-provided example, it is in bold). This is exactly what we need!

Press <Ctrl>+<D>, then click 0x4012FF , and debugger will display the following disassembled text:

Listing 3.7: Disassembled listing of the parent function
image from book
 004012FA   call     00401350  004012FF   cmp      eax, 0FFh  00401302   je       0040132D  00401304   push     eax  00401305   lea      eax, [esp+8]  00401309   push     405054h  0040130E   push     eax  0040130F   call     dword ptr ds:[4033B4h]  00401315   add      esp, 0Ch  00401318   lea      ecx, [esp+4]  0040131C   push     0  0040131E   push     0  00401320   push     ecx  00401321   mov      ecx, esi  00401323   call     00401BC4  00401328   pop      esi  00401329   add      esp, 64h  0040132C   ret  0040132C  0040132D   push     0  0040132D   ; This branch will get control if 401350h function returns FFh.  0040132F   push     0  00401331   push     405048h  00401336   mov      ecx, esi  00401338   call     00401BC4  0040133D   pop      esi  0040133E   add      esp, 64h  00401341   ret 
image from book
 

Look at this: If the EAX register is equal to FFh , then the parent function passes the control to branch 40132Dh and terminates execution after several machine commands, passing control to a higher-level function. If, however, EAX != FFh , its value is passed to function 4033B4h . Consequently, we can assume that FFh is the error flag. Let us return to the function being tested by pressing <Ctrl>+<G> and clicking EIP . Then switch to the Registers pane and change the value of EAX to FFh .

Now, it is necessary to find a suitable point of return from the function. It is not possible to simply go to the RET machine command, because before returning from the function, it is necessary to balance the stack. Otherwise , the program will crash irreversibly, throwing us off to some unpredictable location.

In a general case, the number PUSH commands must correspond exactly to the number of POP commands. Also, take into account the fact that PUSH DWORD X is equivalent to SUB ESP , 4 , and POP DWORD X ”to ADD ESP , 4 . After analyzing the disassembled listing of the function, it is possible to draw the conclusion that, to balance the good and the bad in this case, we must pop two double words from the stack top. They correspond to the following machine commands: 40135C : PUSH ESI and 401361 : PUSH EDI . This can be achieved by passing the control to the 40136Dh address, where there are two benevolent POP S that bring the stack to a balanced state. Move the cursor to that position, right-click, and choose the Set Next Statement command from the context menu. As a variant, it is possible to switch to the registers window and change the EIP value from 401362h to 40136Dh .

Press <F5> to make the processor continue with program execution. Voila! The faulty program actually continues execution, and you can save your data. (A good-natured complaint about an error in the last operation can be ignored.)

Unwinding the Stack

It is not possible to forcibly exit from the function in every case. Some critical failures influence several nested functions simultaneously. In this case, in order to reanimate the dead program, we have to carry out a deep rollback, continuing program execution from the point, at which nothing threatened its operability. The exact depth of rollback must be selected experimentally. As a rule, it will be from three to five steps. Bear in mind that if nested functions modify global data (for instance, heap data), then any attempt at carrying out a rollback can result in a total crash of the program being debugged . Therefore, it is desirable to guess the rollback depth on the first attempt. If you are in doubt, just remember that an excess is better than a shortage. On the other hand, excessive rollback results in the loss of all unsaved data...

The rollback procedure comprises the following three steps: a) building the tree of calls; b) determining the coordinates of the stack frame for each call; c) restoring the register context of the parent function. A really good debugger will carry out all of these operations for you. The only thing that remains is to write appropriate values into EIP and ESP . Unfortunately, Microsoft Visual Studio Debugger cannot be qualified as a really effective debugger. It is good for tracing the stack, omitting FPO functions ( Frame Point Omission ”functions with optimized frame), but doesn t report coordinates of the stack frame; therefore, the most difficult part of your job must be carried out manually.

Still, even such a stack of calls is still better than nothing. By unwinding the stack manually, we will rely on the fact that frame coordinates are determined naturally by the return address. Let s suppose that that the contents of the Call Stack window appear as follows :

Listing 3.8: The contents of the Call Stacks window displayed by Microsoft Visual Studio Debugger
image from book
 TESTCEDIT! 00401362()  MFC42! 6c2922ae()  MFC42! 6c298fc5()  MFC42! 6c292976()  MFC42! 6c291dcc()  MFC42! 6c291cea()  MFC42! 6c291c73()  MFC42! 6c291bfb()  MFC42! 6c291bba() 
image from book
 

Let s try to find addresses 6C2922AEh and 6C298FC5h , corresponding to the two last steps of execution in the stack contents. Press <ALT>+<6> to switch to the dump window, then use the <Ctrl>+<G> hotkey combination to select the base address and select ESP . Scroll the dump window down, and you ll find both return addresses (in the listing provided below, they are framed):

Listing 3.9: Stack content after unwinding
image from book
 0012F488  0012FA64  0012FA64  004012FF    0040136F:ret 8 the first return address  0012F494  00000000  00000064  00403458    00401328:pop esi  0012F4A0  FFFFFFFF  0012F4C4  6C291CEA  0012F4AC  00000019  00000000  6C32EAF0  0012F4B8  0012F4C0  0012EA64  01100059  0012F4C4  00320774  002F5788  00000000  0012F4DO  00320701  77E16383  004C1E20  0012F4DC  00320774  002F5788  00000000  0012F4E8  000003E8  0012EA64  004F8CD8  0012F4F4  0012F4DC  002F5788  0012F560  0012F500  77E61D49  6C2923D8   00403458      0040132C:ret;  0012F50C  00000111  0012F540  6C2922AE     6C29237E:pop ebx/pop ebp/ret 1Ch  0012F518  0012FA64  000003E8  00000000  0012F518  0012FA64  000003E8  00000000  0012F524  004012F0  00000000  0000000C  0012F530  00000000  00000000  0012FA64  0012F53C  000003E8  0012F564  6C298FC5  0012F548  000003E8  00000000  00000000  0012F554  00000000  000003E8  0012FA64 
image from book
 

Memory cells below the return addresses represent the register values that are saved when entering the function and restored after exiting it. Memory cells located below return addresses are occupied by function arguments (if the function has any), or belong to the local variables of the parent function (if the nested function doesn t accept any arguments).

Returning to Listing 3.5, note that the two double words on the top of the stack correspond to the POP EDI and POP ESI machine commands, while the address that directly follows them ” 4012FFh ”is the one, to which the 40136Fh : RET 8 command passes control. To continue stack unwinding, we must disassemble the code by this address:

Listing 3.10: Disassembled listing of the grandmother function
image from book
 004012FA  call    00401350   004012FF  cmp eax,0FFh   00401302  je  0040132D  00401304  push    eax  00401305  lea eax, [esp+8]  00401309  push    405054h  0040130E  push    eax  0040130F  call    dword ptr ds: [4033B4h]  00401315  add esp, 0Ch  00401318  lea ecx, [esp+4]  0040131C  push    0  0040131E  push    0  00401320  push    ecx  00401321  mov ecx,esi  00401323  call    00401BC4  00401328  pop esi  00401329  add esp  ,  64h  0040132C  ret            ; SS: [ESP] =  6C2923D8  
image from book
 

By scrolling the window downwards, we will notice the ADD ESP, 64 instruction that closes the current stack frame. Eight bytes more are popped by the 40136Fh : RET 8 instruction, and four bytes are taken by 401328 : POP ESI . Thus, the position of return address in the stack is equal to current_ESP + 64h + 8 + 4 == 70h . Going down 70h bytes, you ll see:

Listing 3.11: Return address from the grandmother function
image from book
 0012F500  77E61D49  6C2923D8  00403458     00401328:POP ESI/ret; 
image from book
 

The first double word is the value of the ESI register, which we will have to restore manually; the second is the return address from the function. Press <Ctrl>+<G>, enter 0x6C2923D8 , and continue to unwind the stack:

Listing 3.12: Disassembled listing of the great-grandmother function
image from book
 6C2923D8   jmp   6C29237B   6C29237B   mov   eax, ebx  6C29237D   pop   esi  6C29237E   pop   ebx  6C29237F   pop   ebp  6C292380   ret   1Ch 
image from book
 

Now, we have finally got to restoring registers! Move to the right by one double word (it was just popped from the stack by the RET command), switch to the Registers window, and restore the ESI , EBX , and EBP registers by retrieving their saved values from the stack:

Listing 3.13: The contents of the registers saved in the stack along with the return address
image from book
 0012F500  77E61D49  6C2923D8  00403458      6C29237D:pop esi  0012F50C  00000111  0012F540  6C2922AE     6C29237E:pop ebx/pop ebp/ret 1Ch 
image from book
 

As an alternative, you can move the EIP register to the 6C29237Dh address, the ESP register ”to the 12F508h address, and then press <F5> to continue program execution. This technique actually works. At the same time, the reanimated program doesn t report an execution error from the last operation (as was the case when restoring by means of forcibly exiting the function). Instead of this, the program doesn t execute that command. Very well!

Passing Control to the Message Handler Function

Neither of the above-described methods of reanimating faulty applications are free from limitations and drawbacks. If the stack is seriously damaged by buffer overflow attacks or by algorithmic errors, the contents of vitally important processor registers will be corrupted. In this case, we won t be able to roll back (because stack contents have been lost) or exit the current function (because EIP points to some unknown location, probably somewhere in outer space). For console applications, there is actually very little that can be done in such situations GUI applications, however, are a different matter. The concept of event-driven architecture provides any windowing application with some server functions. Even if the current execution context is irreversibly lost, we can pass control to the message-handling loop, thus making the program continue processing user commands.

A classic message-handling loop appears as follows:

Listing 3.14: A classic message-handling loop
image from book
 while (GetMessage(&msg, NULL, 0, 0))  {   TranslateMessage (&msg);   DispatchMessage (&msg);  } 
image from book
 

All you need to do is pass control to the while loop, without even caring about the stack frame tuning, since optimized programs (which are overwhelming in the majority) address their local variables via ESP , rather than via EBP . Of course, when addressing to the msg variable, the function will ruin the stack contents that are located below its top. However, this is of little or no importance to us.

You should, however, realize that after you exit the application, it will definitely die (because instead of the address to return from the function, the RET machine command will find some unpredictable trash on top of the stack). However, this will be after you have saved all of your data, and, therefore, this crash doesn t present any threat. The only exception is in a group of freaky applications that forget to close all opened files and delegate this job to the ExitProcess function. However, even in this case, there is a way out: You can modify the return address in such a way as to make it point to the ExitProcess function!

Let us create the simplest Windows application and experiment with it. Start Visual Studio, choose New Project Win32 Application and then select Typical Hello, World application. Add a new item to the menu, and add the following: char *p ; *p = 0 ; then compile this project with debug info.

Drop the application, then start the debugger. Move the cursor to the first line of the message-handling loop, right-click and select Set Next Statement from the context menu. Press <F5> to continue program execution and it will actually continue to work!

Now, compile the project as a release (i.e., without debug info) and try to reanimate the application in naked machine code. Taking advantage of the fact that Windows is a truly multitasking environment, in which the crashing of one process doesn t interfere with the operation of others, start your favorite disassembler (IDA PRO, for instance) and analyze the import table of the program being debugged. Even freeware programs such as dumpbin are able to do this. However, the report produced by dumpbin is not as clear and illustrative as the results produced by fully functional disassemblers.

The main goal of our search will be the TranslateMessage/DispatchMessage functions and cross-references to the message-handling loop.

Listing 3.15: Searching TranslateMessage/DispatchMessage functions in the import table
image from book
 .idata:004040E0  ; BOOL __stdcall TranslateMessage(const MSG *lpMsg)  .idata:004040E0   extrn TranslateMessage:dword; DATA XREF: _WinMain@16+71   r  .idata:004040E0                                            ;_WinMain@16+8D   r  .idata:004040E4  ; LONG __stdcall DispatchMessageA(const MSG *lpMsg)  .idata:004040E4   extrn DispatchMessageA:dword; DATA XREF: _WinMain@16+94   r  .idata:004040E8 
image from book
 

The DispatchMessage function has the only related cross-reference that obviously leads to the message-handling loop we are after. The disassembled listing of this loop appears as follows:

Listing 3.16: The disassembled listing of the message-handling function
image from book
 .text:00401050       mov    edi, ds:GetMessageA  .text:00401050  ; The first call to GetMessageA                  ; (this isn't the loop itself yet, it is only its threshold).  .text :00401050  .text :00401056      push   0         ; wMsgFilterMax  .text :00401058      push   0         ; wMsgFilterMin  .text :0040105A      lea    ecx, [esp+2Ch+Msg]  .text :0040105A ; ECX points to the memory area, through which GetMessageA  .text :0040105A ; will return the message. The current ESP value can be any value.  .text :0040105A ; The most important thing here is that it must                  ; point to the actually allocated memory area.  .text:0040105A  ; (See memory map, if the ESP value turns out  .text:0040105A  ; to be corrupted so that it points nowhere .)  .text:0040105A  ;  .text:0040105E       push   0             ; hWnd  .text:00401060       push   ecx           ; lpMsg  .text:00401061       mov    esi, eax  .text:00401063       call   edi           ; GetMessageA  .text:00401063  ; Calling GetMessageA  .text: 00401063  .text:00401065       test   eax, eax  .text:00401067       jz     short loc_4010AD  .text:00401067  ; Checking if there are unprocessed messages in the queue  .text:00401067   .text:00401077  loc 401077:               ; CODE XREF: WinMain@ 1 6+A9   j  .text:00401077  ; Starting point of the message loop  .text:00401077  .text:00401077       mov    eax, [esp+2Ch+Msg.hwnd]  .text:0040107B       lea    edx, [esp+2Ch+Msg]  .text:0040107B  ; EDX points to the memory area used for passing the messages.  .text:0040107B  .text:0040107F       push   edx           ; lpMsg  .text:00401080       push   esi           ; hAccTable  .text:00401081       push   eax           ; hWnd  .text:00401082       call   ebx           ; TranslateAcceleratorA  .text:00401082  ; Calling the TranslateAcceleratorA function  .text:00401082   .text:00401084       test   eax, eax  .text:00401086       jnz    short loc 40109A  .text:00401086  ; Checking if there are unprocessed messages in the queue  .text:00401086   .text:00401088       lea    ecx, [esp+2Ch+Msg]  .text:0040108C       push   ecx               ; lpMsg  .text:0040108D       call   ebp               ; TranslateMessage  .text:0040108D  ; Calling the TranslateMessage function, if there is anything to translate  .text:0040108D   .text:0040108F       lea    edx, [esp+2Ch+Msg]  .text:00401093       push   edx                ; lpMsg  .text:00401094       call   ds : DispatchMessageA  .text:00401094  ; Dispatching the message  .text:0040109A  .text:0040109A  loc_40109A:               ; CODE XREF: _WinMain@16+86   j  .text:0040109A       push   0             ; wMsgFilterMax  .text:0040109C       push   0             ; wMsgFilterMin  .text:0040109E       lea    eax, [esp+34h+Msg]  .text:004010A2       push   0             ; hWnd  .text:004010A4       push   eax           ; lpMsg  .text:004010A5       call   edi           ; GetMessageA  .text:004010A5  ; reading the next message from the message queue  .text:004010A5  .text:004010A7       test   eax, eax  .text:004010A9       jnz    short loc_401077  .text:004010A9  ; running the message handling loop  .text:004010A9   .text:004010AB       pop    ebp  .text:004010AC       pop    ebx  .text:004010AD  .text:004010AD  loc_4010AD:               ; CODE XREF: _WinMain@16+67   j  .text:004010AD       mov    eax, [esp+24h+Msg.wParam]                            .text:004010B1             pop       edi                            .text:004010B2             pop       esi  .text:004010B3       add    esp, 1Ch  .text:004010B6       retn   10h  .text:004010B6  _WinMain@16 endp 
image from book
 

We can see that the message-handling loop starts from the address 401050h . This is the address, to which it is necessary to pass control in order to continue the execution of the crashed program. Try it. The program works!

Naturally, the task of reanimating a real-world application is much more complicated, because the message-handling loop in this case will be distributed over a large number of functions. Note that it is very difficult to identify all of these functions in the course of superficial disassembling. Nevertheless, applications based on standard libraries (such as MFC or OVL) have a predictable architecture. Therefore, the reanimation of such applications isn t a hopeless task.

Let s consider the structure of the message-handling loop in MFC. MFC applications spend most of their time in the following function: CWinThread :: Run(void) . This function periodically polls the queue for the arrival of new messages and sends them to the appropriate handlers. If one of the handlers has caused a critical fault, program execution can be continued using the Run function. This is its main advantage!

The function has no explicit arguments, but accepts a hidden this argument, pointing to the CWinThread class instance or its derived class, without which the function will be unable to work. Fortunately, tables of virtual methods of the CWinThread class contain a sufficient amount of birthmarks, allowing us to recreate the this pointer manually.

Let's load the Run function into the disassembler and mark all of the calls to the table of virtual methods addressed via the ECX register.

Listing 3.17: A fragment of the disassembled listing of the Run function
image from book
 .text:6C29919D n2k Trasnlate main:                 ; CODE XREF: MPC42 5715+1F   j  .text:6C29919D                                     ; MFC42 5715+67   j ...  .text:6C29919D        mov    eax, [esi]  .text:6C29919F        mov    ecx, esi  .text:6C2991Al        call   dword ptr [eax+  64h  ]   ;  CWinThread: : PumpMessage (void)  .text:6C2991A4        test   eax, eax  .text:6C2991A6        jz     short loc_6C2991DA  .text:6C2991A8        mov    eax, [esi]  .text:6C2991AA        lea    ebp, [esi+34h]  .text:6C2991AD        push   ebp  .text:6C2991AE        mov    ecx, esi  .text:6C2991B0        call   dword ptr [eax+  6Ch  ]    ;  CWinThread: :IsIdleMessage(MSG*)  .text:6C2991B3        test   eax, eax  .text:6C2991B5        jz     short loc 6C2991BE  .text:6C2991B7        push   1  .text:6C2991B9        mov    [esp+14h] , ebx  .text:6C2991BD        pop    edi  .text:6C2991BE  .text:6C2991BE loc_6C2991BE:                         ; CODE XREF: MFC42 5715+51   j  .text:6C2991BE        push   ebx                     ; wRemoveMsg  .text:6C2991BF        push   ebx                     ; wMsgFilterMax  .text:6C2991C0        push   ebx                     ; wMsgFilterMin  .text:6C2991C1        push   ebx                     ; hWnd  .text:6C2991C2        push   ebp                     ; lpMsg  .text:6C2991C3        call   ds : PeekMessageA  .text:6C2991C9        test   eax, eax  .text:6C2991CB        jnz    short n2k_Trasnlate_main  .text:6C2991CD 
image from book
 

Thus, the Run function expects to receive the pointer to the double word pointing to the table of virtual methods, elements 0x19 and 0x1B of which represent the PumpMessage and IsIdleMessage functions (or stubs to them), respectively. If DLL was not relocated , the addresses of imported functions can be found using the same disassembler. Otherwise, they should be reconstructed using the base address of the module, which is displayed by the debugger in response to the Modules command. Provided that these two functions were not blocked by the programmer, searching for the needed virtual table should be a trivial task.

For some unknown reason, the MFC42.DLL library doesn t export symbolic names for these functions, so we must get this information on our own. After processing the MFC42.LIB library using the dumpbin utility with the /ARCH command-line option, we will get the ordinals of both functions (for PumpMessage , this is 5307, and for IsIdleMessage ”4079). Now, it remains to find these values in the export list of MFC42.DLL ( dumpbin /EXPORTS mf c42. dll > mf c42. txt ), from which we will discover that the address of the PumpMessage function is 6C291194h , while the address of the IsIdleMessage is 6C292583h .

Now, it is necessary to find the pointers to the PumpMessage/IsIdleMessage functions in memory, or, to be more precise, in the data section, the base address of which is contained in the header of the PE-file. Bear in mind that in x86 processors, the least significant byte is located at the lower address, which means that all numbers are written in inverse order. Unfortunately, Microsoft Visual Studio Debugger doesn t support the memory-searching operation. Therefore, we must bypass this limitation by copying the content of the dump onto the clipboard, pasting it into a text file, and searching for addresses there by pressing <F7>. Finally, the required pointers are found at the addresses 403044h/40304Ch (naturally, in your system these addresses may be different). Note that the distance between the pointers is exactly equal to the distance between the pointers to [EAX + 64h] and [EAX + 6Ch] , while the order, in which they appear in memory, is inverse to the order, in which virtual methods are declared. This is a good symptom, which indicates that we are likely on the right path.

Listing 3.18: The addresses of the IsIdleMessage/PumpMessage functions located in the data section
image from book
 00403044  6C2911D4  6C292583  6C291194  ; IsIdleMessage/PumpMessage  00403050   6C2913DO  6C299144  6C297129  0040305C   6C297129  6C297129  6C291A47 
image from book
 

The pointers referring to the 403048h/40304Ch addresses, obviously, are the candidates for membership in the virtual methods table of the CWinThread class, for which we are looking. By extending the search range to the entire address space of the process being debugged, we will find the following two stubs:

Listing 3.19: Stubs to the IsIdleMessage/PumpMessage functions located in the data segment
image from book
  00401A20  jmp dword ptr ds:[403044h] ; IsIdleMessage  00401A26   jmp dword ptr ds:[403048h] ;  00401A2C  jmp dword ptr ds:[40304Ch] ; PumpMessage 
image from book
 

We are getting closer! We have found the stubs to the virtual functions instead of the functions themselves . By unrolling this complicated puzzle, let us try to find the references to 401A26h/401A2Ch , which pass control to the code provided above:

Listing 3.20: Virtual table of the CWinThread class
image from book
  00403490  00401A9E 00401040 004015F0     0x0,  0x1,  0x2  elements  0040349C 00401390 004015F0 00401A98     0x3,  0x4,  0x5  elements  004034A8 00401A92 00401A8C 00401A86     0x6,  0x7,  0x8  elements  004034B4 00401A80 00401A7A 00401A74     0x9,  0xA,  0xB  elements  004034C0 00401010 00401A6E 00401A68     0xC,  0xD,  0xE  elements  004034CC 00401A62 00401A5C 00401A56     0xF,  0x10, 0x11 elements  004034D8 00401A50 00401A4A 00401A44     0x12, 0x13, 0x14 elements  004034E4 00401A3E 004010B0 00401A38     0x15, 0x16, 0x17 elements  004034F0 00401A32  00401A2C  00401A26     0x18,  0x19,  0x1A elements (PumpMessage)  004034FC  00401A20  00401A1A 00401A14     0x1B, 0x1C, 0x1D elements (IsIdleMessage) 
image from book
 

Even a beginner will easily recognize the virtual functions table in this data structure. The pointers to stubs to PumpMessage/IsIdleMessage are divided by exactly one element, as required by the task conditions. Let us suppose that this virtual table is the one that we need. To check if this assumption is correct, count 0x19 elements upwards from 4034F4h , and try to find the pointer that refers to its starting point. If you are lucky and it turns out to be of the CWinThread class, the program will be able to continue its operation correctly:

Listing 3.21: The instance of CWinThread, manually located in memory
image from book
 004050B8  00403490  00000001  00000000  004050C4   00000000  00000000  00000001 
image from book
 

Actually, something very similar to the truth can be found in the memory. Let us write the 4050B8h value into the ECX register and locate the Run function in the memory (as already mentioned, its address ” 6C299l64h ”is known, provided that it hasn t been blocked). Then press <Ctrl>+<G>, enter "0x6C299164" , and choose the Set Next Statement command from the right-click menu. The program, having escaped with a slight fright, continues execution, while you have a good reason to be happy and go have a rest.

Hanged applications that react neither to keyboard entry nor to mouse clicks can be reanimated in a similar way.

How to Process Memory Dump

In the software department, the entire floor was sown with the confetti from punch cards, and there were some guys crawling over the printout of a crash dump about 20 meters in length, trying to locate an error in the memory manager. The head of the department approached the president and informed him that there was some hope that the task could be achieved before dinner.

J.Antonov. The Youth of Gates

Memory dump, also known as core , crash-dump, which is saved by the system in the event of a critical error, isn t the most useful tool for detecting the cause of the crash. However, there is often nothing else at the disposal of system administrator. What is the crash dump? This is the last moan of the operating system at the moment of irreversible fault, before it dies altogether. Digging it out is unlikely to please you. On the contrary, it is highly probable that you won t be able to detect the actual cause of the failure. Suppose, for instance, an incorrectly written driver has invaded the memory region belonging to another driver and ruined its data structures, sending all of the numbers there topsy-turvy. At the moment when the victim dies, the faulty driver may already be stopped and, in this case, it will be practically impossible using the memory dump alone to determine that it was the one that actually crashed the system.

Nevertheless, it doesn t make any sense to ignore the dump s existence. After all, it provided the only debugging method before the arrival of interactive debuggers. Contemporary programmers are spoiled by the availability of visual analysis tools. However, it doesn t provide them with much self-confidence in situations where pitiless entropy leaves them alone, face to face with their errors. But enough waxing lyrical. Let s take a closer look at this question.

First and foremost, it is necessary to edit the system configuration (Control Panel System) and make sure that dump settings correspond to our requirements (Advanced Startup and Recovery). Windows 2000 supports three types of memory dumps: small memory dump, kernel memory dump, and complete memory dump. To change the dump settings, you must have administrative privileges.

Small memory dump uses only 64 K (instead of 2 MB, as the context menu states) and includes: a) a copy of BSOD; b) a list of loaded drivers; c) the context of the crashed process with all of its threads; d) the first 16 K of the kernel stack of the crashed process. It s a disappointingly small amount of information, isn t it? Direct dump analysis provides us only with the address, at which the error has occurred and the name of the driver, to which that address belongs. Provided that system configuration didn t change after the moment of failure, we can start the debugger and disassemble the suspected driver. However, this is unlikely to produce a valuable result. After all, the content of the data segment at the moment of failure is unknown to us. Furthermore, we cannot even say for sure that we see the same machine commands as those that caused the failure. Therefore, the small memory dump might be useful only for system administrators, for whom it is sufficient to know the name of the unstable driver. As practice has shown, this information is sufficient in the vast majority of cases. The administrator is expected to send complaints along with an error report and memory dump to driver developers, and replace the driver with a newer , more stable and reliable one. By default, small memory dump will be written to the directory called %SystemRoot%\ Minidump where it is assigned the name starting with the string Mini , followed by the current date and number of the failure for the current day. For example: Mini110701 “69.dmp ”69th system dump saved on November 7, 2001.

Kernel memory dump contains significantly more comprehensive information about the failure. It includes the entire memory allocated to the system kernel and its components ”drivers, Hardware Abstraction Layer (HAL), and so on, as well as a copy of BSOD. The size of the kernel dump depends on the number of installed drivers and varies from system to system. Help system states that this value can vary from 50 to 800 MB. Eight hundred MB is too much to look realistic. A size of approximately 50 to 100 MB seems more likely. The technical documentation states that the approximate size of the kernel dump is about one third of the amount of RAM physically installed in the computer. This is the best compromise between disk space overhead, the speed of dump creation, and the information value of the latter. This option does actually provide you with the required minimum of information. Using this option, it is possible to locate practically all typical errors of the drivers and other kernel components, including those that are due to the hardware malfunction (however, the investigator must have some experience with studying memory crash dumps). By default, the kernel dump is written into the file named %SystemRoot%\ Memory.dmp. Depending on the current settings, the new dump will either overwrite the existing one or be added to its tail.

Full memory dump includes the entire content of the physical memory, both the memory occupied by kernel components and by application processes. Full memory dump turns out to be especially useful when debugging ASPI/SPTI applications, which, due to their specific features, are capable of dropping the kernel even from the application level. Despite its large size, the full memory dump is the favorite option of all system programmers (most administrators prefer the small memory dump). This isn t surprising, if we recall that hard disks long ago have passed the 100 GB threshold. From the programmer s point of view, it is much better to have an unneeded full memory dump than end up suffering because of its absence. By default, the full memory dump will be saved in the file named %SystemRoot%\ Memory.dmp. Depending on the current system settings, it will either overwrite the existing file or will be appended to its end.

Having chosen the preferred type of memory dump, let s simulate the system crash for the testing purposes. This will help us to get the required skills for recovering the system under fire. For this purpose, we ll need the following:

  • Windows Driver Development Kit (DDK), distributed by Microsoft for free and providing detailed technical documentation of the system kernel; several different C/C++ compilers, assembler, and some advanced tools for memory dump analysis.

  • The W2K_KILL.SYS or any other killer driver, such as BSOD.EXE by Mark Russinovitch, which allows you to get the dump at any given time instance, without needing to wait for a critical error to occur (the freeware version of BSOD.EXE can be downloaded from http://www.sysinternals.com ).

  • Symbol files, required for kernel debuggers to function normally and making the disassembled code more readable and obvious. Symbol files are included in the green MSDN distribution set. In principle, you can get by without them. However, the environment variable _NT_SYMBOL_PATH must be defined anyway, otherwise the i386kd.exe debugger won t work.

  • One or more of the books describing the system kernel architecture. The best is Windows 2000 Internals by Mark Russinovitch and David Solomon. This book will be interesting both for system programmers and for administrators.

After installing DDK on your computer, close all applications and start the killer driver. The system will crash, display a BSOD informing of the causes of failure (see Fig. 3.4), and write the dump (the process might be accompanied by a rattling sound).

image from book
Fig. 3.4: Blue Screen Of Death (BSOD), signaling the irrecoverable system failure and providing brief information about it

For most administrators, the appearance of BSOD means only one thing ”the system was feeling so bad that it preferred death to the infamy of unstable operation. As for the enigmatic characters, they remain a total mystery, but not for true professionals!

Let s start from the top left position on the screen, and trace all BSOD elements, one by one.

  • *** STOP : actually means that the system has stopped. It doesn t carry any other useful information.

  • 0x0000001E ”this is the Bug Check code that classifies the failure. Decoding of the Bug Check codes is provided in DDK. In our case, the code is 0x1E KMODE_EXEPTION_NOT_HALTED , which is specified by a line directly below. Brief explanations of the most typical Bug Check codes are provided in Table 3.1. Of course, it cannot serve as a replacement for the companion documentation. It will prove you, however, the need to download 70 MB of the DDK.

  • Numbers in brackets are four Bug Check parameters, the physical meaning of which depends on a specific Bug Check code, which has no physical meaning outside its context. With regard to KMODE_EXEPTION_NOT_HALTED , the first Bug Check parameter contains the number of the exception that was thrown. According to Table 1, this is STATUS_ACCESS_VIOLATION ”access to an invalid memory address. The fourth Bug Check parameter specifies the exact address. In this case, it is equal to zero, which means that a specific machine instruction attempted accessing by a null-pointer, corresponding to the initialized pointer that references unallocated memory region. Its address is contained in the second Bug Check parameter. The third Bug Check parameter is undefined in this case.

  • *** Address 0xBE80B00 ”this is the address, at which the failure took place. In this particular case, it is identical to the second Bug Check parameter. This, however, isn t always the case (Bug Check codes are not actually intended to store any addresses).

  • base at 0xBE80A00 ”contains the base loading address of the module that violated the system operating order, by which it is possible to restore the data about that module. (Attention: It isn t always possible to determine correctly the base address.) Using any suitable debugger (for instance, Soft-Ice from NuMega or i386kd from Microsoft), let s issue a command that produces the listing of all loaded drivers with their brief characteristics (in i386kd, this is achieved using the ! drivers command). As a possible alternative, you can use the drivers.exe utility supplied as part of NTDDK. No matter which method you choose, the result will be approximately as follows:

    •  kd> !drivers!drivers    Loaded System Driver Summary    Base       Code Size       Data Size      Driver Name           Creation Time    80400000 142dc0 (1291 kb) 4d680 (309 kb) ntoskrnl.exe Wed Dec 08 02:41:11 1999    80062000   cc20 (51 kb)  32c0 (12 kb)      hal.dll Wed Nov 03 04:14:22 1999    f4010000   1760 (5 kb)  1000 (4  kb)  BOOTVID.DLL Thu Nov 04 04:24:33 1999    bffd8000  21ee0 (135 kb)  59a0 (22 kb)     ACPI.sys Thu Nov 11 04:06:04 1999    be193000  16f60 (91 kb)  ccc0 (51 kb)   kmixer.sys Wed Nov 10 09:52:30 1999    bddb4000  355e0 (213 kb) 10ac0 (66 kb)    ATMFD.DLL Fri Nov 12 06:48:40 1999   be80a000    200 (0 kb)   a00 (2 kb) w2k_kill.sys Mon Aug 28 02:40:12 2000   TOTAL:   835ca0 (8407 kb) 326180 (3224 kb) (0 kb    0 kb) 
  • Note the highlighted string w2k_kill.sys , located at the base address 0xBESOAOO . This driver is exactly the one that we need! This step, though, isn t necessary, since the name of the faulty driver is displayed on the BSOD, anyway.

  • Two lines at the bottom of the screen display the progress of the dump creation, entertaining the administrator by displaying a sequence of swiftly changing digits.

Below, you will find the physical meanings of the most common Bug Check hex codes with brief explanations. The popularity rating of the Bug Check codes was composed by counting the number of times they were referenced in Internet conferences (thanks to Google).

  • OXOA ”symbolic name: IRQL_NOT_LESS_OR_EQUAL

    Driver attempted to access the memory page at the DISPATCH_LEVEL or a higher level, which resulted in a crash, since Virtual Memory Manager (VMM) operates at lower level.

    The possible source of failure can be BIOS, driver, or system service (this is especially typical for anti-virus scanners and FM tuner).

    As a possible alternative, check the cable terminators SCSI drives and the Master/Slave settings on IDE drives . Try to disable the memory caching option in BIOS.

    If this doesn t help, check the four Bug Check code parameters containing the reference to the accessed memory, IRQ level, access type (read/write) and the address of the driver s machine instruction.

  • 0x1E ”symbolic name: KMODE_EXCEPTION_NOT_HANDLED

    The kernel component has thrown an exception, and then forgotten to handle it; the number of the exception is contained in the first Bug Check parameter. It usually takes one of the following values:

    • 0x80000003 (STATUS_BREAKPOINT) : A software breakpoint was encountered , which is a debugging rudiment that the driver neglected to remove.

    • (0xC0000005) STATUS_ACCESS_VIOLATION : Access to invalid address (the fourth Bug Check parameter specifies the exact address) ”error by the developer.

    • (0xC000021A) STATUS_SYSTEM_PROCESS_TERMINATED : Failure of CSRSS and/or Winlogon processes. Both kernel components and user-mode applications can cause this error. As a rule, this happens if the machine is infected by a virus or when the integrity of system files has been violated.

    • (0xC0000221) STATUS_IMAGE_CHECKSUM_MISMATCH : The integrity of one or more system files has been violated. The second Bug Check parameter contains the address of the machine command that has thrown an exception.

  • 0x24 ”symbolic name: NTFS_FILE_SYSTEM

    There is a problem with the NTFS.SYS driver. As a rule, this happens as a result of physical disc corruption or, more rarely, under conditions of an urgent shortage of physical memory.

  • 0x2E ”symbolic name: DATA_BUS_ERROR

    The driver accessed a non-existent physical address. If this isn t the driver s fault, this means that RAM or the processor cache memory (or video memory) is malfunctioning or was overclocked to unsupported frequency values.

  • 0x35 ”symbolic name: NO_MORE_IRP_STACK_LOCATIONS

    The higher-level driver called a lower-level driver via IoCallDriver interface, but there was no free space in the IRP stack and it was impossible to pass the entire IRP. This is a deadly situation that has no direct solutions; the only way out is trying to delete some of the least important drivers, in which case you may hope to get the system up and running again.

  • 0x3F ”symbolic name: NO_MORE_SYSTEM_PTES

    The excessive fragmentation of the PTE table, which results in the impossibility of allocating the memory block requested by the driver. As a rule, this situation is characteristic for audio/video drivers manipulating with vast memory blocks. Usually, such drivers fail to release allocated memory blocks in due time. To solve the problem, try to increase the PTE number (up to 50,000 at maximum) by editing the following registry entries: HKLM\SYSTEM\CurrentControlSet\Control\ SessionManager\Memory Management\SystemPages .

  • 0x50 ”symbolic name: PAGE_FAULT_IN_NONPAGED_AREA

    An attempt to access a non-existent memory page, which is usually caused either by hardware malfunction (as a rule, the faulty component is a RAM chip, or video/cache memory), or by an incorrectly designed service (this is typical for many anti-virus scanners), or by the corruption of the NTFS-formatted volume (run chkdsk with /f and /r command-line options). Also try to disable memory caching in BIOS.

  • 0x58 ”symbolic name: FTDISK_INTERNAL_ERROR

    Failure in the course of loading a RAID array. When trying to boot the system from the primary disk, the system has detected its corruption, after which it tried to access the mirror, but there was no partition table there.

  • 0x76 ”symbolic name: PROCESS_HAS_LOCKED_PAGES

    The driver failed to release locked pages after completion of the I/O operation; to detect the name of the faulty driver, open the HKLM\SYSTEM\CurrentControlSet\ Control\Session Manager\Memory Management branch of the system registry, find the TrackLockedPages DWORD parameter, and set its value to 1. Reboot the system, and it will then save the traced stack. If a faulty driver causes an error again, there will be a BSOD with a Bug Check code equal to 0xCB . This will help detect the driver that causes this error.

  • 0x77 ”symbolic name: KERNEL_STACK_INPAGE_ERROR

    The memory page with the kernel data is not available for technical reasons. If the first Bug Check code is not equal to zero, it can take one of the following values:

    • (0xC000009A) STATUS_INSUFFICIENT_RESOURCES ”system resources are not sufficient.

    • (0xC000009C) STATUS_DEVICE_DATA_ERROR ”disk read/write error (or maybe bad sector).

    • (0xC000009D) STATUS_DEVICE_NOT_CONNECTED ”system cannot see the drive (controller malfunction, bad contact).

    • (0xC000016A) STATUS_DISK_OPERATION_FAILED ”disk I/O error (bad sector or malfunctioning controller).

    • (0xC0000185) STATUS_IO_DEVICE_ERROR ”incorrect termination of a SCSI drive or IRQ conflict of IDE drives.

      A zero value got the first Bug Check code specifies an unknown hardware problem.

      Such messages can appear if the system is infected by viruses, in the event of disk corruption, or in the case of RAM failure. Start Recovery Console and run the ChkDsk command with the /r command-line option.

  • 0x7A ”symbolic name: KERNEL_DATA_INPAGE_ERROR###

    Kernel memory page is not available for technical reasons, the second Bug Check parameter contains the exchange status, and the fourth “the virtual page address that couldn t be loaded.

    Possible reasons for the failure are bad sectors occupied by the pagefile.sys file, failures of the disk controller, or virus infection.

  • 0x7B ”symbolic name: INACCESSIBLE_BOOT_DEVICE

    Boot device is unavailable because the partition table is corrupted or doesn t correspond to the content of the boot.ini file.

    This message may appear after the replacement of the motherboard with an integrated IDE controller or the replacement of an SCSI controller, because each controller requires its native drivers. Thus, after installing a hard disk with the Windows NT operating system on a computer containing incompatible equipment, the OS won t start and needs to be reinstalled. Experienced administrators, however, can reinstall disk drivers, after booting into the Recovery Console.

    It is also recommended to test the usability of equipment and scan the system for viruses.

  • 0x7F ”symbolic name: UNEXPECTED_KERNEL_MODE_TRAP

    Processor exception unhandled by the operating system. As a rule, this situation is caused by hardware malfunction, incorrect CPU overclocking, its incompatibility with installed drivers, or algorithmic errors in drivers.

    Check the usability of your equipment and remove all unnecessary drivers. The first Bug Check parameter contains the exception number and can take the following values:

    • 0x00 ”attempt of dividing by zero

    • 0x01 ”system debugger exception

    • 0x03 ”breakpoint exception

    • 0x04 ”overflow

    • 0x05 ”generated by the BOUND instruction

    • 0x06 ”invalid opcode

    • 0x07 ”Double Fault

      Descriptions of all other exceptions can be found in the technical documentation for Intel and AMD processors.

  • 0xC2 ”symbolic name: BAD_POOL_CALLER

    The current thread has caused an incorrect pool-request, which is usually due to an algorithmic error by the driver developer. However, to all appearances, the system itself isn t bug-free, since to eliminate this error, Microsoft recommends the installation of SP2.

  • 0xCB ”symbolic name: DRIVER_LEFT_LOCKED_PAGES_IN_PROCESS

    After completing the input/output procedure, the driver is unable to release locked pages (see PROCESS_HAS_LOCKED_PAGES ).

    The first Bug Check parameter contains the called address, while the second Bug Check parameter specifies the calling address. The last, fourth, parameter points to the UNICODE string with the driver name.

  • 0xD1 ”symbolic name: DRIVER_IRQL_NOT_LESS_OR_EQUAL

    Same as IRQL_NOT_LESS_OR_EQUAL.

  • 0xE2 ”symbolic name: MANUALLY_INITIATED_CRASH

    A manually generated system failure initiated by pressing the <Ctrl>+<Scroll Lock> hotkey combination, provided that the registry parameter CrashOnCtrlScroll located under HKLM\System\CurrentControlSet\Services\i8042prt\Parameters contains a nonzero value.

  • 0x7A ”symbolic name: KERNEL_DATA_INPAGE_ERROR

    Kernel memory data page is not available for technical reasons. The second Bug Check parameter contains the exchange status. The fourth parameter specifies the virtual page address that couldn t be loaded.

    Possible causes include bad sectors in pagefile.sys, disk controller failures, and virus infection.

Recovering the System after Critical Failure

Unnatural, practically sexual inclination to the F8 button appeared in Rabbit with a good reason.

14,400 bauds and 19,200 users

Operating systems of the Windows NT family can tolerate even critical faults ”even if they occur in most unsuitable instances (for example, in the course of disk defragmentation). Fault-tolerant file system driver does everything on its own (although, it will be wise to run chkDsk anyway).

If you have chosen the Full memory dump or Kernel memory dump options, then, after you boot successfully the next time, the hard disk will drag its read/write head for a long period of time, even if there are no attempts to access it. Don t worry! Windows simply relocates the dump from the virtual memory to its constant location. After starting Task Manager, you ll see a new process in the list ” SaveDump.exe. This is the task that it carries out. The need for such a two-step scheme of saving the dump is explained by the fact that the operability of file system drivers isn t guaranteed at the moment of critical error, and the operating system can t risk using them. Instead, it limits itself to temporary storing the dump in virtual memory. By the way, if the available amount of virtual memory turns out to be insufficient (Advanced Performance Virtual memory), it will be impossible to save the dump.

If the system fails to boot, and this error is persistent, don t forget that you have the <F8> key at your disposal. Choose the Last Known Good Configuration menu option. Starting the system in safe mode with the required minimum of vitally important system services and drivers is a more radical step. System reinstallation is the last resort, and it isn t recommended to resort to this unless absolutely necessary. It is better to try to start the Recovery Console and relocate the dump to another machine, where you ll be able to investigate it.

Loading the Crash Dump

To load the crash dump into your Windows Debugger (windbg.exe), choose the Crash Dump option from the File menu, or press the <Ctrl>+<D> hotkey combination. If you are working with the i386kd.exe debugger, use the -z command-line option followed by the fully qualified path name to the dump file. The name of the dump file must be separated from the command by one or more blanks, and the _NT_SYMBOL_PATH environment variable must specify the full path to the symbol files. Otherwise, the debugger will terminate abnormally. As an alternative, you can use the -y command-line option. In this case, the console screen will appear approximately as follows: i386kd -z C:\WINNT\memory.dmp -y C:\WINNT\Symbols . Note that it is necessary to call the debugger from the Checked Build Environment/Free Build Environment console located in the Windows 2000 DDK folder. Otherwise, you ll fail.

Associating DMP files with the i386kd debugger is a good idea. After you do so, you ll be able to call the debugger by simply pressing the <Enter> key in FAR Manager. The choice of debugging tools, though, is a matter of personal preference. Some people prefer KAnalyze, while others are quite content with simple DumpChk. The range of analysis tools, from which you can choose, is broad (for instance, DDK contains four such tools). Thus, for the sake of distinctness, let us choose i386kd.exe, also known as Kernel Debugger.

As soon as the Kernel Debugger console appears on the screen (Kernel Debugger is the console application preferred by those who spent their youth sitting at terminals), the cursor will quickly disassemble the current machine instruction and drag us into the depths of machine code. Enter u from the keyboard, thus making the debugger to continue code disassembling.

According to symbolic identifiers PspUnhandledExceptionInSystemThread and KeBugcheckEx , we are somewhere deep in the kernel, or, to be more precise, somewhere in the surroundings of the code that displays the BSOD:

Listing 3.22: The results of disassembling the memory dump from the current address
image from book
 8045249c 6a01     push 0x1  kd>u  _PspUnhandledExceptionInSystemThread@4:  80452484 8B442404 mov   eax, dword ptr [esp+4]  80452488 8B00     mov   eax, dword ptr [eax]  8045248A FF7018   push  dword ptr [eax+18h]  8045248D FF7014   push  dword ptr [eax+14h]  80452490 FF700C   push  dword ptr [eax+0Ch]  80452493 FF30     push  dword ptr [eax]  80452495 6A1E     push  1Eh  80452497 E8789AFDFF     call  _KeBugCheckEx@20  8045249C 6A01     push  1  8045249E 58       pop   eax  8045249F C20400   ret   4 
image from book
 

There is nothing interesting in the stack (look for yourself. To view the stack contents, issue the kb command):

Listing 3.23: The stack contents don t provide any clues to the actual nature of the critical error
image from book
 kd> kb  ChildEBP RetAddr Args to Child  f403f71c 8045251c f403f744 8045cc77 f403f74c ntoskrnl!PspUnhandledExceptionInSystemThread+0x18  f403fddc 80465b62 80418ada 00000001 00000000 ntoskrnl!PspSystemThreadStartup+0x5e  00000000  00000000 00000000  00000000  00000000 ntoskrnl!KiThreadStartup+0x16 
image from book
 

This turn of things is mystifying. You can disassemble the core as many times as you like, but it won t bring you any closer to the solution. This is logical, since the current address ( 8045249Ch ) is far beyond the limits of the killer driver ( 0BE80A00h ). So let s go another way. Do you recall the address that was displayed on the BSOD? If you don t, this isn t a problem! If the system settings don t prohibit it explicitly, copies of all BSODs are saved in the system log. Let s open it: Control Panel Administrative Tools Event Viewer):

Listing 3.24: A BSOD copy saved in the system log
image from book
 The system was rebooted after a critical error:  0x0000001e (0xc0000005,  0xbe80b000,  0x00000000, 0x00000000).  Microsoft Windows 2000 [v15.2195]  Memory dump was saved: C:\WINNT\MEMORY.DMP. 
image from book
 

Based on the category of the critical error ( 0x1E ), we can easily determine the address of the killer instruction ” 0xBE80B000 (in the above-provided listing, it is in bold). Now issue the u BE80B000 command to view its contents, and you ll see:

Listing 3.25: The results of disassembling of the memory dump by the address reported by BSOD
image from book
 kd>u 0xBE80B000   be80b000 a100000000    mov eax, [00000000]   be80b005 c20800    ret 0x8  be80b008 90    nop  be80b009 90    nop  be80b00a 90    nop  be80b00b 90    nop  be80b00c 90    nop  be80b00d 90    nop 
image from book
 

This looks much closer to the truth. The instruction pointed to by the cursor (in the text, it is in bold) calls on the cell that has a zero address, which causes the critical exception that crashes the system. Now, we know for certain, which branch of the program has caused this exception.

What should we do if we don t have a copy of the BSOD at our disposal? In fact, a copy of the BSOD is always available. You only need to know where to look for it. Try opening the dump file using any hex editor, and you ll find the following strings.

Listing 3.26: A copy of a BSOD in the program dump header
image from book
  image from book  
image from book
 

All main Bug Check parameters can be recognized immediately : 1E 00 00 00 is the failure category code ” 0x1E (in x86 processors, the least significant byte is located at the lower address, which means that all numbers are written in the inverse order); 05 00 00 C0 is the STATUS_ACCESS_VIOLATION exception code; and 00 BO 80 BE specifies the address of the machine command that has thrown this exception. The combination OF 00 00 00 93 08 can be recognized easily as the system Build number (just write it in decimal notation).

To view Bug Check parameters in more readable format, it is possible to use the following debugger command ” dd KiBugCheckData :

Listing 3.27: Bug Check parameters displayed in more readable format
image from book
 kd> dd KiBugCheckData  dd KiBugCheckData  8047e6c0  0000001e c0000005 be80b000 00000000  8047e6d0  00000000 00000000 00000001 00000000  8047e6e0  00000000 00000000 00000000 00000000  8047e6f0  00000000 00000000 00000000 00000000  8047e700  00000000 00000000 00000000 00000000  8047e710  00000000 00000000 00000000 00000000  8047e720  00000000 00000000 00000000 00000000  8047e730  00000000 e0ffffff edffffff 00020000 
image from book
 

The list of other useful commands includes:

  • !drivers ”the command displaying the list of drivers that were loaded for the moment of failure

  • !arbiter ”the command displaying all arbitrators along with arbitration ranges

  • !filecache ”the command displaying the information about the file system cache and PT

  • !vm ”the command that produces the report on the virtual memory usage, etc.

Unfortunately, it is impossible to provide a complete listing of the commands here. If you need it, you ll find such a listing in the manual for your preferred debugger.

Naturally, it is much more difficult to detect the actual cause of the system crash in the real world. This is because any real driver consists of a large set of functions interacting with one another according to some intricate scheme. These functions form complicated hierarchies, sometimes crossed by tunnels of global variables, turning the driver into a labyrinth. Let us consider an example. The construction appearing as mov eax , [ebx] , where ebx == 0 , works quite normally, by obediently throwing an exception, and it is absolutely senseless trying to talk with it! It is necessary to locate the code that writes a zero value into EBX , which isn t an easy task. Of course, it is possible to scroll the screen upwards, hoping that the program code executes linearly at this section, but no one can guarantee that it is actually the case. The possibility to trace back is also missing. Roughly speaking, the address of the previous machine instruction is unknown, so it isn t recommended to rely on screen scrolling.

Having loaded the driver being tested into any intellectual disassembler that automatically restores cross-references (such as IDA PRO), we will get a more or less complete idea about the topology of the program s controlling branches. Naturally, disassembling, because of its static nature, doesn t guarantee that control hasn t been passed somewhere else. It does, however, narrow the search range. Generally speaking, there are lots of good books about disassembling (for instance, I have written one myself ” Hacker Disassembliny Uncovered by Kris Kaspersky); therefore, I won t concentrate on this topic here. I ll simply wish you good luck.

image from book
Fig. 3.5: The i386kd debugger at work; despite its minimalistic interface, it is a powerful and convenient instrument, allowing you to carry out prodigious tasks by pressing a couple of shortcut keys or keyboard combinations (one of which calls up your own script)
image from book
Fig. 3.6: Windbg with loaded memory dump. Note that the debugger automatically highlights the Bug Check codes without waiting for us to instruct it to do so, and when attempting to disassemble the instruction that has caused the critical exception, the screen displays the string specifying the name of the killer driver: Module Load: W2K KILL.SYS ”a nice touch


CD Cracking Uncovered. Protection against Unsanctioned CD Copying
CD Cracking Uncovered: Protection Against Unsanctioned CD Copying (Uncovered series)
ISBN: 1931769338
EAN: 2147483647
Year: 2003
Pages: 60

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net