Exploiting Unicode-Based Vulnerabilities | Hacking Ubuntu: Serious Hacks Mods and Customizations (ExtremeTech)

In order to exploit a Unicode-based buffer overflow, we first need a mechanism to transfer the process's path of execution to the user -supplied buffer. By the very nature of the vulnerability, an exploit will overwrite the saved return address or the exception handler with a Unicode value. For example, if our buffer can be found at address 0x00310004 , then we'd overwrite the saved return address/exception handler with 0x00310004 . If one of the registers contains the address of the user-supplied buffer (and if you're very lucky), you may be able to find a "jmp register" or "call register" opcode at or near a Unicode-style address. For example, if the EBX register points to the user-supplied buffer, then you may find a jmp ebx instruction perhaps at address 0x00770058 . If you have even more luck, you may also get away with having a jmp or call ebx instruction above a Unicode-form address. Consider the following code:

 0x007700FF     inc ecx    0x00770100     push ecx    0x00770101     call ebx

We'd overwrite the saved return address/exception handler with 0x007700FF , and execution would transfer to this address. When execution takes up at this point, the ECX register is incremented by 1 and pushed onto the stack, and then the address pointed to by EBX is called. Execution would then continue in the user-supplied buffer. This is a one in a million likelihood ”but it's worth bearing in mind. If there's nothing in the code that will cause an access violation before the call/jmp register instruction, then it's definitely useable.

Assuming you do find a way to return to the user-supplied buffer, the next thing you need is either a register that contains the address of somewhere in the buffer, or you need to know an address in advance. The Venetian Method uses this address when it creates the shellcode on the fly. We'll later discuss how to get the fix on the address of the buffer.

The Available Instruction Set in Unicode Exploits

When exploiting a Unicode-based vulnerability, the arbitrary code executed must be of a form in which each second byte is a null and the other is non-null. This obviously makes for a limited set of instructions available to you. Instructions available to the Unicode exploit developer are all those single-byte operations that include such instructions as push , pop , inc , and dec . Also available are the instructions with a byte form of

 nn00nn

such as:

 mul eax, dword ptr[eax],0x00nn

Alternatively, you may find

 nn00nn00nn

such as:

 imul eax, dword ptr[eax],0x00nn00nn

Or, you could find many add-based instructions of the form

 00nn00

where two single-byte instructions are used one after the other, as in this code fragment:

 00401066 50                   push        eax 00401067 59                   pop         ecx

The instructions must be separated with a nop -equivalent of the form 00 nn 00 to make it Unicode in nature. One such choice could be:

 00401067 00 6D 00             add         byte ptr [ebp],ch

Of course, for this method to succeed, the address pointed to by EBP must be writable. If it isn't, choose another; we've listed many more later in this section. When embedded between the push and the pop we get:

 00401066 50                   push        eax 00401067 00 6D 00             add         byte ptr [ebp],ch 0040106A 59                   pop         ecx

These are Unicode in nature:

 \x50\x00\x6D\x00\x59