Chapter 21: Main Symptoms of Virus Infection

image from book  Download CD Content

Overview

Most viruses use a rather specific set of machine commands and data structures practically never encountered in "normal" applications. The virus developer, if desired, can conceal these, in which case the infected code would become impossible to detect. However, this is true only in theory. Practice has shown that viruses are usually so dumb that detecting them becomes possible in seconds.

Corruption of the executable file structure is a typical but insufficient symptom of the virus infection. If you encounter such files, this doesn't necessarily mean that they are infected. This unusual structure might be caused by some cunning protection or some self-expression by the application developer. Furthermore, some viruses invade files practically without damaging their structures. A certain and unambiguous answer can be obtained only by fully disassembling the file being investigated. However, this method is too labor- intensive , requiring assiduity, fundamental knowledge of the operating system, and an unlimited amount of free time. Therefore, hackers compromise, briefly viewing the disassembled listing to find the main indications of the virus infection.

To infect the target file, the virus must find it, choosing only the files of "its own" type from possible candidates. Consider ELF files. To make sure that the possible target actually is an ELF file, the virus must read its header and compare the first 4 bytes to the ELF string, which corresponds to the 7F 45 4C 46 ASCII sequence. If the virus body is encrypted, it uses a hash comparison or another cunning programming trick, in which case there will be no ELF string in the body of the encrypted virus file. Nevertheless, this string is present in more than half of all existing UNIX viruses, and this technique, despite its striking simplicity, works excellently.

Load the file being investigated into any HEX editor and try to find the ELF string. In the infected file, there will be two such strings: one directly in the header and another in the code section or data section. Do not search the disassembled listings! Most viruses convert the ELF string into the 32-bit integer constant 464C457Fh , which conceals the virus's presence. However, if you switch to the dump mode, it will immediately appear on the screen. Fig. 21.1 shows the dump of the file infected with the VirTool.Linux.Mmap.443 virus, which uses this technique when searching for targets suitable for infection.

image from book
Figure 21.1: Fragment of a file infected with the VirTool.Linux.Mmap.443 virus. When viewing the file in the HEX dump mode, the ELF string used by the virus for searching possible targets for infection is clearly visible

The Linux.Winter.343 virus (also known as Lotek) cannot be disclosed using this technique, because it uses a special mathematical transformation to encrypt the ELF string (Listing 21.1).

Listing 21.1: Fragment of the Lotek virus that carefully conceals its interest in ELF files
image from book
 .text:08048473    MOV   EAX,  OB9B3BA81h  ; -"ELF" (minus "ELF") .text:08048478    ADD   EAX, [EBX]        ; The first 4 bytes of the target .text:0804847A    JNZ   short loc_804846E ;   This is not an ELF file. 
image from book
 

The direct value B9B3BA81h , corresponding to the B •& pound ; text string (in Listing 21.1, it is highlighted in bold), is nothing but the ELF string converted into a 32-bit constant and multiplied by negative one. By adding the resulting value with the first 4 bytes of the potential target, the virus obtains zero if strings are identical, and a nonzero value if they are not.

As a variant, the virus might convert the ELF reference string to its two's complement (invert all the bits, then add one), in which case its body will contain the 80 BA B3 B9 sequence. Cyclic shifts from one to seven positions in different directions, incomplete checks (checks of two or three matching bytes, etc.), and some other operations are encountered more rarely.

The secrecy of the mechanism of the system-call implementation is more vulnerable. The virus cannot afford dragging the entire LIBC library with it, having linked it to its body by static linking, because the existence of such a monster can hardly remain unnoticed. There are several methods of solving this problem, the most popular of which uses the native API of the operating system. Because the native API remains the prerogative of the implementation details of the specific system, UNIX developers have abandoned attempts at standardizing it. In particular, in System V and its multiple clones , the system functions are called using the far call at the 0007:00000000 address, and in Linux the same is called using the INT 80h interrupt.

Note  

The /usr/include/asm/unistd.h file lists the numbers of system commands.

Thus, the use of native API considerably narrows the natural habitat of the virus, making it unportable.

Normal programs rarely work on the basis of native API (although utilities from the FreeBSD 4.5 distribution set behave in this way). Therefore, the presence of a large number of machine commands such as int 80h/call 0007:0000000 (CD 80 / 9A 00 00 00 00 07 00) likely is evidence of a virus. To prevent false positives (in other words, to detect viruses where there are no traces of one), you must not only detect native API calls but also analyze the sequence of these calls. The following sequence of system commands is typical for viruses: sys_open, sys_lseek, old_mmap/sys_munmap, sys_write, sys_close, sys_exit . The exec and fork calls are used more rarely. In particular, they are used by the STAOG.4744 virus. Viruses such as VirTool.Linux.Mmap.443, VirTool.Linux.Elfwrsec.a, PolyEngine.Linux.LIME.poly, and Linux.Winter.343 do without these calls.

Fig. 21.2 shows a fragment of a file infected by the VirTool.Linux.Mmap.443 virus. The presence of unconcealed int 80h calls easily discloses the aggressive nature of the program code, indicating its inclination for self-reproduction.

image from book
Figure 21.2: Fragment of a file infected with the VirTool.Linux.Mmap.443 virus, which discloses its presence by direct calls to the native API of the operating system

For comparison, consider how the system calls of a normal program appear. For illustration, I have chosen the cat utility supplied as part of the FreeBSD 4.5 distribution set (Fig. 21.3). The interrupt instructions are not spread over the entire code; instead, they are grouped in their own wrapper functions. The virus also can "wrap" system calls in layers of wrapper code. However, it is unlikely that it will succeed in forging the nature of wrappers of the specific target file.

image from book
Figure 21.3: Fragment of a normal file (the cat utility from the FreeBSD distribution set). Note that native API calls are carefully enclosed in wrapper functions

A few viruses do not surrender as easily and use various techniques that complicate their detection and analysis. The most talented (or, perhaps, more careful) virus writers dynamically generate the int 80h/call 0007:00000000 instructions and then push these onto the top of the stack, secretly passing control to the virus. Consequently, the int 80h/call 0007:00000000 calls will be missing from the disassembled listing of the program being investigated. Such viruses can be detected only by multiple indirect calls to subroutines located in the stack. This task is difficult because indirect calls are present in abundance even in normal programs. Therefore, determining the values of the called addresses is a serious problem (at least, in case of static analysis). On the other hand, such viruses are few (and existing ones are mostly lab viruses), so for the moment there is no reason for panic. More often, viruses use encryption of the individual fragments of their bodies, which are critical for detection. However, for the IDA Pro disassembler, this problem doesn't present a serious obstacle , and even multilayered encryption can be removed without any serious mental effort.

Nevertheless, even a wise man stumbles, and IDA Pro is no exception. Normally, IDA Pro automatically determines the names of the called functions, formatting them as comments. Because of this favorable circumstance, there is no need to constantly consult the reference manual when analyzing algorithms. Such viruses as Linux.ZipWorm cannot resign themselves to such a situation and actively use special programming techniques that confuse and blind the disassembler. For example, Linux.ZipWorm forcibly pushes the numbers of the called functions through the stack, which confuses IDA, depriving it of the capability of determining the function names (Listing 21.2).

Listing 21.2: Fragment of the Linux.ZipWorm virus that confuses IDA Pro
image from book
 .text:080483C0  PUSH 13h .text:080483C2  PUSH 2 .text:080483C4  SUB  ECX, ECX .text:080483C6  POP  EDX .text:080483C7  POP  EAX   ; EAX := 2. This is the fork call. .text:080483C8  INT  80h   ; Linux - IDA failed to determine the call name! 
image from book
 

The virus has achieved the desired goal, and it is impossible to take the disassembled listing with missing automatic comments by force. However, consider the situation from another viewpoint. Applying antidebugging techniques is in itself evidence of an abnormal situation if not of an infection. Thus, to use antidebugging technologies, the virus must pay by weakened its concealment (it is said that the virus's "ears" are protruding from the infected file).

This weakness also occurs because most viruses never care about creating startup code, or they imitate it poorly. At the entry point of a normal program, a normal function with classical prologue and epilogue is almost always present. Such a function is automatically recognized by the IDA Pro disassembler (Listing 21.3).

Listing 21.3: An example of a normal start-up function with classical prologue and epilogue
image from book
 text 080480B8 start          PROC  NEAR text 080480B8 text 080480B8                PUSH  EBP text 080480B9                MOV   EBP, ESP text 080480BB                SUB   ESP, 0Ch ... text:0804813B                RET text:0804813B start          ENDP 
image from book
 

In some cases, the start-up function passes control to libc_start_main and terminates using hlt without ret (Listing 21.4). This is normal; however, bear in mind that many viruses written in Assembly obtain the same start-up code as a "gift" from the linker. Therefore, the presence of the start-up code in the file being investigated is not the reason for considering this file healthy .

Listing 21.4: Alternative example of the normal start-up function
image from book
 .text:08048330      public start .text:08048330      start    PROC  NEAR .text:08048330               XOR   EBP, EBP .text:08048332               POP   ESI .text:08048333               MOV   ECX, ESP .text:08048335               AND   ESP, 0FFFFFFF8h .text:08048338               PUSH  EAX .text:08048339               PUSH  ESP .text:0804833A               PUSH  EDX .text:0804833B               PUSH  offset sub_804859C .text:08048340               PUSH  offset sub_80482BC .text:08048345               PUSH  ECX .text:08048346               PUSH  ESI .text:08048347               PUSH  offset loc_8048430 .text:0804834C               CALL  ___libc_start_main .text:08048351               HLT .text:08048352               NOP .text:08048353               NOP .text:08048353      start    ENDP 
image from book
 

Most infected files appear differently. In particular, the start-up code of the PolyEngine.Linux.LIME.poly virus appears as shown in Listing 21.5.

Listing 21.5: Start-up code of the PolyEngine.Linux.LIME.poly virus
image from book
 .data:080499C1 LIME_END:                   ; Alternative name is "main". .data:080499C1   MOV   EAX, 4 .data:080499C6   MOV   EBX, 1 .data:080499CB   MOV   ECX, offset gen_msg ; "Generates 50 [LiME] encrypted" .data:080499DO   MOV   EDX, 2Dh .data:080499D5   INT   80h                 ; Linux - sys_write .data:080499D7   MOV   ECX, 32h 
image from book
 


Shellcoder's Programming Uncovered
Shellcoders Programming Uncovered (Uncovered series)
ISBN: 193176946X
EAN: 2147483647
Year: 2003
Pages: 164

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net