General Structure and Strategy of Virus Behavior | Shellcoders Programming Uncovered (Uncovered series)

The individual structure of the virus code depends on the imagination of its developer. In general, it appears approximately the same as in Windows viruses. As a rule, in the beginning of the virus there is the decryptor, which is followed by the module responsible for searching suitable targets of infection, the injector of the virus code, and the procedure of passing control to the carrier file.

For most ELF viruses, the following sequence of system calls is typical: sys_open (mov eax, 05h/int 80h) opens the file; sys_lseek (mov eax, 13h) moves the file pointer to the required position; old_mmap (mov eax, 5Ah/int 80h) maps the file to the memory; sys_unmap (mov eax, 5Bh/int 80h) removes the image from the memory and writes all modifications to the disk; and sys_close (mov eax, 06/int 80h) closes the file (Fig. 19.1). The numbers of system functions provided here relate to Linux.

Figure 19.1: Typical structure of the virus code

The mapping technique considerably simplifies working with large files. Now it is no longer necessary to allocate a buffer and copy the file fragment by fragment into that buffer. It is possible to delegate all unskilled labor to the operating system and concentrate all efforts directly on the process of infection. It is necessary to mention, however, that when infecting a file several gigabytes long (for example, self-extracting distribution of some software product), the virus will have to view the file through a "window," mapping different parts of the file into 4-GB address space, or simply abandon the idea of infecting this file and search for a more decent target. Most viruses proceed the latter way.

Infection by Merging

Viruses of this type are mainly the creations of programming beginners who have not mastered the basic concepts of the operating system architecture yet but still strive to play dirty tricks on someone. In the most generalized form, the infection algorithm appears as follows : The virus finds a suitable target, makes sure that it has not been infected yet, and ensures that this file has all attributes required for modification. Then the virus reads the target file into the memory (a temporary file) and overwrites the target file with its body. The original file is then written into the tail of the virus as an overlay or is placed into the data segment (Fig. 19.2).

Figure 19.2: Typical method of infecting an executable file by merging

Having captured control, the virus retrieves from its body the contents of the original file, writes it into the temporary file, assigns it the executable attribute, and starts the "healed" file for execution, after which removes it from the disk again. Because such manipulations rarely remain unnoticed, some viruses might undertake "manual" loading of the infected file from the disk. To tell the truth, writing a procedure for correct loading of the ELF file is not a trivial task, and debugging of such a procedure is even more difficult; therefore, such viruses are unlikely to appear. After all, ELF is not the same thing as a.out.

A typical feature of such viruses is a time code segment followed by an enormous data segment (overlay), which represents an independent executable file (Fig. 19.3). Try to find an ELF, COFF, or a.out header using a context search, and you will find two such headers in the infected file. However, do not try to disassemble the overlay or data segment, because no meaningful code will be obtained as a result. This is because to obtain meaningful code, it is necessary to first know the exact location of the entry point and then to place the tail of the file being disassembled at its legal addresses. In addition, the original contents of the file might be intentionally encrypted by the virus, in which case the disassembler would return meaningless garbage, which would be difficult to understand. Nevertheless, this doesn't create any serious complications with the analysis. Virus code is unlikely to be large; therefore, the procedure of restoring the encryption algorithm (if the virus used it) won't take a long time.

Figure 19.3: An example illustrating an executable file merged by the UNIX.a.out virus. A tiny code section (about 300 bytes) indicates a high probability of infection

The situation becomes much worse if the virus moves part of the original file into the data segment and another part into the code segment. Such a file appears like a normal program except that main part of the code segment is made up of "dead" code that never gains control. At first glance, the data segment appears normal; however, after careful investigation it turns out that all cross-references (for example, references to text strings) are shifted in relation to their "native" addresses. As you can easily guess, the value of this offset is equal to the virus length.

Disassembling reveals three functions typical for this kind of virus: the exec and fork functions are used for starting the "healed" file, and the chmod function is used for assigning the executable attribute to this file.

Infection by Extending the Last Section of the File

The simplest method of nondestructive infection of the file is extending the last section or segment of the target file and writing the virus body to its end.

Note

Traditionally, the term "section" was used for describing this kind of viruses, and from now on I will follow this tradition. It should be pointed out, however, that in relation to ELF files this is not quite correct, because the system loader of executable ELF files works exclusively with segments and ignores sections.

Strictly speaking, this statement is not quite right. As a rule, the last section of the file is the .bss section intended for storing uninitialized data. Principally, it is possible to insert the virus code here; however, it doesn't make any sense. The system loader is not so dumb that it will spend precious processor time loading uninitialized data from the slow disk. Therefore, it would be more correct to say the "last meaningful section." However, I suggest ignoring these minor terminological inconsistencies. After all, this is not a doctoral thesis.

The .bss section is usually preceded by the .data section, which contains initialized data. It is this section that becomes the main target of the virus attack. Load the file being investigated into some disassembler and view, in which section the entry point resides. If this is the .data section, as shown in Fig. 19.4, then it is highly probable that the file being investigated is infected with a virus. In the example shown in this illustration, the PolyEngine.Linux.LIME.poly virus has inserted its body into the end of the .data section and set the entry point there. The presence of the executable code in the .data section is a sure sign of virus infection.

Figure 19.4: The file infected with the PolyEngine.Linux.LIME.poly virus, which has inserted its body into the end of the .data section and set the entry point there

When inserting into the a.out file, the virus generally must carry out the following actions:

Read the file header to make sure that this actually is an a.out file.
Increase the length of the a_data field by the value equal to the size of its body.
Copy its body into the end of the file.
Correct the contents of the a_entry field to capture control (if the virus captures control in such a way).

The algorithm of insertion into ELF files is somewhat more complicated (Fig. 19.5):

The virus opens the file, reads its header, and makes sure that this is an ELF file.
By viewing the program header table, the virus searches for the segment most suitable for infection. Note that practically any segment with the PL_LOAD attribute is suitable for infection. Other segments also are suitable; however, the virus code would look somewhat strange there.
The located segment is extended to the end of the file and increased by the value equal to the size of the virus body. This is achieved by synchronous correction of the p_filez and p_memz fields.
The virus writes itself into the end of the file to be infected.
To capture control, the virus either corrects the entry point into the file (e_entry) or inserts the true entry point using jmp to its body. The technique of capturing control is a topic for a separate section, and it will be covered in Chapter 20 .

Figure 19.5: Typical method of infecting an executable file by extending its last section

Now it is time to make a small technical note. The .data section, as a rule, has only two attributes: read and write. By default, it doesn't have the execute attribute. Does this mean that it is impossible to execute the virus code there? This question has an ambiguous answer. Everything depends on the details of implementation of the specific processor and the specific operating system. Some processors and operating systems ignore the lack of execute attribute, considering that the right for execution directly follows from the presence of the read right. Other processors and operating systems throw an exception, thus abnormally terminating execution of the infected program. To bypass this situation, viruses might assign the execute attribute to the .data section. This gives the virus away. However, such viruses are rarely encountered , and most virus writers leave the .data section with the default attributes.

Here is another important issue, which is not obvious at the first glance. Have you ever considered how the behavior of the infected file would change if the virus is inserted into a .data section other than the last one and followed by .bss ? It won't change in any way. Although the last section will be mapped to different addresses, the program code won't "know" about it and will continue to access uninitialized data at their previous addresses, now occupied by the virus code, which by this time would have already completed its operation and returned control to the original file. Provided that the program code is designed correctly and doesn't rely on the initial values of uninitialized variables , the virus's presence won't render the program unusable.

However, under the austere conditions of reality, this elegant technique of infection ceases to operate because the statistically average UNIX application contains about a dozen of different sections.

For example, consider how the ls utility from the Red Hat 5.0 distribution is organized (Listing 19.5).

Listing 19.5: Typical memory map of a typical UNIX executable file

 Name    Start    End      Align Base Type Class 32 es   ss   ds   fs   gs  .init   08000A10 08000A18 para  0001 publ CODE  Y  FFFF FFFF 0006 FFFF FFFF  .plt    08000A18 08000CE8 dword 0002 publ CODE  Y  FFFF FFFF 0006 FFFF FFFF  .text   08000CF0 08004180 para  0003 publ CODE  Y  FFFF FFFF 0006 FFFF FFFF  .fini   08004180 08004188 para  0004 publ CODE  Y  FFFF FFFF 0006 FFFF FFFF  .rodata 08004188 08005250 dword 0005 publ CONST Y  FFFF FFFF 0006 FFFF FFFF  .data   08006250 08006264 dword 0006 publ DATA  Y  FFFF FFFF 0006 FFFF FFFF  .ctors  08006264 0800626C dword 0007 publ DATA  Y  FFFF FFFF 0006 FFFF FFFF  .dtors  0800626C 08006274 dword 0008 publ DATA  Y  FFFF FFFF 0006 FFFF FFFF  .got    08006274 08006330 dword 0009 publ DATA  Y  FFFF FFFF 0006 FFFF FFFF  .bss    080063B8 08006574 qword 000A publ BSS   Y  FFFF FFFF 0006 FFFF FFFF  extern  08006574 08006624 byte  000B publ       N  FFFF FFFF FFFF FFFF FFFF  abs     0800666C 08006684 byte  000C publ       N  FFFF FFFF FFFF FFFF FFFF

The .data section is located in the middle of the file. To reach it, the virus must take care of the modification of the other seven sections by correcting their p_offset fields (the section offset from the start of the file) as appropriate. Some viruses do not care about this, and as a result the infected files cease to start.

On the other hand, the .data section of the file under consideration contains only 10h bytes, because the lion's share of the program data is located in the .rodata section (which is available only for reading). This is a typical practice of contemporary linkers, and most executable files are organized in this way. The virus cannot place its code into the .data section, because this immediately discloses the infection, and it cannot insert its code into the .rodata section, because it will be unable to decrypt itself (do not suggest allocating the stack memory and copying the virus body there, because this task is beyond the capabilities of contemporary virus writers; furthermore, there won't be any practical use in doing so). Because the virus must insert its body into the middle of the file instead of the end, it would be much better to insert its code into the .text section containing machine code rather than into the .data section. The virus won't be too noticeable there. This issue will be covered in more detail later in this chapter (see "Infection by Extending the Code Section of the File" ).

Infection by Compressing Part of the Original File

Programs become increasingly larger and viruses grow more intricate every day. No matter how ugly the Microsoft's code might be, it is much better than some UNIX counterfeits. For example, the cat utility supplied as part of FreeBSD 4.5 takes as much as 64 KB. Isn't this too much for such a primitive utility?

Viewing this file using a HEX editor reveals a large number of regular sequences (mostly chains of zeros) that are either unused or provide the possibility of efficient compression. The virus, being tempted by the availability of free space, can copy its body there, even when to achieve this goal it would have to divide its body into several dozen fragments . Even if there is no free space, this is not a problem. Practically every executable file contains a large number of text strings, which also can be efficiently compressed. At first glance, it might seem that such an infection algorithm is too complicated. Believe me, the task of implementing a Huffman packer is much simpler than shamanism with the separation of sections that the virus must carry out to insert its body into the middle of the file. Furthermore, when using this method of infection, the file length remains unchanged, which partially conceals the virus's presence.

Consider how the virus insets its body into the code segment. In the simplest case, the virus scans the file for a long sequence of NOP commands used for alignment of the program code by addresses divisible by the page size. It writes a fragment of its own body there and adds the command for jumping to the next fragment. This process continues until the virus writes its entire body into the file. At the final stage of infection, the virus writes the addresses of the fragments it has "captured" and then passes control to the carrier file (without doing so, the virus won't be able to copy its body into the next file being infected). There are several intricate viruses that contain built-in tracers, which automatically assemble the virus body; however, these are exotic lab viruses that are never encountered running wild.

Different programs contain different amounts of free space used for alignment. In particular, programs that are part of the basic distribution set of FreeBSD 4.5 are mainly aligned by 4 bytes. Taking into account that the unconditional jump command in x86 systems takes at least 2 bytes, it is an unrealistic job for the virus to fit within this small space. The situation is different with the Red Hat 5.0 system. Alignment is set to values from 08h to 10h bytes, which allows easy infection of files with a virus of an average size.

For example, Listing 19.6 provides a fragment of the disassembled listing of the ping utility infected by the UNIX.NuxBe.quilt virus, representing a modification of the well-known NuxBee virus published in the e-zine released by the #29A hack group .

Listing 19.6: Fragment with UNIX.NuxBe.quilt, whose body " spreads " over the code section

 .text:08000BD9         XOR   EAX, EAX .text:08000BDB         XOR   EBX, EBX .text:08000BDD         JMP   short loc_8000C01     .text:08000C01 loc_8000C01:                      ; CODE XREF:   j .text:0800BDD .text:08000C01        MOV   EBX, ESP .text:08000C03        MOV   EAX, 90h .text:08000CO8        INT   80h                  ; Linux - sys_msync .text:08000COA        ADD   ESP, 18h .text:08000COD        JMP   loc_8000D18     .text:08000D18 loc_8000D18:                      ; CODE XREF:   j .text:08000COD .text:08000D18        DEC   EAX .text:08000D19        JNS   short loc_8000D53 .text:08000D1B        JMP   short loc_8000D2B     .text:08000D53 loc_8000D53:                      ; CODE XREF:   j .text:08000D19 .text:08000D53        INC   EAX .text:08000D54        MOV   [EBP + 8000466h], EAX .text:08000D5A        MOV   EDX, EAX .text:08000D5C        JMP   short loc_8000D6C

Even the researching beginner will easily detect the presence of the virus in the program body. A characteristic chain of jmp instructions, stretched through the entire data section, simply cannot help attracting attention. In normal programs, such constructs are practically never encountered (tricky enveloped protection mechanisms and packers of executable files based on polymorphic engines are not covered here).

Note that the virus fragment need not form a linear sequence. On the contrary, a virus, provided that its creator wasn't stupid, will take all measures to conceal its existence. You must be prepared to encounter the situation, in which jmp instructions will skip the entire file, using "illegal" epilogues and prologues for merging with the surrounding functions. However, this trick can be easily disclosed by cross-references automatically generated by the IDA Pro disassembler (cross-references to fictitious prologues and epilogues are missing).

By the way, the algorithm considered here is not quite correct. The chain of Nop instructions can be encountered in any location within the program (for instance, within some function), in which case the infected file will cease to operate. To avoid this situation, some viruses carry out a range of additional checks. In particular, they make sure that NOP operations are located between two functions by recognizing functions using the prologue and epilogue commands.

Insertion into the data section is carried out in an even simpler way. The virus searches for a long chain of zeros separated by printable ASCII characters . Having found such a chain, it assumes that this is a "neutral" territory generated because of alignment of text strings that doesn't belong to anyone . Because text strings most frequently are located in the .rodata section available only for reading, the virus must be prepared to save in the stack and/or in dynamic memory all cells that it has modified.

Curiously, viruses of this type are difficult to locate. Nonprintable ASCII characters between text strings are encountered frequently, and this is normal. For instance, these might be offsets, some data structures, or even garbage left by the linker.

Consider Fig. 19.6, showing the cat utility before ( a ) and after ( b ) infection. Certainly , infection is not self-evident.

Figure 19.6: The cat utility before (a) and after (b) infection

Investigators that have some experience with IDA might object that there are no problems here. It is enough to move the cursor to the first character following the end of an ASCIIZ string and press the <C> key; the disassembler will immediately display the virus code, picturesquely twisted into the text strings (see Listing 19.7). However, this happens only in theory. In practice, printable characters are encountered among nonprintable ones. The heuristic IDA analyzer, having erroneously interpreted these printable characters as "actual" text strings, won't allow you to disassemble them. Well, it won't allow this at least until they are explicitly "depersonalized" by pressing the <U> key. In addition, the virus might insert a special character into the beginning of each of its fragments, which represents the part of some machine command, thus confusing the disassembler. As a result, IDA will disassemble only a fragment of the virus (and even this will be done incorrectly), after which it will fail, causing the investigator to draw the false conclusion that this is a legal data structure containing no malicious machine code.

Listing 19.7: Fragment with UNIX.NuxBe.jullet, whose body "spreads" over the data section

 .rodata:08054140 aFileNameTooLon db 'File name too long', 0 .rodata:08054153 ; ------------------------------------------- .rodata:08054153        MOV   EBX, 1 .rodata:08054158        MOV   ECX, 8049A55h .rodata:08054158        JMP   loc_80541A9 .rodata:08054160 ; ------------------------------------------- .rodata:08054160 aTooManyLevelsO db 'Too many levels of symbolic links', 0 .rodata:08054182 aConnectionRefu db 'Connection refused', 0 .rodata:08054195 aOperationTimed db 'Operation timed out', 0 .rodata:080541A9 ; ------------------------------------------- .rodata:080541A9 loc_80541A9: .rodata:080541A9        MOV   EDX, 2Dh .rodata:080541AE        INT   80h .rodata:080541B0        MOV   ECX, 51000032h .rodata:080541B5        MOV   EAX, 8 .rodata:080541BA        JMP   loc _80541E2 .rodata:080541BA ; ------------------------------------------- .rodata:080541BF        db    90h                  ; P .rodata:080541C0 aTooManyReferendb 'Toomany references: can', 27h, 't splice',0  rodata:080541E2 ; ------------------------------------------- .rodata:080541E2 loc_80541E2: .rodata:080541E2        MOV   ECX, 1FDh .rodata:080541E7        INT   80h                  ; ILinux - sys_creat .rodata:080541E9        PUSH  EAX .rodata:080541EA        MOV   EAX, 0 .rodata:080541EF        ADD   [EBX + 8049B43h],  bh .rodata:080541F5        MOV   ECX, 8049A82h .rodata:080541FA        JMP   near ptr unk_8054288 .rodata:080541FA ; ------------------------------------------- .rodata:080541FF        DB    90h                  ;P .rodata:08054200 aCanTSendAfterS DB 'Can', 27h, 't send after socket shutdown', 0

Alas! However powerful IDA might be, it is not omnipotent, and the hacker will have to work a great deal over the resulting listing. Provided that the investigator has some disassembling experience, most machine commands can be recognized in a HEX dump at first glance.

However, it is not always possible to scrape up the required number of interstring bytes in every executable file. In this case, the virus might search for a regular area to compress it. In the simplest case, it searches for a chain made up of identical bytes and compresses it according to the RLE algorithm. When carrying out this action, the virus might care not to run up against the contact mine of relocatable elements (it should be mentioned, however, that not a single virus among those that I investigated did this). After gaining control and carrying out all that it planned to do, the virus pushes the unpacker of the compressed code into the stack. This unpacker is responsible for restoring the original state of the file. As can be easily seen, only sections available both for reading and for writing are infected using this method. This means that the most promising and tempting sections, such as . rodata and .text , are not suitable for this purpose unless the virus ventures to change their attributes, thus giving itself away.

The most troublesome viruses can also infect the section of uninitialized data. No, this is not an error or misprint; such viruses exist. Their arrival is possible because a fully-functional virus is still hard to fit in the "holes" that remain after alignment; however, a virus loader fits there adequately. Sections of uninitialized data are not bound to be loaded from the disk into the memory (although some UNIX clones still load them); they might even be missing from the file and be dynamically created by the system loader. However, the virus is not going to search for them in the memory. Instead, it manually reads them directly from the infected file (although in some cases the operating system providently blocks access to the currently executed file).

At first glance, by placing its body into the section of uninitialized data, the virus doesn't gain any advantages (perhaps, it even discloses itself). However, any attempts at catching such a virus produce the same result: The virus escapes . The section of uninitialized data is no visually different from all other sections of the file, and it might contain anything: from a long sequence of zeros to the developer's copyright. For instance, developers of the FreeBSD 4.5 distribution set proceed as in Listing 19.8.

Listing 19.8: The .bss section of most files supplied as part of the FreeBSD distribution set

 000CE530:  00 00 00 00 FF FF FF FF  00 00 00 00 FF FF FF FF 000CE540:  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 000CE550:  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 000CE560:  00 47 43 43 3A 20 28 47  4E 55 29 20 63 20 32 2E    GCC: (GNU) c 2. 000CE570:  39 35 2E 33 20 32 30 30  31 30 33 31 35 20 28 72   95.3 20010315 (r 000CE580:  65 6C 65 61 73 65 29 20  5B 46 72 65 65 42 53 44   elease) [FreeBSD     000CF2B0:  4E 55 29 20 63 20 32 2E  39 35 2E 33 20 32 30 30   NU) c 2.95.3 200 000CF2C0:  31 30 33 31 35 20 28 72  65 6C 65 61 73 65 29 20   10315 (release) 000CF2D0:  5B 46 72 65 65 42 53 44  5D 00 08 00 00 00 00 00   [FreeBSD]