It is hard to imagine anything simpler than a computer virus. Even the Tetris game is more sophisticated! However, programming beginners experience significant difficulties when they start to program viruses. How is it best to insert the virus code into the target file? Which fields should be changed, and which ones are better not to touch? What tools should be used for debugging viruses, and is it possible to use high-level languages for this purpose?
Throughout UNIX's evolution, lots of formats were suggested for binary executable files. For the moment, however, only three of them have been preserved in more or less usable form: a.out, Common Object File Format (COFF), and Executable and Linkable Format (ELF).
The a.out format (a shortened form of assembler and link editor output files) is the simplest and the oldest of the three preceding formats. It appeared when PDP-11 and VAX were prevailing computers. The file of this format comprises three segments: .text (code segment), .data ( initialized data segment), and .bss ( uninitialized data segment). It also has two tables of relocatable elements (one table for the code segment, and another one for the data segment), a table of symbols containing addresses of exported and imported functions, and tables of strings containing the names of the exported and imported function. The a.out format is considered obsolete and is practically out of use. A brief manual, though enough for understanding it, is contained in FreeBSD man . Also, it is recommended to study the a.out.h include file supplied with any UNIX compiler.
COFF is the direct successor of the a.out format. It represents a considerably advanced and improved version. It contains lots of new sections, the header format has been changed (for example, the length field was introduced, which allows the virus to insert its body between the header and the first section of the file), all sections obtained the possibility of mapping at any address of virtual memory (this is important for viruses to insert their bodies into the beginning or into the middle of the file), etc. COFF is popular in the Windows NT world (PE files are slightly modified COFF files); however, in contemporary UNIX this format is practically out of use. Contemporary UNIX systems give preference to ELF.
ELF is similar to COFF. It is even assumed that it obtained its euphonic name from UNIX developers, among which there always have been lots of J. R. R. Tolkien fans. It is simply a variation of COFF, designed for ensuring compatibility between 32- and 64-bit architectures. Nowadays, this is the main format of executable files in the UNIX family of operating systems. It can't be said that this file format always satisfies everyone (for instance, FreeBSD resisted the invasion of elves as long as it could; however, with the release of version 3.0 its developers were forced to declare ELF the default format). Mainly, this happened because the newer versions of the most popular C compiler, GNU C, ceased to support older formats. Thus, ELF became the de facto standard recognized by everyone as such, whether they liked it or not. Thus, this will be the main file format described in this chapter. To efficiently withstand virus attacks, you'll have to study the finest details of ELF. To achieve this, I recommend that you read two excellent manuals covering this topic: http://www. ibiblio .org/pub/historic-linux/ftp-archives/sunsite.unc.edu/Nov-06-1994/GCC/ELF.doc.tar.gz (" Executable and Linkable Format: Portable Format Specification ") and http://www.nai.com/common/media/vil/pdf/mvanvoers_VB_conf%202000.pdf ( " Linux Viruses: ELF File Format " ).
There are at least three principally different methods of infecting files distributed in the a.out format:
"Merging" the original file with subsequently writing it into a temporary file, which is removed after termination of the execution (as a variant, it is possible to manually download the target file)
Extension of the last section of the file and writing the virus body to its end
Compression of the part of original file and insertion of the virus body to the space that has been freed
Migration to the ELF and COFF file formats adds four more methods of infection:
Extending the code section and inserting the virus body to the freed space
Shifting the code section down and writing the virus body into its beginning
Creating a custom section in the beginning, in the end, or in the middle of the file
Inserting the virus body between the file and the header
Having inserted its body into the file, the virus must capture control. This might be achieved using the following methods:
Creating a custom header and custom code or data segment overlapping the existing ones
Correcting the entry point in the header of the target file
Inserting a jump command into the body of the target file, which would pass control to the virus body
Modifying the import table (according to the a.out specification conventions, this table is called a symbols table) to replace functions, a technique especially important for stealth viruses
Except for the merging technique, all these tricks are hard to conceal. In most cases, infection can be easily detected by visually controlling the disassembled listing of the file being analyzed . All these issues will be covered in more detail later in this chapter. For the moment, however, it is necessary to concentrate attention on the mechanisms of system calls, which most viruses used to ensure the minimum level required for their vital activities.
For normal operation, viruses require at least four main functions that operate with files: open , close, read, and write. Optionally, it might also implement the search operation to search for files on local disks or over the network. Otherwise, the virus will be unable to reproduce itself and this program will be simply a Trojan, not a virus.
There are at least three ways for solving this problem:
Using system functions of the target program (if it has any)
Supplementing the import table of the target program with all required system functions
Using the native API of the operating system
Finally, it is necessary to mention that Assembly viruses (which prevail among UNIX viruses) are strikingly different from compiled programs in their laconic and excessively straightforward style, untypical for high-level languages. Because packers of executable files are practically never used in the UNIX world, any extraneous "additions" and "patches" are most likely to represent Trojan components or viruses.