2.2 DOS Technologies | Malicious Mobile Code: Virus Protection for Windows (OReilly Computer Security)


Team-Fly

	Malicious Mobile Code: Virus Protection for Windows By Roger A. Grimes
	Table of Contents

	Chapter 2. DOS Computer Viruses

2.2 DOS Technologies

A DOS PC boots up, places DOS in control, and then runs a myriad of possible files and programs. Booting is the group of processes a PC executes to check itself for basic configuration errors and to load the operating system. A lot is happening during the first minute a PC is turned on.

2.2.1 PC Boot Sequence

The following explanation is going to assume an Intel PC running MS-DOS with one hard drive.

On every PC, many processes and checks must be made prior to any program or user being able to execute the first command. Much of the initial boot sequence, as shown in Figure 2-1, is dedicated to performing simple hardware self-checks and is the same regardless of the operating system. Once the operating system (OS) begins to boot, the sequence differs according to the particular needs of the OS.

Figure 2-1. Normal PC boot sequence (regardless of operating system)

After you flip the power switch, the power supply does a quick self-check and sends a signal to the CPU to start. The CPU initializes itself and starts executing hardware self-check code located in the read-only memory basic input/output system ( ROM BIOS) chip located on the motherboard. The ROM BIOS chip contains instructions that are "burned into" the chip and aren't normally changed. Early on, it took special equipment using ultraviolet light to write to the ROM BIOS chip. Today, the "burn-in" process can be as simple as running a specially-designed program to write the BIOS code to the chip.

The ROM BIOS is used for three functions:

To remember hardware and configuration settings (i.e., enable or disable booting from drive A, enable shadow RAM cache, remember the PC has a slave CD-ROM drive on IDE port 1, etc.).
Contains interrupt code subroutines that allow the operating system or software to access hardware devices. For example, software can initiate an interrupt 13h (h indicates hexidecimal notation) to access the hard drive.
Lastly, contains the instructions to find and start the operating system boot process.

The CPU always executes the first instruction located at the ROM address FFFF0. ROM chip manufacturers and CPU makers have agreed that the first instruction will always be located in the same memory address location. The first instruction then runs the rest of the ROM code. The code begins testing video memory and looking for other ROM chips (e.g., SCSI controller cards) to initialize. The CPU then checks a scratchpad location in system memory to see whether the PC was powered down (i.e., cold booted) or just warm booted by using the keyboard. This check will become more important later on. The former results in a test of system random access memory (RAM) and a further set of ROM self-checks, often referred to as the power on self-test (POST). Any errors found will usually result in audible error beeps, and be displayed if possible.

The system then searches for the first boot device defined in the system ROM, and commonly checks the floppy disk drive first. Usually the floppy disk drive is not intentionally being used to boot. If drive A is empty, then the CPU will check for the first physical hard drive and read the first sector of the first track (cylinder 0, head 0, sector 1) into memory. The first sector contains the master boot record (MBR) and partition table . The MBR tells the CPU which partition and in what sector to continue the boot process from. The MBR is closely followed by the partition table.

The partition table keeps track of logical hard drive partitions. Every physical hard drive is broken down into one or more logical partitions. Although one partition per hard drive can make computing life easier, there are many reasons people choose, or are forced to make, several logical partitions on one physical hard drive. All operating systems have a maximum partition size and can force large hard drives to be subdivided into smaller partitions. For example, a Windows NT 4.0 initial boot partition has a maximum size of 8GB because of its reliance on the older DOS file allocation table (FAT) file storage system during the early stages of installation.

Because floppy drives can't be partitioned in DOS, they don't have MBRs or partition tables.

Logical partitions can create flexibility. As shown in Figure 2-2, multiple partitions allow users to have several distinct drive letters (C, D, E, and so forth) from one hard drive or run several different operating or file storage systems. I run Windows NT, Windows 98, and Linux from one hard drive. A partition table has one entry per partition, often with a maximum of four entries per physical disk. Each entry marks whether the partition is bootable (there is usually only one bootable partition per physical hard disk), where that partition starts and ends on the hard disk, what type of file system it maintains (e.g., DOS FAT, OS/2 HPFS, NTFS, Linux, etc.), and how many sectors are used.

Figure 2-2. Example of PC with two hard drives with several logical partitions

The CPU retrieves the MBR and partition table into memory and reads which sector of the bootable partition contains the operating system boot code. The MBR code then loads the operating system's boot sector into memory and begins to execute it. The first few bytes contain a jump instruction to the rest of the boot code. When you really get to know the boot process, you will see that a lot of time is spent pointing to the next location without any real work being done. Up until now, everything is the same regardless of the operating system. The rest of our example follows a 16-bit FAT DOS boot sector.

We will examine the Windows 98 and NT boot processes later on in Chapter 3. Among other things, the boot sector identifies the OS that formatted the partition (i.e., MS or IBM and version), bytes per sector, sectors per cluster, sectors per track, heads on the hard drive, and boot error messages (e.g., "Nonsystem disk or disk error"), and the software routines that load the rest of the operating system. Those routines then look for the files needed to continue the boot process. The boot sector is little more than some basic error checking and programs that load the real operating system code.

With DOS, the next file found is IO.SYS . IO.SYS contains ROM BIOS extensions and initialization code. IO.SYS searches for and loads MSDOS.SYS into memory (IBM PCs running PC-DOS use IBMBIO.COM and IBMDOS.COM instead). MSDOS.SYS is then executed and begins to run the low-level DOS routines. These files are not visible to normal operations, as they have a hidden file attribute. At this point, DOS is technically started.

If a CONFIG.SYS file exists in the root directory, all the commands and device drivers it contains are processed . Only now is COMMAND.COM executed. Some people are surprised that CONFIG.SYS is processed before COMMAND.COM because COMMAND.COM is the only file users see when they make a disk bootable. It is often assumed that COMMAND.COM is DOS. However, COMMAND.COM is simply the user's interface to DOS (i.e., how files are copied , deleted, displayed, etc.). There are many other DOS " command interpreters" available and used, but because COMMAND.COM is loaded by default by Microsoft and IBM, it was easily accepted as the de facto standard. In Unix, these user interfaces with the operating system are called shells (e.g., CSH, BSH). You can define an alternate DOS shell in a CONFIG.SYS file with the SHELL= statement. Finally, the AUTOEXEC.BAT file is processed. The DOS boot process is summarized in Figure 2-3.

Figure 2-3. DOS boot process

The first half of the boot process serves to find the boot process of the operating system. The operating boot process loads the operating system. The DOS boot sector loads system files, parses CONFIG.SYS , running programs and commands found there, executes COMMAND.COM , and then parses the AUTOEXEC.BAT file, and runs those programs and commands. Besides setting environmental and operational parameters, any files or programs that the user wants executed every time the PC starts are placed in the AUTOEXEC.BAT . Viruses can be injected in nearly every place of the boot-up routine.

2.2.2 .EXE and .COM Files

Now that DOS is booted and running, programs and applications can be executed and opened. In the DOS world, most programs are stored in .EXE or .COM files.

2.2.2.1 .COM files

.COM files are easier to write and modify than .EXE files, but have built-in limitations. They can only be 64KB with program-specific data stored in the same 64KB memory segment as the program. Although not initially documented by Microsoft, larger programs can be created using overlay files and swapping different portions in and out memory. This explains why overlay files ( .OVL ) are targets for virus infection, too. A .COM file in memory is an exact copy of the binary image located on disk, with one exception. As shown in Figure 2-4, every DOS .COM file has a 256-byte header portion called a program segment prefix (PSP). DOS makes a lot of assumptions and sets up the PSP with no input from the file.

Figure 2-4. Simplified structure of a .COM file

2.2.2.2 .EXE files

.EXE files on disk are not exact copies of the binary image on disk. Programs and data are not limited to a single 64KB memory segment, as each can have its own segment. DOS still sets up the PSP, but all other initializing information must be stored in the .EXE's 512-byte header. Among these things, the header tells DOS where the different segments (code, data, and stack) are located and where the actual program coding starts.

Figure 2-5 shows a simplified example of an .EXE file.

Figure 2-5. Simplified structure of an .EXE file

.EXE - infecting viruses must be able to manipulate the header information to account for the new modifications. There are at least 10 recalculations performed in rewriting the header alone, not to mention the program modifications. One of the header calculations involves making sure the newly modified program requests enough minimum memory to run. The virus must read how much memory the original program requested and then add its requirements as well. The address location of the first valid instruction (called the code segment instruction pointer) must be recalculated to point to the virus code and the original CS:IP saved so the original host file can be executed. Similarly, the data segment and stack segment instruction pointers must be saved, recalculated, and rewritten. There is even a file size variable that must be rewritten to include the increase in the file size from the virus. If virus writers make one mistake, the infected file crashes. Because .COM files are easier to program there are more .COM -infecting viruses than .EXE -infecting bugs .

Another incidental uniqueness of .EXE files is that they always begin with the letters MZ or ZM, after the initials of the Microsoft programmer (Mark Zbikowsky) who developed the .EXE file structure. The MZ initials can easily be seen using the TYPE or EDIT command against an .EXE file (see Figure 2-6). Many viruses look for the presence of the MZ at the start of a file when they're looking for .EXE files.

Figure 2-6. Example .EXE file contents with MZ initials visible

Although it may not be immediately visible to the untrained eye, the presence of *.COM in the file contents shown in Figure 2-6 would make me very suspicious if I didn't already know I was looking at a virus. Viruses often use text strings like *.COM or *.EXE to search for new host files.

2.2.3 Software to Hardware

In order for most MMC to work, it must manipulate hardware. It might read, write, and delete files, erase disks, or print screen messages. Most software programs don't directly manipulate PC hardware. That's left to the operating system or BIOS chip machine language instructions (called BIOS interrupts , and discussed later). Low-level operations, such as directing the hard drive to seek a file on a particular track at a particular sector, would be cumbersome if left up to each application. Each software program would have to have its own machine language routines to manipulate the hard drive, floppy drive, screen, modem, and other peripherals. It would take years longer to write a word processor, spreadsheet, or game. And every time a new type of hardware came out, the application would have to be updated. If every programmer had to learn the needed detail to talk to each piece of hardware that could be plugged into a PC, we'd never have the incredible software we have today, not to mention the compatibility.

The BIOS interrupt routines can be called by most programs, but as Windows matures and pulls away from its DOS origins, it uses less of them. Windows operating systems have their own software device drivers that talk to the hardware, although BIOS routines are still used for particular tasks . For example, although Windows NT uses its own native drivers to communicate and direct hardware, it will use interrupt routines when it first starts to load files from the disk. Thus, a command or action in an application can end up taking one of several paths to the PC's underlying hardware. The software routine can try to write directly to the hardware, use BIOS interrupt routines, or use the operating system's routines and drivers. Malicious coders can use any combination of the three when writing rogue code programs. Figure 2-7 shows the pathway choices an MMC program can use.

Figure 2-7. Software to hardware pathway

Which type of software/hardware interface a program uses will affect its ability to run in the face of changing hardware and software. Programs using BIOS interrupt routines are the most flexible. They can write across a wide range of hardware devices and operating systems. However, some operating systems, like Windows NT and 2000, prevent programs from utilizing BIOS routines, unless they gain special access (discussed in Chapter 3 and Chapter 4). And not all hardware has related BIOS routines. Programs using the operating system to communicate with hardware will almost certainly work on that platform, but cannot be guaranteed on others. For example, many viruses written for Windows 95 will not work on Windows NT, and vice versa. Also, what can and can't be done by a program is predefined by the operating system. NT's default system protection is an example. Most application programs running under NT cannot write to protected areas of memory or manipulate system files. Lastly, an MMC program can talk directly to hardware, but writing these low-level routines is complex work, and can make for buggy programs. Malicious code writers take all of this into account when creating programs. When DOS was king, writing in assembly language and using BIOS and DOS interrupt routines was the norm for MMC writers. It allowed maximum flexibility and worked across many operating systems.

2.2.4 Interrupts

Interrupts are low-level software routines explicitly designed to be called by higher-level programs. Each interrupt does a particular function. Some write to the screen, some print to the printer, others write bytes to the serial port. Each operating system has its own series of interrupts. DOS has one set of interrupts, NT, Novell, and OS/2 has another. The BIOS chip has another set -- possibly the most important set. It is the BIOS chip routines that determine the level of IBM-compatibility a particular PC has. DOS programs can indiscriminately "call" interrupts from the operating system or BIOS depending on what they are trying to accomplish. This is not true of all operating systems. NT, for instance, significantly limits the interrupts a program may call outside of the operating system.

Interrupt software subroutines are stored in memory. A software program or operating system calls an interrupt, which points to a predefined low-level routine located in a particular place in system memory. The memory locations of these subroutines are stored in a simple database stored in memory called the interrupt vector table . Programs are free to write their own routines or modify existing routines by simply changing the interrupt's memory address in the vector table. Sometimes this arrangement is all too easy to manipulate.

Many malicious programs gain control of a PC's functionality by rewriting an interrupt routine or pointing the vector table to a new memory location. The process of taking over an interrupt is called hooking . For example, a virus might insert its own file-copying subroutine so that whenever a file is copied, it can check for and infect any program files. Malicious programs often expend a considerable effort to make sure they have hooked the appropriate interrupts necessary to intercede in normal processing. Good antivirus programs will try just as hard to make sure that there are no inappropriate interrupt hooks before they run.

Each interrupt is identified by a unique number. Interrupt 21h is reserved for DOS operating system interrupts. Any calls to it results in a DOS interrupt routine being initiated. Lower interrupts are for BIOS and hardware-level routines. Interrupt 17h involves parallel port services. Interrupt 10h interacts with your video card, and interrupt 12h manipulates memory. Interrupt 13h refers to the BIOS disk's read and write routines, a favorite of boot virus writers.

Interrupt numbers, and many computer components , are identified using the hexadecimal notation. Hexadecimal notation uses the base 16 numbering system, where A equals 10, and F equals 15. Hexadecimal numbers are often followed by a `h' to indicate that the Base 10 numbering system is not being used.

A program can use interrupt 21h or interrupt 13h to write to a disk, and to cause data damage. Interrupt 21h writes to files and 13h to disk sectors. Virus writers can take their pick when wanting to disable a PC. Many early antivirus programs were written to prevent virus manipulation in the DOS world. Viruses writing to the BIOS-level with 13h had no trouble bypassing the protection. Each interrupt set has function (and sometimes subfunction ) identifiers to indicate what particular action to process. For example, interrupt 21h, 3Dh opens a file. Interrupt 21h, 41h deletes a file. Table 2-1 shows a small list of various DOS interrupts that are of special interest to the virus writer.

Table 2-1. Common DOS interrupts used by viruses.

Interrupt number	Function
Int 21h,31h	Terminate and stay resident
Int 21h,3Ch	Create a file
Int 21h,3Dh	Open a file
Int 21h,3Eh	Close a file
Int 21h,40h	Write a file
Int 21h,41h	Delete a file
Int 21h,4Eh	Find a file
Int 21h,43h	Get/set file attributes
Int 21h,57h	Get/set file date

In order to write a DOS virus, a programmer must understand the relationships between the ROM BIOS, DOS, and the other interrupts to call the appropriate mechanisms in his coding. A virus programmer has to have a better-than-average understanding of how computers operate under the hood, how files are really saved to disk, and how to make DOS work for his creation. The next section explains how DOS viruses are written and defines the different types.


Team-Fly

Top