Assembly Language

skip navigation

honeypots for windows
Chapter 12 - Malware Code Analysis
Honeypots for Windows
by Roger A. Grimes
Apress 2005
progress indicator progress indicatorprogress indicator progress indicator

In early computer history, assembly language was one of the only ways to program a computer. You could also do it in binary (with ones and zeros) or by physically changing electronic jumper switches. Assembly language is a short step up from binary programming. It takes bits (the ones and zeros) and works on them a byte at a time (eight bits make up a byte in most of today’s computer hardware). Actually, in assembly language, you are often working with data and programs a bit or half a byte (called a nibble) at a time.

Learning assembly language is the hardest part of being a disassembler. If you don’t find the task of learning and using assembly language too daunting, it opens up an entire new world of understanding computers and malware. Knowing assembly language means you know what really can and cannot be done by a program or hacker.

Years ago, I had hackers tell me that they could write to write-protected (write-protect tab in the open position) floppy diskettes, set monitors on fire, and break hard drives using malicious code. At the time, write-protecting floppy diskettes when they weren’t being written to was a common antivirus recommendation. A virus that could defeat our antivirus advice would be big news. Because I was an assembly language programmer, I knew this to be a false claim, like the others. That’s because there is no assembly language instruction on the Intel PC CPU that allows data to be written to a write-protected floppy diskette. The assembly language instruction for writing to a floppy diskette is essentially “write data.” There is no “unprotect a write-protected floppy diskette and then write” instruction. Write-protecting a floppy diskette is a physical mechanism that a software instruction cannot override. Write protection is detected by the drive the diskette is in, and no writing will be allowed because of the physical condition.

It was only because I knew what the CPU on the motherboard was and wasn’t capable of, per its assembly language instruction set, that I could confidently deny the hackers’ claims. I fell back on my assembly language background to debunk myths of malware: living in modem memory, breaking hard drive read/write heads by banging them against the hard drive platter, super-exciting pixels to make monitors catch fire, and trojans able to spin the power-supply fan speeds into lethal killing machines.

Conversely, knowing assembly language means being able to do anything programmatically possible by your computer, and do it quickly. Programmers are creative creatures by nature, but assembly language programmers are more so. And because assembly language doesn’t need to go through layers of conversions to speak directly with the CPU, it is very fast.

Steve Gibson, of SpinRite fame (http://www.grc.com) still writes all his programs in assembly language. In the days where nearly every programmer writing applications (even OSs) uses ahigh-level language, Steve wants to write tight, fast code. And his applications show the fruits of his efforts. SpinRite can recover damaged hard drive data when everything seems hopeless. He wrote an entire assembler program in under 20KB of code. Even his freeware screensaver, ChromaZone, packs a lot of functionality and graphics into one executable. There are no setup routines or support files to install. Steve’s personal conviction of continuing to use assembly language in aWindows 32-bit world helped him gather a nice page of links to assembly language resources (http://www.grc.com/smgassembly.htm).

Programming Interfaces

A program can interact with several different software-to-machine-language interfaces to do its job. It can use assembly language, BIOS interrupts, the Windows Application Programming Interface (API), and third-party APIs, and a program can also write directly to the hardware. Figure 12-2 illustrates the programming interface choices.

image from book
Figure 12-2: Programming interface choices

Note 

Application Programming Interfaces (APIs) are the programming methods provided by OSs, languages, and programs as a way for external programs to interact with their internal routines.

The programming interface used depends on the computer platform (IBM-compatible, Macintosh, and so on), OS, programming language, desired functionality, and personal preference. All of these programming interfaces eventually break down their own language instructions into machine language to be executed in the CPU. A disassembler will reveal which APIs were used in the compiled program, leading to clues about the program’s behavior.

Using BIOS Interrupt Routines

All BIOS chips have interrupt functions (stored routines) that can be used to manipulate data. They work below the OS level, although high-level interfaces can call them to do the underlying task. The BIOS routines are often called software interrupts, because the CPU keeps doing what it is busy doing until a routine is invoked. Then the CPU stops what it is doing—gets interrupted—and runs the requested routine.

The BIOS chips for a particular computer platform (such as IBM-compatible, Macintosh, Amiga, and so on) are usually similar, even among different hardware vendors. For instance, when you hear the term IBM-compatible, a large part of the compatibility is determined by the BIOS interrupt routines present. If the BIOS routines were not standardized, then it would essentially mean that each programmer would need to specifically write distinctive code for different pieces of hardware. Imagine if programmers had to write different pieces of code for all the diverse types of hard drives, printers, and mice in existence, or try to predict every piece of hardware that could ever interact with their programs. It would be impossible, or we would have a lot less hardware to choose from.

BIOS interrupt routines can be called from within many programming languages, including assembly language. Each BIOS interrupt is identified with a hexadecimal number and most have functions to initiate specific actions. BIOS interrupts range from 00h to FFh, with each interrupt handling a different task. For instance, BIOS interrupt 13h calls routines that interact with the disk system—reading, writing, and so on. Depending on the function called along with it, interrupt 13h can reset a disk (function 0h), read data (function 02h), or write to the disk (function 03h). Other BIOS interrupts handle everything from screen output (Int 10h) to capturing keyboard input (Int 16h). You can find a list of BIOS interrupts at http://www.delorie.com/djgpp/doc/rbinter/ix.

Using the Windows API

Most OSs also have their own API that developers can use to write OS-specific applications. The Windows API allows the programmer to be further isolated from the specifics of hardware variation. For example, a programmer can call Microsoft’s “print-to” feature, and as long as the printer is defined in Windows, the programmer’s application can print to it. The programmer’s application doesn’t need to know what type of printer it is (laser printer or inkjet, or HP versus Canon) or the types of fonts supported. Windows handles all of that housekeeping, so programmers can concentrate on their application’s specific features. Other housekeeping tasks include handling error conditions, storing and retrieving files, displaying dialog boxes, capturing mouse or keyboard input, and so on.

Windows implements its runtime APIs in a series of dynamic link library (DLL) files, with tons of routines stored inside them. Different DLL files contain different APIs. These DLL files are mostly predefined C/C++ program routines.

Most of Windows, as the end user knows it, is really just many different applications that use various Windows APIs. To a programmer, Windows is a collection of APIs waiting to be called. For example, most users interact with the Application log file using the Event Viewer program, but developers can write directly to the Application log using Windows-supplied APIs. Programmers use the RegisterEventSource function to write to the Application log.

The Windows API (also called the Win32 API) first appeared in Windows 9x (although an earlier version was introduced in Windows 3.11). The main DLL files’ functionality changes with every Windows version, and sometimes with service packs and hot fixes. Windows XP and later versions have more than 1,000 different routines that can be called and used by any Windows program or process. The three main Win32 API files are as follows:

  • Kernel32.dll contains file operations, memory management, and many other routines sought after by hackers.

  • User32.dll handles user interface, menus, timers, and so on.

  • Gdi32.dll is involved in graphical displays.

There are many more API DLLs than just these three, but Windows’s core functionality is represented by these files. For that reason, you’ll often see viruses or worms interacting with these three DLLs, particularly Kernel32.dll.

To see the larger list of Windows API files available, do a search for DLL files in the System32 directory. Other API DLL files are located in system files ending in non-DLL extensions, such as OCX (ActiveX control or COM object), DRV, and CPL files.

Microsoft also has Microsoft Foundation Classes (MFC), which are C++ API libraries for coders to use. These MFC files are located in the System32 directory and begin with MFC. Search using MFC*.* to see the related files. MFC files can be used and referenced when programming. Coders not using the MFC files consider the core Windows API files the “raw API.” Everything an MFC API is able to do can also be done with the core Windows API files, albeit the MFC files do things more elegantly at times. Many programmers don’t want to reinvent the wheel, so they use the MFC files as their core components.

Although not every file found with these names and extensions is a native Windows API (many DLL files are added by third-party programs), many are. Knowing what the different DLL API files represent can be helpful for code analysis. For instance, if you find malware using Wsock.dll, Winsock.dll, or Wsock32.dll (which are API interfaces to network connections), you can strongly suspect that the rogue code is communicating with the network. You’ll need to do more investigating to find out why the code is connecting to the network, but at least you have a start.

When a program wants to use a Windows API, it calls (declares) the DLL file that has the routine it needs, and then uses the routine along with any parameters it needs to pass. For example, the following line mimics something a malware writer might code in order to write to a file:

 WriteFile(malware.exe, $pBuffer, $lBytes, $ouBytesWritten, $pOverlapped) 

Here’s another longer Declare example with the full syntax that might appear in a program using a message dialog box:

 Declare Auto Function MBox Lib "user32.dll" Alias "MessageBox" (ByVal hWnd  As Integer, ByVal txt As String, ByVal caption As String, ByVal Typ As Integer) As  Integer 

Windows API routines can be pulled into any program and become a permanent part of the program (called static linking), or be externally called when needed (called dynamic linking). Dynamic linking makes the resulting code smaller, but will cause errors if the expected DLL files aren’t found (as when you run a new program and it generates a “DLL file not found” error). When DLLs are dynamically linked, the API is called at runtime from the appropriate DLL instead of being compiled into the program. DLLs have a standard entry point called DllMain, which is invoked when processes and threads are invoked and detached.

The following are some useful resources for learning how to use the Windows APIs:

  • If you can afford it ($699 to $2,799), a Microsoft Software Development Kit (SDK) contains Microsoft’s official documentation on the Win32 APIs. The SDK is a part of the Microsoft Developer Network (MSDN) quarterly subscription (http://msdn.microsoft.com/subscriptions). It also comes with many of Microsoft’s development languages, like Visual C++.

  • The Win32 API FAQ (http://www.iseran.com/Win32/FAQ/faq.htm) answers the most common questions about Windows APIs.

  • Developer.com Windows API Tutorial (http://www.developer.com/net/vb/article.php/1539721) is a short but instructive article on Windows API programming.

  • AllAPI (http://www.mentalis.org) lists nearly every Windows API, its syntax, and its use. Its tutorial page (http://www.mentalis.org/vbtutor/tutmain.shtml) has several excellent Windows API tutorials.

Using Third-Party APIs

Most programming languages also come with their own APIs that can be joined into an application or referenced for runtime execution. Using APIs shortens program development time and standardizes the look and feel of a program. Microsoft has Windows APIs available in its own programming languages. For example, Microsoft’s Visual C++ uses the Kernel32.lib library file to host dozens of APIs. When programming in Visual C++, you can call the various API routines from within the Kernel32.lib file. Available routines include Crypt32.lib (cryptographic functions), Mapi32.lib (messaging), Wsock32.lib (network connectivity), and dozens more.

For example, a virus written in C++ may contain the following statement:

 #include <fstream.h>      int main()      {          fstream file_op("c:\malware.exe",ios::out);          file_op<<[maliciouscodehere];          file_op.close();          return 0;      } 

C++’s fstream instruction can read and write files. In this example, the malware writer is writing a file to disk. The instructions in higher-level languages, like C++, are compiled into machine-language instructions when the executable is created. The high-level language instruction can replace the instructions of the lower Windows API or call one of the lower-language instructions. For instance, in the Windows API, the WriteFile API function (which instructs the Windows OS to write the file to disk, including how much data to write and from what memory location) can be used instead of C++’s own fstream command. C++ can call the Windows API function and use it instead of its own file-writing routines using syntax similar to this:

 int WriteFile(malware.exe, $pBuffer, $lBytes, $ouBytesWritten, $pOverlapped) 

The int keyword instructs the high-level language compiler to call the Windows WriteFile routine.

Why would a programmer use one function over the other? The choice depends on intent and flexibility. The higher-level language file-writing routine is usually sufficient to accomplish all the tasks that are needed, but not always. Occasionally, there are times when the programmer wants to do something the high-level language doesn’t support. That’s when using alower-level API or assembly language, comes in handy.

Using Assembly Language

Unlike with higher-level APIs, nothing can be taken for granted when you use assembly language. Every variable must be fed to the function. When writing a file, the file name (it is called the file handle) must be located, the file must be opened, the location in the file where the data is to be written must be explicitly directed, the number of bytes to be written must be determined, where in memory the data bytes are must be communicated, and drive status must be queried—all before a single byte of data is written. When the file is opened for writing, an assembly language program must even pass the file’s name correctly before the file can be closed.

Every group of related tasks must be checked for success or failure. If an error happens and the programmer did not write a subroutine to check for it and handle the resulting output, the program will crash. Malware writers often don’t bother with error handling in their code, so their creations are usually not very reliable across a wide range of computer platforms.

Assembly language can accomplish anything the computer is capable of with great versatility, but this double-edged sword also requires detailed and accurate programming instructions.

A very simple assembly language file-writing routine, without using any error checking, might look something like this:

 MOV BX, [filename]  MOV CX, [numberofbytes]  MOV AX, DX  MOV AH, 40h  INT 21h 

Like most compiled programs, the source code is written in a text editor program, and then compiled to its final executable form using one or more steps.

For an ease-of-use standpoint, only writing directly to the hardware can be more challenging than using assembly language.

Writing Directly to Hardware

Most legitimate programmers avoid writing directly to a PC’s hardware. Some programs might need to do this, especially when Windows doesn’t directly support the hardware, but it is highly discouraged. Some OSs, such as the Windows NT family, attempt to prevent programs from directly accessing the hardware.

Malware, on the other hand, likes to write at layers under the OS, so the OS doesn’t get in the way of its maliciousness. If malware can write directly to the disk, it can bypass Windows protection mechanisms and antivirus tools, and write to the slack area between files on the disk. Fortunately, malware writing directly to hardware devices is rare, since there is already so much other damage a hacker can cause writing code using one of the other methods.

image from book
SLACK SPACE

The slack area between files normally refers to the leftover space unwritten to by normal file systems. Every file system (FAT32, NTFS, and so on) has a cluster size (such as 4KB, 8KB, or 16KB) that is the minimum size disk storage unit to which any file can be written. All files are stored as one or more clusters. Most files don’t take up all the available space in its ending cluster, creating slack space.

Some hackers and malware programs store malicious code in the slack space, which is usually unavailable to any tools that run on the OS. Programs written in assembly language can write to the slack space.

image from book

API Enforcement

Not all OSs and platforms allow all APIs to be used. In the Windows 9x family and earlier versions, aprogram could use BIOS, DOS, Windows, third-party, hardware, or assembly language interfaces. Starting with the Windows NT family, Microsoft specifically tried to prevent regular programs from writing directly to hardware (using the BIOS interrupt routines), with limited success. Programs executing in the user’s security context (called user mode programs) cannot directly access hardware. They can do so only indirectly by using Windows APIs, which reside in the user mode part of the OS.

In the Windows NT family of OSs, calls to the Win32 and most other Windows-accessible APIs end up calling NTDLL.DLL, known as the Windows Native API. The NTDLL.DLL file interfaces with the Windows kernel and the hardware. For nearly every Win32 API, there is an NTDLL.DLL counterpart. Because the NTDLL is largely undocumented, many programmers think it contains optimized routines designed to give Microsoft an unfair advantage over outside programmers. An analysis done by Mark Russinovich (http://www.sysinternals.com/ntw2k/info/ntdll.shtml) revealed that it does contain some unique features, but nothing earth-shattering. Still, because NTDLL sits between other APIs and the Windows kernel, it has elevated access.

Windows APIs may not allow a program to do everything it wants to do. However, a kernel mode program resides in the Windows Executive layer, and it has access to the hardware and OS system files (see http://www.microsoft.com/technet/archive/ntwrkstn/evaluate/featfunc/kernelwp.mspx). A malicious kernel mode program can bypass Windows protection mechanisms and make it easier to escape detection. Malicious programs have discovered various ways to bypass Windows security mechanisms and gain kernel mode access, although doing so took them many years after the release of Windows NT.

Although a kernel mode program can use virtually any API, including the normal Windows API, a malicious program often wants to accomplish tasks the Windows API isn’t capable of performing. For this reason, many malware programs are written using assembly language. And regardless of which API layer a program uses, the resulting compiled executable can always be analyzed down at the assembly level.

Assembly Language Instructions on Computer Platforms

Assembly language works by using the machine-language commands available with each particular processor. Different processors have different processor instruction sets, features, data storage areas, and so on.

An assembly language programmer must learn which instructions are available on the processors on which they will be programming. Good assembly language programmers, and good disassemblers, must also be able to create, read, and understand what the assembly language instructions are doing when they view the data in memory. This isn’t always easy. The following are two useful resources for learning assembly language:

  • Webster’s (http://webster.cs.ucr.edu) Art of Assembly Language. This free online book (http://webster.cs.ucr.edu/AoA/index.html) is often recommended as a tutorial for first-time assembly language programmers. There are specific sections on DOS, Windows, and Linux programming.

  • Another short assembly language tutorial can be found at http://www.xs4all.nl/~smit/asm01001.htm.

Registers

Most CPUs have a series of very small memory areas for data storage and program execution called registers. All program execution and data is run through the CPU registers. What is in the registers is constantly being swapped out with programs and data stored in various random-access memory areas and other storage devices (hard drive, USB key, and so on). There are even registers to keep track of the registers and what data and programs are in the registers. No matter which computer platform you are working on, you must learn which registers are available to manipulate to write an assembly program. There is nothing that happens on a PC that doesn’t run through the registers.

The Intel 80x86 CPU family shares a common set of registers that have expanded as the processors gain speed and functionality. The 32-bit family of Intel processors have eight general-purpose registers called EAX, EBX, ECX, EDX, ESI, EDI, EBP, and ESP. Sixteen-bit programs can use a subset of those same registers named AX, BX, CX, DX, SI, DI, BP, and SP. Eight-bit programs can further break down those registers into AH (AX high four bits) and AL (AX lower four bits), BH and BL, CH and CL, and DH and DL. In reality, these are the same registers; it just depends on how many bits the program executing can use (or decides to use). The EAX register contains the AX register, the AX register contains AH and AL, and so on for the other named registers.

You can see an example of the 16-bit registers available in Windows by using the Debug.exe program. Open a command shell (Cmd.exe), type Debug, and then press Enter. At the dash prompt, type r and press Enter. The r is the register command, and it will display the values of the 16-bit registers, as shown in Figure 12-3. Figure 12-3 also shows some of the machine-language commands being executed in memory (using the u command). Type the q command and press Enter to exit Debug.

image from book
Figure 12-3: Using the Debug register command

Caution 

If you are unfamiliar with Debug.exe and its commands, do not type in any commands beyond what is instructed in this text. Doing so can cause system instability and data loss if you are not careful.

Table 12-1 describes the registers shown in Figure 12-3.

Table 12-1: 8086 Register Types and Common Functions

Register

Name

Common Functions

General-Purpose Registers

AX

Accumulator Register

General purpose; mostly used for calculations and for input/output

BX

Base Register

Index register

CX

Count Register

Used for counting loop passes

DX

Data Register

Used for multiplying and dividing

Segment Registers

CS

Code Segment

Points to the active code segment

DS

Data Segment

Points to the active data segment

SS

Stack Segment

Points to the active stack segment

ES

Extra Segment

Points to the active extra segment

Pointer Registers

IP

Instruction Pointer

Points to memory offset of the next instruction to be executed

SP

Stack Pointer

Memory offset to where the stack is located

BP

Base Pointer

Used to pass data to and from the stack

Index Registers

SI

Source Index

Used by string operations as the source

DI

Destination Index

Used by string operations as the destination

Machine instructions are constantly moving data and memory information in and out of the registers. The IP (Instruction Pointer) register is a particularly interesting register for malicious hackers. Buffer overflows cause program crashes, which then throw rogue code into memory. If the overflow can overwrite the IP register with the memory location of the rogue code, the rogue code will be executed next.

Machine/Assembly Language Instructions

The directives manipulating the registers are in machine language, which assembly language most closely resembles. Every CPU has a core set of machine instructions that it supports. Every assembly language program also has a core set of instructions that approximate and map to the processor. The 80x86 CPU family has more than 100 machine instructions (some resources say over a 1,000, but they are defining them at a more granular level), although most programmers use less than 50. Table 12-2 shows some common machine instructions.

Table 12-2: Common 80x86 Instructions

Instruction

Description

MOV

Copies data from one register or memory location to another

ADD

Adds two registers or values together

SUB

Subtracts two registers or values against each other

POP

Puts a piece of data to a register (from the stack)

PUSH

Stores a piece of data from register (to the stack)

JMP

Jumps code execution to another instruction

Most programs use a few dozen different instructions. (The MOV instruction is probably used in a quarter to a half of all assembly language instructions.)

Every application you can think of, no matter how simple or sophisticated, works by using machine instructions to move data into and out of register locations. Every Window dialog box, every prompt, every sound, and every database query can happen only because of hundreds to hundreds of thousands of instructions running every second in the background.

The stack is a temporary memory location for storing data and program instructions. Machine language instructions are constantly popping (getting) and pushing (putting) information from and to the stack. Stacks are often described as a stack of electronic plates, in a last-in/first-out pathway. The last bit of information pushed to the stack is the first bit of information popped off the stack and back into a register. Stack-based buffer overflows will try to overwrite the stack pointer (and have it point to the rogue code in memory) or overwrite the stack so the original legitimate stack pointer now points to malicious code. If a buffer overflow can reliably predict where in memory the buffer overflow will place data, it means the exploit can be crafted to take over complete control of the computer versus just performing a temporary DoS attack.

Portable Executables

Windows 32-bit executables are also known as Portable Executables, or PE files. PE files have a somewhat predictable structure that a disassembler should know. Besides the header and setup information, each PE file contains one or more segments, as listed in Table 12-3.

Table 12-3: PE File Segments

Segment

Description

.code (or .text)

Setup information, such as the program entry point (any dynamically linked APIs and imported code are declared using the EXTERN directive)

.data

Program data that should be initialized, like local variables, when the application starts (this is usually not data viewed by the end user, but rather data used by the executable itself)

.udata (or .bss)

Uninitialized data

.rdata

Read-only data that cannot normally be modified

.rsrc

Resources

.edata

Code available to be exported

.idata

Imported code

.reloc

Information the PE file cannot load to the same base memory address as the program file (either because of memory limitations or some other issue)

When reviewing code or disassembling programs, you will often see these code segment references. Excellent tutorials on PE files and their structure can be found at http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dndebug/html/msdn_peeringpe.asp and http://www.deinmeister.de/w32asm5e.htm.

progress indicator progress indicatorprogress indicator progress indicator


Honeypots for Windows
Honeypots for Windows (Books for Professionals by Professionals)
ISBN: 1590593359
EAN: 2147483647
Year: 2006
Pages: 119

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net