UNIX 101 | The Art of Software Security Assessment: Identifying and Preventing Software Vulnerabilities

The UNIX family of operating systems has been around for a long time (in computing terms) and undergone many variations and changes. Ken Thompson developed the first incarnation of UNIX in 1969. His employer, Bell Labs, had just withdrawn from a joint venture to develop the Multiplexed Information and Computing Service (Multics) system: a large-scale, ambitious project to create a time-sharing system. The design turned out to be unwieldy and mired in complexity, however. Bell Labs worked on the project for four years but then withdrew, as it was still far from completion with no end in sight.

Ken Thompson then decided to write his own operating system, and he took a completely different approach. He focused on simplicity and pragmatic compromise, and he designed and implemented the system in an incremental fashion, one piece at a time. Over time, he would periodically implement a new tool or new subsystem and synthesize it into the existing code. Eventually, it shaped up to form a real operating system, and UNIX was born.

Note

The name UNIX is actually a play on the name Multics. There are a few funny explanations of the genesis of the name. One amusing quote is "UNIX is just one of whatever it was that Multics had lots of." There's the obligatory "UNIX is Multics without balls." There's also a commonly repeated anecdote that UNIX was originally spelled Unics, which stood for the slightly non sequitur Uniplexed Information and Computing Service. Comedy gold.

UNIX systems generally feature simple and straightforward interfaces between small, concise modules. As you'll see, the file abstraction is used heavily throughout the system to access just about everything. At the core of a UNIX system is the kernel, which manages system devices, performs process maintenance and scheduling, and shares system resources among multiple processes. The userland portion of a UNIX system is typically composed of hundreds of programs that work in concert to provide a robust user interface. UNIX programs are typically small and designed around simple, easily accessible text-based interfaces. This tool-oriented approach to system design is often referred to as the "UNIX design philosophy," which can be summed up as "Write simple tools that do only one thing and do that one thing well, and make them easily interoperable with other tools."

The following sections explain the basics of a typical UNIX system, and then you jump into the details of privilege management.

Users and Groups

Every user in a UNIX system has a unique numeric user ID (UID). UNIX configurations typically have a user account for each real-life person who uses the machine as well as several auxiliary UIDs that facilitate the system's supporting functionality. These UIDs are used by the kernel to decide what privileges a given user has on the system, and what resources they may access. UID 0 is reserved for the superuser, which is a special user who, in essence, has total control of the system. The superuser account is typically given the name "root."

UNIX also has the concept of groups, which are used for defining a set of related users that need to share resources such as files, devices, and programs. Groups are also identified with a unique numeric ID, known as a group ID (GID). GIDs assist the kernel in access control decisions, as you will see throughout this chapter. Each user can belong to multiple groups. One is the user's primary group, or login group, and the remaining groups are the user's supplemental groups, or secondary groups.

The users of a system are typically defined in the password file, /etc/passwd, which can be read by every local user on the system. There's usually also a corresponding shadow password file that can be read only by the superuser; it contains hashes of user passwords for authentication. Different UNIX implementations store this information in different files and directories, but there's a common programmatic interface to access it.

The password file is a line-based database file that records some basic details about each user on the system, delimited by the colon character. An entry in the password file has the following format:

bob:x:301:301:Bobward James Smithington:/home/bob:/bin/bash

The first field contains a username that identifies the user on the system. The next field traditionally contained a one-way hash of the user's password. However, on contemporary systems, this field usually just has a placeholder and the real password hash is stored in the shadow password database. The next two fields indicate the user's UID and primary GID, respectively. Supplemental groups for users are typically defined in the group file, /etc/group. The next field, known as the GECOS field, is a textual representation of the user's full name. It can also contain additional information about the user such as their address or phone number.

Note

GECOS actually stands for "General Electric Comprehensive Operating System," which was an old OS originally implemented by General Electric, and shortly renamed thereafter to GCOS. The GECOS field in the password file was added in early UNIX systems to contain ID information needed to use services exposed by GCOS systems. For a more detailed history of GECOS, consult the wikipedia entry at http://en.wikipedia.org/wiki/GECOS.

Each user also has a home directory defined in the password file (/home/bob in this case), which is usually a directory that's totally under the user's control. Finally, each user also has a default shell, which is the command-line interface program that runs when the user logs in.

Files and Directories

Files are an important part of any computer system, and UNIX-based ones are no exception. The kernel provides a simple interface for interacting with a file, which allows a program to read, write, and move around to different locations in the file. UNIX uses this file abstraction to represent other objects on the system as well, so the same interface can be used to access other system resources. For example, a pipe between programs, a device driver, and a network connection all can be accessed through the file-based interface exposed by the kernel.

On a UNIX system, files are organized into a unified hierarchical structure. At the top of the hierarchy is the root directory (named /). Files are uniquely identified by their name and location in the file system. A location, or pathname, is composed of a series of directory names separated by the slash (/) character. For example, if you have an internetd.c file stored in the str directory, and the str directory is a subdirectory of /home, the full pathname for the file is /home/str/internetd.c.

A typical UNIX system has a number of directories that are set up by default according to certain historical conventions. The exact directory structure can vary slightly from system to system, but most directory structures approximate the Filesystem Hierarchy Standard (available, along with bonus Enya lyrics, at www.pathname.com/). A standard UNIX system includes the following directories:

/etc This directory usually contains configuration files used by various subsystems. Among other things, the system password database is located in this directory. If it's not there, it's somewhere strange, such as /tcb.
/home Home directories for users on the system to store their personal files and applications are typically located here. Sometimes home directories are stored at a different location, such as /usr/home.
/bin This directory contains executables ("binaries," hence the directory name) that are part of the OS. They are usually the files needed to operate the system in single-user mode before mounting the /usr file system. The rest of the OS binaries are usually in /usr/bin.
/sbin This directory contains executables intended for use by superusers. Again, /sbin contains the core utilities useful for managing a system in single-user mode, and /usr/sbin contains the rest of the administrative programs.
/var This directory is used primarily to keep files that change as programs are running. Log files, data stores, and temporary files are often stored under this directory.

Although the visible hierarchy appears to users to be a single file system, it might in fact be composed of several file systems, which are grafted together through the use of mount points. Mount points are simply empty directories in the file system that a new file system can be attached to. For example, the /mnt/cdrom directory could be reserved for use when mounting a CD. If no CD is mounted, it's a normal directory. After the CD is mounted, you can access the file system on the CD through that directory. So you could view the test.txt file in the CD's root directory by accessing the /mnt/cdrom/test.txt file. Each file system that's mounted has a corresponding kernel driver responsible for managing file properties and data on the storage media, and providing access to files located on the file system. Typically, a file system module handles access to files on a partition of a physical disk, but plenty of virtual file systems also exist, which do things such as encapsulate network resources or RAM disks.

Every file on the system belongs to a single user and a single group; it has a numeric user ID (UID) indicating its owner and a numeric group ID (GID) indicating its owning group. Each file also has a simple set of permissions, a fixed-size bit mask that indicates which actions are permissible for various classes of users. File permissions are covered in "File Security" later in this chapter.

Processes

A program is an executable file residing on the file system. A process is an instance of a program running on a system. A process has its own virtual memory environment that is isolated from all other processes on the system. Most modern UNIX systems also provide mechanisms for multiple execution flows to share the same address space to support threaded programming models.

Each process on a UNIX system has a unique process ID (PID), and runs with the privileges of a particular user, known as its effective user. The privileges associated with that user determines which resources and files the process has access to. Usually, the effective user is simply the user that runs the application. In certain situations, however, processes can change who they're running as by switching to an effective user with different privileges, thus expanding or reducing their current access capabilities to system resources.

When the UNIX kernel checks to see whether a process has permission to perform a requested action, it usually does a simple test before examining the relevant user and group permissions: If the process is running as the superuser, the action is categorically allowed. This makes the superuser a special entity in UNIX; it's the one account that has unfettered access to the system. Several actions can be performed only by the superuser, such as mounting and unmounting disks or rebooting the system (although systems can be configured to allow normal users to perform these tasks as well).

In some situations, a normal user needs to perform actions that require special privileges. UNIX allows certain programs to be marked as set-user-id (setuid), which means they run with the privileges of the user who actually owns the program file, as opposed to running with the privileges of the user who starts the application. So, if a program is owned by root, and the permissions indicate that it's a setuid file, the program runs as the superuser regardless of who invokes it. There's a similar mechanism for groups called set-group-id (setgid), which allows a program to run as a member of a specific group.