Tenet 1: Small is beautiful | Linux and the Unix Philosophy

2.1 Tenet 1: Small is beautiful

If you're going to write a program, start small and keep it small. Whether you're crafting a simple filter tool, a graphics package, or a gargantuan database, work to reduce it to the tiniest piece of software practicable. Resist the temptation to turn it into a monolith. Strive for simplicity.

Traditional programmers often harbor a secret desire to write the Great American Program. When they embark on a development project, it seems as though they want to spend weeks, months, or even years trying to solve all of the world's problems with one program. Not only is this costly from a business standpoint, it ignores reality. In the real world, few problems exist that cannot be surmounted using small solutions. We choose to implement such massive solutions because we don't fully understand the problem.

The science fiction writer Theodore Sturgeon once wrote "90 percent of science fiction is crud. But then 90 percent of everything is crud." The same applies to most traditional software. A large portion of the code in any program is devoted to something other than actually performing its stated task.

Skeptical? Let's look at an example. Suppose you wanted to write a program to copy file A to file B. These are some steps that a typical file copy program might perform:

Query the user for the name of the source file.
Check whether the source file exists.
If the source file doesn't exist, notify the user.
Query the user for the name of the destination file.
Check whether the destination file exists.
If the destination file exists, ask the user if he wants to replace it.
Open the source file.
Inform the user if the source file is empty. If necessary, exit.
Open the destination file.
Copy the data from the source file to the destination file.
Close the source file.
Close the destination file.

Note that in step 10 the file is copied. The other steps perform functions that, although necessary, have little to do with copying the file. Under closer scrutiny, you'll find that the other steps can generally be applied to many other tasks besides file copying. They happen to be used here, but they're not really part of the task.

A good Unix program should provide capabilities similar to step 10 and little else. Carrying this notion further, a program strictly following the Unix philosophy would expect to have been given valid source and destination file names at invocation. It would be solely responsible for copying the data. Obviously, if all the program had to do were copy the data, it would be a very small program indeed.

This still leaves us with the question of where the valid source and destination file names come from. The answer is simple: from other small programs. These other programs perform the functions of obtaining a file name, checking whether the file exists, and determining whether it contains more than zero bytes of data.

"Now wait a minute," you may be thinking. Are we saying that Unix contains programs that only check whether a file exists? In a word, yes. The standard Unix distribution comes with hundreds of small commands and utility programs that by themselves do little. Some, such as the test command, perform apparently mundane functions like determining a file's readability or equivalent. If that doesn't sound very important, realize that the test command is one of the most heavily used Unix commands.^[1]

By themselves, small programs don't do very much. They often perform one or two functions and little else. Combine them, however, and you begin to experience real power. The whole becomes greater than the sum of the parts. Large, complex tasks can be handled with ease. You can write new applications by simply entering them on the command line.

^[1]Some Unix/Linux shells (command interpreters), such as bash, have made the test command part of the shell itself, eliminating the need to invoke a new process to run the command, thereby reducing overhead. The downside to this is that if you keep adding commands to the shell itself, eventually the shell grows to the point where nonshell commands become costly to execute because of the way Unix/Linux creates new processes. It may be better to rely on the fact that frequently used commands are typically already sitting in the kernel's buffer cache, so obtaining them from the disk, which would be prohibitively expensive timewise, would not be necessary.