The Sin Explained | Writing Secure Code

Command injection problems occur when untrusted data is placed into data that is passed to some sort of compiler or interpreter, where the data might, if its formatted in a particular way, be treated as something other than data.

The canonical example for this problem has always been API calls that directly call the system command interpreter without any validation. For example, the old IRIX login screen (mentioned previously) was doing something along the lines of:

 char buf[1024]; snprintf(buf, "system lpr -P %s", user_input, sizeof(buf)-1); system(buf);

In this case, the user was unprivileged , since it could be absolutely anyone wandering by a workstation. Yet, simply by typing the text: FRED; xterm & , a terminal would pop up, because the ; would end the original command in the system shell; and the xterm command would create a whole new terminal window ready for commands, with the & telling the system to run the process without blocking the current process. (In the Windows shell, the ampersand metacharacter acts the same as a semicolon on a UNIX box.) And, since the login process had administrative privileges, the terminal it created would also have administrative privileges!

There are plenty of functions across many languages that are susceptible to such attacks, as youll see later. But, a command injection attack doesnt require a function that calls to a system shell. For example, an attacker might be able to leverage a call to a language interpreter. This is pretty popular in high-level languages such as Perl and Python. For example, consider the following Python code:

 def call_func(user_input, system_data):  exec 'special_function_%s("%s")' % (system_data, user_input)

In the preceding code, the Python % operator acts much like *printf specifiers in C. They match up values in the parentheses with %s values in the string. As a result, this code is intended to call a function chosen by the system, passing it the argument from the user. For example, if system_data were sample and user_input were fred , Python would run the code:

 special_function_sample("fred")

And, this code would run in the same scope that the exec statement is in.

Attackers who control user_input can execute any Python code they want with that process, simply by adding a quote, followed by a right parenthesis and a semicolon. For example, the attacker could try the string:

 fred"); print ("foo

This will cause the function to run the following code:

 special_function_sample("fred"); print ("foo")

This will not only do what the programmer intended, but will also print foo . Attackers can literally do anything here, including erase files with the privileges of the program, or even make network connections. If this flexibility gives attackers access to more privileges than they otherwise had, this is a security problem.

Many of these problems occur when control constructs and data are juxtaposed, and attackers can use a special character to change the context back to control constructs. In the case of command shells , there are numerous magical characters that can do this. For example, on most UNIX-like machines, if the attackers were to add a semicolon (which ends a statement), backtick (data between backticks gets executed as code), or a vertical bar (everything after the bar is treated as another, related process), they could run arbitrary commands. There are other special characters that can change the context from data to control; these are just the most obvious.

One common technique for mitigating problems with running commands is to use an API to call the command directly, without going through a shell. For example, on a UNIX box, theres the execv() family of functions, which skips the shell and calls the program directly, giving the arguments as strings.

This is a good thing, but it doesnt always solve the problem, particularly because the spawned program itself might put data right next to important control constructs. For example, calling execv() on a Python program that then passes the argument list to an exec would be bad. We have even seen cases where people execv()d /bin/sh (the command shell), which totally misses the point.

Related Sins

A few of the sins can be viewed as specific kinds of command injection problems. SQL injection is clearly a specific kind of command injection attack. Format string problems can be seen as a kind of command injection problem, too. This is because the attacker takes a value that the programmer expected to be data, and then inserts read and write commands (for example, the %n specifier is a write command). Those particular cases are so common that weve treated them separately.

This is also the core problem in cross-site scripting, where attackers can chose data that looks like particular web control elements if that data is not properly validated .