Chapter 10: All Input Is Evil | Writing Secure Code, Second Edition

Chapter 10

All Input Is Evil!

If someone you didn't know came to your door and offered you something to eat, would you eat it? No, of course you wouldn't. So why do so many applications accept data from strangers without first evaluating it? It's safe to say that most security exploits involve the target application incorrectly checking the incoming data or in some cases not at all. So let me be clear about this: you should not trust data until the data is validated. Failure to do so will render your application vulnerable. Or, put another way: all input is evil until proven otherwise. That's rule number one. Typically, the moment you forget this rule is the moment you are attacked.

Rule number two is: data must be validated as it crosses the boundary between untrusted and trusted environments. By definition, trusted data is data you or an entity you explicitly trust has complete control over; untrusted data refers to everything else. In short, any data submitted by a user is initially untrusted data. The reason I bring this up is many developers balk at checking input because they are positive that the data is checked by some other function that eventually calls their application and they don't want to take the performance hit of validating the data more than once. But what happens if the input comes from a source that is not checked or the code you depend on is changed because it assumes some other code performs a validity check? And here's a somewhat related question. What happens if an honest user simply makes an input mistake that causes your application to fail? Keep this in mind when I discuss some potential vulnerabilities and exploits.

I once reviewed a security product that had a security flaw because a small chance existed that invalid user input would cause a buffer overrun and stop the product's Web service. The development team claimed that it could not check all the input because of potential performance problems. On closer examination, I found that not only was the application a critical network component and hence the potential damage from an exploit was immense but also it performed many time-intensive and CPU-intensive operations, including public-key encryption, heavy disk I/O, and authentication. I doubted much that a half dozen lines of input-checking code would lead to a performance problem, especially because the code was not called often. As it turned out, the code did indeed cause no performance problems, and the code was rectified. Performance is rarely a problem when checking user input. Even if it is, no system is less reliably responsive than a hacked system.

IMPORTANT
It's difficult to find a system less reliably responsive than a hacked system!

Hopefully, by now, you understand that all input is suspicious until proven otherwise, and your application should validate direct user input before it uses it. The purpose of this chapter is to serve as an introduction to the next four chapters, which outline canonical representation issues, database and Web-specific input issues, and internationalization issues.

Let's now look at some high-level strategies for handling hostile input.

More Info
If you still don't believe all input should be treated as unclean, I suggest you randomly choose any ten past vulnerabilities. You'll find that in the majority of cases the exploit relies on malicious input. I guarantee it!