HTML is simply plain text, like <b>, which is given special meaning by Web browsers (as by making text bold). Because of this fact, your Web site's user could easily add HTML or JavaScript to their form data, like the comments field in the previous example (Figure 10.8). What's wrong with that, you might ask? Figure 10.8. The malicious and savvy user can enter HTML, CSS, and JavaScript into text inputs. Many dynamically driven Web applications take the information submitted by a user, store it in a database, and then redisplay that information on another page. Think of a forum, as just one example. At the very least, if a user enters HTML code in their data, such code could throw off the layout and aesthetic of your site. Worse yet, bad code could create pop-up windows (Figure 10.9) or redirections to other sites. In the worst-case scenario, HTML and JavaScript could be used for what's called cross-site scripting (XSS), a common type of hack. Figure 10.9. The JavaScript entered into the comments field (see Figure 10.8) would create this alert window when the comments were displayed in the Web browser. PHP includes a handful of functions for handling HTML and other code found within strings. These include: htmlspecialchars(), which turns &, ', ", <, and > into an HTML entity format (&, ", etc.) htmlentities(), which turns all applicable characters into their HTML entity format strip_tags(), which removes all HTML and PHP tags These three functions are roughly listed in order from least disruptive to most. Which you'll want to use depends upon the application at hand. To demonstrate this concept, I'll apply the htmlspecialchars() function to the submitted name and comments data. To handle HTML in form submissions 1. | Open handle_comments.php (Script 10.3) in your text editor or IDE.
| 2. | Change the assignment of the $n variable (Script 10.4).
$n = escape_data(htmlspecialchars ($_POST['name])); Script 10.4. Calls to the htmlspecialchars() function help to sanctify the submitted name and comments values. Presumably the name data would be something that might be displayed in a Web application (as on a view_comments.php page). To keep submitted information from messing up such a page, it's run through the htmlspecialchars() function prior to escape_data(). So, any double quotation mark in $_POST['name'] will be turned into ", and < and > will become < and > respectively. Then the escape_data() function will do its thing, escaping problematic characters like the single quotation mark.
| 3. | Repeat the change for the comments field.
$c = escape_data(htmlspecialchars ($_POST['comments])); As you've already seen (Figures 10.8 and 10.9), the comments field could be used maliciously, so its value should be run through htmlspecialchars() as well.
| 4. | Alter the thank-you message so that it also prints out the safe version of the user's comments.
echo '<p>Thank you for your comments: <br />' . stripslashes($c) . '</p>'; To demonstrate how the htmlspecialchars() function affects a string, I'll print out the value of $c. But since $c is also derived by calling the escape_data() function, I want to strip any slashes from it first.
| 5. | Save the page as handle_comments.php, upload to your Web server, and test in your Web browser (Figure 10.10).
Figure 10.10. Thanks to the htmlspecialchars() function, malicious code entered into the comments field (see Figure 10.8) is rendered inert.
| Tips Both htmlspecialchars() and htmlentities() take an optional parameter indicating how quotation marks should be handled. See the PHP manual for specifics. The strip_tags() function takes an optional parameter indicating what tags should not be stripped. $var = strip_tags ($var, '<p><br />');
The strip_tags() function will remove even invalid HTML tags, which may cause problems. For example, strip_tags() will yank out all of the code it thinks is an HTML tag, even if it's improperly formed, like <b I forgot to close the tag. Unrelated to security but quite useful is the nl2br() function. It turns every return (such as those entered into a text area) into an HTML <br /> tag. If applied in this example, you wouldn't see the rn after </b> in Figure 10.10 (the rn is the stripped version of \r\n, which was created by the Return after </b> in Figure 10.8).
|