| ||
There are two steps on the road to XSS redemption:
Restrict the input to valid input only. Most likely you will use regular expressions for this.
HTML encode the output.
You really should do both steps in your code; the following code examples outline how to perform one or both steps.
Calling code like the code below prior to writing data out to the browser will encode the output.
/////////////////////////////////////////////////////////////////// // HtmlEncode // Converts a raw HTML stream to an HTML-encoded version // Args // strRaw: Pointer to the HTML data // result: A reference to the result, held in std::string // Returns // false: failed to encode all HTML data // true: encoded all HTML data bool HtmlEncode(char *strRaw, std::string &result) { size_t iLen = 0; size_t i = 0; if (strRaw && (iLen=strlen(strRaw))) { for (i=0; i < iLen; i++) switch(strRaw[i]) { case '/////////////////////////////////////////////////////////////////// // HtmlEncode // Converts a raw HTML stream to an HTML-encoded version // Args // strRaw: Pointer to the HTML data // result: A reference to the result, held in std::string // Returns // false: failed to encode all HTML data // true: encoded all HTML data bool HtmlEncode (char *strRaw, std::string &result) { size_t iLen = 0; size_t i = 0; if (strRaw && (iLen=strlen(strRaw))) { for (i=0; i < iLen; i++) switch(strRaw[i]) { case '\0' : break; case '<' : result.append("<"); break; case '>' : result.append(">"); break; case '(' : result.append("("); break; case ')' : result.append(")"); break; case '#' : result.append("#"); break; case '&' : result.append("&"); break; case '"' : result.append("""); break; default : result.append(1,strRaw[i]); break; } } return i == iLen ? true : false; }' : break; case '<' : result.append("<"); break; case '>' : result.append(">"); break; case '(' : result.append("("); break; case ')' : result.append(")"); break; case '#' : result.append("#"); break; case '&' : result.append("&"); break; case '"' : result.append("""); break; default : result.append(1,strRaw[i]); break; } } return i == iLen ? true : false; }
If you want to use regular expressions in C/C++, you should either use Microsofts CAtlRegExp class or Boost.Regex explained at http://boost.org/libs/regex/doc/ syntax.html.
Use a combination of regular expressions (in this case, the VBScript RegExp object, but calling it from JavaScript) and HTML encoding to sanitize the incoming HTML data:
<% name = Request.Querystring("Name") Set r = new RegExp r.Pattern = "^\w{5,25}$" r.IgnoreCase = True Set m = r.Execute(name) If (len(m(0)) > 0) Then Response.Write(Server.HTMLEncode(name)) End If %>
This code is similar to the above example, but it uses the .NET Framework libraries and C# to perform the regular expression and HTML encoding.
using System.Web; // Make sure you add the System.Web.dll assembly ... private void btnSubmit_Click(object sender, System.EventArgs e) { Regex r = new Regex(@"^\w{5,25}"); if (r.Match(txtValue.Text).Success) { Application.Lock(); Application[txtName.Text] = txtValue.Text Application.UnLock(); lblName.Text = "Hello, " + HttpUtility.HtmlEncode(txtName.Text); } else { lblName.Text = "Who are you?"; }
In JSP, you would probably use a custom tag. This is the code to an HTML encoder tag:
import java.io.IOException; import javax.servlet.jsp.JspException; import javax.servlet.jsp.tagext.BodyTagSupport; public class HtmlEncoderTag extends BodyTagSupport { public HtmlEncoderTag() { super(); } public int doAfterBody() throws JspException { if(bodyContent != null) { System.out.println(bodyContent.getString()); String contents = bodyContent.getString(); String regExp = new String("^\w{5,25}$"); // Do a regex to find the good stuff if (contents.matches(regExp)) { try { bodyContent.getEnclosingWriter(). write(contents); } catch (IOException e) { System.out.println("Io Error"); } return EVAL_BODY_INCLUDE; } else { try { bodyContent.getEnclosingWriter(). write(encode(contents)); } catch (IOException e) { System.out.println("Io Error"); } System.out.println("Content: " + contents.toString()); return EVAL_BODY_INCLUDE; } } else { return EVAL_BODY_INCLUDE; } } // JSP has no HTML encode function public static String encode(String str) { if (str == null) return null; StringBuffer s = new StringBuffer(); for (short i = 0; i < str.length(); i++) { char c = str.charAt(i); switch (c) { case '<': s.append("<"); break; case '>': s.append(">"); break; case '(': s.append("("); break; case ')': s.append(")"); break; case '#': s.append("#"); break; case '&': s.append("&"); break; case '"': s.append("""); break; default: s.append(c); } } return s.toString(); } }
And finally, here is some sample JSP that calls the tag code defined above:
<%@ taglib uri="/tags/htmlencoder" prefix="htmlencoder"%> <head> <title>Watch out you sinners...</title> </head> <html> <body bgcolor="white"> <htmlencoder:htmlencode><script type="javascript">BadStuff()</script></htmlencoder:htmlencode> <htmlencoder:htmlencode>testin</htmlencoder:htmlencode> <script type="badStuffNotWrapped()"></script> </body> </html>
Just like in the earlier examples, youre applying both remedies, checking validity, and then HTML encoding the output using htmlentitities():
<?php $name=$_GET['name']; if (isset($name)) { if (preg_match('/^\w{5,25}$/',$name)) { echo "Hello, " . htmlentities($name); } else { echo "Go away!"; } } ?>
This is the same idea as in the previous code samples: restrict the input using a regular expression, and then HTML encoding the output.
#!/usr/bin/perl use CGI; use HTML::Entities; use strict; my $cgi = new CGI; print CGI::header(); my $name = $cgi->param('name'); if ($name =~ /^\w{5,25}$/) { print "Hello, " . HTML::Entities::encode($name); } else { print "Go away!"; }
If you dont want to load, or cannot load, HTML::Entites, you could use the following code to achieve the same task:
sub html_encode my $in = shift; $in =~ s/&/&/g; $in =~ s/</</g; $in =~ s/>/>/g; $in =~ s/\"/"/g; $in =~ s/#/#/g; $in =~ s/\(/(/g; $in =~ s/\)/)/g; return $in; }
Like all the code above, this example checks that the input is valid and well formed , and if it is, encodes the output.
#!/usr/bin/perl use Apache::Util; use Apache::Request; use strict; my $apr = Apache::Request->new(Apache->request); my $name = $apr->param('name'); $apr->content_type('text/html'); $apr->send_http_header; if ($name =~ /^\w{5,25}$/) { $apr->print("Hello, " . Apache::Util::html_encode($name)); } else { $apr->print("Go away!"); }
Simply HTML encoding all output is a little draconian for some web sites, because some tags, such as <I> and <B>, are harmless. To temper things a little consider unencoding known safe constructs. The following C# code shows an example of what the author means, as it un-HTML encodes italic, bold, paragraph, emphasis, and heading tags.
Regex.Replace(s, @"<(/?)(ibpemh\d{1})>", "<>", RegexOptions.IgnoreCase);