ProblemYou need a quick list from which to choose regular expression patterns that match standard items. These standard items could be a Social Security Number, a zip code, a word containing only characters , an alphanumeric word, an email address, a URL, dates, or one of many other possible items used throughout business applications. These patterns can be useful in making sure that a user has input the correct data and that it is well- formed . These patterns can also be used as an extra security measure to keep hackers from attempting to break your code by entering strange or malformed input (e.g., SQL injection or cross-site-scripting attacks). Note that these regular expressions are not a silver bullet that will stop all attacks on your system; rather, they are an added layer of defense. Solution
DiscussionRegular expressions are effective at finding specific information, and they have a wide range of uses. Many applications use them to locate specific information within a larger range of text, as well as to filter out bad input. The filtering action is very useful in tightening the security of an application and preventing an attacker from attempting to use carefully formed input to gain access to a machine on the Internet or a local network. By using a regular expression to allow only good input to be passed to the application, you can reduce the likelihood of many types of attacks, such as SQL injection or cross-site-scripting. The regular expressions presented in this recipe only provide a minute cross-section of what can be accomplished with them. By taking these expressions and manipulating parts of them, you can easily modify them to work with your application. Take, for example, the following expression which allows only between 1 and 10 alphanumeric characters, along with a few symbols to be allowed as input: ^([\w\.+-]\s){1,10}$ By changing the {1,10} part of the regular expression to {0,200} , this expression will now match a blank entry or an entry of the specified symbols up to and including 200 characters. Note the use of the ^ character at the beginning of the expression and the $ character at the end of the expression. These characters start the match at the beginning of the text and match all the way to the end of the text. Adding these characters forces the regular expression to match the entire string or none of it. By removing these characters, you can search for specific text within a larger block of text. For example, the following regular expression matches only a string containing nothing but a U.S. zip code (there can be no leading or trailing spaces): ^\d{5}(-\d{4})?$ This version matches only a zip code with leading or trailing spaces (notice the addition of the \s* to the start and end of the expression): ^\s*\d{5}(-\d{4})?\s*$ However, this modified expression matches a zip code found anywhere within a string (including a string containing just a zip code): \d{5}(-\d{4})? Use the regular expressions in this recipe and modify them to suit your needs. See AlsoTwo good books that cover regular expressions are Regular Expression Pocket Reference by Tony Stubblebine (O'Reilly) and Mastering Regular Expressions , Second Edition, by Jeffrey Friedl (O'Reilly). |