A regular expression defines a pattern of character strings. Once the pattern is defined, you can use functions in PHP, methods in ASP, and commands in other programming systems to determine if a given string matches the pattern.
Regular expressions are made up of constants, classes of characters, and notation for defining logical combinations, groupings, and position. The fact that you can use the notation to build regular expressions out of component parts is what gives this technology its power.
An example of a regular expression is any string of characters. The following is a regular expression:
As given, this pattern will match any string that contains the characters “bird” anywhere in the string.
You can combine two regular expressions to produce a third expression that matches on either of the constituent parts.
will match any string that contains either ‘bird’ or ‘frog’ or both.
For any pattern, you can specify a new pattern made up by repeating the base pattern any number of times, including no times at all.
will match a string that contains a ‘c’ followed by any number of a’s followed by a ‘t’. So, the following patterns would match:
ct cat caat caaaaaaaaaaaaaaaat
The following patterns would not match:
There are other combination notations. A plus stands for one or more repetitions.
This would not match ct, but would match cat, caat, and so on.
A question mark stands for zero or one repetition. To specify an exact range of possible repetitions, you use curly brackets:
would match on:
catcat catcatcat catcatcatcat catcatcatcatcat
You can specify that the pattern must start at the start of the string by using the caret symbol:
would match the following strings:
cat cat and not dog cat cat cat
but would not match:
The dog chased the cat.
Similarly, you can specify that the pattern must be at the end of the string by using the dollar sign:
The dog chased the cat
If you needed to check for an exact match, you would use:
To check that the string contained only dog or cat, use the regular expression:
Fixing the pattern to start or end the string is called anchoring.
You can specify a set of characters using square brackets.
specifies one of the 10 numerals.
specifies the lowercase letters.
specifies all letters.
A period specifies exactly one character. So the string:
would match any string containing the character c, followed by exactly one character, followed by a t. These include:
cat cot cit c2t
You can combine all these features. To specify any number of letters, you would use:
You also can specify that a character not be in a set by using the caret symbol inside the square brackets:
stands for any character that is not a lowercase letter.
You can combine patterns sequentially:
matches a string with one or more instances of the string cat followed by one or more instances of the string dog.
The particular implementation of regular expressions can have classes of characters predefined for you.
What do you do if you need to match one of the special symbols such as period or dollar sign? The answer is to put a slash in front of the special symbol. If you need to use a slash, you must use two slashes.
In addition to determining if a regular expression has a match within a given string, you might want to replace the matched parts with other strings. For example, you construct the regular expression to represent common misspellings and then replace the mistakes with the proper spelling. Most implementations of regular expressions include facilities for replacement. The replacements can be complex: replacing certain groups in the pattern but not others.
While it is important, even critical, that scripts check data submitted by users, it is wrong to focus on checking as the only way to ensure valid input. The design of the interface should make it easy for people to do the right thing and difficult to do the wrong thing before their input reaches the validation code. Consider the task of asking customers or clients to enter a date. One approach would be to present a text field in a form. However, a better approach, and the one that appears to be prevalent on the Web today, is to present three pull-down menus with the full names of the months, numbers for the day of the month, and the years that are appropriate for the question. This makes special sense for a global audience, since people in the United States use a month/day/year format, and people in many other places use day/month/year or year/month/day. It also makes sense to force a person to pick this year or later if the date is to be something like a credit card expiration. Restricting what your user can do is a good tactic. In the examples given for adding a warning to the database for a favorite show, the interface required clicking on the name of the show, not typing a name into a text field. Of course, if the list of favorites was very long, scrolling through all the shows might take too much time. However, you still would want to try to design something that does not require people to type an unrestricted entry.
One fault that you might want to avoid in your validation code is to stop the validation as soon as you find the first error. It might not always be possible, but you should try to tell the person doing input all the problems all at once. The method shown in the Examples section uses what can be called “okay so far” coding. The code defines a variable to be true. After each check, if the input fails, this variable is set to be false and a message is output to the HTML document describing the specific error. The variable might be set to false several times. After all the checks, if the variable is still true, then processing can continue.
It is as important to test the validation code as any other code in your application. This means that you need to submit data with errors and, moreover, combinations of good and bad input.
The PHP system has several regular expression functions. The ereg and eregi functions return true or false, depending on whether a pattern is found in a string or not. The difference between the two is that eregi is not case-sensitive. If:
$source = "cats and dogs"; $pattern = "dog"; $pattern1="^dog";
would return true, but:
would return false since the string dog is not at the start of the tested string. If:
$source1 = "Cats and Dogs";
would return false because of the upper case D. However:
would return true.
The PHP system also has functions for finding parts of strings using regular expressions and replacing them. The call:
would return a new string with the value: “cats and kittens”. A common way of using ereg_replace would be:
$source = ereg_replace($pattern, $replacement ,$source);
where $replacement holds a correction for piece of the string matched to the pattern in $pattern.
The eregi_replace acts like the ereg_replace function, except that the initial matching is done with upper- and lowercase treated as being the same.
var pattern = /dog/; var patterni = /dog/i;
The i at the end in the definition of patterni indicates that matches are to be done independent of upper- and lower-case.
Assume the following:
var source = "cats and dogs"; var source1 = "Cats and Dogs";
All strings have match() and replace() methods. The following:
will return true.
will return false because of the capital D. However:
will both return true since the patterni indicated a case-insensitive case.
source1 = source1.replace(patterni,"kittens");
would set source1 to be “Cats and kittens”.
The SQL standard provides the LIKE operator for specifying a limited set of matches. Specifically, LIKE accepts two so-called wildcard symbols. The underscore stands for any single character, and the percent sign stands for one or more of any character. To return all records in which the field named firstname starts with “Jea,” use the SQL query:
SELECT * from tablename where firstname LIKE 'Jea%'
To select only records in which the field named code has values of exactly three characters, use:
SELECT * from tablename where code LIKE '___'
The long dash is exactly three underscore symbols.
To return all records in which the field named desserts contains “chocolate,” use:
SELECT * from tablename where desserts LIKE 'chocolate'
This last query will return records with desserts field such as:
chocolate chocolate cake dark chocolate candy
MySQL supports an additional operator, REGEXP, that uses regular expressions. Its use will be demonstrated in the Examples section. This is an example of how the open-source development process can make enhancements to standard techniques. However, it is not standard SQL.