A modestly ambitious regular expression language allows the user to express sequences, alternations , and repetitions of characters . The following grammar is based on rules given in Introduction to Automata Theory, Languages and Computation [Hopcroft and Ullman]. The rules recognize conventional operator precedence, and they avoid the problem of left recursion.
expression = term ('' term)*; term = factor factor*; factor = phrase (phrase '*'); phrase = letterOrDigit '(' expression ')'; letterOrDigit = Letter Digit;
Note that there is a major difference between the meaning of a vertical bar by itself and a vertical bar in single quotes. There is also a major difference between an asterisk and an asterisk in single quotes. A character in single quotes means that the quoted character must appear in text that the grammar is matching. Without these quotes, a vertical bar means alternation , and an asterisk means repetition.
The function of the term rule would be more apparent if you required an ampersand, say, to mean sequence. Then, instead of typing the expression "jaJaJA" the user would have to type "j&aJ&aJ&A" . The grammar rule would appear as:
term = factor ('&' factor)*;
The given grammar follows the convention of using juxtaposition to indicate sequence. A sequence of letters such as "ja" implies a sequence to match, and no operator is necessary.