Recipe16.8.Checking Whether a String Has Balanced Parentheses


Recipe 16.8. Checking Whether a String Has Balanced Parentheses

Credit: Peter Cogolo

Problem

You need to check whether a certain string has balanced parentheses, but regular expressions are not powerful enough for this task.

Solution

We want a "true" parser to check a string for balanced parentheses, since parsing theory proves that a regular expression is not sufficient. Choosing one out of the many Python parser generators, we'll use David Beazley's classic but evergreen PLY:

# define token names, and a regular expression per each token tokens = 'OPEN_PAREN', 'CLOS_PAREN', 'OTHR_CHARS' t_OPEN_PAREN = r'\(' t_CLOS_PAREN = r'\)' t_OTHR_CHARS = r'[^( )]+'          # RE meaning: one or more non-parentheses def t_error(t): t.skip(1) # make the lexer (AKA tokenizer) import lex lexer = lex.lex(optimize=1) # define syntax action-functions, with syntax rules in docstrings def p_balanced(p):     ''' balanced : balanced OPEN_PAREN balanced CLOS_PAREN balanced                  | OTHR_CHARS                  | '''     if len(p) == 1:         p[0] = ''     elif len(p) == 2:         p[0] = p[1]     else:         p[0] = p[1]+p[2]+p[3]+p[4]+p[5] def p_error(p): pass # make the parser (AKA scanner) import yacc parser = yacc.yacc( ) def has_balanced_parentheses(s):     if not s: return True     result = parser.parse(s, lexer=lexer)     return s == result

Discussion

Here's an example of use of this recipe's code:

>> s = 'ba(be, bi(bo, bu))' >> print s, is_balanced(s) ba(be, bi(bo, bu)) True >> s = 'ba(be, bi(bo), bu))' >> print s, is_balanced(s) ba(be, bi(bo), bu)) False

The first string has balanced parentheses, but the second one has an extra closed parenthesis; therefore, its parentheses are not balanced.

"How do I check a string for balanced parentheses?" is a frequently asked question about regular expressions. Programmers without a computer science background are often surprised to hear that regular expressions just aren't powerful enough for this apparently simple task and a more complete form of grammar is required. (Perl's regular expressions plus arbitrary embedded expressions kitchen sink does sufficewhich just proves they aren't anywhere near "regular" expressions any more!)

For this very simplified parsing problem we're presenting, any real parser is overkilljust loop over the string's characters, keeping a running count of the number of open and yet unclosed parentheses encountered at this point, and return False if the running count ever goes negative or doesn't go back down to exactly 0 at the end:

def has_bal_par(s):     op = 0     for c in s:         if c=='(':             op += 1         elif c==')':             if op == 0:                 return False             op -= 1     return op == 0

However, using a parser when you need to parse is still a better idea, in general, than hacking up special-purpose code such as this has_bal_par function. As soon as the problem gets extended a bit (and problems invariably do grow, in real life, in often unpredictable directions), a real parser can grow gracefully and proportionally with the problem, while ad hoc code often must be thrown away and completely rewritten.

All over the web, you can find oodles of Python packages that are suitable for lexing and parsing tasks. My favorite, out of all of them, is still good old PLY, David Beazley's Python Lexx and Yacc, which reproduces the familiar structure of Unix commands lexx and yacc while taking advantage of Python's extra power when compared to the C language that those Unix commands support.

You can find PLY at http://systems.cs.uchicago.edu/ply/. PLY is a pure Python package: download it (as a .tgz compressed archive file), decompress and unarchive it (all reasonable archiving tools now support this subtask on all platforms), open a command shell, cd into the directory into which you unarchived PLY, and run the usual python setup.py install, with the proper privileges to be able to write into your Python installation's site-packages directory (which privileges those are depends on how you installed Python, and on what platform you're running). Briefly, install it just as you would install any other pure Python package.

As you can see from this recipe, PLY is quite easy to use, if you know even just the fundamentals of lexing and parsing. First, you define your grammar's tokensmake a tuple or list of all their names (conventionally uppercase) bound to name tokens at your module's top level, define for each token a regular expression bound to name t_token_name (again at the module's top level), import lex, and call lex.lex to build your tokenizer (lexer). Then, define your grammar's action functions (each of them carries the relevant syntax ruleproductionin its docstring in BNF, Backus-Naur Form), import yacc, and call yacc.yacc to build your parser (scanner). To parse any string, call the parse method of your parser with the string as an argument.

All the action is in your grammar's action functions, as their name implies. Each action function receives as its single argument p a list of production elements corresponding to the production that has been matched to invoke that function; the action function's job is to put into p[0] whatever you need as "the result" of that syntax rule getting matched. In this recipe, we use as results the very strings we have been matching, so that function is_balanced just needs to check whether the whole string is matched by the parse operation.

When you run this script the first time, you will see a warning about a shift/reduce conflict. Don't worry: as any old hand at yacc can tell you, that's the yacc equivalent of a rite of passage. If you want to understand that message in depth, and maybe (if you're an ambitious person) even do something about it, open with your favorite browser the doc/ply.html file in the directory in which you unpacked PLY. That file contains a rather thorough documentation of PLY. As that file suggests, continue by studying the contents of the examples directory and then read a textbook about compilersI suggest Dick Grune and Ceriel J.H. Jacobs, "Parsing Techniques, a Practical Guide." The first edition, at the time of this writing, is freely available for download as a PDF file from http://www.cs.vu.nl/~dick/PTAPG.html, and a second edition should be available in technical bookstores around the middle of 2005.

See Also

PLY web page at http://systems.cs.uchicago.edu/ply/; Dick Grune and Ceriel J.H. Jacobs, "Parsing Techniques, a Practical Guide," a PDF, downloadable from http://www.cs.vu.nl/~dick/PTAPG.html.



Python Cookbook
Python Cookbook
ISBN: 0596007973
EAN: 2147483647
Year: 2004
Pages: 420

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net