The tokenize module splits a Python source file into individual tokens. It can be used for syntax highlighting or for various kinds of code-analysis tools.
In Example 13-17, we simply print the tokens.
File: tokenize-example-1.py import tokenize file = open("tokenize-example-1.py") def handle_token(type, token, (srow, scol), (erow, ecol), line): print "%d,%d-%d,%d: %s %s" % (srow, scol, erow, ecol, tokenize.tok_name[type], repr(token)) tokenize.tokenize( file.readline, handle_token ) 1,0-1,6: NAME import 1,7-1,15: NAME okenize 1,15-1,16: NEWLINE \012 2,0-2,1: NL \012 3,0-3,4: NAME file 3,5-3,6: OP = 3,7-3,11: NAME open 3,11-3,12: OP ( 3,12-3,35: STRING "tokenize-example-1.py" 3,35-3,36: OP ) 3,36-3,37: NEWLINE \012 ...
Note that the tokenize function takes two callable objects: the first argument is called repeatedly to fetch new code lines, and the second argument is called for each token.
Core Modules
More Standard Modules
Threads and Processes
Data Representation
File Formats
Mail and News Message Processing
Network Protocols
Internationalization
Multimedia Modules
Data Storage
Tools and Utilities
Platform-Specific Modules
Implementation Support Modules
Other Modules