The tokenize Module

The tokenize module splits a Python source file into individual tokens. It can be used for syntax highlighting or for various kinds of code-analysis tools.

In Example 13-17, we simply print the tokens.

Example 13-17. Using the tokenize Module

import tokenize

file = open("")

def handle_token(type, token, (srow, scol), (erow, ecol), line):
 print "%d,%d-%d,%d:	%s	%s" % 
 (srow, scol, erow, ecol, tokenize.tok_name[type], repr(token))


1,0-1,6: NAME import
1,7-1,15: NAME 	okenize
1,15-1,16: NEWLINE \012
2,0-2,1: NL \012
3,0-3,4: NAME file
3,5-3,6: OP =
3,7-3,11: NAME open
3,11-3,12: OP (
3,12-3,35: STRING ""
3,35-3,36: OP )
3,36-3,37: NEWLINE \012

Note that the tokenize function takes two callable objects: the first argument is called repeatedly to fetch new code lines, and the second argument is called for each token.

Core Modules

More Standard Modules

Threads and Processes

Data Representation

File Formats

Mail and News Message Processing

Network Protocols


Multimedia Modules

Data Storage

Tools and Utilities

Platform-Specific Modules

Implementation Support Modules

Other Modules

Python Standard Library
Python Standard Library (Nutshell Handbooks) with
ISBN: 0596000960
EAN: 2147483647
Year: 2000
Pages: 252
Authors: Fredrik Lundh © 2008-2020.
If you may any questions please contact us: