The tokenize Module | Implementation Support Modules

Table of contents:

Example 13-17. Using the tokenize Module

The tokenize module splits a Python source file into individual tokens. It can be used for syntax highlighting or for various kinds of code-analysis tools.

In Example 13-17, we simply print the tokens.

Example 13-17. Using the tokenize Module

File: tokenize-example-1.py

import tokenize

file = open("tokenize-example-1.py")

def handle_token(type, token, (srow, scol), (erow, ecol), line):
 print "%d,%d-%d,%d:	%s	%s" % 
 (srow, scol, erow, ecol, tokenize.tok_name[type], repr(token))

tokenize.tokenize(
 file.readline,
 handle_token
 )

1,0-1,6: NAME import
1,7-1,15: NAME 	okenize
1,15-1,16: NEWLINE \012
2,0-2,1: NL \012
3,0-3,4: NAME file
3,5-3,6: OP =
3,7-3,11: NAME open
3,11-3,12: OP (
3,12-3,35: STRING "tokenize-example-1.py"
3,35-3,36: OP )
3,36-3,37: NEWLINE \012
...

Note that the tokenize function takes two callable objects: the first argument is called repeatedly to fetch new code lines, and the second argument is called for each token.

Core Modules

More Standard Modules

Threads and Processes

Data Representation

File Formats

Mail and News Message Processing

Network Protocols

Internationalization

Multimedia Modules

Data Storage

Tools and Utilities

Platform-Specific Modules

Implementation Support Modules

Other Modules