Credit: Jürgen Hermann 15.2.1 ProblemYou need to convert Python source code into HTML markup, rendering comments, keywords, operators, and numeric and string literals in different colors. 15.2.2 Solutiontokenize.tokenize does most of the work and calls us back for each token found, so we can output it with appropriate colorization: """ MoinMoin - Python Source Parser """ import cgi, string, sys, cStringIO import keyword, token, tokenize # Python Source Parser (does highlighting into HTML) _KEYWORD = token.NT_OFFSET + 1 _TEXT = token.NT_OFFSET + 2 _colors = { token.NUMBER: '#0080C0', token.OP: '#0000C0', token.STRING: '#004080', tokenize.COMMENT: '#008000', token.NAME: '#000000', token.ERRORTOKEN: '#FF8080', _KEYWORD: '#C00000', _TEXT: '#000000', } class Parser: """ Send colorized Python source as HTML to an output file (normally stdout). """ def _ _init_ _(self, raw, out = sys.stdout): """ Store the source text. """ self.raw = string.strip(string.expandtabs(raw)) self.out = out def format(self): """ Parse and send the colorized source to output. """ # Store line offsets in self.lines self.lines = [0, 0] pos = 0 while 1: pos = string.find(self.raw, '\n', pos) + 1 if not pos: break self.lines.append(pos) self.lines.append(len(self.raw)) # Parse the source and write it self.pos = 0 text = cStringIO.StringIO(self.raw) self.out.write('<pre><font face="Lucida,Courier New">') try: tokenize.tokenize(text.readline, self) # self as handler callable except tokenize.TokenError, ex: msg = ex[0] line = ex[1][0] self.out.write("<h3>ERROR: %s</h3>%s\n" % ( msg, self.raw[self.lines[line]:])) self.out.write('</font></pre>') def _ _call_ _(self, toktype, toktext, (srow,scol), (erow,ecol), line): """ Token handler """ if 0: # You may enable this for debugging purposes only print "type", toktype, token.tok_name[toktype], "text", toktext, print "start", srow,scol, "end", erow,ecol, "<br>" # Calculate new positions oldpos = self.pos newpos = self.lines[srow] + scol self.pos = newpos + len(toktext) # Handle newlines if toktype in [token.NEWLINE, tokenize.NL]: self.out.write('\n') return # Send the original whitespace, if needed if newpos > oldpos: self.out.write(self.raw[oldpos:newpos]) # Skip indenting tokens if toktype in [token.INDENT, token.DEDENT]: self.pos = newpos return # Map token type to a color group if token.LPAR <= toktype <= token.OP: toktype = token.OP elif toktype == token.NAME and keyword.iskeyword(toktext): toktype = _KEYWORD color = _colors.get(toktype, _colors[_TEXT]) style = '' if toktype == token.ERRORTOKEN: style = ' style="border: solid 1.5pt #FF0000;"' # Send text self.out.write('<font color="%s"%s>' % (color, style)) self.out.write(cgi.escape(toktext)) self.out.write('</font>') if _ _name_ _ == "_ _main_ _": import os, sys print "Formatting..." # Open own source source = open('python.py').read( ) # Write colorized version to "python.html" Parser(source, open('python.html', 'wt')).format( ) # Load HTML page into browser if os.name == "nt": os.system("explorer python.html") else: os.system("netscape python.html &") 15.2.3 DiscussionThis code is part of MoinMoin (see http://moin.sourceforge.net/) and shows how to use the built-in keyword, token, and tokenize modules to scan Python source code and re-emit it with appropriate color markup but no changes to its original formatting ("no changes" is the hard part!). The Parser class's constructor saves the multiline string that is the Python source to colorize and the file object, which is open for writing, where you want to output the colorized results. Then, the format method prepares a self.lines list that holds the offset (the index into the source string, self.raw) of each line's start. format then calls tokenize.tokenize, passing self as the callback. Thus, the _ _call_ _ method is invoked for each token, with arguments specifying the token type and starting and ending positions in the source (each expressed as line number and offset within the line). The body of the _ _call_ _ method reconstructs the exact position within the original source code string self.raw, so it can emit exactly the same whitespace that was present in the original source. It then picks a color code from the _colors dictionary (which uses HTML color coding), with help from the keyword standard module to determine if a NAME token is actually a Python keyword (to be emitted in a different color than that used for ordinary identifiers). The test code at the bottom of the module formats the module itself and launches a browser with the result. It does not use the standard Python module webbrowser to ensure compatibility with stone-age versions of Python. If you have no such worries, you can change the last few lines of the recipe to: # Load HTML page into browser import webbrowser webbrowser.open("python.html", 0, 1) and enjoy the result in your favorite browser. 15.2.4 See AlsoDocumentation for the webbrowser, token, tokenize, and keyword modules in the Library Reference; the colorizer is available at http://purl.net/wiki/python/MoinMoinColorizer, part of MoinMoin (http://moin.sourceforge.net). |