Pragmatic Programmers manipulate text the same way woodworkers shape wood. In previous sections we discussed some specific tools shells , editors, debuggers that we use. These are similar to a wood-worker's chisels, saws, and planestools specialized to do one or two jobs well. However, every now and then we need to perform some transformation not readily handled by the basic tool set. We need a general-purpose text manipulation tool.
Text manipulation languages are to programming what routers [8] are to woodworking. They are noisy , messy, and somewhat brute force. Make mistakes with them, and entire pieces can be ruined. Some people swear they have no place in the toolbox. But in the right hands, both routers and text manipulation languages can be incredibly powerful and versatile. You can quickly trim something into shape, make joints, and carve. Used properly, these tools have surprising finesse and subtlety. But they take time to master.
[8] Here router means the tool that spins cutting blades very, very fast, not a device for interconnecting networks.
There is a growing number of good text manipulation languages. Unix developers often like to use the power of their command shells, augmented with tools such as awk and sed. People who prefer a more structured tool like the object-oriented nature of Python [URL 9]. Some people use Tcl [URL 23] as their tool of choice. We happen to prefer Perl [URL 8] for hacking out short scripts.
These languages are important enabling technologies. Using them, you can quickly hack up utilities and prototype ideasjobs that might take five or ten times as long using conventional languages. And that multiplying factor is crucially important to the kind of experimenting that we do. Spending 30 minutes trying out a crazy idea is a whole lot better that spending five hours. Spending a day automating important components of a project is acceptable; spending a week might not be. In their book The Practice of Programming [KP99], Kernighan and Pike built the same program in five different languages. The Perl version was the shortest (17 lines, compared with C's 150). With Perl you can manipulate text, interact with programs, talk over networks, drive Web pages, perform arbitrary precision arithmetic, and write programs that look like Snoopy swearing .
Tip 28
Learn a Text Manipulation Language
To show the wide- ranging applicability of text manipulation languages, here's a sample of some applications we've developed over the last few years .
Database schema maintenance. A set of Perl scripts took a plain text file containing a database schema definition and from it generated:
The SQL statements to create the database
Flat data files to populate a data dictionary
C code libraries to access the database
Scripts to check database integrity
Web pages containing schema descriptions and diagrams
An XML version of the schema
Java property access. It is good OO programming style to restrict access to an object's properties, forcing external classes to get and set them via methods . However, in the common case where a property is represented inside the class by a simple member variable, creating a get and set method for each variable is tedious and mechanical. We have a Perl script that modifies the source files and inserts the correct method definitions for all appropriately flagged variables .
Test data generation. We had tens of thousands of records of test data, spread over several different files and formats, that needed to be knitted together and converted into a form suitable for loading into a relational database. Perl did it in a couple of hours (and in the process found a couple of consistency errors in the original data).
Book writing. We think it is important that any code presented in a book should have been tested first. Most of the code in this book has been. However, using the DRY principle (see The Evils of Duplication) we didn't want to copy and paste lines of code from the tested programs into the book. That would have meant that the code was duplicated , virtually guaranteeing that we'd forget to update an example when the corresponding program was changed. For some examples, we also didn't want to bore you with all the framework code needed to make our example compile and run. We turned to Perl. A relatively simple script is invoked when we format the bookit extracts a named segment of a source file, does syntax highlighting, and converts the result into the typesetting language we use.
C to Object Pascal interface. A client had a team of developers writing Object Pascal on PCs. Their code needed to interface to a body of code written in C. We developed a short Perl script that parsed the C header files, extracting the definitions of all exported functions and the data structures they used. We then generated Object Pascal units with Pascal records for all the C structures, and imported procedure definitions for all the C functions. This generation process became part of the build, so that whenever the C header changed, a new Object Pascal unit would be constructed automatically.
Generating Web documentation. Many project teams are publishing their documentation to internal Web sites. We have written many Perl programs that analyze database schemas, C or C++ source files, makefiles, and other project sources to produce the required HTML documentation. We also use Perl to wrap the documents with standard headers and footers, and to transfer them to the Web site.
We use text manipulation languages almost every day. Many of the ideas in this book can be implemented more simply in them than in any other language of which we're aware. These languages make it easy to write code generators, which we'll look at next .
The Evils of Duplication