As Pragmatic Programmers, our base material isn't wood or iron, it's knowledge. We gather requirements as knowledge, and then express that knowledge in our designs,
Plain text
is made up of printable
Fieldl9=467abe
The reader has no idea what the significance of
467abe
may be. A better choice would be to make it
understandable
to
DrawingType=UMLActivityDrawing
Plain text doesn't mean that the text is unstructured; XML, SGML, and HTML are great examples of plain text that has a
Plain text tends to be at a higher level than a straight binary encoding, which is usually derived directly from the implementation. Suppose you wanted to store a property called uses_menus that can be either TRUE or FALSE. Using text, you might write this as
myprop.uses_menus=FALSE
Contrast this with 0010010101110101.
The problem with most binary formats is that the context necessary to understand the data is separate from the data itself. You are artificially divorcing the data from its meaning. The data may as well be encrypted; it is
Tip 20
Keep Knowledge in Plain Text
There are two major drawbacks to using plain text: (1) It may take more space to store than a compressed binary format, and (2) it may be
Depending on your application, either or both of these situations may be unacceptable ”for example, when storing satellite telemetry data, or as the internal format of a relational database.
But even in these situations, it may be acceptable to store metadata about the raw data in plain text (see Metaprogramming).
Some developers may worry that by
[1] MD5 is often used for this purpose. For an
excellent introduction to the wonderful world of cryptography, see [Sch95].
Since
larger
and
slower
aren't the most frequently
Insurance against obsolescence
Leverage
Easier testing
Human-readable forms of data, and self-describing data, will outlive all other forms of data and the applications that created them. Period.
As long as the data survives, you will have a chance to be able to use it ”
You can parse such a file with only partial knowledge of its format; with most binary files, you must know all the details of the entire format in order to parse it successfully.
Consider a data file from some legacy system
[2]
that you are given. You know little about the original application; all that's important to you is that it
[2] All software becomes legacy as soon as it's written.
<FIELD10>123-45-6789</FIELD10>
...
<FIELD10>567-89-0123</FIELD10>
...
<FIELD10>901-23-4567</FIELD10>
Recognizing the format of a Social Security number, you can quickly write a small program to extract that data ”even if you have no information on anything else in the file.
But imagine if the file had been formatted this way instead:
AC27123456789B11P
...
XY43567890123QTYL
...
6T2190123456788AM
You may not have recognized the significance of the numbers quite as easily. This is the difference between human readable and human understandable.
While we're at it, FIELD10 doesn't help much either. Something like
<SSNO>123-45-6789</SSNO>
makes the exercise a no-brainer ”and ensures that the data will outlive any project that created it.
Virtually every tool in the computing universe, from source code management systems to compiler environments to editors and stand-alone filters, can
The Unix PhilosophyUnix is famous for being designed around the philosophy of small, sharp tools, each intended to do one thing well. This philosphy is enabled by using a common underlying format ”the line-oriented, plain text file. Databases used for system administration (users and passwords, networking configuration, and so on) are all kept as plain text files. (some systems, such as Solaris, also maintain a binary forms of certain databases as a performance optimization. The plain text version is kept as an interface to the binary version.) When a system crashes, you may be faced with only a minimal environment to restore it (You may not be able to access graphics drivers, for instance), Situations such as this can really make you appreciate the simplicity of plain text. |
For instance, suppose you have a production deployment of a large application with a complex site-specific configuration file ( sendmail comes to mind). If this file is in plain text, you could place it under a source code control system (see Source Code Control), so that you automatically keep a history of all changes. File comparison tools such as diff and fc allow you to see at a glance what changes have been made, while sum allows you to generate a checksum to monitor the file for accidental (or malicious) modification.
If you use plain text to create synthetic data to drive system tests, then it is a simple matter to add, update, or modify the test data
without having to create any special tools to do so.
Similarly, plain text output from regression tests can be trivially
Even in the future of XML-based
Source Code Control
Code Generators
Metaprogramming
Blackboards
Ubiquitous Automation
It's All Writing
Design a small address book database (
Translate that format into a plain text format using XML.
For each version, add a new, variable-length field called directions in which you might enter directions to each person's house.
What issues come up regarding versioning and extensibility? Which form was easier to modify? What about converting existing data?