< Day Day Up > |
To give this chapter some context, I'm going to invent a fictional company, WidgetCo, and you are going to take on the role of programmer. WidgetCo is a small (10 employee) shop that sells retail replacement parts for antique bicycles. WidgetCo primarily sells through catalogs to bicycle enthusiasts and repair shops . Your boss, Mr. Widget, has determined that his company will now be taking orders from the Internet. In fact, he would like you as Chief Programmer to streamline the process. Initially, orders will be received through email sent by someone else's web site. (The exact mechanism for taking such orders over the web will be covered in Hours 21 “24.) For now, your concern is dealing with incoming mail messages. So what can you expect to see? Unstructured DataOne kind of data representation is called unstructured . For example, the web site could simply have users type what they want in a text box and mail the results to you. What you'll likely see is something like the following mail message:
This contains all the relevant information, but there's no hope of having a computer program dissect all of this. A human is still going to have to read this message, and re-key the information into your ordering system. Unstructured data is not suitable for any kind of automation. Table DataMost likely, the web site will be designed to send data in a structured form. The most common representation is a table. Your email order might arrive and look like the following:
This is more typical of the kind of data that would be sent through a web application (or any other kind of program). If you've ordered products on the web from an online retailer, you've probably received a confirmation email that looked very much like this. It's still human-readable , but it's structured in such a way that it can be easily processed by software. Hierarchical DataHierarchical data refers to data that is structured, like table data, but contained in parent-child or container relationships. The Table of Contents of this book is an example of hierarchical data. Each line in the Table of Contents isn't really meaningful unless you know its context. There are 24 entries called "Summary," but each Summary is only meaningful if you know in which chapter it appears. The Index of this book is also hierarchical. There are at least a half- dozen entries for the word "arrays," sometimes as a minor topic (data types/arrays, references/arrays, and so on) and once as a major topic. Exactly where it appears depends a great deal on what is being talked about. You'll look at hierarchical data in a little while, specifically at a method of representing hierarchical data called XML. Binary DataThe last category of data is binary data. It's usually quite unreadable by humans . Loading a binary file into your text editor results in gibberish. Binary data is useful when interchanging data between programs where space and speed are concerns, and human readability is not. Normally to decode binary data, you have to be supplied with the data layout by the person who's also supplying the data. It's very structured ”like the table data in Figure 19.1 ”it's just that the structure isn't apparent. JPEG images, ZIP files, and MP3 music files are all examples of binary data. Figure 19.1. Marked-up incoming order
|
< Day Day Up > |