Hour 19. Data Processing

 <  Day Day Up  >  

Hour 19. Data Processing

What You'll Learn in This Hour:

  • How to determine what kind of data you're dealing with

  • Basic techniques for breaking down common data types in Perl

  • How to parse table data and XML

If you're old enough, the phrase "data processing" probably conjures up images of magazine ads from the 1970s proclaiming that you could "make big money as a Data Processor," or minicomputers with whirling tape drives from the 1980s.

But that's not really what we're talking about in this hour. This hour is about taking data from its presented form and making it more useful. This takes a few different stages.

First, you have to examine the raw data to see if it can be cut apart, sliced up, stretched , and massaged into the final form that you need. Usually this step is obvious, but not to be forgotten. If you've got a CD collection, is it possible to assemble a (partial) discography for each band in the collection? Sure. Is it possible to take that and assemble a telephone directory for your company? No, because the raw data you need just isn't in there.

Next you need to pick your tools to read the data, pull it apart, and reassemble it. Our tool of choice, of course, is Perl.

Finally ”and this is the part that requires some creativity ”examine the data to determine how to pull it apart. Should you cut it into vertical slices ( columns )? Horizontal slices (rows)? Make new tables and manipulate those? Do you have to glue two different sources of data together?

This hour will show you some basic techniques for pulling apart your data and reassembling it into a useful form.

By the Way

For further reading, an entire book has been written on the subject of using Perl to manipulate table data, XML data, and parse unstructured data: Data Munging with Perl by David Cross.


 <  Day Day Up  >  


SAMS Teach Yourself Perl in 24 Hours
Sams Teach Yourself Perl in 24 Hours (3rd Edition)
ISBN: 0672327937
EAN: 2147483647
Year: 2005
Pages: 241

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net