B.3 A Data Set Example

For purposes of this appendix, let us start with a specific hypothetical data representation. Here is an easy-to-understand example. In the town of Greenfield, MA, the telephone prefixes are 772-, 773-, and 774-. (For non-USA readers: In the USA, local telephone numbers are seven digits and are conventionally represented in the form ###-####; prefixes are assigned in geographic blocks.) Suppose also that the first prefix is the mostly widely assigned of the three. The suffix portions might be any other digits, in fairly equal distribution. The data set we are interested in is "the list of all the telephone numbers currently in active use." One can imagine various reasons why this might be interesting for programmatic purposes, but we need not specify that herein.

Initially, the data set we are interested in comes in a particular data representation: a multicolumn report (perhaps generated as output of some query or compilation process). The first few lines of this report might look like:

 ============================================================= 772-7628     772-8601     772-0113     773-3429     774-9833 773-4319     774-3920     772-0893     772-9934     773-8923 773-1134     772-4930     772-9390     774-9992     772-2314 [...] 


Text Processing in Python
Text Processing in Python
ISBN: 0321112547
EAN: 2147483647
Year: 2005
Pages: 59
Authors: David Mertz

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net