As an example of this process, I'm going to use U.S. federal government budget authorization data, which the Office of Management and Budget (OMB) publishes in a variety of equivalent flat formats, even though the data itself is relatively unflat. This is a good example of the sort of legacy data developers often have to deal with. The complete document [http://w3.access.gpo.gov/usbudget/fFY2002/db.html] consists of 3,185 line items. Each line item consists of 43 separate fields. In the comma-separated values (CSV) version of the file, a typical line item looks like this: "418","National Endowment for the Humanities","00","National Endow- ment for the Humanities","0200","National Endowment for the Humani- ties: grants and administration","59","503","Research and general education aids","Discretionary","On-budget", 0, 0, 0, 121275, 145231, 150100, 151299, 130560, 135447, 140118, 139478, 132582, 138890, 140435, 153000, 156910, 170002, 175955, 177413, 177491, 172000, 110000, 110000, 111000, 112000, 115000, 120000, 121000, 124000, 126000, 129000, 132000 Each field is separated from the following field by a comma. Strings are enclosed in double quotes, and may contain commas that do not delimit fields. Dollar amounts are expressed as integers divided by 1,000. That is, the last value in the above line is 132 million dollars, not 132 thousand dollars. Table 4.1 identifies the 43 separate fields. Table 4.1. Public Budget Database Field Descriptions Field Number | Field Name | Description | 1 | Agency code | A unique, three-digit , numerical code for the cabinet department or independent agency. | 2 | Agency name | The name of the cabinet-level department (e.g., Department of Defense) or independent agency (e.g., Peace Corps). Even though only the executive branch has true agencies, for purposes of the budget, offices within the legislative branch are given the agency name "Legislative Branch" and offices within the judicial branch are given the agency name "Judicial Branch." Agency names have a maximum of 89 characters . | 3 | Bureau code | A two-digit numerical code for the bureau within the cabinet department or independent agency; bureau codes are unique only within an agency. | 4 | Bureau name | The name of the suboffice within the agency (e.g., Coast Guard or Federal Aviation Administration). Budget amounts for the agency as a whole that are not part of a specific bureau are generally categorized under a fictional bureau with the same name as the agency. Bureau names have a maximum of 89 characters. | 5 | Account code | A four-digit code (outlays) or six-digit code (offsetting receipts) for the account within the bureau. | 6 | Account name | The name of the budgeted function within the bureau. Account names have a maximum of 160 characters. | 7 | Treasury Agency code | A two-digit numerical code for the agency, assigned by the Treasury Department. | 8 | Subfunction code | A three-digit numerical code for the subfunction within an account. | 9 | Subfunction title | The name of the subfunction within the account. Subfunction titles have a maximum of 72 characters. | 10 | BEA category | Budget Enforcement Act category: "Mandatory," "Discretionary," or "Net interest." | 11 | On- and off-budget indicator | "On-budget" or "Off-budget"; Social Security trust funds and the Postal Service are off-budget; all other accounts are on-budget. | 12 | 1976 value | Actual amounts, in thousands of dollars, for fiscal year (FY) 1976. Budget authority is usually shown as a positive value. Offsetting receipts are usually negative values. | 13 | TQ value | Actual amount, in thousands of dollars, for the "transitional quarter" in 1976 when the government shifted the start of its fiscal year ahead from July to October. | 1437 | 19772000 | Actual amounts, in thousands of dollars, for each fiscal year from 1977 to 2000. | 38end | 20012006 | Estimated amounts, in thousands of dollars, for FY 2001 through FY 2006. | Source: Adapted from U.S. Office of Management and Budget. Budget Analysis Branch. Public Budget Database User's Guide: Budget of the United States Government, Fiscal Year 2002 [http://a257.gakamaitech.net/7/257/2422/09apr20010800/www.gpo.gov/usbudget/FY2002/pdf/db_guide.pdf], April 9, 2001, 8. | |