3 4
When you import data from text files, especially files generated from mainframe databases, the data often needs some processing so that it can be appropriately formatted for use in Access databases. Mainframe databases often lack features (such as input masks in Access tables) that prevent data entry errors and enforce consistency. Also, Access fields have more (or different) data type selections than mainframe databases, so some fields might need to be converted to a different data type, such as dates stored in a mainframe text field.
If you regularly download data to update your files and therefore routinely import text files of identical structure, the best way to massage that data is to import it without attempting to fix it in the Import dialog box (or in the saved spec). Then use a query (or perhaps several queries, depending on how much reworking the data needs) to reformat the data, appending cleaned-up data to another table. Let’s look at an example.
The table of raw imported data shown in Figure 16-33 is typical of data downloaded from mainframe databases.
Figure 16-33. This table of raw data imported from a mainframe has formatting problems in several fields.
This table has several formatting problems that need to be dealt with before you store the data in an Access table, as follows:
You can clean up the raw data by taking steps similar to the following. (The Stage1 table and the query that fills it won’t be needed if the raw table doesn’t need so much cleanup.) You’ll find the tables and queries listed in these steps in the Link and Import.mdb database on the companion CD.
Parsing out a name field containing up to five components is too complex for a single query. If you need to split a name into that many components, the name parsing should be done in two stages. The first stage (qappCurrentEmployeesStage1) parses out the LastName, FirstNamePlus, and MiddleNamePlus fields; parses the extension out from the BUSPHONE field; and converts the HIREDATE field to a Date field. The output fields are written to tblCurrentEmployeesStage1. The expressions used to parse and format fields in qappCurrentEmployeesStage1 are shown here:
FormattedSSN: Format([SSN],"@@@-@@-@@@@") LastName: Mid([EMPLOYEENAME],1,InStr([EMPLOYEENAME],Chr$(44))-1) FirstNamePlus: Mid([EMPLOYEENAME],InStr([EMPLOYEENAME],Chr$(44))+2) MiddleNamePlus: IIf(InStr([FirstNamePlus],Chr$(32))>0, Mid([FirstNamePlus],InStr([FirstNamePlus],Chr$(32))+1),"") BusinessPhone: IIf(InStr(UCase([BUSPHONE]), Chr$(88))-1>0,Mid([BUSPHONE],1,InStr(UCase([BUSPHONE]), Chr$(88))-1),[BUSPHONE]) Extension: IIf(InStr(UCase([BUSPHONE]), Chr$(88))-1>0,Mid([BUSPHONE],InStr(UCase([BUSPHONE]), Chr$(88))+1),"") FormattedHireDate: CDate(Format([HireDate],"@@\/@@\/@@"))
The second query, qappCurrentEmployees, parses out the FirstName and MiddleName fields, converts all name fields to the proper case (first letter capitalized), and formats the phone number fields with dashes. The expressions used to perform these conversions are shown here:
LastNameFirstFormatted: StrConv([LastNameFirst],3) MiddleName: StrConv(IIf(InStr([MiddleNamePlus],Chr$(32))>0, Mid([MiddleNamePlus],1,InStr([MiddleNamePlus],Chr$(32))-1), [MiddleNamePlus]),3) Suffix: StrConv(IIf(InStr([MiddleNamePlus],Chr$(32))>0, Mid([MiddleNamePlus],InStr([MiddleNamePlus],Chr$(32))+1),""),3) LastNameFormatted: StrConv([LastName],3) FirstName: StrConv(IIf(InStr([FirstNamePlus],Chr$(32))>0, Mid([FirstNamePlus],1,InStr([FirstNamePlus],Chr$(32))-1), [FirstNamePlus]),3) BusinessPhoneFormatted: Format([BusinessPhone],"@@@-@@@-@@@@") HomePhoneFormatted: Format([HomePhone],"@@@-@@@-@@@@")
Figure 16-34 shows the final table with data properly parsed and formatted.
Figure 16-34. The final table, with data properly parsed and formatted, looks like this.
A sample database (Split and Concatenate 2000) with queries that use various functions to split and concatenate name and address data can be downloaded from the Code Samples page of my Web site, at http://www.helenfeddema.com; it’s Code Sample #37 in the Access section. This database is also available on the companion CD.
The expressions in the previous queries (qappCurrentEmployeesStage1 and qappCurrentEmployees) use a number of functions to format the data. The following functions are useful for parsing and formatting data:
After you prepare and test the preformatted tables and the append queries, the final touch is to create a macro or function to run the necessary actions in the correct order. You can run this macro or function whenever you download a new text file from the mainframe. The following function does the import from the text file and then copies the backup tables to tblCurrentEmployeesStage1 and tblCurrentEmployees and runs the two append queries to fill tblCurrentEmployees with properly formatted data. To make this function easier to use, you could run it from a macro or place it on a toolbar as a command button. (See Chapter 13, "Customizing Access Toolbars and Menus (CommandBars)," for more information.)
note
Function ImportEmployeeData() On Error Resume Next 'Turn warnings off. DoCmd.SetWarnings False 'Delete old tables. DoCmd.DeleteObject objecttype:=acTable, _ objectname:="tblCurrentEmployeesRaw" DoCmd.DeleteObject objecttype:=acTable, _ objectname:="tblCurrentEmployeesStage1" DoCmd.DeleteObject objecttype:=acTable, _ objectname:="tblCurrentEmployees" 'Import data from text file using a saved import spec. DoCmd.TransferText transfertype:=acImportDelim, _ specificationname:="CurEmp Import Specification", _ tablename:="tblCurrentEmployeesRaw", _ filename:="D:\Documents\CurEmp.txt", _ hasfieldnames:=True 'Copy blank tables to be filled with data from 'the imported table. DoCmd.CopyObject , newname:="tblCurrentEmployeesStage1", _ sourceobjecttype:=acTable, _ sourceobjectname:="zstblCurrentEmployeesStage1" DoCmd.CopyObject , newname:="tblCurrentEmployees", _ sourceobjecttype:=acTable, _ sourceobjectname:="zstblCurrentEmployees" 'Run queries to parse and format imported data. DoCmd.OpenQuery "qappCurrentEmployeesStage1" DoCmd.OpenQuery "qappCurrentEmployees" End Function