Using Queries to Reformat Imported Mainframe Data

3 4

When you import data from text files, especially files generated from mainframe databases, the data often needs some processing so that it can be appropriately formatted for use in Access databases. Mainframe databases often lack features (such as input masks in Access tables) that prevent data entry errors and enforce consistency. Also, Access fields have more (or different) data type selections than mainframe databases, so some fields might need to be converted to a different data type, such as dates stored in a mainframe text field.

If you regularly download data to update your files and therefore routinely import text files of identical structure, the best way to massage that data is to import it without attempting to fix it in the Import dialog box (or in the saved spec). Then use a query (or perhaps several queries, depending on how much reworking the data needs) to reformat the data, appending cleaned-up data to another table. Let’s look at an example.

The table of raw imported data shown in Figure 16-33 is typical of data downloaded from mainframe databases.

figure 16-33. this table of raw data imported from a mainframe has formatting problems in several fields.

Figure 16-33. This table of raw data imported from a mainframe has formatting problems in several fields.

This table has several formatting problems that need to be dealt with before you store the data in an Access table, as follows:

  • The SSN field is a numeric field. It should be a text field, and it needs dashes in the appropriate places.
  • The EMPLOYEENAME field contains the employee’s entire name. It needs to be broken up into Last Name, First Name, Middle Name, and Suffix fields. This field is also all caps; it needs to be converted to initial caps.
  • The BUSPHONE field needs dashes in the phone number and also needs to have the extension parsed out into a separate field.
  • The HOMEPHONE field needs dashes in the phone number.
  • The HIREDATE field is a text field. It needs to be converted to a Date field.

You can clean up the raw data by taking steps similar to the following. (The Stage1 table and the query that fills it won’t be needed if the raw table doesn’t need so much cleanup.) You’ll find the tables and queries listed in these steps in the Link and Import.mdb database on the companion CD.

  1. Create two empty tables to receive the imported data (or just one, if the cleanup is simple) with fields set up with the appropriate Access data types. Copy these tables to make new tables every time the data is downloaded. In this example, the tables to be copied are named zstblCurrentEmployeesStage1 and zstblCurrentEmployees. The zs prefix indicates a system table.
  2. Run two update queries (or one, if you have cleaner data) to parse and reformat the data as needed. The query qappCurrentEmployeesStage1 appends data from tblCurrentEmployeesRaw to tblCurrentEmployeesStage1 (a copy of zstblCurrentEmployeesStage1), and qappCurrentEmployees appends data from tblCurrentEmployeesStage1 to tblCurrentEmployees.

Parsing out a name field containing up to five components is too complex for a single query. If you need to split a name into that many components, the name parsing should be done in two stages. The first stage (qappCurrentEmployeesStage1) parses out the LastName, FirstNamePlus, and MiddleNamePlus fields; parses the extension out from the BUSPHONE field; and converts the HIREDATE field to a Date field. The output fields are written to tblCurrentEmployeesStage1. The expressions used to parse and format fields in qappCurrentEmployeesStage1 are shown here:

 FormattedSSN: Format([SSN],"@@@-@@-@@@@") LastName: Mid([EMPLOYEENAME],1,InStr([EMPLOYEENAME],Chr$(44))-1) FirstNamePlus: Mid([EMPLOYEENAME],InStr([EMPLOYEENAME],Chr$(44))+2) MiddleNamePlus: IIf(InStr([FirstNamePlus],Chr$(32))>0, Mid([FirstNamePlus],InStr([FirstNamePlus],Chr$(32))+1),"") BusinessPhone: IIf(InStr(UCase([BUSPHONE]), Chr$(88))-1>0,Mid([BUSPHONE],1,InStr(UCase([BUSPHONE]), Chr$(88))-1),[BUSPHONE]) Extension: IIf(InStr(UCase([BUSPHONE]), Chr$(88))-1>0,Mid([BUSPHONE],InStr(UCase([BUSPHONE]), Chr$(88))+1),"") FormattedHireDate: CDate(Format([HireDate],"@@\/@@\/@@")) 

The second query, qappCurrentEmployees, parses out the FirstName and MiddleName fields, converts all name fields to the proper case (first letter capitalized), and formats the phone number fields with dashes. The expressions used to perform these conversions are shown here:

 LastNameFirstFormatted: StrConv([LastNameFirst],3) MiddleName: StrConv(IIf(InStr([MiddleNamePlus],Chr$(32))>0, Mid([MiddleNamePlus],1,InStr([MiddleNamePlus],Chr$(32))-1), [MiddleNamePlus]),3) Suffix: StrConv(IIf(InStr([MiddleNamePlus],Chr$(32))>0, Mid([MiddleNamePlus],InStr([MiddleNamePlus],Chr$(32))+1),""),3) LastNameFormatted: StrConv([LastName],3) FirstName: StrConv(IIf(InStr([FirstNamePlus],Chr$(32))>0, Mid([FirstNamePlus],1,InStr([FirstNamePlus],Chr$(32))-1), [FirstNamePlus]),3) BusinessPhoneFormatted: Format([BusinessPhone],"@@@-@@@-@@@@") HomePhoneFormatted: Format([HomePhone],"@@@-@@@-@@@@") 

Figure 16-34 shows the final table with data properly parsed and formatted.

figure 16-34. the final table, with data properly parsed and formatted, looks like this.

Figure 16-34. The final table, with data properly parsed and formatted, looks like this.

A sample database (Split and Concatenate 2000) with queries that use various functions to split and concatenate name and address data can be downloaded from the Code Samples page of my Web site, at http://www.helenfeddema.com; it’s Code Sample #37 in the Access section. This database is also available on the companion CD.

The expressions in the previous queries (qappCurrentEmployeesStage1 and qappCurrentEmployees) use a number of functions to format the data. The following functions are useful for parsing and formatting data:

  • CDate. Converts text to a date. The text must be recognizable as a date.
  • Chr$. Returns an ASCII character. Useful for finding or inserting punctuation characters.
  • Format. Formats a string as specified. Useful for tasks such as inserting dashes and parentheses in social security numbers and phone numbers.
  • InStr. Finds a character or string within another string. Useful for finding a space or comma in a concatenated name field and for parsing out name components.
  • LCase. Converts a string to all lowercase.
  • Left. Returns the specified number of characters, starting at the left side of a string.
  • Mid. Returns the specified number of characters, starting at the specified point in a string. Used together with InStr, this function is useful for parsing out name components.
  • StrConv. Converts a string to the specified format. If you use 3 for the Conversion parameter, this function capitalizes the first character of the text string (sometimes called proper case).
  • UCase. Converts a string to all uppercase.

After you prepare and test the preformatted tables and the append queries, the final touch is to create a macro or function to run the necessary actions in the correct order. You can run this macro or function whenever you download a new text file from the mainframe. The following function does the import from the text file and then copies the backup tables to tblCurrentEmployeesStage1 and tblCurrentEmployees and runs the two append queries to fill tblCurrentEmployees with properly formatted data. To make this function easier to use, you could run it from a macro or place it on a toolbar as a command button. (See Chapter 13, "Customizing Access Toolbars and Menus (CommandBars)," for more information.)

note


Although it isn’t necessary to do so, I put the argument names in the procedure to simplify keeping track of the arguments of these complex methods.

 Function ImportEmployeeData() On Error Resume Next     'Turn warnings off.     DoCmd.SetWarnings False     'Delete old tables.     DoCmd.DeleteObject objecttype:=acTable, _         objectname:="tblCurrentEmployeesRaw"     DoCmd.DeleteObject objecttype:=acTable, _         objectname:="tblCurrentEmployeesStage1"     DoCmd.DeleteObject objecttype:=acTable, _         objectname:="tblCurrentEmployees"     'Import data from text file using a saved import spec.     DoCmd.TransferText transfertype:=acImportDelim, _         specificationname:="CurEmp Import Specification", _         tablename:="tblCurrentEmployeesRaw", _         filename:="D:\Documents\CurEmp.txt", _         hasfieldnames:=True     'Copy blank tables to be filled with data from      'the imported table.     DoCmd.CopyObject , newname:="tblCurrentEmployeesStage1", _         sourceobjecttype:=acTable, _         sourceobjectname:="zstblCurrentEmployeesStage1"     DoCmd.CopyObject , newname:="tblCurrentEmployees", _         sourceobjecttype:=acTable, _         sourceobjectname:="zstblCurrentEmployees"     'Run queries to parse and format imported data.     DoCmd.OpenQuery "qappCurrentEmployeesStage1"     DoCmd.OpenQuery "qappCurrentEmployees" End Function 



Microsoft Access Version 2002 Inside Out
Microsoft Access Version 2002 Inside Out (Inside Out (Microsoft))
ISBN: 0735612838
EAN: 2147483647
Year: 2005
Pages: 172
Authors: Helen Feddema

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net