Working with Data in Text Files


Excel can read data in fixed-width as well as delimited text files. (A delimited file is one that uses some particular character or combination of characters to mark the boundaries between fields. A fixed-width file is one that uses space characters-as many as necessary-to achieve field alignment.) You can either open a text file (by clicking the Microsoft Office Button and clicking Open) or import it (by clicking the Data tab and then clicking From Text in the Get External Data group). If you want to be able to maintain a refreshable link to the source file, you need to do the latter.

When you open a comma-separated-values (.csv) file, Excel parses the data immediately into columns. If you open or import any other kind of text file, Excel presents the Text Import Wizard, described next.

Note 

When you ask Excel to open or import a text file, the program looks for files with the extensions .prn, .txt, and .csv. If you want a file with a different extension, select All Files in the list to the right of the File Name box. Excel determines a file's type by its content, so it doesn't matter what the extension is.

Using the Text Import Wizard

With the Text Import Wizard, you can show Excel how to parse your text file. You get to tell the program what character or character combination (if any) is used to delimit columns, what kind of data appears in each column, and what character set or language was used to create the original file. You can also use the Text Import Wizard to exclude one or more rows at the top of your file-an option that's particularly useful if your file begins with some kind of noncolumnar descriptive information.

The first page of the wizard, shown in Figure 23-5, presents a preview of the data that Excel is about to import. It also indicates the best estimation of whether your file is delimited or fixed-width. You'll find that the wizard is usually correct with this first guess-but if it's mistaken, you can set it straight. (If you're not sure, just go to the second page. When you get there, you'll know whether the program was wrong, and you can return to the first page to fix the problem.)

image from book
Figure 23-5: You can use the Text Import Wizard to tell Excel how to parse your text file.

While you're still on the first page of the wizard, use the Start Import At Row text box to eliminate any header rows that you can live without in Excel. Header rows make it hard for Excel to parse your file correctly, so you can help the program (and yourself) by lopping them off here. Click Next.

The second page of the Text Import Wizard looks something like either Figure 23-6 or Figure 23-7, depending on whether your file is delimited or fixed-width. In both cases, the vertical lines in the Data Preview section show how Excel proposes to split your file into columns. The Data Preview section regrettably shows a paltry 5 rows at a time and 65 characters per row. You cannot make it show more, but you can look at other parts of the file by using the scroll bars.

image from book
Figure 23-6: If your file is delimited, the second page of the wizard indicates what character Excel has recognized as the delimiter, and the Data Preview section shows how Excel will parse your file.

image from book
Figure 23-7: If your file is fixed-width, be sure to look at the Data Preview box; you can fix any mistakes by manipulating the vertical bars.

If your file is delimited, the second page of the wizard indicates what character Excel regards as the delimiter. In Figure 23-6, for example, the program has correctly divined that the file in question is tab-delimited. Most of the time, Excel gets this right. If it does not in your case, you can select a different check box and see the effect immediately in the Data Preview section. You can also select more than one check box to indicate that your file is delimited by multiple characters. If you select two or more check boxes, Excel breaks to a new column whenever it sees any of your choices.

A separate check box lets you stipulate that Excel should regard consecutive delimiting characters as a single delimiter. You'll find that this option sometimes saves the day with tab-delimited files. The original creator of the file might occasionally have used two or more tabs to skip to the next column when the current column's contents were short. That strategy could disrupt your alignment in Excel unless you tell the program to treat consecutive delimiters as a single delimiter.

Excel is much more likely to introduce errors when trying to parse a fixed-width file. In the file shown in Figure 23-7, for example, the program initially fails to recognize that the first column break should occur between ID and Category to create a Product ID column and a Category ID column. It also erroneously plants a column break in the middle of the Category ID heading. Fortunately, it's easy to fix parsing problems of this sort. (However, if the file is long and the mistakes are many, catching them all is likely to become a trial; if you have a choice between opening a fixed-width file and an equivalent delimited file, by all means go with the latter.)

To fix parsing errors, drag vertical lines to the left or right to reposition the column breaks. To create a column break where one doesn't yet exist, click once at the appropriate place. To remove a column break that shouldn't be there at all, double-click it. When you have finished, click Next.

The third page of the wizard, shown in Figure 23-8, lets you specify the data type of each column. Your choices are limited to General (which treats text as text, numbers as numbers, and dates in recognizable formats as dates), Text (which treats everything as text), Date, and Do Not Import Column (Skip). Excel initially assigns the General description to all columns, and you'll probably want to override that presumption in some cases. For example, if your file happens to have a text field that begins with a hyphen, Excel will regard the hyphen as a minus sign and attempt to turn your text into a formula. You can avoid errors by indicating that the field is Text.

image from book
Figure 23-8: You can use the third page of the wizard to control the data type of each column.

image from book
Opening dBase Files

Excel no longer gives you the option of saving files in any of the dBase formats, but you can still open such files. To open a dBase file, click the Microsoft Office Button, click Open, and then select dBase Files in the list to the right of the File Name box:

image from book

An imported dBase file becomes an unrefreshable range in Excel.

image from book

The third page also includes an Advanced button. By clicking it, you can change the way the wizard handles commas and periods in numeric data. By default, Excel uses the settings specified in the Regional And Language Options item of Control Panel. If your text file was created under other assumptions, you'll need to make some adjustments in the Advanced Text Import Settings dialog box. Click OK, click Finish, and then click OK to import the text file.

Parsing Clipboard Text

Occasionally, when working with text data, you might find long text strings that you need to break into separate columns. This can happen, for example, if you paste text into Excel from the Clipboard. To parse such data, select it, click the Data tab, and then click Text To Columns in the Data Tools group.



Microsoft Office Excel 2007 Inside Out
MicrosoftВ® Office ExcelВ® 2007 Inside Out (Inside Out (Microsoft))
ISBN: 073562321X
EAN: 2147483647
Year: 2004
Pages: 260

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net