Chapter 33: Importing Data from a Text File or Document


Overview

  • How can I import data from a text file into Excel so that I can analyze it?

Jeff Sagarin, the creator of the USA Today basketball and football ratings, and I have developed a system to rate NBA players for the Dallas Mavericks team and its owner Mark Cuban. Every day during the season, Jeff’s FORTRAN program produces a multitude of information, including ratings for each Dallas Maverick lineup during each game. Jeff’s program produces this information in the form of a text file.

  • How can I import data from a text file into Excel so that I can analyze it?

  • We often receive data in a Microsoft Office Word document or in a text (.txt) file that we need to import into Microsoft Office Excel for numerical analysis. To import a Word document into Excel 2007, you should first save it as a text file. You can then use the Text Import Wizard to import the text file into Excel. With the Text Import Wizard you can break data in a text file into columns by using one of the following approaches.

    • If you choose the fixed-width option, Excel guesses where the data should be broken into columns. You can easily modify Excel’s assumptions.

    • If you choose the delimited option, you pick a character (common choices are a comma, a space, or a plus sign), and Excel breaks the data into columns wherever it encounters the character you chose.

  • As an example, the file Lineupsch33.docx (a sample of the data is shown below) contains the length of time each lineup played for Dallas in several games during the 2002– 2003 season. The file also contains the “rating” of the lineup. For example, the first two lines tell us that against Sacramento, the lineup of Bell, Finley, LaFrentz, Nash, and Nowitzki were on the court together for 9.05 minutes and that the lineup played at a level of 19.79 points (per 48 minutes), worse than an average NBA lineup.

     Bell    Finley   LaFrentz  Nash    Nowitzki  - 19.79 695# 9.05m SAC DAL* Finley   Nash    Nowitzki  Van Exel  Williams  - 11.63 695# 8.86m SAC DAL* Finley   LaFrentz  Nash    Nowitzki  Van Exel  102.98 695# 4.44m SAC DAL* Bradley  Finley   Nash    Nowitzki  Van Exel  - 44.26 695# 4.38m SAC DAL* Bradley  Nash    Nowitzki  Van Exel  Williams   9.71 695# 3.05m SAC DAL* Bell    Finley   LaFrentz  Nowitzki  Van Exel  - 121.50 695# 2.73m SAC DAL* Bell    LaFrentz  Nowitzki  Van Exel  Williams   39.35 695# 2.70m SAC DAL* Bradley  Finley   Nowitzki  Van Exel  Williams   86.87 695# 2.45m SAC DAL* Bradley  Nash    Van Exel  Williams  Rigaudeau  - 54.55 695# 2.32m SAC DAL*

  • We’d like to import this lineup information into Excel so that, for each lineup, we would have the following information listed in different columns.

    • Each player’s name

    • Minutes played by the lineup

    • Rating of the lineup

  • The player Van Exel (actually Nick Van Exel) raises a problem. If we choose the delimited option and use a space character to break the data into columns, Van Exel will occupy two columns. For lineups that include Van Exel, the numerical data will be located in a different column than the column in which the data is located for lineups that don’t include Van Exel. To remedy this problem, I’ve used the Replace command in Word to change each occurrence of Van Exel to Exel. Now, when we break up the data where a space occurs, Van Exel will require only one column. The first few rows of our data now look like the following.

     Bell    Finley   LaFrentz  Nash    Nowitzki  - 19.79 695# 9.05m SAC DAL* Finley   Nash    Nowitzki  Exel  Williams  - 11.63 695# 8.86m SAC DAL* Finley   LaFrentz  Nash    Nowitzki  Exel  102.98 69 5# 4.44m SAC DAL* Bradley  Finley   Nash    Nowitzki  Exel  - 44.26 695# 4.38m SAC DAL* Bradley  Nash    Nowitzki  Exel  Williams   9.71 69 5# 3.05m SAC DAL* Bell    Finley   LaFrentz  Nowitzki  Exel  - 121.50 695# 2.73m SAC DAL* Bell    LaFrentz  Nowitzki  Exel  Williams   39.35 69 5# 2.70m SAC DAL* Bradley  Finley   Nowitzki  Exel  Williams   86.87 69 5# 2.45m SAC DAL* Bradley  Nash    Exel  Williams  Rigaudeau  - 54.55 695# 2.32m SAC DAL*

  • The trick in importing data from a Word or text file into Excel is to use the Excel Text Import Wizard. As I mentioned earlier, you first need to save the Word file (Lineupsch33.docx in this example) as a text file. To do this, simply open the file in Word, click the Microsoft Office Button followed by File, Save As, and then select Plain Text in the Save As Type list. In the File Conversion dialog box, select the Windows (Default) option, and then click OK. Your file will now be saved with the name Lineupsch33.txt. Close the Word document. In Excel, open the file Lineupsch33.txt. You’ll see Step 1 of the Text Import Wizard, which is shown in Figure 33-1.

    image from book
    Figure 33-1: Step 1 of the Text Import Wizard

  • Clearly, we want to select the Delimited option and break the data at each space. However, let’s suppose that we choose Fixed Width. Then Step 2 of the Text Import Wizard appears, shown in Figure 33-2. As you can see, you can create, move, or delete a break line. For many data import operations, changing column breaks can be a hit-or-miss adventure.

    image from book
    Figure 33-2: Step 2 of the Text Import Wizard after selecting the Fixed Width option

  • If we select Delimited in Step 1, you’ll see the second step of the Text Import Wizard that’s shown in Figure 33-3 on the next page. In this example, I’ve selected Space as the delimiter. Selecting the Treat Consecutive Delimiters As One option ensures that consecutive spaces will result in only a single column break. I recommend keeping Tab selected because many Excel add-ins do not work properly if Tab is deselected.

    image from book
    Figure 33-3: Step 2 of the Text Import Wizard after selecting Delimited Option

  • When you click Next, you’re sent to the third step in the wizard, which is shown in Figure 33-4. By selecting the General option as the format, we have Excel treat numerical data as numbers and other values as text.

    image from book
    Figure 33-4: Step 3 of the wizard, in which you can select a format to apply to the data you’re importing

  • When you click Finish, the wizard imports the data into Excel, as shown in Figure 33-5.

    image from book
    Figure 33-5: Excel file with lineup information

  • Each player is listed in a separate column (columns A–E); column F contains the rating of each lineup, column G contains the game number, column H contains the minutes played by each lineup, and columns I and J list the two teams playing in the game. After saving the file as an Excel workbook (.xlsx), you can use all of the analytic capabilities of Excel to analyze the performance of Dallas’s lineups. For example, we could calculate the average performance of the team when Dirk Nowitzki is on or off the court.




Microsoft Press - Microsoft Office Excel 2007. Data Analysis and Business Modeling
MicrosoftВ® Office ExcelВ® 2007: Data Analysis and Business Modeling (Bpg -- Other)
ISBN: 0735623961
EAN: 2147483647
Year: 2007
Pages: 200

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net