CASE STUDY

   

You've volunteered to help out in a mayoral election one of the candidates is your next door neighbor, so you suppress your distaste for the political process and offer your expertise in data management to your friend's election effort.

The campaign team has purchased a database containing voter rolls and it's up to you to help manage it. The first step is to get the data out of the database and into an Excel workbook where you can more easily analyze and summarize the information.

The data, shown in Figure 4.1, is in an Access database named Voters.mdb. The database is stored at C:\2005\CAMPAIGN.

Specifying the Data's Location and Format

You begin the first general step to importing data by choosing Data, Get External Data, New Database Query. The Choose Data Source dialog box shown in Figure 4.2 appears.

Figure 4.2. Starred data sources are Registry DSNs, either User DSNs or System DSNs. You can hide them by clicking Options.

graphics/04fig02.gif


NOTE

This is one of those annoying areas where Microsoft changes menu commands for no apparent reason other than sheer boredom. In Excel 97 and Excel 2000, the sequence starts with Data, Get External Data. In Excel 2002 and Excel 2003, it's Data, Import External Data. To add to the fun, Excel 97 has you continue with Create New Query, whereas in Excel 2000, 2002, and 2003 you continue with New Database Query.


TIP

If you don't see the window shown in Figure 4.2, you'll probably instead see an error message to the effect that you haven't installed Microsoft Query. To install it, you'll have to run the Office installation procedure again, choosing to add a new feature. In Office 97, Microsoft Query is available from the Data Access option. In Office 2000, XP, and 2003, Microsoft Query is available under Office Tools.


As you'll see, each choice you make in the process of importing data has implications for what happens later, and this book deals with each of them. As a good starting place, select <New Data Source> from the list box shown in Figure 4.2. When you then click OK, the Create New Data Source dialog box shown in Figure 4.3 appears.

Figure 4.3. The controls in this dialog box become enabled in turn as you use each one.

graphics/04fig03.gif


When you choose to create a new data source, you create a new file that contains information about the source of your data. For the purposes of your friend's run for the mayoralty, that file will specify an Access database named Voters.mdb, located in the C:\2005\CAMPAIGN path. Subsequently, you can use this new data source over and over to pick up information about newly registered voters, perhaps, or because you're using another workbook to put together a different analysis.

The name of the new file you're creating is based on what you type into the What Name Do You Want to Give Your Data Source? box seen in Figure 4.3. Suppose that you type Next Election in that box. As soon as you begin typing, the box labeled Select a Driver for the Type of Database You Want to Access: becomes enabled. Use its dropdown to display a list of drivers that have been installed on your computer, as shown in Figure 4.4.

Figure 4.4. The name you give the data source is later used as the name of the Data Source Name (or DSN) file.

graphics/04fig04.jpg


NOTE

If you know at the outset that you want to import data from Access, you can avoid having to make these choices by selecting MS Access Database in the Choose Data Source window (refer to Figure 4.2).


Because the data on voters is in an Access database, you would select Microsoft Access Driver (*.mdb). When you click on a driver, the Connect button is enabled. Clicking it displays the ODBC Microsoft Access Setup dialog box shown in Figure 4.5.

Figure 4.5. Use the Advanced button to supply a username and password, if one is needed to open the database.

graphics/04fig05.gif


From the ODBC Microsoft Access Setup dialog box, clicking Select displays the Select Database dialog box (see Figure 4.6).

Figure 4.6. Use the Network button to browse to a database that's stored on a network server.

graphics/04fig06.gif


At last, you're in a position to select the database that you want to use. From the Select Database window, use the Drives dropdown, if necessary, to choose the drive where your database is stored. Use the Directories list box to navigate to the folder on that drive that contains the database. In this example, you would choose C: from the Drives dropdown, and 2005\CAMPAIGN from the Directories list box. Click the Voters database in the Database Name list box, and click OK.

When you do so, you're returned to the ODBC Microsoft Access Setup window. The path to and name of the database are now shown on the window itself (see Figure 4.7).

Figure 4.7. The path and database given on the Setup dialog box help you confirm that you have selected the correct database.

graphics/04fig07.gif


Click OK to return to the Create New Data Source dialog box, and click OK again to return to the Choose Data Source dialog box.

This case study continues later in the chapter on page 94. First, though, it's necessary to look more closely at the issue of multiple simultaneous users.

Using the Exclusive and Read Only Check Boxes

The Select Database dialog box shown in Figure 4.6 has two check boxes, Exclusive and Read Only, whose use is a little obscure.

Unlike Excel workbooks, databases are designed to be used by more than one person simultaneously. Yes, you can share an Excel workbook, but this feature came along relatively late in Excel's development and was tacked onto the application almost as an afterthought. It just does not work as well or as reliably as it does in an application such as Access or SQL Server that was designed from the ground up for multiple simultaneous usage.

Bearing in mind that databases can be opened in what's termed shared mode, you'd think that filling the Exclusive check box would prevent that. You might well have a situation in which you would want to prevent others from opening the database while your query is running. In that case, you would want to open the database in exclusive rather than shared mode.

But that doesn't happen here. If you fill the Exclusive check box only in the Select Database dialog box, when you return to the ODBC Microsoft Access Setup dialog box (see Figure 4.7) and click OK, you get an error message. The error states that Query Could not use '(unknown)'; file already in use.

Microsoft states that the reason for this behavior is that Microsoft Query attempts to make two connections to the database as it's creating the DSN. If you've specified exclusive mode in the Select Database dialog box, two connections can't be made and Microsoft Query reports an error.

The most straightforward way around this foolishness is as follows:

  1. Create your data source as described in the prior section. Do not fill the Exclusive check box in the Select Database dialog box.

  2. Start Notepad, or any other application that is able to open, edit, and save text files.

  3. With Notepad active, choose File, Open. In the Files of Type dropdown, choose All Files.

  4. Browse to the location where your DSN files are stored. The location depends on your version of Excel. For example, in Excel 97, it's C:\Program Files\Microsoft Office\Office; in Excel 2003, it's C:\Program Files\Common Files\ODBC\Data Sources.

  5. Open the DSN file you created in step 1. You'll see statements much like those shown in Figure 4.8. Click at the end of any statement, including the final one, and press Enter to create a new, blank line.

    Figure 4.8. DSN files can be read by any application that can open a text file, including Notepad.

    graphics/04fig08.jpg


  6. Type this statement:

     Exclusive=1 

  7. Choose File, Save and then File, Exit.

Now when you use that DSN as the basis for a query, the database will be opened in exclusive mode.

As usual, there's a downside. If you try to use a DSN as the basis for a query, and the DSN specifies exclusive mode, and someone already has the database open, you'll get an error message. The message tells you which workstation has the database open. If you know who is using, for example, the workstation named WSMS256, you can ask that person to close the database. Or you can use Notepad to remove the Exclusive=1 statement from the DSN file.

The Read Only check box in the Select Database dialog box might seem superfluous. After all, you can't write anything back to a data source using a worksheet's external data query.

But DSN files can be used for purposes other than importing data into an Excel worksheet. As you'll see in Chapter 8, "Opening Databases," you can use a DSN to establish a connection to a database for the purpose of updating, adding, or deleting records, and even modifying the database's structure.

So, one way of preventing that sort of access using your DSN is to fill the Read Only check box. That causes this statement to be put in the DSN file:

 ReadOnly=1 

That statement means that the DSN will not allow changes to be made to the data or structure of the database.

But as you saw earlier in this section, it's easy to modify a DSN. You can remove a ReadOnly=1 statement as easily as you can add an Exclusive=1 statement. If you're really concerned about preventing changes to a database, you're much better off using its security features.

Using DSN Files

After you've created a new data source, it's easy to use it as the basis for a new query. You can dispense with choosing a driver, naming the data source, and navigating to the location of the data just select the source's name (instead of <New Data Source>) in the Choose Data Source window (refer to Figure 4.2). The information you supplied is stored in a file, known informally as a DSN. (The abbreviation DSN stands for data source name.) Excel displays it, along with other available data sources, in the Choose Data Source dialog box. Figure 4.8 shows the contents of a typical DSN file.

If you review Figure 4.2, notice the Browse button in the Choose Data Source dialog box. It enables you to navigate to other locations where you might have stored DSN files. In Office 2003, they are stored by default in C:\Program Files\Common Files\ODBC\Data Sources.

NOTE

You might have wondered about the other two tabs on the Choose Data Source dialog box, Queries and OLAP Cubes. Query files are similar to DSNs, except that in addition to information such as path, name, and type of database, they also include SQL that defines a query's tables, fields, filtering criteria, and so on. They can be more convenient but less flexible than DSNs. Queries usually have the file extension .dqy. Figure 4.9 shows a typical DQY file.

Figure 4.9. Note that the first part of the query file resembles a DSN, whereas the second part is standard SQL.

graphics/04fig09.gif


OLAP Cubes tend to be very large, multidimensional data structures. (OLAP stands for Online Analytic Processing.) Because they are so large, some values are pre-calculated so as to save data retrieval time.


Suppose that someone has sent you a DSN file via email. You might have saved it on your desktop and in that case it won't show up in the Choose Data Source dialog box. To get to it, just click the Browse button and navigate to your desktop.

This is different from what you see if you click the Options button in Choose Data Source. Then the dialog box shown in Figure 4.10 appears.

Figure 4.10. This Browse button has a different effect than the Browse button on the Choose Data Source dialog box.

graphics/04fig10.gif


The Data Source Options dialog box enables you to make longer-lasting changes to the list of available data sources. If you click the Browse button shown in Figure 4.10, you can use the Select Directory dialog box to navigate to a folder where you store, or intend to store other DSN and query files (see Figure 4.11).

Figure 4.11. Files with qy in their extensions, such as .dqy and .oqy, usually contain SQL statements; that is, they are queries.

graphics/04fig11.gif


After you've navigated to a folder, clicking OK will add that folder to the list that Microsoft Query searches. Any data source files in that folder will appear in the list of available sources in the Choose Data Source dialog box. In brief

  • Click the Browse button on the Choose Data Source dialog box to browse to a data source file that you want to use for this query. That file will not show up subsequently in the Choose Data Source dialog box.

  • Click the Options button on the Choose Data Source dialog box, and then the Browse button on the Data Source Options dialog box, to browse to a folder that you want to use now and in the future. Data source files in that folder will appear subsequently in the Choose Data Source dialog box.

NOTE

Of course, if at another time you want to use a different data source, you'll need to set it up as you did this one. If you want to use an Oracle database, for example, you need to create a DSN with a different driver; or, if you want to use a different Access database, you need a DSN that names it.


This completes the first of two general steps involved in creating a new data source: identifying the type and location of the data. We return you now to the mayor's race, where, with the data source identified, it's time to start creating a new query.

CASE STUDY
Building the Query

You have just finished creating a new data source (or, equivalently, you might have used an existing source by choosing Data, Import External Data, New Database Query). You now see the Choose Data Source dialog box shown previously in Figure 4.2.

With the Choose Data Source dialog box active, click the name of the data source you're interested in from the Databases list box, and click OK. The Microsoft Query window appears, as shown in Figure 4.12.

Figure 4.12. Use the Options button to choose between showing tables or views (view is sometimes used as another term for query).

graphics/04fig12.jpg


Although this is the first time in this sequence that the Microsoft Query window appears, Query's been running the show in the background, walking you through the creation of a new data source. The Add Tables dialog box also appears automatically when the Query window opens; all there is for you to do at this point is add one or more tables or existing queries from the database.

The next chapter goes into detail about various ways to use Microsoft Query to automatically get data out of databases and into Excel. If you're unfamiliar with true relational databases, just bear these points in mind for the present:

  • A table closely resembles an Excel worksheet or list. It has records, and each record occupies a different row. It has fields, and each field occupies a different column. The current example shows how to bring data from a database table into an Excel worksheet.

  • You can use this method to bring data into Excel from more than one table at once even from more than one database.

  • A query is a series of statements that acts on a table in some way. A query can use a table's data to do calculations (for example, "If the Sex field is 1, display 'Male' and otherwise display 'Female'"). It's often efficient to bring data into the worksheet directly from the query, and only indirectly from the table that the query uses.

The present example continues by getting data on voters directly from a single table into a worksheet.

  1. In the Add Tables window shown in Figure 4.12, select the table named Affiliation and then click Add. When you do so, the table appears in the Microsoft Query window. If they exist, you can continue by adding more tables and queries. When you're through adding, click Close. The Microsoft Query window then appears as shown in Figure 4.13.

    Figure 4.13. If you don't see the Criteria pane, choose Criteria from the View menu.

    graphics/04fig13.jpg


    The Query window has three panes:

    • The Table Pane This is where tables that your query uses appear. As you can see in Figure 4.13, the individual fields in the table are also shown.

    • The Criteria Pane Here you can specify which records should be returned from the database table. Suppose that you want to return records of voters whose value on the Party field is either Republican or Independent. You could drag the Party field from its table into the first row of the criteria pane, labeled Criteria Field. In the same column, you would enter "Republican" (with the quote marks) in the Value row. Again in the same column, you would enter "Independent" in the Or row.

    • The Data Pane This is where you specify which fields you want to return. You need not (although you can if you want) return all the fields in a table from the database. In the Table Pane, just click a field that you want and drag it into a column in the Data Pane (see Figure 4.14).

      Figure 4.14. Double-clicking a field in the table copies it to the first available column in the Data Pane.

      graphics/04fig14.gif


  2. Double-click a field in the table to copy it to the Data Pane, or click on a field and drag it to a specific column in the Data Pane. When you have put as many fields as you want into the Data Pane, choose File, Return Data to Microsoft Excel (or just click the Return Data button). When you do, the Import Data dialog box shown in Figure 4.15 appears.

    Figure 4.15. The exact appearance of this window depends on the version of Excel you're using.

    graphics/04fig15.gif


  3. Click OK to return the data to the active worksheet, beginning in the cell that was active when you began the process. Of course, if there is data that you want to avoid overwriting, you can supply a different cell address or even return the data to a new worksheet.

TIP

Step 2 noted that you can either double-click a field or click and drag it to get it into the Data Pane. Double-clicking the field puts it in the rightmost available column of the Data Pane. If you drag a field, you can place it in any column you want in the Data Pane. If you drag the asterisk at the top of the field list into the Data Pane, you get all the fields.


Figure 4.16 shows the appearance of the worksheet after the data has been returned from the source.

Figure 4.16. Notice that the data comes to the worksheet in the form of a list, complete with field names.

graphics/04fig16.jpg


Your friend's campaign can now concentrate its advertising and the get-out-the-vote effort on the subgroups most likely to vote in the next election.

There are several useful properties about the range that contains the returned data call it the external data range. Among other properties, you can

  • Cause the range to refresh itself automatically whenever you open the workbook

  • Specify how you want to manage an increase or a decrease in the number of records

  • Copy down formulas that depend on the external data range to accommodate additional records

NOTE

Excel names the external data range automatically. The name depends in part on which version you're using. Excel 97 names it ExternalData1, and if you have other external data ranges on the same worksheet, it names them ExternalData2, ExternalData3, and so on. Subsequent versions name data ranges according to the data source name that you selected in the Choose Data Source dialog box; for example, Query_from_Employee_DB. Regardless of the version, the range names are sheet-level.


graphics/arrow_icon.gif For a discussion of sheet-level versus book-level names, see Chapter 3, "Excel's Lists, Names, and Filters," p. 55.


To see what options are available to you, right-click in any cell in the external data range, and choose Properties (or, depending on your version, Data Range Properties) from the shortcut menu.

graphics/arrow_icon.gif More information on the available options can be found in "Using Microsoft Query:," p. 113.


Using the Query Wizard

As you accumulate more and more experience with Microsoft Query, you'll find that (within limits) it's a useful way of defining how you want to retrieve data from a database: which records, which fields, in which order.

Until you gain a measure of confidence, though, you might want to rely on an adjunct to Microsoft Query, called the Query Wizard. Have another look at Figure 4.2: There's a check box in the Choose Data Source window, labeled Use the Query Wizard to Create/Edit Queries.

If you fill this check box, Excel will present a series of windows much like the Chart Wizard that enables you to define your query, step by step. Figure 4.17 shows the first step of the Query Wizard, after you've finished defining the data source.

Figure 4.17. Using the Query Wizard, this window appears instead of the Microsoft Query window.

graphics/04fig17.jpg


The Query Wizard's Choose Columns window combines the Add Tables window (refer to Figure 4.12) with the act of dragging fields to the Query window's Data Pane. In Figure 4.17, the user has expanded the Affiliation table by clicking the plus sign that's just left of the table's name. The list then expands to show the names of the fields in the table, and the plus sign by the table's name changes to a minus sign.

To continue, just click a field name and then click the button with the greater-than (>) sign. The field is moved into the Columns in Your Query list. If you change your mind, select a field name in that list and click the button with the less-than (LESSTHAN) sign (see Figure 4.18).

Figure 4.18. Click the double less-than (<<) sign to remove all previously selected fields.

graphics/04fig18.jpg


When you've finished selecting fields, click the Next button to move to the wizard's next step, shown in Figure 4.19.

Figure 4.19. The Query Wizard refers to fields as columns and to records as rows.

graphics/04fig19.jpg


In this step, you identify which fields, if any, you want to use to restrict the records that are returned to the worksheet. To do so, take these steps:

  1. Select a field in the Column to Filter list.

  2. Click the first left-side dropdown and choose an appropriate operator: equals, is less than, begins with, and so on.

  3. Click the first right-side dropdown and choose an appropriate criterion value (see Figure 4.20).

    Figure 4.20. To add more criteria, you need to first select And or Or between the criteria sets the And is the default choice.

    graphics/04fig20.jpg


    NOTE

    You can also type a criterion value in the box instead of selecting one from a dropdown. If you do, make sure that you type a value that exists in the field. If the query cannot match the value you type, it will not return any records. These criteria are not case sensitive: "green" is treated in the same way as "Green".

  4. If you want to add more operators and criteria, use the second and third pairs of dropdowns. If you establish three criteria, the dialog box displays a vertical scrollbar and another criteria set. You're not limited to the three criteria sets that are initially visible. When you're through, click Next.

Step 4 shows that the Query Wizard is both easier to use and less powerful than Query itself. The step enables you to choose among available operators, rather than relying on you to know what they are and how to specify them. It also displays the values that you can use as criteria, rather than relying on you to supply them.

On the other hand, you can filter on one field only. You could not, for example, specify that you want to return male Republicans; that would require you to use two fields: Sex and Party. Microsoft Query, on the other hand, enables you to filter on as many fields as you want.

When you click Next in step 4, the Sort Order window appears (see Figure 4.21).

Figure 4.21. The user has selected Age from the dropdown and clicked Descending. The default is an ascending sort.

graphics/04fig21.gif


Select one or more fields and specify whether you want an ascending or descending sort for each. The first field you specify becomes the primary sort key; the second field becomes the secondary sort key. That is, if you select Sex first and then Party, and specify an ascending sort for each, the records might come to Excel in this order (depending on the actual values for party affiliation): Female Democrat, Female Green, Female Independent, Female Republican, Male Democrat, Male Green, Male Independent, Male Republican.

When you click Next, the Query Wizard's final step appears as shown in Figure 4.22.

Figure 4.22. Viewing the query in Microsoft Query is a good way to see how to build the same query without using the Query Wizard.

graphics/04fig22.gif


If you click Save Query, you'll save the query in a text-format data file. This can be useful, but it can also become a problem.

Suppose first that you click the Return Data to Microsoft Excel button and then click Finish. If you do, the query's definition is saved in the active workbook (in a hidden name, one that neither you nor anyone else has direct access to; this is very different from a readable DSN file). Subsequently, you can refresh the data in the external data range by re-executing the query.

TIP

You can refresh an external data range by right-clicking any cell in the range and choosing Refresh External Data from the shortcut menu (your version might instead say Refresh Data).


If you want to put the same query in another workbook, you'll either have to copy the external data range and paste it into that workbook, or you'll have to re-create the query with the other workbook active.

NOTE

It might seem too easy, but to copy the external data range and the query definition to another location, all you need to do is copy and paste the entire external data range. The query definition, as well as the name as defined, both follow along with the data.


On the other hand, suppose that you save the query instead of returning the data directly to the workbook. Then, with another workbook active, you could choose Data, Import External Data, and click Import Data. You would see your saved query as one of the available data sources. That makes it convenient to re-establish the query in a new workbook whenever you want, without copying and pasting it, and without re-creating it.

The hitch is that the query is stored in text format, and anyone might be able to use it. (The query is saved as a DQY file, of the type mentioned earlier in this chapter and shown in Figure 4.9.) Queries often need to have passwords saved with them, so if you're thinking of saving a query that gets data from a password-protected source, you might want to think twice.

You can choose View Data or Edit Query in Microsoft Query before clicking Finish. This choice is useful if you're learning how to use Microsoft Query, or if you simply find it convenient to begin by using the Query Wizard and then to fine-tune with Microsoft Query.

In most cases, though, you'll choose Return Data to Microsoft Excel. When you click Finish, the Import Data dialog box shown in Figure 4.15 appears. Click OK to finish the process. Figure 4.23 shows how your worksheet appears when you do so.

Figure 4.23. Notice that the order of the records conforms to the sort chosen in Figure 4.21.

graphics/04fig23.gif


This is a great situation for dynamic range names, which were introduced in Chapter 3, "Excel's Lists, Names, and Filters." When your external data range is refreshed with new data, its definition automatically changes.

Suppose that the range that contains the external data is named Query_From_Next_Election, and that it occupies the range A1:D100. You add five new records to the database records that qualify to be returned by the query. When the data range is refreshed next, it will include those five new records and will extend an additional five rows in the worksheet, occupying A1:D105.

If you establish two dynamic range names that use Query_From_Next_Election as a basis, you can easily arrange other parts of your workbook to update along with the external data range. An XY (Scatter) chart is a good example: You might have one dynamic range that acts as the chart's x-axis values, and one that acts as the chart's y-axis values. For example, you might define the range name AgesToChart as

 =OFFSET(Sheet1!Query_from_Next_Election,1,2, _ ROWS(Sheet1!Query_from_Next_Election),1) 

and the range name LikelihoodToChart as

 =OFFSET(AgesToChart,0,1) 

Applied to the external data range shown in Figure 4.23, the dynamic range named AgesToChart is offset one row (so it starts in row 2), two columns (so it starts in column C), has as many rows as the external data range, and is one column wide. When the external data range gets more records, it has more rows, and so does AgesToChart.

The dynamic range named LikelihoodToChart is based on AgesToChart, but it is offset by zero rows and one column from AgesToChart. Because its definition does not specify the number of rows and columns in its range, it has by default the same number of rows and columns as its basis range.

The chart shown in Figure 4.24 has a single data series, defined as

 =SERIES(,Sheet1!AgesToChart,Sheet1!LikelihoodToChart,1) 

Figure 4.24. The chart updates automatically if it's based on dynamic range names.

graphics/04fig24.gif


When the external data range is refreshed, new records enter the external data range. It occupies more rows, and the dynamic ranges that depend on it also grow. Because the chart's data series refers to the dynamic range names, it captures and displays the newly added records.



Managing Data with Microsoft Excel
Managing Data with Microsoft Excel
ISBN: 789731002
EAN: N/A
Year: 2003
Pages: 134

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net