Definitive Guide to Excel VBA
Authors: Kofler M.
Published year: 2002
The idea of the following example is fairly simple: A survey is to be conducted . Instead of the participants being given a printed form to fill out whose results later have to be tabulated by hand, the questionnaire is to be formulated in the form of an Excel worksheet. Thus various controls (listboxes, check boxes) can be used to make input as simple as possible; see the example questionnaire survey.xls in Figure 12-21.
Figure 12-21: Example questionnaire
Thus instead of receiving a carton of fill-out questionnaires, you will have a folder full of Excel files. At this point the database aspect of the example kicks in. The file analyzedata.xls offers a function that transfers from all the Excel files in a given folder all the responses into an Access database. In a second step you can immediately evaluate all available responses; see Figure 12-22.
Figure 12-22: Evaluation of the questionnaires
The entire project can be found in the sample files in the directory survey . Filledout surveys must be placed in the directory incoming . After they have been read in, the files are reduced (unnecessary worksheets are deleted) and moved into the directory archive .
The questionnaire is located in the file survey.xls , a supplementary template of this questionnaire in survey_template.xls . This file is used to generate test files with random data. (Thus you do not have to fill out ten questionnaires in order to try out the program.)
The file analyzedata.xls is for evaluating the questionnaires. This file contains all the VBA code of the example. The survey data are stored in the Access database dbsurvey.mdb .
As with most problems, here there is not a single solution but many. To prevent my setting you, dear reader, loose with this one example without your considering some alternatives, the following list gives some suggestions as to alternatives, variants, and possible improvements.
The example presented here uses MS Forms controls in the questionnaire. This requires at least Excel 9. If you would prefer to use the older controls (toolbar "Forms"), your Excel file will be backward compatible to Excel 5. If you would like to do without controls entirely, then you might be able to achieve compatibility with even older versions of Excel.
Our example uses no VBA code in the questionnaire. (The controls are not linked to code.) The advantage of this modus operandi is, of course, that the macro virus warning is thereby avoided. On the other hand, with additional code you could construct a much more " intelligent " form, in which, for example, certain questions could be posed that depended on the answers to previous questions.
The question of data flow is left open in this example. How do the participants in the survey receive their questionnaires? How do the Excel files find their way back to the host computer? How is it ensured that an Excel file is not accidentally read more than once into the database? Or that a participant in the survey who attempts to skew the survey by submitting multiple files is thwarted?
Possible solutions depend greatly on the particular application. If you are collecting data on hospital patients , for example, you could place the questionnaire on a few computers. If you can assume that survey participants have Internet access, then e-mail could become your medium of communication. Theoretically, your Excel files could be provided with a serial number, to avoid duplicates, but that would jeopardize the anonymity of the participants.
If not the participant but a third person transmits the answers (as in a telephone survey), there is the possibility of linking the input form directly to the database (say, with Access). Excel files have the advantage that they are independent of the database and place few demands on the computer on which the data are recorded.
The technologically most attractive variant for an Excel questionnaire as described here is one conducted over the Internet, though this would involve considerable effort in installing such an Internet survey. The danger of unwanted manipulation of the data is also greater in such a case.
If the anonymity of the data is a decisive criterion, then the oldfashioned paper questionnaire is (alas) still the most secure variant. Office 97 made headlines in the computer press because all documents created in it had to be given new ID numbers. These numbers make it possible (at least within a network) to identify the computer on which the document was created. Microsoft spoke of an error, provided an update, and promised that this would not happen with Office 2000, but one can no longer speak of confidence with respect to Microsoft.
In order not to make our example overly bloated, the questionnaire has been made relatively simple. There are only six questions. The answer to three of the questions can be input directly into an Excel cell , which is the easiest solution both in setting up the questionnaire and the later evaluation of the results.
The file survey.xls consists of three worksheets (Figure 12-23), of which normally only the first is visible. The sheet "listdata" contains the entries of the two listboxes, "results" contains a summary of the result cells .
Figure 12-23: The internal structure of survey.xls
Here are a few explanatory remarks: In the two listboxes ListFillRange was set in such a way that the data can be read from "listdata" (for the first listbox we have ListFillRange="listdata:A1:A3" ). With BoundColumn=0 we have achieved that the result of the selection is a number (0 for the first entry, etc.). LinkedCell refers to a result cell in "results," so that there the number of the active list entry is displayed. Finally, the setting fmStyleDropDownList (2) prevents the participant from inserting text into the listbox. The check boxes for input of the preferred computer book publisher are also linked to the corresponding cells in "results" via LinkedCell . In questions 1, 5, and 6 simple formulas have been placed in "results" (for example, =survey!$B for the age).
The main reason for the separation between the questionnaire table "survey" and the result table "results" is that you can edit the questionnaire easily (for example, by inserting a new question) without mixing up the order of result cells in "results." All of the code for evaluating the questionnaire is connected to "results" and depends on the structure of this table being constant. (Any change here would involve difficult changes in the program code.)
If you fill out the questionnaire file, save it, close it, and then later reopen it, the listboxes will be reset. That is, the settings that have been made appear to have been lost. Fortunately, however, the information on the selected entries is retained in the cell associated with the listbox (for example, [B3] in the worksheet "results" for the profession). Since during evaluation only the cells linked to the control elements are read, the automatic resetting of the listboxes does not represent a limitation for the program.
Both the worksheet survey and the Excel file as a whole are protected. Previously, the sheets "listdata" and "results" were invisible (FormatSheet). Thus the user can make changes only in particular cells or by means of particular controls. (The protection in this example is not backed up with a password; in practice, of course, this would be recommended.)
In question 5 the input cell B29 is protected by DataValidation. In this cell only whole numbers between 0 and 10 can be entered. An attempt to input any other value leads to an error message.
The function of the database dbsurvey.mdb is to save the results of the survey. The database consists of a single table, dbsurveydata , and there are no relations. The database was created with Access 2000. Figure 12-24 shows the table under construction, while Figure 12-25 shows some saved data records.
Figure 12-24: The table surveydata under construction
Figure 12-25: Some data records from the table surveydata
In addition to the database fields that arise directly from the questionnaire, the table also contains an id field of type Increment . This field has the task of identifying the data records and simplifying the internal management of the data. (It is part of the "good housekeeping" of database creation that every table be outfitted with such an id field and then be defined as a primary index. Such id fields are of particular importance when several tables are linked by relations.)
First, ProcessIncomingFolder opens a connection to the database dbsurvey.mdb and then a Recordset object to the table surveydata . Instead of an actual SQL command in the Open method, only the name of the table is given, which is shorthand, allowed in ADO, for SELECT * FROM table .
Then a loop is run over all *.xls files. Each file is processed separately in the procedure ProcessSurveyFile (see below). During the process many Excel files are opened, edited, and then saved. To make this happen as quickly as possible, several measures are taken for speed optimization (no screen updating, for example; see Chapter 5.10). To make the waiting time bearable (about one second per file on a Pentium II 400), the status bar displays the number of files that have been processed so far.
' survey\analyzedata.xls, Module1 Sub ProcessIncomingFolder () Dim fil As File, fld As Folder Dim conn As New Connection Dim rec As New Recordset Dim nrOfFiles&, i& On Error GoTo error_processincoming ' optimize speed Application.Calculation = xlCalculationManual Application.ScreenUpdating = False Application.DisplayStatusBar = True ' connection to dbsurvey.mdb Set conn = OpenSurveyDatabase If conn Is Nothing Then Exit Sub ' short form for "SELECT * FROM surveydata" rec.Open "surveydata", conn, adOpenKeyset, adLockOptimistic Set fld = fso.GetFolder(ThisWorkbook.Path + "\incoming") nrOfFiles = fld.Files.Count For Each fil In fld.Files i = i + 1 Application.StatusBar = "process file " & fil.Name & _ " (" & i & " from " & nrOfFiles & ")" If LCase(Right(fil.Name, 4)) = ".xls" Then ProcessSurveyFile fil, rec End If Next rec.Close conn.Close error_processincoming: Application.Calculation = xlCalculationAutomatic Application.DisplayAlerts = True Application.ScreenUpdating = True Application.StatusBar = False If Err <> 0 Then MsgBox Error + vbCrLf + _ "The procedure ProcessIncomingFolder will be stopped" End If End Sub
ProcessSurveyFile is invoked for each file in the incoming folder. The file is opened, and protection is removed for the entire file. Then the result cells in "results" are replaced by their copies. This is necessary so that the source data can be deleted, to make the archived files as small as possible.
The initial plan was simply to delete the sheets "survey" and "listdata." But it turned out that wb.Worksheets("survey").Delete leaves the file in a damaged state. The file can be saved, but at the next attempt to open it, Excel crashes. For this reason this sheet is not deleted in its entirety, but only its contents (Cells.Clear for the cells, Shapes( ).Delete for the controls).
The insertion of a new data record into the surveydata table of the database is simply accomplished with AddNew . Then the result cells of the worksheet results are read and saved in various fields of the data record. Finally, the method Update saves the new record.
Sub ProcessSurveyFile (fil As Scripting.File, rec As Recordset) Dim newfilename$ Dim wb As Workbook, ws As Worksheet Dim shp As Shape ' open file Set wb = Workbooks.Open(fil.Path) ' sheet "results": replace formula by their results ' sheets "survey" and "listdata": delete wb.Unprotect Set ws = wb.Worksheets("results") ws.[a1].CurrentRegion.Copy ws.[a1].CurrentRegion.PasteSpecial xlPasteValues ws.Visible = xlSheetVisible Application.CutCopyMode = False With wb.Worksheets("survey") .Unprotect .Cells.Clear For Each shp In .Shapes shp.Delete Next End With Application.DisplayAlerts = False 'don't show alerts ' wb.Worksheets("survey").Delete 'caution: would cause corrupt file wb.Worksheets("listdata").Delete Application.DisplayAlerts = True ' copy data from survey to database With rec .AddNew !age = ws.[b1] !sex = ws.[b2] !profession = ws.[b3] !pubaw = -CInt(ws.[b4]) 'False--->0, True--->1 !pubapress = -CInt(ws.[b5]) !pubgalileo = -CInt(ws.[b6]) !pubidg = -CInt(ws.[b7]) !pubmut = -CInt(ws.[b8]) !pubmitp = -CInt(ws.[b9]) !puboreilly = -CInt(ws.[b10]) !pubquesams = -CInt(ws.[b11]) !pubsybex = -CInt(ws.[b12]) !internet = ws.[b13] If ws.[b14] <> 0 And ws.[b14] <> "" Then !Comments = [b14] End If .Update End With ' close file and move it into archive directory wb.Save ' Stop wb.Close ' new filename: ' directory incoming instead of archive ' yyyymmdd-hhmmss-oldname.xls instead of oldname.xls newfilename = Replace(fil.Path, _ "incoming", "archive", compare:=vbTextCompare) newfilename = Replace(newfilename, _ fil.Name, Format(Now, "yyyymmdd-hhmmss-") + fil.Name) fso.MoveFile fil, newfilename End Sub
It is worth mentioning the use of the function Cint in the evaluation of the "publishers" check box ( True/False ). The function Cint transforms the Boolean values into 0 ( False ) and “1 ( True ). The minus sign in front of Cint has the effect of saving the truth values in the database as 0 and 1.
The new file name is formed from the previous name in two steps: First, the folder incoming is replaced by archive . Here a case-insensitive text comparison is carried out in Replace ( Compare:=vbTextCompare ). In the second step the former name (that is, fil.Name ) is replaced by a new name, to which the current date and time are prefixed. This serves to resolve conflicts between like-named files.
The first few lines for opening the connection to the database were taken from ProcessIncomingFolder and CreateDummyFilesInIncoming , primarily to avoid redundancy in error testing. Otherwise, these lines present few surprises .
' open connection to database Function OpenSurveyDatabase () As Connection Dim conn As Connection On Error Resume Next Set conn = New Connection conn.Open "provider=microsoft.jet.oledb.4.0;" + _ "data source=" + ThisWorkbook.Path + "\dbsurvey.mdb;" If Err <> 0 Then MsgBox "Could not connect to database: " & _ Error & vbCrLf & "The procedure will be stopped." Exit Function End If Set OpenSurveyDatabase = conn End Function
From the point of view of database programming, the most interesting procedure is certainly AnalyzeDatabase . In it queries are carried out in the database dbsurvey.mdb by means of various SQL commands, and the results then transferred into the cells of the worksheet "surveyresults." The procedure assumes that this worksheet looks like the one depicted in Figure 12-22, thus that the result cells are sensibly formatted (for example, as percentages), the charts refer to the appropriate data, and so on. This can all be accomplished interactively during program development and requires no VBA code.
All the database queries are executed with the same Recordset variable, which is opened with a variety of SQL commands, and after reading the result(s) is again closed. The first two commands are easily understood : SELECT COUNT(id) determines the number of data records, while AVG(age) and STDEV(age) calculate the mean and standard deviation of the age. Both commands return a list of records with only one record in it. For testing commands of this sort it is useful to have access to Access (see Figure 12-26).
Figure 12-26: Testing an SQL query in Access
Please note that STDEV does not conform to the SQL standard, but belongs to an extension of the SQL syntax for Access. This aggregate function, therefore, in contrast to AVG, is not available on all database systems.
Sub AnalyzeDatabase () Dim conn As Connection Dim rec As New Recordset Dim ws As Worksheet Dim publ As Variant Dim p, i& Set ws = ThisWorkbook.Worksheets("surveyresults") ' connection to table surveydata of database dbsurvey.mdb Set conn = OpenSurveyDatabase If conn Is Nothing Then Exit Sub ' nr. of questionnaires rec.Open "SELECT COUNT(id) AS result FROM surveydata", conn ws.[c11] = rec!result rec.Close ' average age, standard deviation rec.Open "SELECT AVG(age) AS result1, STDEV(age) AS result2 " & _ "FROM surveydata", conn ws.[c13] = rec!result1 ws.[c14] = rec!result2 rec.Close
Of greater interest is the evaluation of the column sex in the database (which provides this book with its "R" rating). Here three values are permissible: 0 (no input), 1 (male), and 2 ( female ). The query is to determine how many records belong to each group . To this end the SQL construct GROUP BY is employed. To facilitate understanding of the query it may help first to consider a simpler variant:
SELECT sex, id AS result FROM surveydata sex result - - 2 19 1 20 2 21 0 22
You thus obtain a list (one line for each record) where the first column contains the gender and the second, the sequential ID number. This list can then be organized using GROUP BY sex in such a way that entries with the same gender are collected on a single line. In this case you have to specify how the entries in the second column are to be summarized. This is done with an aggregate function (in this case COUNT ).
SELECT sex, COUNT (id) AS result FROM surveydata GROUP BY sex sex result 0 35 1 29 2 24
The Recordset variable rec probably contains three records as in the table above; probably, because it is theoretically possible for one of the three permissible sex values to have no entries in the database. In this case the corresponding row would be lacking. For this reason the three result cells are first cleared with ClearContents , in order to prevent an old value from remaining behind. ClearContents has the advantage over a simple Clear in that the cell format is kept intact.
The appearance of the contents of rec is now clear. But the evaluation is interesting as well: A loop is run over all the data records of rec . Here sex is used as index for [c16].Cells(1 + n). In this way cells C16, C17, and C18 are addressed. It is not simply a value that is moved into these cells, but a formula, by means of which the result is divided by the total number of records (cell C11).
' sex (0: missing, 1: male, 2: female) ws.[c16:c18].Clear rec.Open "SELECT sex, COUNT(id) AS result " & "FROM surveydata GROUP BY sex" While Not rec.EOF ws.[c16].Cells(1 + rec!sex).Formula = "=" & rec!result & " / $C" rec.MoveNext Wend rec.Close
This same method is used for grouping the professions .
' profession (0: missing, 1-5: various prof.) rec.Open "SELECT profession, COUNT(id) AS result " & _ "FROM surveydata GROUP BY profession" While Not rec.EOF ws.[c20].Cells(1 + rec!profession).Formula = _ "=" & rec!result & " / $C" rec.MoveNext Wend rec.Close
To determine by what percentage of the participants the individual publishers were chosen , a host of similar queries are necessary.
SELECT COUNT(id) AS result FROM surveydata WHERE pubXyz = True
To execute this query with a minimum of programming effort a loop is run over the field names given in an Array . For each field name the SQL query is executed and the result placed in the corresponding cell in the worksheet.
' publishers publ = Array("pubaw", "pubapress", "pubgalileo", "pubidg", _ "pubmut", "pubmitp", "puboreilly", "pubquesams", "pubsybex") For Each p In publ i = i + 1 rec.Open "SELECT COUNT(id) AS result FROM surveydata " & _ "WHERE " & p & " = True" ws.[c27].Cells(i).Formula = "=" & rec!result & " / $C" rec.Close Next
The evaluation of the internet question is done in the same way as the age question: The mean and standard deviation of all responses are computed.
' internet rec.Open "SELECT AVG(internet) AS result1, " & _ "STDEV(internet) AS result2 FROM surveydata", conn ws.[c37] = rec!result1 ws.[c38] = rec!result2 rec.Close ' close connection conn.Close End Sub
AnalyzeDatabase deliberately avoids the speed optimization measures carried out in the other procedures. If the execution of the SQL query takes some time (which is the case only if there are very many questionnaires in the database), then the user sees how, gradually, one result cell after the other is updated.
Naturally, the analysis commands demonstrated here cannot replace proper statistical analysis. For example, if for a medical test you wish to compute crosscorrelations among several parameters, then there is no avoiding a real statistics program (such as SPSS). But even in this case it is convenient to have the data already in electronic form, so that they can be imported into the statistics program with relatively little effort. (Furthermore, Excel, too, offers some sophisticated statistics functions with the add-in "Analysis ToolPak." These functions cannot replace a professional statistics program and in their application in VBA code frequently present problems.)
If you would like to try out the program, you can, of course, fill out some questionnaires yourself and then copy them into the directory incoming . But you can save yourself the effort and instead call upon CreateDummyFilesInIncoming .
The program generates a variable number of files nnnn.xls in the incoming directory and inserts random data into the worksheet "results."
The procedure begins with the same instructions for speed optimization as in ProcessIncomingFolder . Then the file survey_template.xls is opened nrOfFiles times, edited, and saved under a new name in the directory incoming . To avoid the necessity of the file having to be later processed "by hand," all cells in the worksheet "survey" are struck through with a diagonal pattern.
Sub CreateDummyFilesInIncoming () Const nrOfFiles = 50 Dim i&, j& Dim newfilename$ Dim wb As Workbook, ws As Worksheet On Error GoTo error_createdummy Randomize ' optimize speed Application.Calculation = xlCalculationManual Application.ScreenUpdating = False Application.DisplayStatusBar = True Application.DisplayAlerts = False ' open survey_template.xls, insert random data, save newfilename = ThisWorkbook.Path + "\incoming\" For i = 1 To nrOfFiles Application.StatusBar = "Create file " & i & " from " & nrOfFiles Set wb = Workbooks.Open(ThisWorkbook.Path + "\survey_template.xls") ' random data Set ws = wb.Worksheets("results") ws.[b1] = Int(15 + Rnd * 50) ws.[b2] = Int(Rnd * 3) '0: missing, 1: male, 2: female ws.[b3] = Int(Rnd * 6) '0: missing, 1-5: various prof. For j = 1 To 9 'for all publishers If Rnd > 0.7 Then ws.[b4].Cells(j) = True Else ws.[b4].Cells(j) = False End If Next ws.[b13] = Int(Rnd * 11) 'Internet: 0-10 ' mark survey sheet as inactive Set ws = wb.Worksheets("survey") ws.Unprotect ws.Cells.Interior.Pattern = xlLightUp ws.[a1] = "contains random data, do not edit manually" ws.Protect ' overwrite existing files (DisplayAlerts=False) wb.SaveAs newfilename + Format(i, "0000") + ".xls" wb.Close Next error_createdummy: Application.Calculation = xlCalculationAutomatic Application.DisplayAlerts = True Application.ScreenUpdating = True Application.StatusBar = False If Err <> 0 Then MsgBox Error + vbCrLf + _ "the procedure CreateDummyFilesInIncoming will be stopped" End If End Sub
Definitive Guide to Excel VBA
Authors: Kofler M.
Published year: 2002