Research Methodology
Given the limited academic and practitioner knowledge about the data mining process, this study was intentionally exploratory since the objective is to discover a potential process for performing data mining projects. Consistent with the nature of this research, practical considerations with respect to companies willing to participate in the study and the requirements to have full access to an organization during the time frame of the study and that the major actors are available for "interview," a single site case study design was employed. The basic unit of analysis for this study was a data mining project.
Data collection and analysis of case study research, along with other types of research (e.g., empirical survey research), have to be
tested
for construct validity, external and internal validity, and reliability. Construct validity means establishing correct operational measures for the concepts being studied (Yin, 1989). In this study, construct validity was enhanced by using multiple sources of evidence. Of the six sources of evidence described by Yin (1989), three were used: documentation, direct observation and participant-observation. Documentation used were memos, minutes of meetings and internal company documents. Direct observation was used during field
visits
to the case study site to collect observational evidence. In addition, participant-observation, a special mode of direct observation, allowed for the collection of particularly rich case study data.
One
methodological
issue affecting the choice of case studies is their potential lack of external validity. It is often difficult to tell how much can be generalized from any particular case study. This study deals with external validity by choosing a data mining project in a major Canadian fast food company (hereafter referred to as Company X). The business issues
affecting
companies in the fast food industry are similar to those of the retail sector in general; thus the choice of Company X is reasonably representative. Since this study is exploratory in nature, the generalizability of this case is not as important as in other situations. As for internal validity, according to Yin (1989), it is not a concern for exploratory studies. Finally, in this study, reliability, which is the ability to replicate the measurement of constructs and
variables
is not addressed as this study is exploratory.
Data Collection Methods
The data collection for this study was carried out in multiple steps using several sources of information. Obviously, an important pragmatic requirement was to find a company that was interested in this study. It is possible that this criterion may have introduced a selection bias; however, as mentioned previously, selection bias is expected to be minimal since Company X is at a minimum reasonably representative of the Canadian retail sector.
The second step consisted of selecting a particular department in Company X whose cross section of
employees
was willing to participate in an
in-depth
analysis. At the time this study had been initiated, the IT department of Company X was beginning a data mining implementation project. This
coincidence
supported a participant-observation mode and therefore afforded a unique opportunity to have access to evidence that would
otherwise
have been inaccessible. In addition, it allowed for the ability to perceive reality from the viewpoint of someone "inside" the case study.
Data for the study was
carefully
collected, categorized (e.g., workshop notes, project documentation deliverables) and
analyzed
after each site visit. A total of 10 site visits took place and data collection
began
in July 1998 and was completed by November 1998. After gathering qualitative data from all site visits, final data analysis was undertaken.