Meta evaluation stands apart from the three types of evaluation discussed in the previous sections. It is a quality control process that is applied to the processes, products, and results of formative , summative , and confirmative evaluation. Meta evaluation has been around since the 1970s; however, organizational buy in to implementing meta evaluation is spotty at best. Implementation of meta evaluation is hampered by time, cost, and skill requirements. Those who do implement meta evaluation find that it enables them to:
Improve the quality of formative, summative, and confirmative evaluation.
Test whether or not the formative, summative, and confirmative evaluations delivered what they promised .
Increase the probability that evaluation results will be used effectively. [69]
Basically, meta evaluation is the process of evaluating formative, summative, and confirmative evaluation by literally zooming in on the evaluation processes, products, and outcomes to take a closer look at what happened and why. There are two types of meta evaluation, and for the purposes of this section, they will be referred to as Type One and Type Two meta evaluation. The differences between the two types of meta evaluation are timing and purpose, as shown in the following Table 7-8.
Type of Meta Evaluation | Timing and Purpose |
---|---|
Type One |
|
Type Two |
|
Type One is very much like formative evaluation because it is concurrent and proactive. Many times an outside evaluator observes and analyzes the formative, summative, and confirmative stages of evaluation as they occur and makes recommendations for improving the evaluation process before the process is finalized. Type Two resembles summative evaluation because it places a value on the completed evaluation processes. Type Two is the most frequently used form of meta evaluation. It requires fewer resources (time, money, personnel, materials) and can be tied directly to the bottom line of the performance intervention. Therefore, this chapter will focus on Type Two meta evaluation.
The concept of Type Two meta evaluation fits in quite well with the concepts of quality control and accountability. "Evaluators will be more likely to see their studies effectively utilized when they demonstrate that their work can stand the test of careful analysis and that they themselves are open to growth through criticism." [70]
The specific purposes for conducting Type Two meta evaluation may vary, but usually involve placing a value on basic issues such as:
Technical adequacy of the formative, summative, and confirmative evaluation processes and products.
Usefulness of the formative, summative, and confirmative evaluation results in guiding decision making.
Ethical significance of policies for dealing with people during formative, summative, and confirmative evaluation.
Practical use of resources during formative, summative, and confirmative evaluation.
Whether the formative, summative, and confirmative evaluations served the information needs of the client.
Whether or not the formative, summative, and confirmative evaluations adequately addressed the goals and values of the performance improvement intervention and the organization.
How well the formative, summative, and confirmative evaluations dealt with situational realities.
Whether or not the formative, summative, and confirmative evaluations met the requirements of honesty and integrity.
Other...as determined by the organization for which the intervention was implemented. [71]
In fact, the PT practitioner (or evaluator) could rephrase the issues stated above as questions and use the questions to focus the meta evaluation. For example, Was the formative evaluation useful for decisionmaking? Did the formative evaluation meet the informational needs of the performance intervention package designers?
There are three basic methods for conducting a Type Two meta evaluation: [72]
Review the Documentation
Evaluators review the evaluation proposal, evaluation plan, interim reports (status reports ) and/or final report. The purpose of the review is to determine whether the reviewer agrees with the data collection, data analysis, and conclusions of the original evaluator(s). Documentation review is particularly helpful when the outcomes of an evaluation are qualitative rather than quantitative. (Qualitative outcomes are based on feelings and experience and the information is often gathered through interviews or self-reporting . Quantitative outcomes are based on more objective measurement and the information is often gathered using statistical methods.)
Do It All Over Again
Evaluators reanalyze the quantitative data from the original evaluation to determine the "goodness" (reliability and validity) of the data and the analysis techniques. The evaluator may replicate all or part of the original evaluation or examine the effects of using different statistical procedures or asking different questions. Reanalysis is costly in terms of time and resources and is usually reserved for evaluation projects that involve major policy changes.
Find Others Who Did the Same Thing and Combine the Results
If an evaluation has been repeated in different settings, it is possible to gather together and integrate the results from all the evaluations. This is a statistical process that makes it feasible to draw general conclusions about the processes, products, and results of all the evaluations. The current popularity of internal or external benchmarking could be a selling point for implementing this method of meta evaluation (benchmarking compares internal performance to industry standards or practices).
For those readers who wish to explore the process of meta evaluation in more depth, Stufflebeam provides 34 standards and 68 guidelines for conducting a Type Two meta evaluation. [73] The standards and guidelines are applicable to business and industry as well as education and are useful when planning a meta evaluation.
The PT practitioner or evaluator conducts the meta evaluation with input and support from the performance intervention stakeholders. The stakeholders should be involved in the meta evaluation, particularly if an outside evaluator conducts the evaluation. Stakeholders can help make decisions regarding the purpose for conducting the meta evaluation and can also help select the methods to use when conducting the evaluation. The services of an external evaluator, or an internal evaluator who has not participated in planning, designing, or implementing the performance improvement package, is generally preferred to gain a fresh perspective.
For this case study, an external evaluator reviewed the case studies in the formative, summative, and confirmative evaluation sections of this chapter, provided initial reactions to the evaluation plans, and outlined plans for conducting a meta evaluation of each evaluation.
The Detroit Medical Center (DMC) Case Study (page 168): Using Formative Evaluation throughout the Life Cycle of a Performance Improvement Package
Initial Reactions to the Case Study
This case is an example of long- term formative evaluation.
It appears that the initial systemwide rollout was a good idea in principle, but should have included a pilot study. Including every manager and employee in the DMC is costly and time intensive . Using feedback from the participants for future planning of workshops is certainly an excellent idea.
Additional follow-up beyond Level I evaluation (personal reaction to the training) is warranted when large numbers of people, money, and time are involved. The new workshop was developed based on Level I reports of the difficulties individuals had with the rollout training. A Level 2 pre- and post-test evaluation to compare entering knowledge and skills with exit knowledge and skills would have been more helpful to the designers of the new workshop.
The goal of the new workshop was to describe the role and actions of a manager who contributes to a positive environment and to encourage managers to develop and follow a personal action plan for creating a positive environment within their work area. The goal is certainly an important one; however, from an evaluation perspective, it needs to be much more clearly defined and measurable. For example, the role of the personal action plan as an evaluation tool is unclear.
The field-testing of a group of managers for this new workshop with a follow-up debriefing, was an excellent step to include in the process. However, since the goal was a positive environment for everyone in the workplace, greater input from both managers and employees should have been obtained.
Meta Evaluation Plan for DMC Formative Evaluation Case Study
Purpose
Provide feedback on the design, development, and implementation of the long-term formative evaluation.
Assess the usefulness, validity, and reliability of the results.
Foci
Original Course Materials and New Workshop Materials
What were the specific participant reactions to the content of the original course?
What were the specific participant reactions to the instructional strategies used in the original course?
What content revisions were made based on participant reaction to the original course?
What instructional strategy revisions were made based on participant reaction to the original course?
Design and Development Process for New Workshop
What are the goals and objectives for the new workshop?
How were the goals and objectives established?
Was formative evaluation built in during design and development? If yes, how?
Internal Review by DMC Training Staff and Vendor
Who participated in the review?
How was the review conducted?
What were the results?
What revisions were made to the new workshop as a result of the internal review?
Field-test and Debriefing Session
Were the participants truly representative of the target audience?
How was the field-test conducted?
How was the briefing session conducted?
What were the results of the Level I evaluations from the field-test participants?
What were the results of the debriefing session?
How did the participants react to the revisions made as a result of the Level I evaluations from the original course?
How did the participants react to the revisions made during the internal review?
What revisions were made to the new workshop as a result of the field-test?
Level 1 Evaluation of the Workshops
What were the results of the Level I evaluations of the new workshop over time?
Were there any trends?
Were revisions made to the workshop based on the Level I evaluations? If yes, what were they?
Action Plans
What was the purpose of and process for the personal action plans?
What were the results of the personal action plan?
Were there any trends in the results?
Were revisions made to the new workshop based on the action plan results?
Annual Survey
To what degree of certainty can it be said that the results of the survey showed that over three years , the DMC consistently maintained a positive relationship with its employees?
Was a pretest used for the annual survey?
Coaching Sessions
What issues from the workshop were carried over to the coaching sessions?
Were any trends obvious?
What revisions were made to the new workshop as a result of the coaching sessions?
Focus Groups
How were the focus groups conducted?
What issues were explored?
What new issues were raised?
What feelings or emotions surfaced during the sessions?
What revisions were made to the new workshop based on the focus groups?
Methods for Conducting the Meta Evaluation
Review existing documentation and materials
Find others who did the same thing and compare the evaluations.
Replicate at least part of the evaluation (do it all over again).
Dealership Case Study (page 172): Summative Evaluation Plan for a Performance Improvement Program
Initial Reactions to the Case Study
After reading the evaluation plan, it is clear that an organized and thorough approach to evaluation was conducted using Kirkpatrick's four levels of evaluation. [74] The entire evaluation was well thought out, comprehensive, and professionally completed.
Using Kirkpatrick's four levels of evaluation provided a complete picture from the participants' immediate reaction to the training to the impact of the training on the organization's vision.
Level 1 Evaluation (immediate reaction) was obtained through both quantitative and qualitative responses. The results from these reactions were summarized clearly and provided useful information.
Level 2 Evaluation (immediate increase in knowledge or skill) was done by using a pre- and post-test. This was effective in assessing the acquisition of skills presented in training.
Level 3 Evaluation (transfer back on the job) used participant action plans completed during the training to follow up on what and how the participants applied what they learned during the training. In addition, a followup in the dealerships was conducted after 30 “60 days. This was an excellent step in the evaluation process.
Level 4 Evaluation (impact on the organization) was completed by interviewing individuals six months after the training, reviewing before and after sales, and reading customer service documentation. This information was obtained at an appropriate time interval.
It was still fresh to those who were being interviewed
There was ample time for the impact to be felt throughout the organization
There was ample time for objective information to be collected as to how the training impacted the dealership and the sales organization.
Incentives were an excellent idea. Immediate feedback on surveys and quizzes was a positive aspect of the evaluation plan.
One suggestion is that plans for a confirmative evaluation be considered . Perhaps, evaluation at six-month intervals over a three- or four-year period is needed. If another training session is conducted in the near future, it is strongly recommended that it follow a similar plan for summative evaluation such as the one conducted in this study.
Meta Evaluation Plan
Purpose
Provide feedback on the design, development, and implementation of the summative evaluation plan.
Assess the usefulness, validity, and reliability of the results.
Foci
Level 1 Evaluation
What were the specific participant reactions to the program?
Were any trends evident over time?
Was the instrument valid and reliable?
What revisions were made to the program based on the Level I evaluation results?
Level 2 Evaluation
What were the goals and objectives for the program?
How were the goals and objectives established?
Do the pre- and post-test questions align with the program goals and objectives?
Do the pre- and post-test questions align with the content of the program?
What were the specific pre- and post-test results?
Were any trends noted over time?
Level 3 Evaluation
Do the action plans align with the program goals and objectives?
What was participant reaction to the action plans?
What were the results?
What revisions were made to the new workshop as a result of the action plans?
Were any trends noted over time?
Was there a transfer of competency over time (K-l2)?
Level 4 Evaluation
What was the interview protocol? Was there a script? Were the interviewers trained?
What was the result of the interviews?
Were any trends noted?
How was the before-and-after comparison of sales figures and customer satisfaction conducted?
What were the results?
What was the impact on the bottom line?
Is there a final report available? Does it include recommendations for next steps?
Methods
Review existing documentation and materials.
Conduct a statistical analysis of Level 1-4 evaluation results.
Interview the program designer and evaluator.
Interview the participants.
Interview the management.
Case Study (page 179): Confirming the Long-term Effects of a Nationwide Reading Program
Initial Reactions to the Case Study
By its very nature, confirmative evaluation poses particular challenges that are not necessarily apparent in formative or summative evaluation. In this case, the greatest challenge was to assess the effects of a reading program implemented in kindergarten by measuring those effects 12 years later
The design of the study posed a problem. It does not appear that a pretest and a posttest were conducted. This may have been because a pretest to measure reading skills at the kindergarten level is particularly difficult to develop when many of the students are not yet reading. Therefore, the summative evaluation data gathered from a large national sample in which the reading program was implemented in kindergarten classes still provided a strong basis for developing the confirmative evaluation.
The results of the evaluation need careful assessment. The design of the study did not include an experimental or quasi-experimental approach. Measuring the effects of a reading program after 12 years without either of these approaches certainly makes the findings vulnerable to biases. For example, the finding that high school students who went through the reading program had higher grades, better attendance patterns, more positive attitudes toward school, and less need for remediation is a welcomed result. However, what about the many confounding factors that potentially affect these findings, such as IQ, motivation, perseverance , internal vs. external locus of control, family influence, and personality type.
Ongoing evaluation after the reading program was completed and then throughout the students' K-12 years typifies a common evaluation strategy for educational studies such as this one.
Qualitative and quantitative evaluation involving both the staff who implemented the program and the children's parents who were a part of this program may lead to some additional insights not readily apparent.
Student achievement and the cost-benefit ratio are key factors for any district to consider when implementing a program of this type. Both types of information should be generated when evaluating the effectiveness of a program with such a broad scope.
A well-planned evaluation design and an ongoing evaluation and tracking system are needed so that programs such as these can be assessed for their effectiveness over time.
Meta Evaluation Plan
Purpose
Provide feedback on the design, development, and implementation of the confirmative evaluation.
Assess the usefulness, validity, and reliability of the results.
Foci
SWRL/Ginn Beginning Reading Program
What were the specific goals and objectives of the program?
Was content aligned with the goals and objectives?
Were the instructional strategies prescribed or left to the individual teacher?
How much variation was there in the implementation of the program?
Was the reading program subjected to a thorough product evaluation?
Can any generalizations be made about the confirmative evaluation findings based on the reading program itself?
Design of the Evaluation Program
What was the overall evaluation plan?
What were the goals and objectives for the evaluation?
How were the goals and objectives established?
Do the pre- and post-test questions align with the evaluation goals and objectives?
Do the pre- and post-test questions align with the content of the program?
What were the specific pre- and post-test results?
What was the summative evaluation plan?
How were data on achievement and attendance generated?
Instruments Used for Follow-up Measurement of Reading Competency and Attitude
What instrument was used to measure to test student competency at the beginning and end of kindergarten?
What instrument was used to measure competency from grade 1 to grade 12?
What instruments were used to measure attitude?
How was data on attendance and achievement collected?
Were the instruments valid and reliable?
How frequently was competency measured?
How frequently was attitude measured?
How were the results of the evaluation analyzed ?
What trends were noted over time?
Summative Evaluation
How was data gathered from schools that implemented the program in their kindergarten classes?
Was it a random sampling?
What were the goals and objectives of the summative evaluation?
If instruments were used, were they valid and reliable?
What were the results?
What trends were noted?
How were the data quantified and analyzed?
Methods
Review existing documentation and materials.
Find others who did the same thing and compare the evaluations.
Replicate at least part of the evaluation (do it all over again).
These meta evaluations were written by Mary Jane Heaney, R.N., M.S.N., Ph.D., C.H.E.S., of Wayne State University. Used with permission.
One way to determine the purpose of a meta evaluation is to sit down with the stakeholders and respond together to the following questions.
Do we need to know... | Formative | Summative | Confirmative |
---|---|---|---|
Technical adequacy of the evaluation process and products? | q Yes q No | q Yes q No | q Yes q No |
Usefulness of the results in guiding decision making? | q Yes q No | q Yes q No | q Yes q No |
Ethical significance of policies for dealing with people? | q Yes q No | q Yes q No | q Yes q No |
Practical use of resources? | q Yes q No | q Yes q No | q Yes q No |
Whether it served the information needs of the client? | q Yes q No | q Yes q No | q Yes q No |
Whether it adequately addressed the goals/values of the performance improvement package? | q Yes q No | q Yes q No | q Yes q No |
Whether it adequately addressed the goals/values of the organization? | q Yes q No | q Yes q No | q Yes q No |
How well it dealt with situational realities? | q Yes q No | q Yes q No | q Yes q No |
Whether it met the requirements of honesty and integrity? | q Yes q No | q Yes q No | q Yes q No |
How well it satisfied the need for truthfulness? | q Yes q No | q Yes q No | q Yes q No |
Other? | q Yes q No | q Yes q No | q Yes q No |
ISPI 2000 Permission granted for unlimited duplication for noncommercial use .
[69] Posavac and Carey, 1989, p. 284
[70] Posavac and Carey, 1989, p. 282
[71] Posavac and Carey, 1989, p. 282; Madaus, Scriven, and Stufflebeam, 1987, p. 16
[72] Posavac and Carey, 1989, pp. 282 “284
[73] Stufflebeam, 1978
[74] Kirkpatrick, 1994