Method | Advanced Topics in End User Computing, Vol. 3

< Day Day Up >

A primary goal for the experimental design was to create realistic working conditions. Thus, (1) tasks were designed to be representative of traditional office work, (2) individuals who regularly engage in these types of tasks were recruited as participants, (3) participants completed their tasks in realistic environments, and (4) motivation was ensured through a quantity/quality-based financial incentive.

Participants

Assigning regular employees work that is outside of their traditional responsibilities increases the perception that the work, and therefore the entire experience, is artificial. Therefore, we recruited temporary agency employees to participant in this research. These individuals regularly engage in the types of tasks being investigated in this study. Furthermore, their normal mode of work is to visit an organization for a short period of time to do a limited quantity of work. Since participants would be completing realistic tasks under realistic working conditions (see below), we anticipated that more data may be lost once attrition, hardware failures, and outliers were considered. As a result, it was concluded that a large number of participants would be required for this study. Given the large volume of data required, the number of appropriate physical spaces available for participants to work, and the number of temporary employees that could be recruited in a single geographic region, it was concluded that multiple sites for data collection would be required. As a result, participants were recruited at two universities located in different geographic regions.

DePaul University (DPU) and Florida International University (FIU) hired a total of 175 temporary employees to participate in this research. The participants were hired under the premise of assisting with a backlog of work that had accumulated in the School of Computer Science at DPU and the Department of Industrial Engineering at FIU. The Institutional Review Boards (i.e., the organizations responsible for oversight of regulations, policies, procedures, and forms related to research involving human participants) of both DPU and FIU reviewed the experimental protocol. Given the requirements of our study, guaranteed confidentiality of the results, and nature of the activities in which participants' would engage, the Institutional Review Boards of DPU and FIU both approved this protocol.

All participants were experienced temporary employees having completed a minimum of six weeks of temporary assignments. Screening, using the Qwiz® skills assessment test (Qwiz, 2002), was conducted by the temporary agency to ensure that all participants exhibited a consistent level of experience with the three applications utilized during the study. Participants received their normal hourly pay through the temporary agency. As an incentive, participants could earn a bonus of up to 50% as described below.

Tasks

Twenty-one tasks were developed that involved creating new documents and modifying existing documents. The tasks were developed using input from the temporary agency to ensure that the tasks accurately represented the work our participants engaged in during their normal work activities. Similar tasks were designed for Excel, Word, and PowerPoint, resulting in a total of seven tasks per application. In this chapter, we focus on two of the seven tasks for each application. These tasks required our participants to make simple modifications to existing documents containing text, images, and graphics. Participants were provided an electronic copy of the document as well as a printed version.

The printed document had items marked for deletion and locations marked where new text was to be inserted. Insertion points were numbered and a corresponding list of items to be inserted was provided on a separate page. The low-demand tasks involved insertions and deletions where each modification involved an average of 1.1 items (e.g., words, numbers). The high-demand tasks were identical to the low-demand tasks, with one exception: each modification involved an average of 3.5 items. Again, while the amount of data entry required does differ between the low- and high-demand tasks, the amount of navigation required is the same. Table 1 provides additional details.

Table 1: Example Low- and High-Demand Tasks for Excel, PowerPoint, and Word
	Excel	PowerPoint	Word
Low-demand	Deletions: Participants deleted the contents of one or two adjacent cells. Insertions: Participants entered one or two numbers or words into a single cell.	Deletions: Participants deleted one or two adjacent words. Insertions: Participants entered one or two adjacent words.	Deletions: Participants deleted one or two adjacent words. Insertions: Participants entered one or two adjacent words.
High-demand	Deletions: Participants deleted the contents of more than two adjacent cells. Insertions: Participants entered a two to five numbers or words into a single cell.	Deletions: Participants deleted more than two adjacent words. Insertions: Participants entered two to five adjacent words.	Deletions: Participants deleted more than two adjacent words. Insertions: Participants entered two to five adjacent words.

To ensure that the conditions were as close to normal working conditions as possible, no constraints were placed on how the tasks were completed. Participants were free to use whatever technique they felt was most appropriate for each task. For example, they were free to navigate to each modification by scrolling, using the page-up/page-down keys, using the arrow keys, employing split windows, searching, or even using the search and replace functionality built into the applications. Given this flexibility, an individual's choice of strategy will be based upon previous experiences with similar tasks. Since different strategies could result in different task completion times or error rates, participants were randomly assigned to use one of the three platforms employed in this study. As a result, we believe it is appropriate to associate significant differences identified in our results with differences between platforms as opposed to the underlying preferences of individual participants.

Work Environment

Creating realistic working conditions required that participants be immersed in traditional work environments. Each participant was placed into a workspace designed to represent the traditional environments in which office employees typically work. Workspaces varied from larger open rooms occupied by one to four individuals not necessarily associated with the current study working on various problems, to open cubicles occupied by only one individual. Ringing telephones, conversations, radios, and other background distractions existed in every workspace.

Motivation

Beyond having realistic working conditions and tasks, our participants must also be motivated to complete the assigned work. Participants were hired for a temporary work assignment by the universities to complete predefined tasks, just as they are for all other temporary assignments. Participants were not aware that they were participating in a research experiment. Since poor work quality could result in the hiring organization expressing dissatisfaction to the temporary agency, which could affect their future work assignments, all participants had some motivation to complete the assigned work. We wanted to provide an additional, more tangible, incentive.

Eisenberger & Cameron (1996) explored the relationship between rewards and motivation. They concluded that tangible rewards, such as increased pay, resulted in increased interest in the task if the reward was tied to the quality of the outcome. Tangible rewards increased motivation regardless of the type of task being completed. Terborg & Miller (1978) also demonstrated that motivation is increased if pay is contingent upon performance. Given these observations, we instituted an incentive system that allowed participants to earn a bonus of up to 50% for the day, based upon the quantity and quality of the results they produced.

Participants completed as many of the 21 tasks as they could during their normal, approximately eight-hour workday. The temporary agency we worked with had established standards for both the quantity of work a temporary employee should be able to complete and the accuracy of that work. Using the temporary agency's quality expectations, only those tasks completed with an error rate of less than 2% would be considered when determining the bonus, if any, that a participant received. In practice, error rates were quite low and all tasks were counted toward bonus calculations. This suggests that participants did not experience significant difficulty in achieving this quality standard.

A pilot study was conducted to assess our experimental materials and procedures. No changes to the experiment were implemented as a result of this pilot study, but the results of this study, combined with the temporary agency's expectations for the quantity of work, were used to establish a baseline of sixteen tasks. The intent was to define a baseline such that the vast majority of participants would be able to attain this level of productivity in a single eight-hour workday. By establishing this baseline, the first sixteen tasks completed with an error rate of less than 2% did not result in any bonus pay. Each additional task beyond the baseline, completed with an error rate of less than 2%, resulted in a 10% bonus for the entire day. Therefore, participants completing all 21 tasks with sufficient accuracy would receive a 50% bonus for the day. If all tasks were completed in less than eight hours, participants were still paid for the full eight hours of work. Participants were informed of the details of this incentive plan prior to participation and any questions were answered.

Experimental Design

Each participant utilized a single computing platform and was unaware that other individuals were completing the same tasks or that other computing platforms were available. The three platforms utilized in this study were based upon Pentium® 133MHz, Celeron™ 266MHz, and Pentium II 400MHz processors. All three platforms were running Windows NT 4.0. The slowest system was selected because it was representative of systems frequently used in industry to perform tasks such as those included in this study. The fastest system was included because it represented the state-of-the-art in PC processor technology at the time the study began. The Celeron processor-based system represented a mid-point both in terms of processor speed and performance on industry-standard benchmark tests such as SYSmarkNT and Winstone.

All platforms used the same model keyboard, mouse, and 15-inch monitor to ensure that, from the perspective of the participants, all of the components being utilized to complete the tasks were identical both in appearance and operation. Many variables affect the performance of a computer, including the processor speed and amount of memory. However, it is important to note that the critical difference between these platforms is not the processor or memory—instead it is the overall performance of the systems as measured by the SYSmarkNT benchmark (see Table 2). Throughout the remainder of this chapter, we refer to these systems as the 133, 266, and 400 platforms.

Table 2: Platform Details
	Platform
	133	266	400
SYSmarkNT	112	178	354
Processor	Pentium 133MHz	Celeron 266MHz	Pentium II 400MHz
Memory	32 MB	32 MB	64 MB

Independent Variables

Independent variables include the computing platform, application, and task. Platform is treated as a between-group variable and has three levels corresponding to the systems described in this chapter. Application is also treated as a between-group variable and also has three levels (i.e., Excel, Word, and PowerPoint). Finally, task is treated as a between-group variable and has two levels (i.e., low- and high-demand).

Dependent Variables

Task completion time, error rate, and user perceptions of the systems were the dependent variables. Given the nature of this study, it was concluded that user perceptions had to be assessed through discussions rather than formal questionnaires. Since two supervisors were involved (i.e., one at DPU and one at FIU), scripted discussions were used to ensure consistency. These discussions took place between the supervisor and participant at the end of the workday. Each discussion began by asking the participant how their day went. Often, this question was sufficient and participants freely expressed their views of the computer they had used. Extreme examples include "Everything was fine—that was a great computer!" and "It's too bad you don't have faster computers—I had to wait for the computer so much I didn't get as much done as I might have." If this initial question was not sufficient, one of several predefined follow-up questions was asked (the specific question depended on the previous content of the discussion). Occasionally, when several questions failed to provide a clear opinion about the computer that was used, the conversation was focused explicitly on the computer by framing the discussion around the issues involved in purchasing additional computers. The discussion continued until the participant had clearly expressed a positive, neutral, or negative opinion of the system they had used. Negative opinions resulted in a rating of one, neutral and positive opinions resulted in a rating of two, and positive opinions resulted in a rating of three. Error rates were assessed by comparing the desired outcomes to the actual results produced by the participants. Each extra, forgotten, or incorrectly completed alteration was counted as a single error.

Task completion time measured the total time users spent interacting with the system to complete the task. The data that was collected allowed task completion time (t) to be divided into three components: data entry (t_D), navigation (t_N), and miscellaneous other activities (t_O). All time spent preparing for and executing data entry activities (e.g., typing or deleting characters) was represented by t_D. Time spent preparing for and executing navigation activities (e.g., mouse or keyboard navigation) was represented by t_N. Time spent preparing for and executing the few remaining activities that were not data entry, but could not be definitively classified as navigation (e.g., accessing help or other functions) was represented by t_O. As a result, some of the time in t_O may belong in t_N, but none of the time in t_O belongs in t_D. To summarize:

t = t_D + t_N + t_O
t_D accurately reflects all time spent preparing for and executing data entry activities
t_N represents the minimum amount of time spent preparing for and executing navigation activities
t_O represents the time spent on activities that were not data entry, but could not be definitively classified as navigation
t_N + t_O represents the maximum amount of time spent preparing for and executing navigation activities

Experimental Procedure

Participants were selected by the temporary agency from a database of active employees. Participants were scheduled to arrive at a predefined location and time on a specific day. Their supervisor for the day escorted them to their workplace and explained their tasks. Participants were given three to five tasks at a time. When they completed one set of tasks, they were given another set. Each participant was assigned the tasks in a unique random order. The supervisor checked on the participants periodically to make sure no unexpected problems were hindering their progress. Participants were allowed normal breaks in the morning, at lunch, and in the afternoon, but were not constantly monitored. As described above, the supervisor engaged the participant in a discussion at the end of the day to assess the participant's perception of the computer they had used.

All other data collection was automated using Platinum Technology's DeskWatch software (Arcsoft, 2002). This software recorded each keystroke, mouse event, and function invocation as well as numerous additional details about the users' interactions with the system. The resulting log files were processed using analysis software available from Platinum Technology.

After each participant completed their work, all log files and resulting documents were removed from the computer. Microsoft Office 97 was uninstalled, temporary files created during the day were deleted, and Microsoft Office 97 was then reinstalled. This process ensured that every participant's first encounter with each of the applications was identical and that any indication that someone else had previously completed the same tasks would be removed.

< Day Day Up >