Section 23.2. EXPERIMENTS | Aspect-Oriented Software Development with Use Cases

23.2. EXPERIMENTS

Our main goal in conducting these experiments was to better understand how the separation of concerns provided by aspect-oriented programming affects a programmer's ability to accomplish different kinds of tasks.

23.2.1. General Format

Each experiment was a between-groups study consisting of six trials: In three trials, one pool of participants worked with AspectJ; in the other three, a different pool of participants worked with a control language. We estimated each programmer's relative ability from our previous knowledge of them and a set of informal questions regarding the scope of their previous experience, particularly in regards to object-oriented programming (in Java and Emerald), concurrency, and distribution. The pools were formed to balance abilities; for the debugging experiment, pairs were selected with matching skills (so neither could dominate the trial) and then distributed to balance the pools. Each trial began with training time to allow the participants to familiarize themselves with the environment and the language(s) they were to use. We also gave the participants some refresher material on synchronization and distribution. The participants were then given a ninety minute session in which to tackle the assigned tasks. Two computers were available for use in each trial. The participants were graduate students and professors of computer science, and an undergraduate in computer engineering.

We videotaped the sessions during which participants worked on tasks; the participants were asked to think-aloud during this time. An experimenter was present during the session and was available to answer questions about the programming environment. At 30-minute intervals, or after each task was completed, the experimenter stopped the participants and asked a series of questions:

What have you done up to now?
What are you working on?
What significant problems have you encountered?
What is your plan of attack from here?

The same basic systema digital librarywas used in each experiment. The library had two main actors: readers and libraries. Readers would make requests to libraries for a particular book. Libraries would search within their internal repositories for the book, and also ask remote libraries to do the same. Each reader could query one library, and each library could directly query at least one other.

The library system was initially written in two languages, AspectJ (with JCore and Cool) and Java, the control language for the first experiment. These initial implementations were used in the program debugging experiment. A distributed version of the system was then implemented in AspectJ (using JCore, Cool. and Ridl, and in Emerald [3]. Emerald was chosen as the control language for the second experiment because it is an example of an object-oriented language that integrates explicit, but not separate, support for distributed, synchronized programming.^[3] To more fairly compare Java and Emerald with AspectJ, synchronization lock classes in each language similar to the synchronization mechanisms of AspectJ were provided to the participants.

^[3] This support includes simple constructs for object mobility, and transparent object references across machine boundaries.

Our experimental design was a refinement of a design used to conduct a small pilot study that compared the ease of creating AspectJ programs with Java programs.

23.2.2. Experiment 1: Ease of Debugging

The intent of this experiment was to investigate whether programmers working with aspect-oriented programming were able to more quickly and easily find and fix faults in a multi-threaded program. Our hypothesis was that programmers working with the aspect-oriented programming language, AspectJ (JCore and Cool components), would be able to more quickly and easily identify the cause of errors and correct them than programmers working in Java, the control language.

Three synchronization errors were introduced into the digital library code. Pairs of programmers, knowledgeable in multi-threaded programming techniques and object-oriented programming, then attempted to correct the faults.

23.2.3. Format

In each pair, one participant had control of the computer with the programming problem, and the other had access to a report describing the symptoms of the faults, and on-line documentation. The teams were asked to fix each fault sequentially. All participants were told that the errors were due to incorrect synchronization within the program.

The faults were cascading, meaning that the symptoms of the first hid the symptoms of the second, and the second hid those of the third. In the first fault, only one reader would make requests while the others remained idle. The participants had to remove per-class self-exclusive coordination on the run()method of the Reader class so that more than one reader (each in a separate thread) could run. In the second fault, multiple readers would make requests but the system would eventually deadlock. The participants were required to determine that the deadlock occurred when two libraries each tried to do a remote search on the other at the same time. Removing per-object self-exclusive coordination on the remoteSearch()method of the Library class removed the deadlock condition. The third fault allowed more than one reader to check out the same book from the same library. To correct this, the participants had to add per-object self-exclusive coordination on the checkOut() method of the Library class so that only one reader could check out a book at a particular library at a time.

23.2.4. Results

In both the AspectJ and Java groups, all pairs of participants were able to find and correct all three of the faults. We analyzed videotapes of the ninety minute sessions to extract both qualitative and quantitative data elements such as the participant's views, the time taken, and the number of builds.^[4] We first discuss each data element in isolation, and then correlate and summarize the results.

^[4] The raw data for the quantitative data elements is available at http://www.cs.ubc.ca/labs/se/projects/aop/. The appendix contains tables of the quantitative data referred to in this paper.

Time

The times required to correct each of the three faults are shown in Figure 23-2A. In this (and following) figures, a bar, shaded according to the language being used, is shown for each participant and for each assigned task. From Figure 23-2A, we can see that the largest difference in completion times was with respect to the first fault: the AspectJ teams clearly repaired the fault faster than the Java ones. For the second and third faults, there was a smaller difference.

Figure 23-2. Debugging results.

Switching Between Files

We examined the number of times the pairs switched the file they were examining to determine if the AspectJ users were affected by the coordination specification residing in a different file from the rest of the code. Figure 23-2B shows that the AspectJ pairs typically made fewer file switches than the Java group for fault 1, more for fault 2 and slightly less for fault 3.

Instances of Semantic Analysis

Figure 23-2C highlights the difference in the number of instances of semantic analysis over the sessions. To determine the number of instances of semantic analysis, we recorded the number of times participants said something to the effect of "let's find out what this does. . . ." The data indicates that the Java pairs more often analyzed the behavior of the code than the AspectJ pairs. In the AspectJ session with the most instances of semantic analysis, the group members openly disagreed as to how much semantic analysis was necessary to solve the second fault:

A: . . . we know it's in the COOL file . . .

B: But we have to know what they do before changing anything.
AspectJ Pair 2

Builds

Overall, the AspectJ and Java pairs spent roughly equal time building and executing their program. The additional time required for weaving AspectJ programs was negligible. The number of builds per fault ranged from one to nine.

Concurrency Granularity

The Java users specified synchronization constraints by inserting statements about lock objects into methods. Working at the statement level meant that the Java users could attempt to synchronize parts of methods. To alter the concurrency granularity, the AspectJ pairs would have had to have changed the structure of the existing methods. To determine the instances when Java users considered finer granularity locking, we noted when the users attempted to move locks around within a method. Only two of the Java pairs investigated locking granularity. Java pair 1 investigated locking granularity twice in the first fault, twice in the second fault, and once in the third fault. Java pair 3 investigated this once in the third fault. None of the AspectJ participants questioned the granularity imposed by Cool.

Participants' Comments

At the end of the trials, two of the three AspectJ pairs expressed enthusiasm in support of separating the coordination code, and described that the separation directly contributed to their ability to solve the faults.

It meant that since [the problems] were just synchronization problems we just had to look at the parts that were related to synchronization. We could have spent lots of time looking at the non-synchronization parts, at one point we did look briefly, but it was clear there was nothing about synchronization in that code, and the only way to deal with synchronization was to look in the Cool files.
AspectJ Pair 2

The other group, however, expressed that Cool provided a handy way of summarizing coordination of and between methods, but were unhappy with the physical separation of the coordination code.

The only place I can see there could be an advantage is if you know that you have some modules you are working with that are tested and you are sure you can limit the faults to synchronization issues in which case you don't really have to understand the code.
AspectJ Pair 3

This pair would have opted instead for the Cool code to have been inserted in pertinent places throughout the code so that the programmer could see in one glance both the coordinator and the method at the same time. Interestingly, although this pair perceived that the separation provided by Cool caused them to look at many files to gain context, this pair switched less between files in total than any of the Java pairs.

Analysis of Results

The three debugging tasks can be categorized two ways: according to the addition or deletion of concurrency functionality, and according to whether there were localized or non-localized reasoning requirements. We say a fault required localized reasoning if the code responsible for the fault was modularized (e.g., part of one class). Non-localized reasoning meant participants would have to look across modularity boundaries for the problem.

The first fault's solution required localized reasoning and the deletion of synchronization code. In this fault, the AspectJ pairs were able to solve the fault faster than the Java pairs. They did so with fewer file switches, and with fewer instances of semantic analysis. This points out that the AspectJ pairs were able to more quickly isolate and remedy the problem causing the fault.

Solving the second fault required non-localized reasoning and involved deleting synchronization code. For this fault, the AspectJ pairs were somewhat faster than the Java pairs, and completed the task with slightly fewer file switching and marginally fewer instances of semantic analysis. Clearly, the AspectJ participants did not benefit as much from the use of Cool as they did in the first debugging task when the reasoning required for the problem was more localized.

Fixing the third fault required localized reasoning and involved adding synchronization code. In this fault the AspectJ pairs generally finished faster and performed somewhat fewer file switches. Two of the Java pairs performed significant numbers of instances of semantic analysis, while two of the AspectJ pairs performed none. This may have occurred because the Java participants had to perform analysis to understand how to add locking functionality, whereas using the Cool syntax required less analysis.

To summarize, when the solution to a problem required localized reasoning, Cool helped programmers focus their efforts. However, when the solution required non-localized reasoning, Cool did not provide as clear a benefit. This was regardless of whether functionality was being added or deleted.

23.2.5. Experiment 2: Ease of Change

The intent of this experiment was to investigate whether the separation of concerns provided in aspect-oriented programming enhanced a programmer's ability to change the functionality of a multi-threaded, distributed program. Our hypothesis was that the AspectJ combination of JCore for the component programming, Cool for synchronization, and Ridl for specifying data transfers would make it easier to change such programs compared to a similar program written in Emerald.

23.2.5.1 Format

In this experiment, the participants worked alone. They were asked to address each of three change tasks sequentially. In the first task, participants were asked to add the ability for a reader to check books back into the library after checking them out. The solutions to this problem generally involved adding a method to check books back in, synchronizing that method, and calling it from somewhere within the main program loop. The second task was to assign one library to randomly reject a reader's request to check out a book. Adding this functionality required the determination of a library to make the denial decision, and the addition of a check in the main library code to ask that library if the reader's request for a check-out should be granted. The third task involved enhancing the performance of the code. Here the participants could find and fix any performance lag. However, to try to direct their approach, we seeded an inefficiency into the code: readers read the book byte-by-byte, requiring many messages to be sent if the book were located remotely. Enhancing the performance in this case meant ensuring that the appropriate book object was on the same host machine as the reader reading it. Because the third task was intended to be more open-ended and exploratory than the other two, we treat it separately in the discussion of results below.

The code in the solutions of each of the three tasks did not affect each other.

23.2.5.2 Results

We examined the performance of the participants by examining their approach to solving the problem. In particular, we analyzed the videotape for such data elements as the time spent analyzing the code base versus the time spent writing their solutions. We considered both the absolute time and the proportion of the total time spent on each activity. We also looked at the pattern of activities over time. Finally, we examined the code written by the participants. We describe these data elements individually before synthesizing the results.

Time

Figure 23-3A shows the completion times for the six participants for the first and second tasks. The Emerald participants typically had faster completion times than the AspectJ participants.

Figure 23-3. Change Results.

Portion of Time

To investigate what could account for the time differences between the Emerald group and the AspectJ group, we examined how the participants spent their time. Figures 23-3B and 23-3C show, as a percentage of total time, how much time was spent coding and analyzing for each activity for each participant. The typical percentage of time spent on coding was slightly greater for the AspectJ trials, while that spent on analysis was greater for the Emerald trials. The remaining percentage of time was spent on a combination of compiling and running the program. The AspectJ participants spent slightly more absolute time observing the program run.

Patterns of Activities

Emerald participants typically began their tasks with extended periods of analysis while AspectJ participants typically began extensive coding attempts with little or no prior analysis.

Code Written

The AspectJ participants wrote between 50 and 150 lines of code, of which two to six lines were Ridl code, and two to three lines were Cool code. The Emerald participants wrote between 50 and 80 lines of code, of which two to four lines were synchronization code, and one to three lines pertained to the movement of objects.

Task 3: Performance Enhancement

While each Emerald participant successfully made at least one modification to the program that led to a performance enhancement, only two AspectJ participants had sufficient time to attempt this task, and only one of these successfully improved the performance.

AspectJ does not support the identical pass-by-reference semantics of Java^[5] because object replication is not automatic. Assuming that JCore is effectively Java can lead to surprises when Ridl specifications violate the implied semantics of the JCore code: when a change is made to an object, the change is not propagated to remote copies of that object. AspectJ participant 2, who unsuccessfully attempted this task, encountered and recognized this difficulty.

^[5] Although Java is strictly pass-by-copy, since objects can only be accessed via a reference type, objects are effectively passed-by-reference (primitive types are not objects in Java). In other words, if an object is passed to a method, changes made to the object are visible outside that method.

Since Ridl version 0.1 is implemented on top of Java RMI, copying an object requires that its class implement the Serializable interface; to pass a global reference to an object requires that its class provide a remote interface. This causes a serious catch-22 for standard Java library classes that implement neither. The alternative to using such library classes is to encapsulate their instances within "remoteable" versions; however, the JCore code would then need to specify explicitly that the "remoteable" version be used. Although we provided such "remoteable" versions for a few classes that we suspected would be needed to complete tasks 1 and 2, the open-ended nature of task 3, combined with the expensive nature of reimplementing most of the standard Java library, prevented us from providing a sufficiently complete set of such versions. Both AspectJ participants encountered problems when the remote interface specifications they attempted involving library classes either failed to compile or caused unexpected run-time exceptions.

Participants'Comments

When the two AspectJ participants who attempted the performance-enhancing task were asked at trial-end what difficulties they had encountered, they both believed that the amount of code analysis required to express a concern in Ridl was a factor. One of these participants noted that the separation between Ridl and JCore was not as clean as desired.

I get the feeling that Cool is pretty close to capturing synchronization but that Ridl has a way to go . . . it's too meshed with Java
AspectJ Participant 3

This participant believed that it may be more difficult to separate object mobility issues from the core functional code than separating synchronization issues. The participant characterized object mobility issues as the location of an object at a particular execution point and how an object is passed into a method. These participants claimed that in order to understand how objects were moving around in the system, it was necessary to thoroughly understand the core semantics that supported the Ridl file.

Analysis of Results

The Emerald participants were able to implement more, faster than the AspectJ participants.

The patterns of activity for the AspectJ participants showed a heavy emphasis on coding quite early in their tasks, as compared to those for the Emerald participants. This may point to the fact that massaging the Ridl code seemed like a quick way to solve object-mobility problems when, in fact, it was not. Interestingly, AspectJ participant 1, who successfully attempted task 3, did spend more time on analysis of the core semantics than on coding, while AspectJ participant 2, who unsuccessfully attempted task 3, showed the opposite distribution. This observation lends some credence to the participants' claims that in-depth analysis of the core semantics is required to correctly express a concern in Ridl.

The fact that AspectJ participants spent slightly more absolute time observing the program run could also suggest that they perceived that tinkering with the Ridl code and watching the program run would keep them from having to deal with the core semantics of the program.