Section 22.3. DATA

22.3. DATA

We analyzed the data from each case both quantitatively and qualitatively. Here, we present a summary of the data collected. We synthesize results from the study as a whole in the next section.

In our reporting of the data and results, we distinguish the participants based on the tool they applied. For example, the participant applying the AMT tool is indicated by P_AMT. As described earlier, each participant was an expert, meaning they had prior knowledge of the code base for one case. P_FEAT was the expert for the Jex case, P_AMT was the expert for the AMT case, and P_AB was the expert for the AspectBrowser case.

22.3.1. Strategies Used

For the purpose of this study, each participant recorded information about how they used their tool and reported high-level information about the strategies they used. Table 22-1 summarizes the overall strategies used for each case. In the table, we use the term dependence to describe the following of a dependence, such as a call from a caller to a callee. We use the term reference to describe the reverse, such as determining all callers of a callee.

Table 22-1. Strategies Used by Programmers in Identifying Concern Seeds and in Elaborating from Seed
		Jex	AMT	AspectBrowser
P_FEAT	Seed	Knowledge of the application.	Program entry points, plausible name of file.	Word from task description, string shown in GUI.
	Search	Mostly followed dependences. Less frequently, followed references.	Almost exclusively followed dependences. Focused on one part of change at a time.	Followed references to fields. For methods returned, followed both references and dependences. Focused on a class at a time.
P_AMT	Seed	Feature programmer's name, word from task description.	Knowledge of application.	Word from task description.
	Search	Followed call dependences and references to data structures encountered.	Read code around search matches or where he knew changes had to occur.	Breadth-first investigation of reference/dependency graph. Recorded calls for later investigation.
P_AB	Seed	Plausible name of file, entry point, coding reading.	Program entry point.	Knowledge of application.
	Search	Read code and followed pertinent dependence (call) chains.	Code reading and following of dependencies.	Followed dependences (calls).

Despite the differences in the tools used, the participants generally followed a strategy of finding a plausible seed for the concern and traversing from that seed the reference/dependence graph to look for related code. Most commonly, the dependences traversed were call dependences.

In general, seeds were chosen based on prior knowledge of the system or through an intelligent guess. When a participant lacked both knowledge of the system and a basis on which to make an intelligent guess, they often resorted to the brute force method of finding the main classidentified by documentation or the presence of a main methodand then traversing the control flow from there. At the other extreme, to find a seed for the AspectBrowser case, P_FEAT launched the subject tool and used some text out of the traversal feature's tool tip in a text search to find the code that created the tool tip.

Table 22-2 summarizes the number and type of queries performed by the developers in each case. Not surprisingly, prior knowledge was beneficial in concern elaboration. Experts uniformly performed the fewest queries on their own system, even overcoming differences in system size.

Table 22-2. Number and Types of Queries Performed by Programmers
	Jex	AMT	AspectBrowser
P_FEAT	53 queries: 19 fan-out, 15 fan-in, and 19 class.	81 queries: 38 fan-out, 29 fan-in, 14 class.	183 queries: 85 fan-out, 69 fan-in, and 29 class.
P_AMT	15 queries: 10 lexical, 3 lexical and type, 1 type only.	3 queries; 3 lexical.	15 queries: 4 lexical, 8 lexical and type, 3 type only.
P_AB	19 queries.	19 queries.	12 queries.

The searches performed by the AMT and AspectBrowser participants tended to be of three kinds: (a) explicit naming a specific entity or type, (b) a substring intended to indirectly identify a variety of objects, and (c) finding a comment left behind by the participant. Explicit naming was done when the participant knew what he wanted, as in traversing the control flow.^[1] This kind of search tended to be performed later in a task. For example, at the end of the AspectBrowser task, P_AMT used explicit naming, searching for moveCursor, to ensure his changes were complete. In contrast, at the beginning of the AspectBrowser task, P_AMT performed a substring search for traversal because the task had described an evolution of the traversal feature of AspectBrowser. The third kind of search, for a comment left behind by a participant, was used by P_AB to do some of the bookkeeping required for the study (e.g., // code mod). Programmers sometimes do similar marking to keep track of buggy or questionable code (e.g., // BUG).

^[1] P_AB could have chosen to use the "etags" feature of Emacs for many of these traversals but did not.

Overall, the vast majority of queries in AMT and AspectBrowser were strings meant to locate identifiers, whether by lexical or type matching. In particular, there was little attempt to match on the actual syntax of the program above the identifier level. Although we have seen more aggressive syntax matching in other studies, such as the inclusion of parentheses in string patterns to find method calls [10], we note that the common case pattern was rather simple in our three cases.

Given a seed, FEAT directly supports traversals of references and dependences. The strategy of the participant can thus be seen more through the direction of the query than the content of the query. P_FEAT tended to follow dependences more than references. A FEAT query follows only one link of the dependence/reference graph as compared to AMT and AspectBrowser queries, which may result in a number of hits across the code base. Consequently, a larger number of FEAT queries were executed in each case compared to the other tools.

22.3.2. Concern Code Identified

For each case, each participant listed the lines of code that he considered to belong to the concern of interest. We compared the concern code identified by each. When the code differed, we questioned the participants as to why they had or had not included the code. The answers to these questions were used to determine which code had been falsely identified as being part of the concern (false positives) and which code should have been identified but had not been (false negatives).

Table 22-3 provides a quantitative overview, at the class-level, of the false positive and false negative code in each case. This table provides a sense of the quantitative differences among the participants' results, showing the significant divergences that occurred.

Table 22-3. Class-Level Breakdown of Identified Concern Code
	Jex (542 Classes)			AMT (727 Classes)			AspectBrowser (103 Classes)
	Total	False +	False -	Total	False +	False -	Total	False +	False -
P_FEAT	9	0	1	6	1	1	11	2	1
P_AMT	5	0	5	6	0	1	15	5	0
P_AB	5	0	5	6	0	1	5	0	5

However, Table 22-3 does not show the false positives and negatives that occur at a finer grain, such as the method or statement level. For example, in the Jex case, P_AB's concern code included a number of false positives related to unnecessary program start-up code and including control flow that was not relevant. A qualitative analysis of the false positives and negatives at the line level, discussed next, provides a better understanding of the differences in the concern code that were identified by each of the participants.

22.3.2.1 False Positives

The false positivescode included in a concern that the expert deemed inappropriatecan be grouped into three categories:

Including functionality unrelated to the change.
Including test code.
Including functionality that is related, but that is unlikely to change, which we call the edges of the concern.

To make categories (a) and (c) more concrete, consider the following examples. A category (a) false positive occurred when P_AB in the Jex case included code to determine static calls. However, static calls do not need to be considered for the task of finding code related to determining the targets of virtual calls. A category (c) false positive occurred in the AspectBrowser case, when P_AMT chose to include GUI code that triggered the traversal but that did not affect the refactoring task.

When the false positives across all of the cases were discussed, the justifications for cases (a) and (c) were based on a conservative resolution of uncertainties. This reasoning is corroborated by the fact that no expert had false positivesthey had no uncertainty about the effect of the code that they had identified and had decided to include as part of the concern.

The inclusion of test driver code, case (b), actually arose as both false positives and false negatives due to the conflicting opinions of experts. The Jex expert was sure that test code did not belong in his concern; the AMT expert was sure that it did.

22.3.2.2 False Negatives

The false negativescode excluded from a concern that the expert deemed should be in itcan be grouped into five categories:

Missing functionality needed to make the change
Missing utility code that is only used by the concern and that would become dead if the concern was removed
Missing abstract code
Missing related concern code
An alternative but equivalent implementation

False negatives in categories (d) and (g) involve code that is complex, either intrinsically or because of the way that it relates to other concern code. The function of this code was thus misunderstood. As an example of a category (d) false negative, P_AB did not include code coordinating the traversal in the concern in the Aspect browser case. An example of a category (g) false negative arose in the Jex case. In this case, replacement of the typing mechanism motivates changing the logging feature so that the logs make sense with respect to the new mechanism. Although the typing mechanism does not depend on the logger for its correct functionality, the typing concern in its broad interpretation cuts into the logger. Due to this subtlety, the logger update was omitted by P_FEAT.

On some occasions, P_AMT and P_AB omitted an interface or field declaration (case (f)), in part because this code was not executable and hence was not judged part of the concern. In another case, FEAT did not show a dependence due to a similar judgment by FEAT's designer, P_FEAT. These omissions may occur because of the programmers' primary focus on executable code and their complementary process of searching through the program's control flow.

False negatives in category (h) occurred in the AMT case due to the variety of places that code could be inserted into the program to achieve the same outcome. For example, the code-age feature that was to be added as part of the task could be enabled by a button. The creation of this button can occur virtually anywhere before the AMT display is brought up, as long as it happens before the button itself is inserted into the widget hierarchy. Likewise, the field that stores this button can be declared virtually anywhere in the class, as long as it is scope-visible in all the places it needs to be accessed.

The participants sometimes justified their false negatives by noting that the compiler would have caught their omission. In fact, P_AB found his false negatives in the AspectBrowser case when carrying out the subsequent refactoring because the program would not compile after refactoring his identified concern.

22.3. DATA

22.3.1. Strategies Used

Table 22-1. Strategies Used by Programmers in Identifying Concern Seeds and in Elaborating from Seed

Table 22-2. Number and Types of Queries Performed by Programmers

22.3.2. Concern Code Identified

Table 22-3. Class-Level Breakdown of Identified Concern Code

22.3.2.1 False Positives

22.3.2.2 False Negatives