|< Day Day Up >|| |
Computers process data at a syntactic level only. For example, a computer has little understanding of the semantics behind the string "book" and therefore which is the correct interpretation for a particular instance. For the correct semantic interpretation the computer needs to understand the context in which the term is presented. An accurate comprehension of complex context is beyond the ability of computers at present and according to artificial intelligence research is still many years from realisation. The inability of the computer to understand is significant to data mining and the knowledge discovery process, as the objective is to find patterns of interest.
Identifying what is of interest is nontrivial and much research has been done within this field (Hilderman & Hamilton, 1999; Piatetsky-Shapiro & Matthews, 1991; Silberschatz & Tuzhilin, 1996). There are two classes of measures of interestingness: objective and subjective. Objective measures are based upon heuristics, where the interestingness of the pattern is defined objectively based upon a function of the discovered pattern and its associated data. Piatetsky-Shapiro and Matthews (1994) formally describe this function as follows.
The objective interestingness of a rule X → Y is defined as a function of f(X), f(Y) and f(x∧y), where f(k) is the probability that k is true.
However, objective measures fail to capture all the characteristics of pattern interestingness as heuristic measures are logically constrained (Silberschatz & Tuzhilin, 1996). An item of interest is one that incorporates characteristics of novelty, complexity, focus and usefulness. From this definition it is apparent that patterns cannot be classed as interesting through an analysis of a pattern's structure alone but must also incorporate subjective measures.
Subjective measures of interestingness depend not only upon the structure of the rule and the underlying data but also upon the user's interpretation of the pattern's representation. For example, one characteristic of an interesting rule is that it must be goal-oriented; satisfaction of this characteristic is based upon an understanding of the mining task goals. For example, if a user is trying to justify additional department funding, a pattern indicating a trend in increasing employee height would not be useful. Subjective interpretation provides semantic understanding of patterns because users have the ability to comprehend data semantics and relate them to the problem domain. This builds upon the concept of knowledge-based architectures for human computer interaction (HCI), which have explored the possibility of an implicit communication channel that in an abstract sense provides the computer with knowledge of the problem domain and objectives as shown in Figure 1 (Dix, Finlay, Abowd & Beale, 1998).
Figure 1: Knowledge based HCI (Dix '98)
Data mining algorithms can generate a large number of patterns, most of which are of no interest to the user. It is therefore essential to incorporate both subjective and objective measures of interestingness into the mining process, constraining the algorithm to an extent where only the most interesting patterns are generated. The inclusion of subjective measures requires the user to actively participate in the data mining process, creating synergy through an understanding of the data that will result in the discovery of a more concise set of interesting rules and probably decrease mining time. Participation may also promote better work ethics due to what is known as the Hawthorne effect, which states that "people tend to work harder when they sense that they are participating in something new or in something in which they have more control" (Mayo, 1945). In order for the user to participate in the mining process there must be mechanisms in place to provide for this functionality. Such mechanisms include:
One or more interfaces between the user and the mining process.
A cause and effect mapping between interaction primitives and mining process manipulation.
Mining algorithm extensions allowing for guidance of the processing through human interaction.
|< Day Day Up >|| |