ROUGH SET THEORY | Data Mining: Opportunities and Challenges

data mining: opportunities and challenges

Chapter VI - Data Mining Based on Rough Sets
Data Mining: Opportunities and Challenges
by John Wang (ed)
Idea Group Publishing 2003


	Brought to you by Team-Fly

Rough set theory was created as a tool to handle inconsistent data. This section presents the fundamentals of the original rough set theory. The complete description of the theory may be found in Pawlak (1991) (see also Grzymala-Busse, 1995).

Global Coverings

We are assuming that the input data set is in a form of a table. Rows of the table are called cases (or examples). Columns are labeled by attributes and a decision. An example of such a table is presented in Table 1.

Table 1: Consistent data set
	`Attributes`				`Decision`

	`Home`	`Boat`	`Credit_Card`	`Age`	`Loan_Application`

`c1`	`expensive`	`no`	`yes`	`old`	`approved`
`c2`	`middle`	`yes`	`no`	`old`	`rejected`
`c3`	`expensive`	`yes`	`no`	`old`	`approved`
`c4`	`cheap`	`yes`	`yes`	`young`	`rejected`
`c5`	`middle`	`yes`	`yes`	`middle`	`approved`
`c6`	`middle`	`no`	`no`	`old`	`rejected`
`c7`	`middle`	`no`	`yes`	`young`	`approved`
`c8`	`expensive`	`no`	`no`	`young`	`rejected`

Table 1 depicts a simplified data base showing eight homeowners (cases) applying to a bank for a loan to buy a car. Any such table defines an information function ρ that maps the set of all ordered pairs (case, attribute) into the set of attribute values. For example, ρ (c1, Home) is equal to expensive.

One of the fundamental ideas of rough set theory is the relation of the set of all cases implied by a subset P of the set A of all attributes, called an indiscernibility relation, and denoted by IND(P). For any two cases c and c′, the relation indiscernibility is defined as follows:

Obviously, IND(P) is an equivalence relation, so it may be represented by the partition on U induced by IND(P). This partition will be denoted by U/IND(P). Two cases c and c′ belong to the same set of the partition U/IND(P) if and only if (c, c′) ε IND(P). For example,

and

Sets belonging to the partition U/IND(P) are called P-elementary sets or blocks. Elements c1 and c3 belong to the same {Home}-elementary set (or block) of U/ND({Home}) because the value of variable Home is the same for both c1 and c3.

In a special case where set P contains only one variable, the decision, usually denoted by d, the corresponding sets that constitute U/IND(P) are called concepts. In Table 1, we distinguish two concepts: {c1, c3, c5, c7}, and {c2, c4, c6, c8}. The former concept consists of all accepted applicants, the latter concept consists of all rejected.

The basic problem is how to determine a subset P of the set A describes all concepts. In different words, our problem is whether a subset P of A is sufficient to distinguish all concepts. For example, the single attribute {Home} is not sufficient, since Home has the same value expensive for c1 and c8, yet c1 and c2 belong to two different concepts. Moreover, a subset {Home, Boat} is not sufficient either, since both have the same values for c1 and c8 (expensive, no), and c1 and c2, as observed before, belong to two different concepts. On the other hand, the set of all attributes: {Home, Boat, Credit_Card, Age} is sufficient, since values of all four attributes are unique for corresponding concepts. The value vector (expensive, no, yes, old) of attribute vector (Home, Boat, Credit_Card, Age) characterize only one case, c1, that belongs to the concept approved loan applications. We say that {d} depends on the subset P = {Home, Boat, Credit_Card, Age}. The necessary and sufficient condition for {d} to depend on P is

The sign "≤" in the above expression concerns partitions. For partitions π and τ on U, π ≤ τ if and only if for each block B of π there exists a block B′ of τ such that B ⊆ B′. In other words, {d} depends on P if and only if every concept defined by {d} is the union of some P-elementary sets (or blocks of U/IND(P)).

In our example,

or, {Loan_Application depends on {Home, Age}. It is not difficult to check that U/IND({Home, Age} ≤ U/IND({Loan_Application}: every block of U/IND({Home, Age}) is a subset of corresponding block from U/IND({Loan_Application}).

The minimal subset P such that {d} depends on P is called a global covering of {d}. The global covering is also called relative reduct (Pawlak, 1991). In the example in Table 1, there exist precisely two global coverings of Loan_Application: {Home, Age} and {Home, Credit_Card, Boat}. Algorithms for finding the set of all global coverings was published in Grzymala-Busse (1991).

Algorithm for Finding the Set of All Global Coverings

The aim of the algorithm is to find the set C of all global coverings of {d}. The cardinality of the set X is denoted |X|. Let k be a positive integer. The set of all subsets of the same cardinality k of the set A is denoted P_k, i.e., P_k = {{xi1, xi2, , xik} | xi1, xi2, , xik ε A}.

Algorithm Find_the_set_of_all_global_coverings;         begin                   C : =  ;                   for each attribute x in A do                            compute partition U/IND({x});                   compute partition U/IND({d});                   k : = 1;                   while k ≤ |A| do                           begin                                                 for each set P in P_k do                                                          if (P is not a superset of any member of C) and                                                          (  U/IND({x}) ≤ U/IND({d}))                                                          then add P to C;                                                 k : = k+1                   end {while}         end {procedure}.

Time complexity of the algorithm for finding the set of all coverings of R in S is exponential.

Local Coverings

In the definition of global covering, all involved attributes and decision are considered globally, i.e., for all cases. Here we will introduce a local covering, defined by variable-value pairs. Let x be a variable (an attribute or a decision), and let v be a value of x. The block of a variable-value pair (x, v), denoted [(x, v)], is the set of all elements of the universe U that for variable x have value v. Thus, the concept is a block of [(d, w)] for some value w of decision d. For example, in Table 1, the block of (Home, expensive) is {c1, c3, c8}.

Let B be a subset of the universe U. Let T be a non-empty set of attribute-value pairs, where all involved attributes are different. The block of T, denoted [T], is defined as

Let B be the set {c1, c3, c5, c7}, that is, B is the block of (Loan_Application, approved}. In the example from Table 1, let X be the set {(Home, expensive), (Boat, no)}. The block [X] of X is equal to

Thus B = {c1, c3, c5, c7} does not depend on X. On the other hand, for Y equal to {(Home, expensive), (Age, Old)}, the block [Y] of Y is equal to

so B = {c1, c3, c5, c7} depends on {(Home, expensive), (Age, Old)}.

We say that B depends on a set T of attribute-value pairs if and only if [T] ⊆ B. Set T is a minimal complex of B if and only if B depends on T and no proper subset T′ of T exists such that B depends on T′.

The set Y = {(Home, expensive), (Age, Old)} is a minimal complex of B = {c1, c3, c5, c7}, since B does not depend on any subset of Y, because

and

However, there exist more minimal complexes of B. For example, Z = {(Home, Middle), (Credit_Card, yes)} is another minimal complex of B, because

Let T be a non-empty family of non-empty sets of attribute-value pairs. Set T is called a local covering of B if and only if the following three conditions are satisfied

each member T of T is a minimal complex of B,
, and
T is minimal, i.e., no subset of T exists that satisfies conditions (1) and (2).

For the example in Table 1, the set {Y, Z}, where Y = {(Home, expensive), (Age, Old)} and Z = {(Home, Middle), (Credit_Card, yes)} is a local covering of B = {c1, c3, c5, c7}. The local covering is also called value reduct (Pawlak, 1991).

Lower and Upper Approximations of a Set

Real-life data are frequently inconsistent, i.e., data that contain conflicting cases. Two cases are conflicting when they are characterized by the same values of all attributes, but they belong to two different concepts. For example, Table 2 presents inconsistent data. Cases c8 and c9 are conflicting.

Table 2: Inconsistent data set
	`Attributes`				`Decision`
	`Home`	`Boat`	`Credit_Card`	`Age`	`Loan_Application`
`c1`	`expensive`	`no`	`yes`	`old`	`approved`
`c2`	`middle`	`yes`	`no`	`old`	`rejected`
`c3`	`expensive`	`yes`	`no`	`old`	`approved`
`c4`	`cheap`	`yes`	`yes`	`young`	`rejected`
`c5`	`middle`	`yes`	`yes`	`middle`	`approved`
`c6`	`middle`	`no`	`no`	`old`	`rejected`
`c7`	`middle`	`no`	`yes`	`young`	`approved`
`c8`	`expensive`	`no`	`no`	`young`	`rejected`
`c9`	`expensive`	`no`	`no`	`young`	`approved`

Let [x]P denote the p-elementary set of U/IND(P). Any finite union of elementary sets of P is called a definable set by P. Let B be a concept. For inconsistent data, in general, B is not a definable set in P. However, set B may be approximated by two definable sets in P; the first one is called a lower approximation of B in P, denoted by PB and defined as follows

The second set is called an upper approximation of B in P, denoted by and defined as follows

The lower approximation of B in P is the greatest definable set by P, contained in B. The upper approximation of B is the least definable set by P containing B. A rough set of B in P is the family of all subsets of U having the same lower and the same upper approximations of B in P.

For Table 2,

the lower approximation of B = {c1, c3, c5, c7, c9} in A = {Home, Boat, Credit_Card, Age} is the set {{c1, c3, c5, c7}, the upper approximation of B = {c1, c3, c5, c7, c9} in A is the set {c1, c3, c5, c7, c8, c9}, and the rough set of B in A is the set {{B}}.

For inconsistent data, lower and upper approximations of a set may be used for defining two different kinds of set membership: certain and possible. Say that the concept B is not definable by the subset P of the set A of all attributes. An element x ε U is said to be certainly in B if and only if An element x ε U is said to be possibly in B if and only if Both definitions have straightforward interpretation. We have to decide whether x is or is not in B on the basis of accessible information, i.e., attributes from P. This means that we may describe the membership of x in terms of members of U/IND(P). On the other hand, both sets, and PB are constructed from U/IND(P). Moreover,

so if B then x is certainly in B. However, if , then x does not need to be in B. If B is not definable, , and x may be a member of and not a member of B. Thus x is only possibly in B.

In our example, for B = {c1, c3, c5, c7, c9} and A = {Home, Boat, Credit_Card, Age},

However, and c8 ∉ B.

A few numbers characterizing a degree of uncertainty may be associated with a rough set. These numbers, sometimes useful for evaluation, have limited value, because the most important tool of rough set theory are sets lower and upper approximations of a given concept B, a subset of the universe U.

There are two universal measures of uncertainty for rough sets, introduced in Grzymala-Busse (1987) (see also Grzymala-Busse, 1991). We will assume that P is a subset of the set A of all attributes. The first measure is called a quality of lower approximation of B in P, denoted B, and equal to

In Pawlak (1984), B was called simply quality of approximation of B in P. The quality of lower approximation of B in P is the ratio of the number of all certainly classified elements of U by attributes from P as being in B to the number of elements of U. It is a kind of relative frequency. There exists a nice interpretation of Bin terms of evidence theory, also called Dempster-Shafer theory (see Shafer, 1976). With this interpretation, Bbecomes a belief function, as it was observed for the first time in Grzymala-Busse (1987).

The second measure of uncertainty for rough sets, a quality of upper approximation of B in P, denoted B, is equal to

The quality of upper approximation of B in P is the ratio of the number of all possibly classified elements of U by attribute from P as being in B to the number of elements of U. Like the quality of lower approximation, the quality of upper approximation is also a kind of relative frequency. It is a plausibility function of evidence theory (Grzymala-Busse, 1987).


	Brought to you by Team-Fly