Instead of confining
We first define the (1+
x)-class classification problem
or
biased classification problem
as the learning problem in which there are an unknown number of classes but the user is only interested in one class, i.e., the user is biased toward one class. And the training samples are labeled by the
Much research has addressed this problem simply as a two-class classification problem with symmetric treatment on positive and negative examples, such as FDA. However the intuition is like "all good marriages are good in a similar way, while all bad marriages are bad in their own ways"; or we say "all positive examples are good in the same way, while negative examples are bad in their own ways." Therefore, it is desirable to distinguish a real
For a biased classification problem, we ask the following question instead: What is the optimal discriminating subspace in which the positive examples are "pulled" closer to one another while the negative examples are "
Or mathematically: What is the optimal transformation such that the ratio of "the negative scatter with respect to positive centroid" over "the positive within class scatter" is maximized? We call this biased discriminant analysis (BDA) due to the biased treatment of the positive examples. We define the biased criterion function
| (30.8) |
|
where
| (30.9) |
|
| (30.10) |
|
The optimal solution and transformations are of the same formats as those of FDA or MDA, subject to the differences defined by Equations (30.9) and (30.10).
Note that the discriminating subspace of BDA has an effective dimension of min{ N x , N y }, the same as MDA and higher than that of FDA.
It is well known that the sample-based plug-in estimates of the scatter matrices based on Equations (30.2)(30.5), (30.9), and (30.10) will be severely biased for a small number of training examples, in which case regularization is necessary to avoid singularity in the matrices. This is done by adding small
| (30.11) |
|
The parameter
μ
controls
The influence of the negative examples can be
| (30.12) |
|
With different combinations of the (
μ
,
γ
) values, regularized BDA provides a
BDA captures the essential nature of the problem with minimal assumption. In fact, even the Gaussian assumption on the positive examples can be further
Similar to FDA and MDA, we first solve for the generalized eigenanalysis problem with generalized square eigenvector matrix
V
associated with the
| (30.13) |
|
However, instead of using the traditional discriminating dimensionality reduction matrix in the form of Equation (30.7), we propose the discriminating transform matrix as
| (30.14) |
|
As for the transformation A, the
One generalization of BDA is to take in multiple positive clusters instead of one [45]. For example, the training set may be labeled as
red horse, white horse, black horse,
and
non horse,
then we shall
| (30.15) |
|
| (30.16) |
|
where C is the total number of clusters in positive examples, {
},
i =
1,2,
,
are the positive examples in cluster c, and
is the mean vector of positive cluster c.
If the separation of different colored horses is important to the user, we can add the corresponding scatters into equation (30.15). Note this is not MDA since again it is "biased" toward horses: the scatter of non horses are not minimized.
This generalization is