7. Experiments and Analysis

Using image retrieval as examples, we compare the three proposed discriminating transforms to the optimal two-level whitening transforms [12], and compare the kernel versions with SVM, on both synthetic data and real world image databases. The scenario is "query by example" followed by several rounds of relevance feedback by the user. The machine first learns an optimal transform, linear or nonlinear, and then all training and testing points are transformed into the new space, where the new query is the mean of the transformed positive examples, and its 20 nearest neighbors are returned for further judgment from the user.

7.1 Experiments on Synthetic Data

For the non kernel versions of FDA, MDA, and BDA, all the transform matrices are linear, and the decision boundaries are either linear or quadratic.

To test the performance of BDA versus FDA or MDA, we used some toy problems as depicted in Figure 30.4. Original data are in 2-D feature space, and positive examples are "o"s and negative examples are "x"s in the figure. In all cases, it is assumed that the number of modes for negative examples is unknown. FDA, MDA, and BDA are applied to find the best projection direction by their own criterion functions for each case, and the resulting (generalized) eigenvector corresponding to the maximum eigenvalue is drawn in solid, dash-dotted, and dashed lines, respectively. The would-be projections are also drawn as bell-shaped curves to the side of the corresponding eigenvectors, assuming Gaussian distribution for each mode.

click to expand
Figure 30.4: Comparison of FDA, MDA, and BDA for dimensionality reduction from 2-D to 1-D.

Here, FDA treats positive and negative examples equally; i.e., it tries to decrease the scatter among negative examples as part of the effort. This makes it a bad choice in the cases of (b) or (d). Without any prior knowledge about the number of classes to which the negative examples belong, MDA can only treat each example as a separate class/mode. Since MDA has in its criterion function the tendency of increasing the scatter among all classes/modes, which includes the scatter among negative examples, this makes it a bad choice for cases (a) and (c). Notice the two modes of the negative examples moved apart from each other and toward the positive examples, and BDA is able to adapt to the change and gives better class separation in both cases. MDA fails in (c), and FDA fails in (d).

In all cases, BDA yields good separation of negative examples from positive ones, as well as clustering of positive examples (it finds a balance between these two goals). Note from (c) to (d), the two negative modes move apart from each other and toward the positive ones. FDA and MDA yield unchanged results, for (c) FDA gives better separation and for (d) MDA gives better separation. BDA is able to adapt to the change and gives better separation in both cases.

FDA and MDA are inadequate in biased classification or biased dimensionality reduction problems because of their forceful assumption on the number of modes. BDA avoids making this assumption by directly modeling the asymmetry and hence gives better results.

To test the ability of the KBDA in dealing with nonlinearly distributed positive examples, six sets of synthetic data in two-dimensional space are used (see Figure 30.5). The circles are positive examples and the crosses negative. A simulated query process is used for training sample selection; i.e., the 20 nearest neighbors of a randomly selected positive point are used as training samples. The bar diagram shows the averaged hit rate in the top 20 returns. A clear boost in hit rates is observed when using KBDA.

click to expand
Figure 30.5: Test results on synthetic data.

7.2 Image Database Testing

A fully labeled set of 500 images from the Corel image set is used for testing. It consists of five classes, each with 100 images. Each image is represented by a 37-dimensional vector, which consists of 9 color features, 10 texture features, and 18 edge-based structure features [6], [13].

Each round, 10 positive and 10 negative images are randomly drawn as training samples. The hit rate in the top 100 returns is recorded. Five hundred rounds of testing are performed on all five classes, and the averaged hit rates are shown in Table 30.1. Here, QM, or query movement, a popular technique in information retrieval, is implemented by simply taking the positive centroid as the new query point and returning the nearest neighbors in the order of increasing distance. It is worth pointing out that based on the performance shown in Table 30.1, KBDA outperforms a one-class SVM-based kernel density estimator (cf. [39]).

Table 30.1: Averaged hit rate in top 100 for 500 rounds of testing.
QM	WT (μ=0.1)	BDA (μ=0.1, γ=0)	KBDA (RBF: σ=0.7)
62.7%	70.4%	74.2%	79.1%

More test results on image databases are available in [50].

7.3 Experiment on Videos

We are currently building a large video database with both numerical features and textual annotations. Thus far we could only offer some preliminary testing results, hoping that these may shed some lights on promising future research directions.

We experimented with a small set of 50 news video shots, with keyword annotations manually extracted/assigned based on the transcript. The vocabulary is fixed with 20 keywords ("Clinton," "Middle-East," "Arafat," "Israel," "advertisement," "weather," etc.), resulting in a 20-dimension sub-vector. We also manually assigned a keyword association matrix [28] (e.g., with cross-triggering among "Middle-East," "Arafat," "Israel," and "Clinton"). In a real world scenario, this matrix can be formed through either manual encoding [21], or learning from a large corpus [23], [30], or learning from long-term user interactions [28]. We used the same visual features as the ones used in our image database with 37 dimensions, extracted from the first frame of each shot. Initial testing indicates that relevance feedback algorithms such as BDA can correctly identify the proper subspace within the joint feature space of visual and textual components and thus can facilitate queries across modalities.

One run of the linear BDA algorithm would proceed as follows: the query shot was the anchorperson (similar to the one shown in Figure 30.1) reporting on President Clinton's remarks on Middle East crisis (with keywords "Clinton" and "Middle East"). With one input, the system applies Euclidean distance; and visual features apparently dominated the results: the top ten returns are all anchorperson shots, but fortunately with one of them regarding Israel Defense Force (IDF). Assumed to be interested in Middle East crisis, the user then selected the IDF shot as the second query, and labeled four of the other anchorperson shots as negative examples. By applying the BDA algorithm, the system largely ignored visual features (because of the negative examples) and weighted more on the textual components for "Middle East," "Israel," and "Clinton". The subsequent top ten returns included all the four segments on Middle East crisis, including a shot of Arafat, and a shot on a meeting between Clinton and Arafat.