11.2 Practical examples of image retrieval

In this section some of the methods described in the previous sections are used to demonstrate practical visual information retrieval from image databases. I have chosen texture and shape-based retrievals for demonstration purposes, since those of colour and motion-based methods are not easy to demonstrate in black and white pictures and within the limited space of this book.

11.2.1 Texture-based image retrieval

Spatial frequency analysis of textures provides an excellent way of classifying them. The complex Gabor wavelet which is a modulated Guassian function to a complex exponential is ideal for this purpose [3]. A two-dimensional complex Gabor wavelet is defined as:

(11.1)

where σ_x and σ_y are the horizontal and vertical standard deviations of the Gaussian and f₀ is the filter bandwidth. Thus its frequency domain representation (Fourier transform) is given by:

(11.2)

where u and v are the horizontal and vertical spatial frequencies and σ_u and σ_v are their respective standard deviations.

The g(x, y) of eqn. 11.1 can be used as the mother wavelet to decompose a signal into various levels and orientations. In Chapter 4 we showed how mother wavelets could be used in the design of discrete wavelet transform filters. The same procedure can be applied to the mother Gabor wavelet.

Figure 11.3 shows the spectrum of the Gabor filter at four levels and six orientations. It is derived by setting the lowest and highest horizontal spatial frequencies to u_l = 0.05 and u_h = 0.4, respectively. The intermediate frequencies are derived by constraining the bands to touch each other.

click to expand
Figure 11.3: Gabor filter spectrum; the contours indicate the half-peak magnitude of the filter responses in the Gabor filter dictionary. The filter parameters used are u_h = 0.4, u_l = 0.05, M = 4 and L = 6

The above set of filters can decompose an image into 4 × 6 = 24 subimages. As we had seen in Chapter 4, each subimage reflects characteristics of the image at a specific direction and spatial resolution. Hence it can analyse textures of images and describe them at these orientations and resolutions. For the example given above, one can calculate the mean and standard deviation of each subimage, and use it as a 48-dimensional vector to describe it. This is called the feature vector that can be

used for indexing the texture of an image and is given by:

(11.3)

To retrieve a query texture, its feature vector is compared against the feature vectors of the textures in the database. The similarity measure is the Euclidean distance between the feature vectors, and the one that gives the least distance is the most similar texture.

Figure 11.4 demonstrates retrieving a texture from a texture database of 112 images. In this database, images are identified as D1 to D112. The query image, D5, is shown at the top left, along with the 12 most similar retrieved images, in the order of their similarity measure. As we see, the query texture itself, D5, is found as the closest match, followed by a visually similar texture, D54, and so on. In the Figure the similarity distance, sm, of each retrieved candidate texture is also given.

click to expand
Figure 11.4: An example of texture-based image retrieval; query texture— D5

11.2.2 Shape-based retrieval

Shapes are best described by the strength of their curvatures along their contours. A useful way of describing this strength is the curvature function, k(s, θ), defined as the instantaneous rate of change of the angle of the curve (tangent) θ over its arc length s:

(11.4)

At sharp edges, where the rate of change of the angle is fast, the curvature function, k(s, θ), has large values. Hence contours can be described by some values of their curvature functions as feature points. For example, feature points can be defined as the positions of large curvature points or their zero crossings. However, since contours are normally noisy, direct derivation of the curvature function from the contour can lead to false feature points.

To eliminate these unwanted feature points, contours should be denoised through smoothing filters. Care should be taken on the degree of filter smoothness, since heavily filtered contours lose the feature points and lightly filtered ones cannot get rid of the false feature points. Large numbers of feature points also demand more storage and heavy processing for retrieval.

Filtering a contour with a set of Gaussian filters of varying degrees of smoothness is the answer to this question, the so-called scale-space representation of curves [4]. Figure 11.5 shows four smoothed contours of a shape at scaling (smoothing) factors of 16, 64, 256 and 1024. The positions of the curvature extremes on the contour at each scale are also shown. These points exhibit very well the most important structure of each contour.

click to expand
Figure 11.5: A contour and the positions of its curvature extremes at four different scales

For indexing and retrieval applications, these feature points can be joined together to approximate each smoothed contour with a polygon [5]. Moving round the contour, the angle of every polygon line with the horizontal is recorded, and this set of angles is called the turning function. The turning function is the index of the shape that can be used for retrieval.

The similarity measure is based on the minimisation of the Euclidean distance between the query turning function and the turning functions of the shapes in the database. That is:

(11.5)

where is the ith angle of the query turning function and is the ith angle of the jth shape turning function in the database and N is the number of angles in the turning function (e.g. number of vertices of the polygons or feature points on the contours). Calculation of the Euclidean distance necessitates that all the indices should have an equal number of turning angles in their turning functions. This is done by inserting additional feature points on the contours such that polygons have N vertices.

Insertion of new feature points on the contour should be such that they should also represent important curvature extremes. This is done by inserting points on the contour, one at a time, where the contour has its largest distance from the polygon. Figure 11.6 shows the polygons of Figure 11.5 with the added new vertices. The number of total vertices is chosen such that approximation distortion of the original contour with a polygon is less than an acceptable value.

click to expand
Figure 11.6: New polygons of Figure 11.5 with some added vertices

To show the retrieval efficiency of this method of shape description, Figure 11.7a shows a query shape, to be searched in a database of almost 1100 marine creatures. The query marine is found as the closest shape, followed by three next closet marine shapes in the order of their closeness as, shown in Figure 11.7.

click to expand
Figure 11.7: a the query shape, b, c and d the three closest shapes in order

11.2.3 Sketch-based retrieval

In the above two examples of image retrieval, it was assumed that the query image (texture or shape) is available to the user. However, there are occasions when these query images may not be available. Users might have a verbal description of objects of interest, or have in their minds an idea of a visual object.

One way of searching for visual objects without the query images is to sketch the object of interest and submit it to the search engine. Then, based on a set of retrieved objects, the user may iteratively modify his sketch, until he finds the desired object. Assume that the fish in Figure 11.8 C1 is the desired shape in a database. The user first draws a rough sketch of this fish, as he imagines it, like the one shown in Figure 11.8 A0. Based on shape similarity, the three best similar shapes in the order of their similarity to the drawn shape are shown in A1, A2 and A3.

click to expand
Figure 11.8: Sketch-based shape retrieval

By inspecting these outputs, the user then realises that none of the matched fish has any fins. Adding a dorsal and a ventral fin to the sketched fish, the new query fish of BO is created. With this new query shape, the new set of best matched shapes in the order of their similarity to the refined sketch become B1, B2 and B3.

Finally, adding an anal fin and a small adipose fin to the refined sketch, a further refined query shape of CO is created. The new set of retrieved shapes, in the order of their similarity to the refined sketch now become C1, C2 and C3. This is the last iteration step, since the desired shape, C1, comes as the best matched shape.

This technique can also be extended to the other retrieval methods. For example, one might paint or add some texture to the above drawings. This would certainly improve the reliability of the retrieval system