Types of Algorithms Used for Voice Interpretation | Biometrics for Network Security (Prentice Hall Series in Computer Networking and Distributed)

< Day Day Up >

Now that we know what constitutes the voice biometric and how it can be captured, we need to know what types of algorithms are used. The algorithms used to match and enroll the voice biometric fall into the following general categories: ^[1]

^[1] The list of voice algorithms comes from Biometric Security Australia ( www.biometricsecurity.com.au/technologies/technologies.htm ).

Fixed phrase verification
Fixed vocabulary verification
Flexible vocabulary verification
Text-independent verification

Fixed phrase verification

As the name implies, the user both enrolls and verifies using a fixed phrase. This makes it easy for the user to enroll, as only one phrase may need to be repeated for enrollment. This type of verification is often viewed as simply comparing two wave forms. If they match within a tolerance, then they are assumed to be the same person and access is granted. The matching of the two wave forms is normally done using dynamic time warping.

Dynamic time warping is used to prepare for a comparison. An explanation of it is included here as background information. The algorithm attempts to solve the problem of comparing a reference template to a comparison template when the cadence of a phoneme is different. It accomplishes this using relatively simple mathematics. By minimizing the distance between the two signals, it is hoped that the templates can be accurately compared. To do this, each signal is mapped onto a local distance matrix. This is done by taking the absolute value of two cells at the same reference time. Thus, the matrix now contains an array of relative distances between the two signals. Next, an accumulated distance matrix is created. In doing so, a representative value is placed in each cell that is made up of its relative value and the lowest value of its nearest neighbor from the local distance matrix. Once the accumulated distance matrix is created, the shortest path is calculated. Once this path is determined, it can be used as a warping function to compare the two signals. In this way, the signals are now relatively time-synchronized for comparison. ^[2]

^[2] Ibid .

Fixed vocabulary verification

Fixed vocabulary verification is based on the user's being enrolled and verified from a known pool of words. This pool of words is usually made up of the digits 0 through 9, and some other randomly related words. For the user to enroll, each word in the vocabulary is repeated so that a unique user model is created. When it is time for the user to verify, he/she is prompted with a random subset of the vocabulary. When the live template is compared to the enrolled template, a match is determined based on the breakdown of each word in the vocabulary relative to the enrolled word model. The match of each word model is summed to generate a value for determining a match.

Flexible vocabulary verification

Flexible vocabulary verification is based on the user's being able to use any word in a given lexicon for authentication. To accomplish this, the user is required to repeat a series of words from the lexicon that covers all the phonemes used in the lexicon. Not only does there have to be coverage of the entire set of phonemes, but the phonemes must also be tested in conjunction with each other. When the user needs to authenticate, he/she speaks any word or words from the lexicon. The words are then broken down into their individual phoneme components and compared.

Text-independent verification

Text-independent verification offers the promise of freedom to use any chosen phrase or words for authentication. To enroll, the user is free to say anything. Thus, when the user goes to verify, he/she is verified against all the other speaker models that have been created. Since this technique is generally considered weak, it is used in conjunction with continuous speaker monitoring. The user is constantly measured against all the other models. Since this method relies on continuous speech, it is not a useful method for biometric network security. It is included here for the sake of completeness.

Which Algorithm Is Best?

The decision of which algorithm would be best for voice biometrics should be based on a balance between convenience and security. Each algorithm has its own inherent tradeoffs. Thus, selecting which one to use is a risk management decision. If a company is more concerned with user convenience, then it should pick an algorithm that is easy to use and enroll. If a company is more concerned with security, then an algorithm that requires more in-depth enrollment and a wider variety of words/phrases for authentication should be considered.

Recommended Voice Algorithm

In this case, the algorithm I recommend offers a good tradeoff between convenience and security. Fixed vocabulary verification can use any word in the lexicon by itself or in combination with others. It does require the user to do a one-time enrollment for each word. This enrollment may require the user to repeat each word three to ten times, depending on the underlying algorithm implementation. Though this can be tedious , it is done only once, and then it can be leveraged many times over.

< Day Day Up >