Section 10.4. Evaluating Biometrics | Security and Usability: Designing Secure Systems That People Can Use

10.4. Evaluating Biometrics

Testing biometrics is complicated and requires objective comparisons.^[21] That's because biometrics authentication is not a simple yes/no process. It involves complicated statistical analysis of the incoming live signal from the biometrics device. Performance metrics should indicate how well a system performs, but it is difficult to get reliable data on particular systems, and in general, more independent testing is required.

^[21] P. J. Phillips, A. Martin, C. L. Wilson, and M. Przybocki, "An Introduction to Evaluating Biometric Systems," Computer (Feb. 2000), 5662.

To this end, a standard for biometrics testing should be created. A number of private and public testing laboratories have been set up to promote such a standard. These organizations include the U.S. National Biometric Test Center^[22] at San Jose State University, the Biometric Consortium,^[23] and programs at the International Computer Security Association,^[24] the U.S. National Institutes of Standards and Technology^[25] (NIST), and the UK National Physical Laboratory.^[26] The International Biometric Group^[27] is a private initiative offering independent testing that purports to include a usability perspective.

^[22] http://www.engr.sjsu.edu/biometrics/.

^[23] http://www.biometrics.org/.

^[24] http://www.icsa.net/.

^[25] http://www.nist.gov/.

^[26] A. J. Mansfield and J. L. Wayman, "Best Practices in Testing and Reporting Performance of Biometric Devices," NPL Report CMSC 14:2 (Aug. 2002).

^[27] International Biometrics Group. Comparative biometric testing; http://www.ibgweb.com/reports/public/comparative_biometric_testing.html.

10.4.1. Performance Metrics

The performance of a biometrics system must be high if users are to trust and accept it. A system's theoretical performance is often quoted by vendors and looks impressive, but it does not necessarily reflect the system's real-world performance. Furthermore, the methods used to measure performance can greatly affect the results. A system that performs well in the laboratory with trained, cooperative users will generate a completely different set of values with inexperienced or less cooperative users in a real-world environment.

Performance is typically measured in terms of two measures:

False accept rates (FAR). The likelihood that the wrong person will be able to access the system
False reject rates (FRR). The likelihood that a legitimate person will be denied access

The problem is the interconnection between these two measures: as FAR improves, FRR worsens, and vice versa. In fact, reported values for FAR and FRR are usually based on theoretical calculations performed with clean, high-quality data, instead of on actual observations and real-world performance. Reported values should indicate how the numbers were calculated and the basis for including and excluding users, images, and other data from the calculations.

Both the methods by which these figures are gathered and the human subjects used in their gathering seriously impact subsequent performance. The realized performance may not be as good as the predicted performance. Performance estimates are often far more impressive than actual performance.^[28] Systems tested in laboratory conditions with a small homogeneous set of "good," trained, young, cooperative users may generate completely different results than testing in a live environment with inexperienced and less-cooperative users. Many systems do not live up to expectations because they prove unable to cope with the enormous variations among large populations, or fail to take into account people's needs and behaviors.^[29]

^[28] Mansfield and Wayman.

^[29] S. G. Davies, "How Biometric Technology Will Fuse Flesh and Machine," Information Technology and People 7:4 (1994).

NIST's 2003 Fingerprint Vendor Technology Evaluation^[30] test found that poor-quality fingerprints reduced matching accuracy. The relevance of tests is therefore limited if the distribution of fingerprint quality is not known in the test sets. A variety of user factors affect fingerprint quality, however, and only some of them can be controlled by operational procedures. The poorer the quality, the higher the false rejects. This study also found that the false reject rate increased as subject age increased, particularly when the subjects were over the age of 50.

^[30] http://fpvte.nist.gov/.

Two other metrics, described next, must be considered in biometrics usability.

Failure to enroll (FTE). The FTE rate is very important, as it identifies people who can never use the system.
Failure to acquire (FTA). The FTA rate depends on the number of users that fail to generate an appropriate image when using the device. The FTA is often affected by difficult systems, or systems that require a high level of user cooperation.

FTE and FTA necessitate manual or alternative processing of that user. These figures will vary with the user base, the application area, and the biometrics system. The fallback strategy for these people must be acceptable, usable, and secure. It is essential to ensure that they do not result in a security loophole.