Section 11.3. Overview of Previous Research

11.3. Overview of Previous Research

The first studies on the effectiveness of keystroke characteristics as personal identifiers occurred in 1977^[17] and 1980^[18] (for a fuller treatment of work prior to 1990, see Joyce and Gupta^[19]). Over the years, many different classifiers have been evaluated in an effort to improve recognition capabilities of keystroke biometrics , ranging from statistical analysis to neural networks. It is beyond the scope of this chapter to delve into the details of each approach. In general, each classifier measures the similarity between an input keystroke timing pattern and a reference model of the legitimate user's typing pattern. The model is built from training samples previously provided by each user and maintains varying characteristics depending on the classifier. The time required to generate each model also varies according to the classifier, with neural networks generally taking significantly longer than other approaches.

^[17] G. Forsen, M. Nelson, and R. Staron, "Personal Attributes Authentication Techniques," Rome Air Development Center Report RADC-TR-77-1033, Air Force Base Griffis (New York, 1977).

^[18] R. Gaines, W. Lisowski, S. Press, and N. Shapiro, "Authentication by Keystroke Timing: Some Preliminary Results," Technical Report Rand report R-256-NSF, Rand Corporation (1980).

^[19] R. Joyce and G. Gupta, "Identity Authentication Based on Keystroke Latencies," Communications of the ACM 33:2 (1990), 168176.

Table 11-1 compares the various experimental designs and techniques that have been analyzed in key published research. We include as much information as we have available from the relevant papers, though often experimental details are omitted from the primary source.^[20]

^[20] Feature vectors representing keystroke characteristics are derived from key press times, key release times, and information on which keys are being pressed. Times are usually measured in milliseconds, although the granularity can vary according to the experiment setup, and is not generally reported. Duration, or hold time, represents the time between the press of a key and the release of the same key. Digraph latency is the delay between the release of one key and the press of the next key. In early literature, the term interkey delay is also used; that term may refer to either the digraph latency or the time from the press of one key to the press of the next key. We will refer to the latter feature as key press delay to avoid confusion. Timing information between three consecutive keystrokes, known as trigraphs, has also been analyzed. Unless otherwise indicated, samples involving input errors or the use of the backspace key are not analyzed. In addition, most experiments have subjects type on a single machine and keyboard, with the notable exception being the web-based experiments that rely on Java applets to collect keystrokes.

Table 11-1. Comparison of published research, 19802004
Authors/Year	Input Data	Design	Features	Preprocessing	Classifiers	Notes
Gaines, Lisowski, Press, and Shapiro^a; 1980	Three 300400 character passages	Seven professional secretaries typed two samples each with a delay of four months between samples	Interkey delays	Used only the 87 digraphs that had at least 10 or more replications per sample and per user; eliminated outliers; took logarithm of values	Two-sample t-test on whether the means of each value were the same assuming that variances were the same	Identified five core digraphs that discriminated perfectly: in, io, no, on, and ul
Umphress and Williams^b; 1985	Fixed 1,400-character reference input, 300-character test input	17 programmers typed samples with a delay of at least one month; errors allowed	Interkey delays	Single low-pass temporal filter to remove outliers	Closeness between test value and corresponding reference value, measured according to a standard deviation threshold and a passing ratio
Leggett and Williams^c; 1988	Two samples of fixed 537-character input	36 individuals typed samples with a delay of at least one month; errors allowed	Interkey delays; mean of delays	Various; resulted in 12 different subsets of feature vectors to analyze	Closeness measure as in Umphress and Williams	Found that means of delays do not further discriminate between users; using all lowercase digraphs yielded best results
Joyce and Gupta^d; 1990	Username, password, first name, last name	33 users typed all samples in a single session	Key press delays	None	Minimum distance from reference model, with verification threshold according to each user's typing variance	Found that more experienced users were more difficult for imposters to replicate
Bleha, Slavinski, and Hussein^e; 1990	Username and fixed 32-character phrase	32 users typed samples over a period of weeks	Digraph latencies	Combined two samples into one; dimension reduction to reduce size of feature vector	Normalized minimum distance; normalized Bayesian	Applied different fixed thresholds for authentication
Leggett and Williams et al.^f; 1991	Same as 1988	Same as 1988	Interkey delays	N/A	N/A	Introduced dynamic characterization of users by their typing patterns
Bleha, Knopp, and Obaidat^g; 1992	Fixed 32-character phrase	Users typed the sample at least once per day for five weeks	Digraph latencies	None	Linear perception
Brown and Rogers^h; 1993	First and last name	25 users typed on a single keyboard	Digraph latencies	Removed outliers	Minimum distance; back-propagation neural network; partially connected back-propagation neural network	Found that partially connected back-propagation network performed the best
Obaidatⁱ; 1995	Username and password	15 users typed on a single keyboard over 8 weeks	Durations; digraph latencies	None	Various pattern recognition (k-means, cosine measure, minimum distance, Bayesian, potential function); various neural networks (BP, SOM, ART-2, RBFN, LVQ, RNN, SOP, HSOP)	Potential function and Bayesian performed the best, while cosine measure performed the worst; using only durations was more successful than using only latencies
Lin^j; 1997	Password	90 valid users and 61 invalid users logged into system	Durations; key press delays	Derived invalid vectors by extending valid vector with random numbers and multiplying by a factor	Three-layer back-propagation neural network
de Ru and Eloff^k; 1997	Password	30 users typed on single keyboard; used assembler code to produce time intervals in clock cycles	Interkey delays; category indicating typing difficulty of password	Related precise delays to four time interval categories (a value can belong to more than one category through probabilistic assignment)	Fuzzy logic with four categories and five rules	Found typing difficult to be less discriminating than timing interval
Song, Venable, and Perrig^l; 1997	Continuous monitoring of keystrokes	Several hours of keystroke data gathered for each user; coarse timing granularity of 10 ms due to X server implementation	Digraph, trigraph, and wordgraph key events for each incoming keystroke	Measured closeness of incoming key events to the respective digraph, trigraph, and wordgraph models for that user	Final probabilistic prediction based on a weighted sum of the incoming keystroke's closeness measurement and the previous keystroke's closeness measurement	Empirical observations on a single user showed promise, but lack of quantitative results
Robinson et. al.^m; 1998	Username	140 students routinely logged into campus network; replaced standard login module with one that collected keystrokes	Digraph latencies	Randomly selected 10 usernames for training and 10 usernames for testing; discarded 24% of samples due to typing errors	Minimum distance; nonlinear measure similar to Umphress and Williams; inductive learning based on nonparametric density estimation	Found that inductive learning classifier using both duration and latencies performed the best; using duration time alone was better than latencies
Monrose, Reiter, and Wetzelⁿ; 1999	Fixed eight-character password	20 users logged into server at least five times over six months; Java applet recorded keystrokes	Durations; digraph latencies	Selected distinguishing features based on mean and standard deviation, and thresholds	Binary classification (slow and fast) for each distinguishing feature	Attempted to demonstrate how passwords can be more securely stored on servers, and did not seek to minimize FAR
Monrose and Rubin^o; 2000	N/A	63 users typed on local Sun workstations at their convenience over 11 months	N/A	Selected most significant features	Minimum distance; weighted and nonweighted probability; Bayesian	Bayesian classifier performed the best
Peacock^p; 2000	Username, password, fixed nine-character word	11 users typed samples from own machines in one session; Java applet recorded keystrokes	Durations; digraph latencies	None	K-nearest neighbor
Cho, Han, Han, and Kim^q; 2000	Seven-character password	25 users typed samples over several days	Durations; digraph latencies	Removed two users; 6%-50% of training data discarded for every user	Minimum distance; autoassociative neural network	Neural network performed the best
Haider, Abbas, and Zaidi^r; 2000	Seven-character password	Users typed samples into DOS-based application	Interkey delays	None	Fuzzy logic with five categories; three-layer neural network; statistical confidence interval; combinations thereof	A combination of approaches performed the best
Changshui and Yanhua^s; 2000	Fixed 1,100-character text	24 users typed sample 18 times	Durations; key press delays	Removed outliers	Autoregressive model with coefficients by the Yule-Walker and Burg methods	Low accuracy relative to previous results
Wong et al.^t; 2001	User-selected password	10 users typed on 2 dedicated machines; 100 unauthorized attempts	Interkey delay	Removed outliers	Single-layer perceptron network; minimum distance	Tradeoff between FRR and FAR for the two classifiers used, with the neural network having a high FAR
Bergadano, Gunetti, and Picardi^u; 2002	Fixed 683-character text	44 users typed sample over one month, with no two samples from a user collected on the same day; errors allowed	Trigraph durations	None	Disorder between arrays of sorted trigraph durations	The method was also tested on digraphs, 4-graphs, and 6-graphs, but trigraphs performed the best
Clarke et al.^v; 2002	Four-digit number, fixed phone number, varying phone numbers	16 users typed on mobile handset	N/A	N/A	Back-propagation neural network
Kacholia and Pandit^w; 2003	Username and password	20 users typed on a single machine	N/A	N/A	Clustering to produce reference models; threshold deviation for classification
Yu and Cho^x; 2003	Seven-character password	25 users typed samples over several days (data from same experiment as Cho, 2000)	Durations; digraph latencies	Various, with the best results after performing feature selection based on a genetic algorithmSVM-based wrapper	Support Vector Machine (SVM) novelty detector models	SVM approach is about 1,000 times more efficient than multilayer perceptrons but has the same degree of accuracy; large training sample needed to attain most accurate results
^a Gaines et al.
^b D. Umphress and G. Williams, "Identity Verification Through Keyboard Characteristics," International Journal of Man-Machine Studies 23: 3 (1985), 263273.
^c J. Leggett and G. Williams, "Verifying Identity Via Keystroke Characteristics," International Journal of Man-Machine Studies 28: 1 (1988), 6776.
^d Joyce and Gupta.
^e S. Bleha, C. Slivinsky, and B. Hussein, "Computer-Access Security Systems Using Keystroke Dynamics," IEEE Transactions on Pattern Analysis and Machine Intelligence 12:12 (1990), 12171222.
^f Leggett and Williams.
^g S. A. Bleha, J. Knopp, and M. S. Obaidat, "Performance of the Perceptron Algorithm for the Classification of Computer Users," Proceedings of the 1992 ACM/SIGAPP Symposium on Applied Computing (ACM Press, 1992), 863866.
^h M. Brown and S. J. Rogers, "User identification Via Keystroke Characteristics of Typed Names Using Neural Networks," International Journal of Man-Machine Studies 39:6 (1993), 9991014.
ⁱ M. S. Obaidat, "A Verification Methodology for Computer Systems Users," Proceedings of the 1995 ACM Symposium on Applied Computing (ACM Press, 1995), 258262.
^j D.-T. Lin, "Computer-Access Authentication with Neural Network Based Keystroke Identity Verification," IEEE International Conference on Neural Networks 1 (June 1997), 174178.
^k W. de Ru and J. Eloff, "Enhanced Password Authentication Through Fuzzy Logic," IEEE Expert 12 (Nov./Dec. 1997), 3845.
^l Song, Venable, and Perrig.
^m J. A. Robinson, V. W. Liang, J. A. M. Chambers, and C. L. MacKenzie, "Computer User Verification Using Login String Keystroke Dynamics," IEEE Transactions on Systems, Man, and Cybernetics, Part A 28 (March 1998), 236241.
ⁿ Monrose, Reiter, and Wetzel.
^o Monrose and Rubin.
^p Peacock.
^q Cho, Han, Han, and Kim.
^r S. Haider, A. Abbas, and A. K. Zaidi, "A Multi-Technique Approach for User Identification Through Keystroke Dynamics," IEEE International Conference on Systems, Man, and Cybernetics 2 (Oct. 2000), 13361341.
^s Z. Changshui and S. Yanhua, "AR Model for Keystroker Verification," IEEE International Conference on Systems, Man, and Cybernetics 4 (Oct. 2000), 28872890.
^t F. W. M. H. Wong, A. S. M. Supian, A. Ismail, L. W. Kin, and O. C. Soon, "Enhanced User Authentication Through Typing Biometrics with Artificial Neural Networks and K-Nearest Neighbor Algorithm," Conference Record of the Thirty-Fifth Asilomar Conference on Signals, Systems and Computers 2, (Nov. 2001), 911915.
^u F. Bergadano, D. Gunetti, and C. Picardi, "User Authentication Through Keystroke Dynamics," ACM Transacations on Information and System Security, 5:4 (2002), 367397.
^v Clarke et al.
^w V. Kacholia and S. Pandit, "Biometric Authentication Using Random Distributions (BioART)," Proceedings of the 15th Canadian IT Security Symposium (CITSS), Government of Canada (May 2003).
^x E. Yu and S. Cho, "GA-SVM Wrapper Approach for Feature Subset Selection in Keystroke Dynamics Identity Verification," Proceedings of the IEEE International Joint Conference on Neural Networks 3 (July 2003), 22532257.

11.3. Overview of Previous Research

Table 11-1. Comparison of published research, 19802004