The first studies on the effectiveness of keystroke characteristics as personal identifiers occurred in 1977[17] and 1980[18] (for a fuller treatment of work prior to 1990, see Joyce and Gupta[19]). Over the years, many different classifiers have been evaluated in an effort to improve recognition capabilities of keystroke biometrics , ranging from statistical analysis to neural networks. It is beyond the scope of this chapter to delve into the details of each approach. In general, each classifier measures the similarity between an input keystroke timing pattern and a reference model of the legitimate user's typing pattern. The model is built from training samples previously provided by each user and maintains varying characteristics depending on the classifier. The time required to generate each model also varies according to the classifier, with neural networks generally taking significantly longer than other approaches.
Table 11-1 compares the various experimental designs and techniques that have been analyzed in key published research. We include as much information as we have available from the relevant papers, though often experimental details are omitted from the primary source.[20]
Table 11-1. Comparison of published research, 19802004
Authors/Year | Input Data | Design | Features | Preprocessing | Classifiers | Notes |
---|
Gaines, Lisowski, Press, and Shapiroa; 1980 | Three 300400 character passages | Seven professional secretaries typed two samples each with a delay of four months between samples | Interkey delays | Used only the 87 digraphs that had at least 10 or more replications per sample and per user; eliminated outliers; took logarithm of values | Two-sample t-test on whether the means of each value were the same assuming that variances were the same | Identified five core digraphs that discriminated perfectly: in, io, no, on, and ul |
Umphress and Williamsb; 1985 | Fixed 1,400-character reference input, 300-character test input | 17 programmers typed samples with a delay of at least one month; errors allowed | Interkey delays | Single low-pass temporal filter to remove outliers | Closeness between test value and corresponding reference value, measured according to a standard deviation threshold and a passing ratio | |
Leggett and Williamsc; 1988 | Two samples of fixed 537-character input | 36 individuals typed samples with a delay of at least one month; errors allowed | Interkey delays; mean of delays | Various; resulted in 12 different subsets of feature vectors to analyze | Closeness measure as in Umphress and Williams | Found that means of delays do not further discriminate between users; using all lowercase digraphs yielded best results |
Joyce and Guptad; 1990 | Username, password, first name, last name | 33 users typed all samples in a single session | Key press delays | None | Minimum distance from reference model, with verification threshold according to each user's typing variance | Found that more experienced users were more difficult for imposters to replicate |
Bleha, Slavinski, and Husseine; 1990 | Username and fixed 32-character phrase | 32 users typed samples over a period of weeks | Digraph latencies | Combined two samples into one; dimension reduction to reduce size of feature vector | Normalized minimum distance; normalized Bayesian | Applied different fixed thresholds for authentication |
Leggett and Williams et al.f; 1991 | Same as 1988 | Same as 1988 | Interkey delays | N/A | N/A | Introduced dynamic characterization of users by their typing patterns |
Bleha, Knopp, and Obaidatg; 1992 | Fixed 32-character phrase | Users typed the sample at least once per day for five weeks | Digraph latencies | None | Linear perception | |
Brown and Rogersh; 1993 | First and last name | 25 users typed on a single keyboard | Digraph latencies | Removed outliers | Minimum distance; back-propagation neural network; partially connected back-propagation neural network | Found that partially connected back-propagation network performed the best |
Obaidati; 1995 | Username and password | 15 users typed on a single keyboard over 8 weeks | Durations; digraph latencies | None | Various pattern recognition (k-means, cosine measure, minimum distance, Bayesian, potential function); various neural networks (BP, SOM, ART-2, RBFN, LVQ, RNN, SOP, HSOP) | Potential function and Bayesian performed the best, while cosine measure performed the worst; using only durations was more successful than using only latencies |
Linj; 1997 | Password | 90 valid users and 61 invalid users logged into system | Durations; key press delays | Derived invalid vectors by extending valid vector with random numbers and multiplying by a factor | Three-layer back-propagation neural network | |
de Ru and Eloffk; 1997 | Password | 30 users typed on single keyboard; used assembler code to produce time intervals in clock cycles | Interkey delays; category indicating typing difficulty of password | Related precise delays to four time interval categories (a value can belong to more than one category through probabilistic assignment) | Fuzzy logic with four categories and five rules | Found typing difficult to be less discriminating than timing interval |
Song, Venable, and Perrigl; 1997 | Continuous monitoring of keystrokes | Several hours of keystroke data gathered for each user; coarse timing granularity of 10 ms due to X server implementation | Digraph, trigraph, and wordgraph key events for each incoming keystroke | Measured closeness of incoming key events to the respective digraph, trigraph, and wordgraph models for that user | Final probabilistic prediction based on a weighted sum of the incoming keystroke's closeness measurement and the previous keystroke's closeness measurement | Empirical observations on a single user showed promise, but lack of quantitative results |
Robinson et. al.m; 1998 | Username | 140 students routinely logged into campus network; replaced standard login module with one that collected keystrokes | Digraph latencies | Randomly selected 10 usernames for training and 10 usernames for testing; discarded 24% of samples due to typing errors | Minimum distance; nonlinear measure similar to Umphress and Williams; inductive learning based on nonparametric density estimation | Found that inductive learning classifier using both duration and latencies performed the best; using duration time alone was better than latencies |
Monrose, Reiter, and Wetzeln; 1999 | Fixed eight-character password | 20 users logged into server at least five times over six months; Java applet recorded keystrokes | Durations; digraph latencies | Selected distinguishing features based on mean and standard deviation, and thresholds | Binary classification (slow and fast) for each distinguishing feature | Attempted to demonstrate how passwords can be more securely stored on servers, and did not seek to minimize FAR |
Monrose and Rubino; 2000 | N/A | 63 users typed on local Sun workstations at their convenience over 11 months | N/A | Selected most significant features | Minimum distance; weighted and nonweighted probability; Bayesian | Bayesian classifier performed the best |
Peacockp; 2000 | Username, password, fixed nine-character word | 11 users typed samples from own machines in one session; Java applet recorded keystrokes | Durations; digraph latencies | None | K-nearest neighbor | |
Cho, Han, Han, and Kimq; 2000 | Seven-character password | 25 users typed samples over several days | Durations; digraph latencies | Removed two users; 6%-50% of training data discarded for every user | Minimum distance; autoassociative neural network | Neural network performed the best |
Haider, Abbas, and Zaidir; 2000 | Seven-character password | Users typed samples into DOS-based application | Interkey delays | None | Fuzzy logic with five categories; three-layer neural network; statistical confidence interval; combinations thereof | A combination of approaches performed the best |
Changshui and Yanhuas; 2000 | Fixed 1,100-character text | 24 users typed sample 18 times | Durations; key press delays | Removed outliers | Autoregressive model with coefficients by the Yule-Walker and Burg methods | Low accuracy relative to previous results |
Wong et al.t; 2001 | User-selected password | 10 users typed on 2 dedicated machines; 100 unauthorized attempts | Interkey delay | Removed outliers | Single-layer perceptron network; minimum distance | Tradeoff between FRR and FAR for the two classifiers used, with the neural network having a high FAR |
Bergadano, Gunetti, and Picardiu; 2002 | Fixed 683-character text | 44 users typed sample over one month, with no two samples from a user collected on the same day; errors allowed | Trigraph durations | None | Disorder between arrays of sorted trigraph durations | The method was also tested on digraphs, 4-graphs, and 6-graphs, but trigraphs performed the best |
Clarke et al.v; 2002 | Four-digit number, fixed phone number, varying phone numbers | 16 users typed on mobile handset | N/A | N/A | Back-propagation neural network | |
Kacholia and Panditw; 2003 | Username and password | 20 users typed on a single machine | N/A | N/A | Clustering to produce reference models; threshold deviation for classification | |
Yu and Chox; 2003 | Seven-character password | 25 users typed samples over several days (data from same experiment as Cho, 2000) | Durations; digraph latencies | Various, with the best results after performing feature selection based on a genetic algorithmSVM-based wrapper | Support Vector Machine (SVM) novelty detector models | SVM approach is about 1,000 times more efficient than multilayer perceptrons but has the same degree of accuracy; large training sample needed to attain most accurate results |
a Gaines et al. |
b D. Umphress and G. Williams, "Identity Verification Through Keyboard Characteristics," International Journal of Man-Machine Studies 23: 3 (1985), 263273. |
c J. Leggett and G. Williams, "Verifying Identity Via Keystroke Characteristics," International Journal of Man-Machine Studies 28: 1 (1988), 6776. |
d Joyce and Gupta. |
e S. Bleha, C. Slivinsky, and B. Hussein, "Computer-Access Security Systems Using Keystroke Dynamics," IEEE Transactions on Pattern Analysis and Machine Intelligence 12:12 (1990), 12171222. |
f Leggett and Williams. |
g S. A. Bleha, J. Knopp, and M. S. Obaidat, "Performance of the Perceptron Algorithm for the Classification of Computer Users," Proceedings of the 1992 ACM/SIGAPP Symposium on Applied Computing (ACM Press, 1992), 863866. |
h M. Brown and S. J. Rogers, "User identification Via Keystroke Characteristics of Typed Names Using Neural Networks," International Journal of Man-Machine Studies 39:6 (1993), 9991014. |
i M. S. Obaidat, "A Verification Methodology for Computer Systems Users," Proceedings of the 1995 ACM Symposium on Applied Computing (ACM Press, 1995), 258262. |
j D.-T. Lin, "Computer-Access Authentication with Neural Network Based Keystroke Identity Verification," IEEE International Conference on Neural Networks 1 (June 1997), 174178. |
k W. de Ru and J. Eloff, "Enhanced Password Authentication Through Fuzzy Logic," IEEE Expert 12 (Nov./Dec. 1997), 3845. |
l Song, Venable, and Perrig. |
m J. A. Robinson, V. W. Liang, J. A. M. Chambers, and C. L. MacKenzie, "Computer User Verification Using Login String Keystroke Dynamics," IEEE Transactions on Systems, Man, and Cybernetics, Part A 28 (March 1998), 236241. |
n Monrose, Reiter, and Wetzel. |
o Monrose and Rubin. |
p Peacock. |
q Cho, Han, Han, and Kim. |
r S. Haider, A. Abbas, and A. K. Zaidi, "A Multi-Technique Approach for User Identification Through Keystroke Dynamics," IEEE International Conference on Systems, Man, and Cybernetics 2 (Oct. 2000), 13361341. |
s Z. Changshui and S. Yanhua, "AR Model for Keystroker Verification," IEEE International Conference on Systems, Man, and Cybernetics 4 (Oct. 2000), 28872890. |
t F. W. M. H. Wong, A. S. M. Supian, A. Ismail, L. W. Kin, and O. C. Soon, "Enhanced User Authentication Through Typing Biometrics with Artificial Neural Networks and K-Nearest Neighbor Algorithm," Conference Record of the Thirty-Fifth Asilomar Conference on Signals, Systems and Computers 2, (Nov. 2001), 911915. |
u F. Bergadano, D. Gunetti, and C. Picardi, "User Authentication Through Keystroke Dynamics," ACM Transacations on Information and System Security, 5:4 (2002), 367397. |
v Clarke et al. |
w V. Kacholia and S. Pandit, "Biometric Authentication Using Random Distributions (BioART)," Proceedings of the 15th Canadian IT Security Symposium (CITSS), Government of Canada (May 2003). |
x E. Yu and S. Cho, "GA-SVM Wrapper Approach for Feature Subset Selection in Keystroke Dynamics Identity Verification," Proceedings of the IEEE International Joint Conference on Neural Networks 3 (July 2003), 22532257. |