Section 11.3. Overview of Previous Research


11.3. Overview of Previous Research

The first studies on the effectiveness of keystroke characteristics as personal identifiers occurred in 1977[17] and 1980[18] (for a fuller treatment of work prior to 1990, see Joyce and Gupta[19]). Over the years, many different classifiers have been evaluated in an effort to improve recognition capabilities of keystroke biometrics , ranging from statistical analysis to neural networks. It is beyond the scope of this chapter to delve into the details of each approach. In general, each classifier measures the similarity between an input keystroke timing pattern and a reference model of the legitimate user's typing pattern. The model is built from training samples previously provided by each user and maintains varying characteristics depending on the classifier. The time required to generate each model also varies according to the classifier, with neural networks generally taking significantly longer than other approaches.

[17] G. Forsen, M. Nelson, and R. Staron, "Personal Attributes Authentication Techniques," Rome Air Development Center Report RADC-TR-77-1033, Air Force Base Griffis (New York, 1977).

[18] R. Gaines, W. Lisowski, S. Press, and N. Shapiro, "Authentication by Keystroke Timing: Some Preliminary Results," Technical Report Rand report R-256-NSF, Rand Corporation (1980).

[19] R. Joyce and G. Gupta, "Identity Authentication Based on Keystroke Latencies," Communications of the ACM 33:2 (1990), 168176.

Table 11-1 compares the various experimental designs and techniques that have been analyzed in key published research. We include as much information as we have available from the relevant papers, though often experimental details are omitted from the primary source.[20]

[20] Feature vectors representing keystroke characteristics are derived from key press times, key release times, and information on which keys are being pressed. Times are usually measured in milliseconds, although the granularity can vary according to the experiment setup, and is not generally reported. Duration, or hold time, represents the time between the press of a key and the release of the same key. Digraph latency is the delay between the release of one key and the press of the next key. In early literature, the term interkey delay is also used; that term may refer to either the digraph latency or the time from the press of one key to the press of the next key. We will refer to the latter feature as key press delay to avoid confusion. Timing information between three consecutive keystrokes, known as trigraphs, has also been analyzed. Unless otherwise indicated, samples involving input errors or the use of the backspace key are not analyzed. In addition, most experiments have subjects type on a single machine and keyboard, with the notable exception being the web-based experiments that rely on Java applets to collect keystrokes.

Table 11-1. Comparison of published research, 19802004

Authors/Year

Input Data

Design

Features

Preprocessing

Classifiers

Notes

Gaines, Lisowski, Press, and Shapiroa; 1980

Three 300400 character passages

Seven professional secretaries typed two samples each with a delay of four months between samples

Interkey delays

Used only the 87 digraphs that had at least 10 or more replications per sample and per user; eliminated outliers; took logarithm of values

Two-sample t-test on whether the means of each value were the same assuming that variances were the same

Identified five core digraphs that discriminated perfectly: in, io, no, on, and ul

Umphress and Williamsb; 1985

Fixed 1,400-character reference input, 300-character test input

17 programmers typed samples with a delay of at least one month; errors allowed

Interkey delays

Single low-pass temporal filter to remove outliers

Closeness between test value and corresponding reference value, measured according to a standard deviation threshold and a passing ratio

 

Leggett and Williamsc; 1988

Two samples of fixed 537-character input

36 individuals typed samples with a delay of at least one month; errors allowed

Interkey delays; mean of delays

Various; resulted in 12 different subsets of feature vectors to analyze

Closeness measure as in Umphress and Williams

Found that means of delays do not further discriminate between users; using all lowercase digraphs yielded best results

Joyce and Guptad; 1990

Username, password, first name, last name

33 users typed all samples in a single session

Key press delays

None

Minimum distance from reference model, with verification threshold according to each user's typing variance

Found that more experienced users were more difficult for imposters to replicate

Bleha, Slavinski, and Husseine; 1990

Username and fixed 32-character phrase

32 users typed samples over a period of weeks

Digraph latencies

Combined two samples into one; dimension reduction to reduce size of feature vector

Normalized minimum distance; normalized Bayesian

Applied different fixed thresholds for authentication

Leggett and Williams et al.f; 1991

Same as 1988

Same as 1988

Interkey delays

N/A

N/A

Introduced dynamic characterization of users by their typing patterns

Bleha, Knopp, and Obaidatg; 1992

Fixed 32-character phrase

Users typed the sample at least once per day for five weeks

Digraph latencies

None

Linear perception

 

Brown and Rogersh; 1993

First and last name

25 users typed on a single keyboard

Digraph latencies

Removed outliers

Minimum distance; back-propagation neural network; partially connected back-propagation neural network

Found that partially connected back-propagation network performed the best

Obaidati; 1995

Username and password

15 users typed on a single keyboard over 8 weeks

Durations; digraph latencies

None

Various pattern recognition (k-means, cosine measure, minimum distance, Bayesian, potential function); various neural networks (BP, SOM, ART-2, RBFN, LVQ, RNN, SOP, HSOP)

Potential function and Bayesian performed the best, while cosine measure performed the worst; using only durations was more successful than using only latencies

Linj; 1997

Password

90 valid users and 61 invalid users logged into system

Durations; key press delays

Derived invalid vectors by extending valid vector with random numbers and multiplying by a factor

Three-layer back-propagation neural network

 

de Ru and Eloffk; 1997

Password

30 users typed on single keyboard; used assembler code to produce time intervals in clock cycles

Interkey delays; category indicating typing difficulty of password

Related precise delays to four time interval categories (a value can belong to more than one category through probabilistic assignment)

Fuzzy logic with four categories and five rules

Found typing difficult to be less discriminating than timing interval

Song, Venable, and Perrigl; 1997

Continuous monitoring of keystrokes

Several hours of keystroke data gathered for each user; coarse timing granularity of 10 ms due to X server implementation

Digraph, trigraph, and wordgraph key events for each incoming keystroke

Measured closeness of incoming key events to the respective digraph, trigraph, and wordgraph models for that user

Final probabilistic prediction based on a weighted sum of the incoming keystroke's closeness measurement and the previous keystroke's closeness measurement

Empirical observations on a single user showed promise, but lack of quantitative results

Robinson et. al.m; 1998

Username

140 students routinely logged into campus network; replaced standard login module with one that collected keystrokes

Digraph latencies

Randomly selected 10 usernames for training and 10 usernames for testing; discarded 24% of samples due to typing errors

Minimum distance; nonlinear measure similar to Umphress and Williams; inductive learning based on nonparametric density estimation

Found that inductive learning classifier using both duration and latencies performed the best; using duration time alone was better than latencies

Monrose, Reiter, and Wetzeln; 1999

Fixed eight-character password

20 users logged into server at least five times over six months; Java applet recorded keystrokes

Durations; digraph latencies

Selected distinguishing features based on mean and standard deviation, and thresholds

Binary classification (slow and fast) for each distinguishing feature

Attempted to demonstrate how passwords can be more securely stored on servers, and did not seek to minimize FAR

Monrose and Rubino; 2000

N/A

63 users typed on local Sun workstations at their convenience over 11 months

N/A

Selected most significant features

Minimum distance; weighted and nonweighted probability; Bayesian

Bayesian classifier performed the best

Peacockp; 2000

Username, password, fixed nine-character word

11 users typed samples from own machines in one session; Java applet recorded keystrokes

Durations; digraph latencies

None

K-nearest neighbor

 

Cho, Han, Han, and Kimq; 2000

Seven-character password

25 users typed samples over several days

Durations; digraph latencies

Removed two users; 6%-50% of training data discarded for every user

Minimum distance; autoassociative neural network

Neural network performed the best

Haider, Abbas, and Zaidir; 2000

Seven-character password

Users typed samples into DOS-based application

Interkey delays

None

Fuzzy logic with five categories; three-layer neural network; statistical confidence interval; combinations thereof

A combination of approaches performed the best

Changshui and Yanhuas; 2000

Fixed 1,100-character text

24 users typed sample 18 times

Durations; key press delays

Removed outliers

Autoregressive model with coefficients by the Yule-Walker and Burg methods

Low accuracy relative to previous results

Wong et al.t; 2001

User-selected password

10 users typed on 2 dedicated machines; 100 unauthorized attempts

Interkey delay

Removed outliers

Single-layer perceptron network; minimum distance

Tradeoff between FRR and FAR for the two classifiers used, with the neural network having a high FAR

Bergadano, Gunetti, and Picardiu; 2002

Fixed 683-character text

44 users typed sample over one month, with no two samples from a user collected on the same day; errors allowed

Trigraph durations

None

Disorder between arrays of sorted trigraph durations

The method was also tested on digraphs, 4-graphs, and 6-graphs, but trigraphs performed the best

Clarke et al.v; 2002

Four-digit number, fixed phone number, varying phone numbers

16 users typed on mobile handset

N/A

N/A

Back-propagation neural network

 

Kacholia and Panditw; 2003

Username and password

20 users typed on a single machine

N/A

N/A

Clustering to produce reference models; threshold deviation for classification

 

Yu and Chox; 2003

Seven-character password

25 users typed samples over several days (data from same experiment as Cho, 2000)

Durations; digraph latencies

Various, with the best results after performing feature selection based on a genetic algorithmSVM-based wrapper

Support Vector Machine (SVM) novelty detector models

SVM approach is about 1,000 times more efficient than multilayer perceptrons but has the same degree of accuracy; large training sample needed to attain most accurate results

a Gaines et al.

b D. Umphress and G. Williams, "Identity Verification Through Keyboard Characteristics," International Journal of Man-Machine Studies 23: 3 (1985), 263273.

c J. Leggett and G. Williams, "Verifying Identity Via Keystroke Characteristics," International Journal of Man-Machine Studies 28: 1 (1988), 6776.

d Joyce and Gupta.

e S. Bleha, C. Slivinsky, and B. Hussein, "Computer-Access Security Systems Using Keystroke Dynamics," IEEE Transactions on Pattern Analysis and Machine Intelligence 12:12 (1990), 12171222.

f Leggett and Williams.

g S. A. Bleha, J. Knopp, and M. S. Obaidat, "Performance of the Perceptron Algorithm for the Classification of Computer Users," Proceedings of the 1992 ACM/SIGAPP Symposium on Applied Computing (ACM Press, 1992), 863866.

h M. Brown and S. J. Rogers, "User identification Via Keystroke Characteristics of Typed Names Using Neural Networks," International Journal of Man-Machine Studies 39:6 (1993), 9991014.

i M. S. Obaidat, "A Verification Methodology for Computer Systems Users," Proceedings of the 1995 ACM Symposium on Applied Computing (ACM Press, 1995), 258262.

j D.-T. Lin, "Computer-Access Authentication with Neural Network Based Keystroke Identity Verification," IEEE International Conference on Neural Networks 1 (June 1997), 174178.

k W. de Ru and J. Eloff, "Enhanced Password Authentication Through Fuzzy Logic," IEEE Expert 12 (Nov./Dec. 1997), 3845.

l Song, Venable, and Perrig.

m J. A. Robinson, V. W. Liang, J. A. M. Chambers, and C. L. MacKenzie, "Computer User Verification Using Login String Keystroke Dynamics," IEEE Transactions on Systems, Man, and Cybernetics, Part A 28 (March 1998), 236241.

n Monrose, Reiter, and Wetzel.

o Monrose and Rubin.

p Peacock.

q Cho, Han, Han, and Kim.

r S. Haider, A. Abbas, and A. K. Zaidi, "A Multi-Technique Approach for User Identification Through Keystroke Dynamics," IEEE International Conference on Systems, Man, and Cybernetics 2 (Oct. 2000), 13361341.

s Z. Changshui and S. Yanhua, "AR Model for Keystroker Verification," IEEE International Conference on Systems, Man, and Cybernetics 4 (Oct. 2000), 28872890.

t F. W. M. H. Wong, A. S. M. Supian, A. Ismail, L. W. Kin, and O. C. Soon, "Enhanced User Authentication Through Typing Biometrics with Artificial Neural Networks and K-Nearest Neighbor Algorithm," Conference Record of the Thirty-Fifth Asilomar Conference on Signals, Systems and Computers 2, (Nov. 2001), 911915.

u F. Bergadano, D. Gunetti, and C. Picardi, "User Authentication Through Keystroke Dynamics," ACM Transacations on Information and System Security, 5:4 (2002), 367397.

v Clarke et al.

w V. Kacholia and S. Pandit, "Biometric Authentication Using Random Distributions (BioART)," Proceedings of the 15th Canadian IT Security Symposium (CITSS), Government of Canada (May 2003).

x E. Yu and S. Cho, "GA-SVM Wrapper Approach for Feature Subset Selection in Keystroke Dynamics Identity Verification," Proceedings of the IEEE International Joint Conference on Neural Networks 3 (July 2003), 22532257.




Security and Usability. Designing Secure Systems that People Can Use
Security and Usability: Designing Secure Systems That People Can Use
ISBN: 0596008279
EAN: 2147483647
Year: 2004
Pages: 295

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net