You are on page 1of 6

Bonfring International Journal of Software Engineering and Soft Computing, Vol. 9, No.

2, April 2019 52

Dynamics User Identification Using Pairwise


Client Coupling
D. Tejaswi* and Rajasekar Rangasamy

Abstract--- In this paper due to the increasing Keystroke dynamics are identified by the writing dynamics
vulnerabilities in the internet, security alone isn't sufficient to of the user or client, which are considered to be unique to a
keep a rupture, however digital crime scene investigation and large extent among different people.
cyber intelligence is also required to prevent future assaults or Keystroke dynamics are known by a few names: keyboard
to identify the potential attacker. The unpretentious and
dynamics, keystroke analysis, biometrics typing and rhythm
incognito nature of biometric information gathering of typing.
keystroke dynamics has a high potential for use in cyber
forensics or cyber intelligence and crime scene investigation The biometric keystroke dynamics system is very high.
or digital knowledge. The keystroke dynamics is a biometric The keystroke dynamics are mostly applicable for verification;
assumption that different people typify in a unique way. The it is also possible to identify them.
information accessing from computer systems is normally
It is known who the user should be and the biometric
controlled by client accounts with usernames and passwords.
system should verify whether the user is the person he
If the set of data falls into the wrong hands, such a scheme has
professes to be.
little security. For example fingerprints, can be used to
strengthen security; however they require very expensive When recognizing, the biometric system should recognize
additional hardware. Keystroke dynamics with no additional the client, using just keystroke dynamics, without any extra
hardware can be used. Keystroke dynamics is for the most information. Most keystroke dynamics applications are in the
part applicable to verification and identification also possible. field of verification.
In verification it is known who the client is supposed to be and
Keystroke dynamics is a well-explored research domain in
the biometric system should verify if the user is who he claims
authentication, where the research problem is a two class
to be in identification, the biometric The system should identify
problem i.e. legitimate or imposter user.
the client with keystroke dynamics without additional
knowledge. This paper examines the usefulness of keystroke Finally, an example where identification could be of use is
dynamics to determine the user's identity. We propose three in a chat room, where the behavior of an unknown person is
plans for user identification when entering a keyboard. We use compared to the known profiles.
different machine learning algorithms in conjunction with the
For instance, if a person is demonstrating pedophile
proposed user coupling technology. In particular, we show
that combined user coupling in a bottom - up tree structure behavior, then his or her typing behavior can be compared to
scheme provides the best performance in terms of both the behavior of a lot of known pedophiles.
precision and time complexity. The techniques proposed are This paper's primary commitments are as follows:
validated by keystroke data. Lastly, we also examined the
performance of the identification system and demonstrated We propose three distinctive identification schemes in this
that the performance was not optimal, as expected. paper.

Keywords--- Pairwise Client Coupling, Keystroke These schemes are based on the combination of user
Dynamics, User Identification, Behavioural Biometrics, coupling, Let us consider the Example is multi-class pattern
Cyber-forensics or Cyber Intelligence, Biometrics, Pattern identification problem, it will be divided into several two-class
Recognition, Computer Security. problems.
These schemes could be useful for user identification.
I. INTRODUCTION Extensive analysis was carried out with a keystroke data

K EYSTROKE Dynamics (KD) is a well established


behavioral biometric modality due to the unobtrusive
nature of biometric data collection, low computational
set based on the online test; this data set was collected from 64
users with three different typing methods. We have used
another keystroke data set with our optimal settings to approve
complexity and no special hardware required for data our research.
collection [1], [2], [3].
We analyzed both open-set and close-set settings and
showed that our optimal settings outperform state-of - the-art
D. Tejaswi*, PG Scholar, Department of Computer Science and research.
Engineering, St. Peter's Engineering College, Hyderabad, Telangana, India.
Rajasekar Rangasamy, Professor, Department of Computer Science and
Engineering, St. Peter's Engineering College, Hyderabad, Telangana, India.
DOI:10.9756/BIJSESC.9023

ISSN 2277-5099 | © 2019 Bonfring


Bonfring International Journal of Software Engineering and Soft Computing, Vol. 9, No. 2, April 2019 53

Authentication Identification
Static [6],[7],[16],[17],[18],[19] [13],[14]
Periodic [10],[11],[12] [15]
[8] [20]
Continuous
Fig. 2: State-of-the-art Keystroke Research
In [8] has an artificial neural network technique been used
for real time user identification using Keystroke dynamics.
This approach was validated experimentally based on data of 6
users, each typing a 15 character phrase 20 times. The
achieved identification accuracy was 97.8%. In [9] a
Euclidean distance based nearest neighbor classifier has been
used for personal identification and an accuracy of 99.3% for
Fig. 1: User Authentication Approach Using Keystroke 36 users was achieved. Several researchers have participated
Dynamics in this competition and proposed different techniques to
improve the baseline performance.
II. KEYSTROKE DYNAMICS AUTHENTICATION
TECHNIQUES FIELD III. CLASSIFIER
The invention concerns keystroke dynamics Client identification by analyzing the client’s keystroke
authentication. In Specifically, the invention identifies with behavior profiles is challenging due to limited data, large
data manipulations that offer enhanced performance for intra-class variations and the sparse nature of the information.
authentication systems for keystroke dynamics. We observed that statistical analysis (i.e. distance based
classifiers) is successful for authentication but, fails to achieve
Background good results in the identification. Therefore we have used four
Computer systems often contain essential and delicate different classifiers in our research.
information, control access to or play an integral role in
1) Artificial Neural Network (ANN)
securing physical sites and assets. Computer security depends
heavily on secret passwords before. Unfortunately, clients ANN is a combination of multiple artificial neurons which
frequently choose passwords that are anything but easy to can be used for classification and regression analysis. We have
guess or easy to determine using comprehensive searches or used the Scaled Conjugate Gradient algorithm which is
other means. When more complex passwords are allocated, efficient to optimize the cost function and also it will reduce
clients may find them difficult to remember, so they can write the ANN training time.
them down, creating new, different security vulnerability. 2) Counter-Propagation Artificial Neural Network (CPANN)
Different approaches have been attempted to enhance the CPANN is a hybrid learning mechanism dependent on
security of computer systems. However, an authorized client ANN to deal with supervised learning problems. In CPANN,
can still allow an unauthorized client to use a system by the output layer is added to the Kohonen layer which is very
simply giving the unauthorized user the token and secret code. similar to Self Organizing Maps and provides both the
The estimation of unique physical characteristics (" biometrics advantages of supervised and unsupervised learning.
") of users to identify authorized users depends on other
authentication methods. For example, fingerprints, Voice 3) Support Vector Machine (SVM):
patterns and retinal images have all been used with some SVM is a very well-known supervised learning algorithm
Success. However, these strategies typically require special which can be used for classification problems. The SV is the
hardware to implement that is keystroke. data point of the various classes closest to the decision line.
In KD, clients are identified or authenticated dependent on 4) Decision Tree (DT):mi
the manner in which they type on a keyboard. A KD based DT is a predictive learning model based on a tree structure
authentication or identification system is low-cost and easy to that maps the features of an item's observation to the target
implement because most of systems are software based. In objective value where leaves are class labels and branches are
such a system, the keystroke timing information must be combinations of features leading to class labels.
captured for pattern analysis [4]. KD is a well establish
biometrics for static, periodic or continuous authentication [5], IV. SYSTEM PIPELINE
[6], [7]. Table I shows state-of - the-art KD research
depending on the different objectives (e.g. authentication or To achieve better performance, we applied the
identification) with different functionalities, i.e. static (user combination of User Coupling (CUC) technique using the
authentication is based on the password or pass phrase type above mentioned classifiers (in background section).
rhythm),Periodic (user re-authentication after a fixed amount
of data, e.g. after 500 keystrokes) and continuous (user re-
authentication based on typing behavior continually).

ISSN 2277-5099 | © 2019 Bonfring


Bonfring International Journal of Software Engineering and Soft Computing, Vol. 9, No. 2, April 2019 54

𝑷𝑷𝑴𝑴𝟏𝟏 𝑷𝑷𝑴𝑴𝟏𝟏 … 𝑷𝑷𝑴𝑴𝟏𝟏 𝑷𝑷𝑴𝑴𝟏𝟏


𝟐𝟐 𝟑𝟑 𝑵𝑵−1 𝑵𝑵
𝑷𝑷𝑴𝑴𝟐𝟐 𝑷𝑷𝑴𝑴𝟐𝟐 … 𝑷𝑷𝑴𝑴𝟐𝟐 𝑷𝑷𝑴𝑴𝟐𝟐
𝟏𝟏 𝟑𝟑 𝑵𝑵−𝟏𝟏 𝑵𝑵
𝑷𝑷𝑴𝑴𝟑𝟑 𝑷𝑷𝑴𝑴𝟑𝟑 … 𝑷𝑷𝑴𝑴𝟑𝟑 𝑷𝑷𝑴𝑴𝟑𝟑
𝟏𝟏 𝟐𝟐 𝑵𝑵−𝟏𝟏 𝑵𝑵

𝑷𝑷𝑴𝑴𝟒𝟒 𝑷𝑷𝑴𝑴𝟒𝟒 𝑷𝑷𝑴𝑴𝟒𝟒 𝑷𝑷𝑴𝑴𝟒𝟒
𝟏𝟏 𝟐𝟐 𝑵𝑵−𝟏𝟏 𝑵𝑵
.
.
.
𝑷𝑷𝑴𝑴𝑵𝑵−𝟏𝟏 𝑷𝑷𝑴𝑴𝑵𝑵−𝟏𝟏 … 𝑷𝑷𝑴𝑴𝑵𝑵−𝟏𝟏 𝑷𝑷𝑴𝑴𝑵𝑵−𝟏𝟏
𝟏𝟏 𝟐𝟐 𝑵𝑵−𝟐𝟐 𝑵𝑵
𝑷𝑷𝑴𝑴𝑵𝑵 𝑷𝑷𝑴𝑴𝑵𝑵 … 𝑷𝑷𝑴𝑴𝑵𝑵 𝑷𝑷𝑴𝑴𝑵𝑵
𝟏𝟏 𝟐𝟐 𝑵𝑵−𝟐𝟐 𝑵𝑵−𝟏𝟏
Fig. 3: Block Diagram of Pipe Line
Fig. 5: Pairwise Training Models for Multi-Class PUC
Pair Wise Client Training Data Preparation Figure 3 shows a pictorial representation of a pairwise
Figure4 shows a case of a multi-class (i.e. N class) training training models for multi-class PUC. In this example, we
dataset for the preparation of conventional training data. In can see that multiple training models are created (denoted
this example, FV i q represents the feature vector of user i by PM i ) for user i and j, where i = 1, 2, . . . N and j
from qth sample, where i = 1,2,...N, q = 1,2,...n and n is the Ji = [1, 2, . . . N ] [i] .
total number of training samples for user i. The last column
represents the class label i.e. a value between 1 and N. we Three different identification systems (i.e. S1, S2 and S3)
came up with a solution called PCC, where the multi-class were developed for the examination and decision module.
classification problem will be divided into several two-class In Scheme S1, we will randomly arrange the set of clients
classification problems. into pairs and we will determine for each pair (client i, client j)
Feature Vector Class Label whether the information fits better with the client i or client j
𝑭𝑭𝑽𝑽𝟏𝟏 1 profile. In Scheme S2, we will randomly select k other clients
𝟏𝟏
… …
for each client i and determine the mean score for client i
𝑭𝑭𝑽𝑽𝟏𝟏 1 when comparing the test data in k pairwise examinations with
𝒏𝒏 the randomly selected other clients.
𝑭𝑭𝑽𝑽𝟐𝟐 2
𝟏𝟏 Scheme S3 depends on twice applying Scheme S2. The
… …
𝑭𝑭𝑽𝑽𝟐𝟐 2
first S2 system is used to reduce the number of potential users
𝒏𝒏 from the first N users to c clients only. The remaining c clients
… … are compared in a complete comparison in the second step.
𝑭𝑭𝑽𝑽𝑵𝑵−𝟏𝟏 N-1
𝟏𝟏 Scheme (S1)
… …
𝑭𝑭𝑽𝑽𝑵𝑵−𝟏𝟏 N-1 Scheme 1 (S1) where consider the total number of users is
𝒏𝒏 20 and the required rank is 1, i.e. N = 20 and r = 1. the pairs
… … are created in increasing order but we have to selected these
𝑵𝑵 N
𝑭𝑭𝑽𝑽𝒏𝒏 pairs randomly in each round.

Fig. 4: Multi-class (i.e. N class) Training Dataset for


Conventional Training Data Preparation

ISSN 2277-5099 | © 2019 Bonfring


Bonfring International Journal of Software Engineering and Soft Computing, Vol. 9, No. 2, April 2019 55

Rank depends on average score maximization i.e. Si > Sj select k pairs from the set PM i randomly and calculate the
for m number of keystrokes where the number of T1 classification scores for each of these pairs. This gives each
comparisons is T1(N, r)= N − r when it starts with N clients classifier an aggregate value of k m.
and stops at rank r. At that point, we get the resulting score for the ith client by
the number of comparisons T2 for this scheme is independent
of r, but it depends on N and k. In specifically we have T2(N,
k) = N × k.
Scheme 2 (S2)
The below Algorithm 2 for Scheme 2 (S2) where k = 6 and
r = 1. When comparing the subject and ith subject, we initially

we repeat S2 with the set of ∆ users with a fixed value of k,


i.e. k = c − 1. This scheme can be considered as using S2 for
Scheme3 (S3)
a re-ranking process after the initial S2 scheme. For S3 the
Below algorithm for Scheme (S3). Let ∆ =[δ1, δ2, . . . δc] number of comparisons T3 depends on N , k and c. To be
be the set of the Rank-c clients subsequent to applying S2 precise we have T3 (N, k, c) = T2(N, k) + c × (c − 1).
with r = c, where ∆ subset or equivalent to { 1, 2, . . .N}. Now

ISSN 2277-5099 | © 2019 Bonfring


Bonfring International Journal of Software Engineering and Soft Computing, Vol. 9, No. 2, April 2019 56

Local Authentication: (from keystroke pdf)


Keystroke dynamic local authentication suggests a locally
installed program or some sort of mechanism, with all
computations and data storage taking place locally, that might
authenticate user logins, text typing etc.
Authentication through keystroke dynamics has mostly
been achieved with the help of pattern recognition systems,
with the most common of them listed bellow.
• Statistical Models [10][11].
• Neural networks [12].
• Fuzzy logic [13][14]
Fig. 6: Obtained Finall Results from s3 for Different k Values • Support-vector machines [15]

V. FEATURES USED WITH KEYSTROKE DYNAMICS Web Authentication


Web keystroke dynamic authentication on the other hand
The keystroke dynamics include various estimates that can
refers to authentication on websites, web applications etc. The
be distinguished by pressing keys on the keyboard.
data and the computations might take place either on a remote
1. Measurement possibilities include: server which can only be contacted through some kind of
2. Latency between consecutive keystrokes. connection such as a network or the Internet or locally through
3. Keystroke duration, an installed program.
4. hold-time.
There have not been as many researches in web
5. Type speed overall.
authentication as in local authentication. Some of them will be
Frequency of errors (how often backspace is used by the
listed bellow, depending on their pattern recognition approach.
user). The habit to use additional keys in the keyboard, e.g.
write numbers with the number pad. In which order the user • Statistical Models [10]
presses the keys when writing letters of capital, the letter key • Neural Networks
is shifted or first released.
Statistics can be global, i.e. combined for all keys, or VII. OPERATIONAL PHASES
accumulated independently for each key or keystroke. Most Keystroke dynamics; consist of three operational phases,
applications only measure latency between successive the data gathering, the training and the classification
keystrokes or keystroke durations. In Figure3 an example of procedure.
writing word “password” several times and measuring
Raw Data Collection
latencies between keystrokes. Timings have been measured
for three diff erent persons. There are clear deference in Data collection is the first phase of a keystroke dynamic
latencies and their standard deviations. system. It is essentially the phase where the mechanism
collects characteristic data and computes the features of one or
more individuals. More specifically, in keystroke dynamics,
data collection refers to the process of saving keystroke
timings, such as the press and release time of a user.

Fig. 7: Latency between Keystrokes When Three Different


People Write the Word "Password" The Word has been
Written Repeatedly. The Lines Represent Average Latencies,
Standard Deviations are the Error Bars Fig. 8: Some of the Most Important Features of a Keystroke
Sensitive Password
VI. IMPLEMENTATION
Types of Errors in Keystroke Dynamics
Authentication with keystroke dynamics can be achieved
by training a certain algorithm with the typing pattern of a They are always statistical errors in recognition patterns,
person. In this section, some of the most famous thus any algorithm used in keystroke dynamic authentication
implementations are reviewed and separated into categories or identification. Those errors rates define the quality of the
depending on their functionality, local or web, and the scope application of a keystroke dynamic system and a biometric
of their development, academic or commercial. system in general and its electiveness in distinguishing people
from each other. A set of metrics have been defined, in order

ISSN 2277-5099 | © 2019 Bonfring


Bonfring International Journal of Software Engineering and Soft Computing, Vol. 9, No. 2, April 2019 57

to calculate the accuracy and the effciency of biometric [2] S. Bhatt and T. Santhanam, “Keystroke dynamics for biometric authen-
tication - a survey”, In Int. Conf. on Pattern Recognition, Informatics
system. Those types of errors are listed as follows. and Mobile Engineering (PRIME’13), Pp. 17–23, 2013.
FAR (False Acceptance Rate) [3] M. Karnan, M. Akila and N. Krishnaraj, “Biometric personal authen-
tication using keystroke dynamics: A review”, Applied Soft Computing,
It represents the probability of an authentication system Vol. 11, No. 2, Pp. 1565 – 1573, 2011.
providing access to an impostor. The probability of that can be [4] J. Sucupira, L.H.R., M. Lizarraga, L. Ling and J.B.T. Yabu- Uti, “User
authentication through typing biometrics features”, IEEE Trans. on
very low in physiological biometrics, but in the case of
Signal Processing, Vol. 53, No. 2, Pp. 851–855, 2005.
behavioral biometrics such as keystroke dynamics, it is [5] F. Bergadano, D. Gunetti and C. Picardi, “Identity verification through
something very common and it relies upon the strictness of the dynamic keystroke analysis”, Intelligent Data Analysis, Vol. 7, No. 5,
system. Pp. 469–496, 2003.
[6] K.S. Killourhy, “A scientific understanding of keystroke dynamics”,
FRR (False Rejection Rate) Ph.D. dissertation, Carnegie Mellon University, 2012.
[7] P. Bours and S. Mondal, “Continuous Authentication with Keystroke
The false rejection rate represents the probability of an Dynamics”, Norwegian Information Security Laboratory NISlab, Pp. 41-
authentication system denying access to the legitimate user. 58, 2015.
Obviously, it also depends on the strictness of the system. [8] M. Obaidat and D. Macchiarolo, “An online neural network system
for computer access security”, IEEE Trans. on Industrial Electronics,
An important thing that should be mentioned is that FAR, Vol. 40, No. 2, Pp. 235–242, 1993.
FRR respectively, have inverse relationships. [9] C.C. Tappert, M. Villani, and S.H. Cha, “Keystroke biometric identifi-
cation and authentication on long-text input”, Behavioral Biometrics for
Biometric FAR FRR Subjects Comments Human Identification: Intelligent Application”, In Behavioral biometrics
Face 1% 10% 37437 Varied light, for human identification: Intelligent applications, Pp. 342-367, 2010.
[10] G.E. Forsen, M.R. Nelson and R.J. Staron, “Pattern analysis, and
indoor/outdoor recognition corp rome NY. Personal Attributes Authentication
Finger print 2% 2% 25000 Rotation and Techniques”, Defense Technical Information Center, 1977.
exaggerated skin [11] R. Joyce and G. Gupta, “Identity authentication based on keystroke
distortion latencies”, Commun. ACM, Vol. 33, No. 2, Pp. 168–176, 1990.
[12] M.S. Obaidat and B. Sadoun, “Verification of computer users using
Hand 2% 2% 129 With rings and keystroke dynamics”, Trans. Sys. Man Cyber. Part B, Vol. 27, No. 2, Pp.
geometry improper 261–269, 1997.
placement [13] B. Hussien, R. McLaren and S. Bleha, “An application of fuzzy
algorithms in a computer access security system”, Pattern Recognition
Iris .94% .99% 1224 Indoor Letters, Vol. 9, No. 1, Pp. 39–43, 1989.
environment [14] W.G. De Ru and J.H.P. Elo, “Enhanced password authentication through
Keystrokes 7% .1% 15 During 6 months fuzzy logic”, IEEE Expert: Intelligent Systems and Their Applications,
period Vol. 12, No. 6, Pp. 38–45, 1997.
Voice 2% 10% 30 Text dependent
and multilingual
Fig. 9: Biometric Technique Relation between FAR and FRR

VIII. CONCLUSION
In this research, Keystrokes dynamic authentication is an
underrated authentication mechanism, ”As a service” was a
basic concept behind the implementation, so that users and
websites have the opportunity to access these technologies
without any additional infrastructure or costs, in hopes of it
being more accessible to the wider public. we have focused on
identifying a person based on the person’s typing behavior. A
comprehensive analysis was carried out in an online test
dataset based on keystroke dynamics and achieved
approximately 7 percent better identification.
We proposed three identification schemes with
combination of user coupling and shown that bottom up tree
structure based scheme gives the best results. In the future, we
plan to perform an experiment on real world cyber-forensics
data and investigate how it can be used as forensics evidence
in court. In the future, we will also investigate how well we
can improve the performance for one handed typing and the
scalability of our proposed schemes.

REFERENCES
[1] R. V. Yampolskiy and V. Govindaraju, “Behavioural biometrics: a
survey and classification”, Int. Journal of Biometrics, Vol. 1, Pp. 81–
113, 2008.

ISSN 2277-5099 | © 2019 Bonfring

You might also like