You are on page 1of 4

DESIGN OF A HAMMING-DISTANCE CLASSIFIER FOR ECG BIOMETRICS

Siddarth Hari, Foteini Agrafioti, Dimitrios Hatzinakos

The Edward S. Rogers Sr. Department of Electrical and Computer Engineering,


University of Toronto,
10 King’s College Road, Toronto, ON, Canada, M5S 3G4
{shari,foteini,dimitris}@comm.utoronto.ca

ABSTRACT 1.5
A) ECG heart-beats

R R'

In existing ECG-based biometric recognition systems, the 1

Voltage (mV)
feature extraction and matching are performed in Euclidean 0.5
T
P'
T'
P
spaces. However, there are many scenarios (e.g., biometric 0

template encryption for privacy protection, or low-complexity -0.5


Q S
Q'
S'

classification in an identification mode of operation) in which 50 100 150


Time (samples)
200 250 300

it is useful to binarize the feature vectors. The main con- 1


B) Autocorrelation segment

tribution of this paper is a Hamming-distance classifier for R-R' duration


Normalized Power
0.8

ECG biometrics based on SPEC-Hashing. The proposed 0.6


P, QRS duration

system was evaluated over a database of ECG signals from 0.4

52 different subjects that were collected at the Biometrics 0.2

Security Laboratory of the University of Toronto. The EER 0


0 50 100 150
Time (samples)
200 250 300

of the Hamming-distance classifier was found to be 5.5% for


closed-set matching and 14.82% for open set matching.
Fig. 1. (a) Basic components of ECG heart-beats
Index Terms— Autocorelation, electrocardiogram, SPEC- (b)Autocorrelation (zoomed-in) of an ECG signal
Hashing
The next generation of biometric systems and modalities
1. INTRODUCTION are being developed in light of the threats listed above. Medi-
cal biometrics, a new but promising category of biometric fea-
1.1. Medical Biometrics tures, is one such example. Medical biometrics utilize physi-
Biometric recognition systems are rapidly replacing tradi- ological signals of the human body (vital signals) that contain
tional identification systems based on PIN numbers, tokens subject-specific characteristics. Since these vital signals are
or passwords. Fingerprint, face and iris biometric systems are internal to the human body, medical biometrics offer an in-
increasingly being used to secure passports and grant access herent robustness to circumvention, replay and obfuscation
to high security environments. attacks. Furthermore, biometric liveness is inherently guar-
However, as the technology for biometric recognition ad- anteed.
vances and its use becomes more widespread, so too do the
methods and technology for biometric falsification. Commer- 1.2. ECG Biometrics
cial fingerprint sensors are defeated with off-the-shelf compo- The focus of this paper is on ECG-based biometric recog-
nents that can be used to create identical copies of a person’s nition. The ECG is a vital signal of cardiovascular origin
fingerprint features. Replay attacks are already a credible which reflects the cardiac electrical potential of the heart. It is
threat to voice and fingerprint biometrics. Biometric obfusca- formed during the depolarization and repolarization of spe-
tion, i.e. removal of biometric features to avoid establishment cific parts of the myocardium and can be measured at the
of one’s true identity, is another prominent challenge (for ex- surface of the body using electrodes which can be placed in
ample, asylum-seekers in Europe intentionally damaged their various configurations. From an engineering point of view,
fingerprints). With the wide deployment of biometrics, these the ECG has a non-periodic but highly repetitive pattern. Ev-
attacks are becoming more frequent and concerns are being ery quasi-period of the ECG corresponds to a pulse (or heart-
raised about the kind of security that biometric technologies beat) and it signifies a full cycle of the cardiac function. There
can offer. are particular points of interest on a heart-beat called fiducial
This work was supported by the Natural Sciences and Engineering Re- points, such as the ones shown in Figure 1. These are primar-
search Council of Canada (NSERC). ily related to the main ECG waves, namely the P wave, the

978-1-4799-0356-6/13/$31.00 ©2013 IEEE 3009 ICASSP 2013


QRS complex and the T wave. the Euclidean space: y = WφT . Details pertaining to the
There are two main approaches for feature extraction from AC/LDA can be found in [6] and [8].
ECG signals, namely fiducial points-dependent or indepen- 2.2. Binarization
dent. Fiducial-dependent approaches extract features based In [7], a quantization approach was suggested (using a Max-
on the local characteristics of heart-beats [1, 2]. For exam- Lloyd quantizer), together with a Gray-code mapping. While
ple, temporal and amplitude distances between consecutive Gray-mapping is effective when the number of bits is small
fiducial points on the ECG were proposed in [1, 3]. On the (< 5bits), this approach does not scale well. In addition, bio-
other hand, non-fiducial approaches explore the ECG wave- metric feature vectors have unique properties, such as small
form as a whole thereby eliminating the need for heart-beat intra-class variability and high inter-class variability, which
segmentation and fiducial-point detection [4, 5, 6]. In the are not taken into account in ML quantization.
above works, the resulting feature vectors lie in Euclidean This motivates the need for class-specific or similarity-
spaces. To the best of our knowledge, there hasn’t been sig- preserving binarization techniques, which is the focus of this
nificant prior work that addresses the problem of binarization work.
(i.e. transformation of feature vectors from Euclidean into There has been significant amount of work on discovering
binary Hamming spaces) for ECG signals. In [7], a quantiza- good projections from arbitrary feature spaces (metric spaces)
tion approach was suggested (using a Max-Lloyd quantizer), into binary vector-spaces. This interest has been motivated by
together with a Gray-code mapping. However, quantization- the need for fast nearest neighbor search in face-recognition
based approaches do not necessarily preserve distance prop- and image-retrieval applications. Some methods include the
erties, i.e two feature vectors that are separated by large Eu- traditional KD-tree, and the more recent Locality Sensitive
clidean distances are not guaranteed to be mapped to two Hashing, Parameter Sensitive Hashing, RBM-based learning,
binary vectors that have a large Hamming-distance separa- Spectral Hashing, SPEC-Hashing, random quantization using
tion.This observation has motivated the subsequent analysis Fourier features etc. Some qualitative discussion and relevant
for ECG biometric processing in the Hamming space. references can be found in [9].
2. OVERVIEW OF THE PROPOSED SYSTEM In this paper, the possibility of adapting the SPEC-
The proposed method for ECG analysis in the hamming space Hashing algorithm to the needs of biometric systems is evalu-
is based on the Autocorrelation/ Linear Discriminant Analy- ated. The purpose is to map (component wise) feature vectors
sis (AC/LDA) [6] in conjunction with SPEC-Hashing. We from the space containing the image of the LDA-based trans-
use the following notation : we denote vectors using bold form (i.e., Im(W)) into binary vectors in such a way that
lower-case x. xm denotes the m-th component of the vec- two feature vectors which have a small Euclidean distance
tor x, whereas x(i) denotes the i-th vector in a set. We use between them are mapped to two binary vectors that are sep-
bold upper-case X for matrices. f (·) denotes that the output arated by small Hamming distance. With this manner, the
of the function is a vector. intra-class and inter-class variabilities will be carried to the
new space. SPEC-Hashing is a similarity preserving algo-
2.1. Feature extraction rithm for entropy-based coding that was developed by Lin
The AC/LDA is a two step procedure in which the autocor- et al.[9] for fast nearest neighbor search in high-dimensional
relations of 5-sec ECG windows are first estimated, and then feature spaces, primarily in the context of image retrieval and
subjected to dimensionality reduction using the LDA. celebrity face recognition applications. The nearest neigh-
Figure 1 shows an example of AC for an ECG reading. bors are defined according to the semantic similarity between
The autocorrelation is computed as: objects in the feature space.
N −|m|−1
In this work, the SPEC-Hashing approach is evaluated for
X closed-set and open-set biometric matching. In the first case,
bee [m] =
R e[i]e[i + m] (1)
a fraction of each user’s enrollment data is used for training
i=0
of the SPEC-Hashing algorithm. For the open-set scenario, a
where e[i], i = 0, 1...(N − 1), are samples of the ECG signal, generic database of non-enrollees is input for SPEC-Hashing
bee
m is the time lag, and N is the length of the signal. Out of R training while the output is applied to a new set of users i.e.,
only a segment φ[m], m = 0, 1...M , starting from the zero the enrollees of the system under evaluation.
lag instance and extending to approximately the length of P The input to the SPEC-Hashing block is the set of LDA-
and QRS waves (as shown in Figure 1) is input to the LDA feature vectors {x(i) } ⊂ RM corresponding to the training
training block. These waves are the least affected by heart data, along with a similarity matrix S. Sij , the (i, j)-th en-
rate changes, and consequently, utilizing only this segment try of the similarity matrix, denotes the semantic similarity
for discriminant analysis makes the system robust to heart rate between the i-th and j-th feature vectors in the training set.
variability. The SPEC-Hashing algorithm is run separately for each
The output of the LDA-training is a transformation ma- of the M components of the LDA- feature vectors. The out-
trix W. The AC feature vectors are projected with W into put is a list of M threshold-vectors {λ1 , λ2 , · · · , λM }. Each

3010
of these threshold-vectors corresponds to a component of the Algorithm 1 SPEC-Hashing
LDA- feature vectors, and they can have varying lengths (their Inputs : LDA-feature vectors (for training set), Similarity
length is optimally determined by the algorithm). matrix
Once the list of thresholds is obtained from the SPEC- Outputs : an ordered list of thresholds
Hashing algorithm during the training phase, the binarization
of the ECG feature vectors is done as follows : 1: Set L = ∅
Let us first look at the m-th component. Suppose λm = 2: for each component m = 1, 2, · · · , M do
[λm,1 , λm,2 , · · · , λm,n ]. Then, the m-th component xm of 3: Set flag = 0.
a feature vector x is mapped to an n-bit vector bm (xm ) = 4: while flag = 0 do
[bm,1 (xm )bm,2 (xm ) · · · bm,n (xm )] where 5: Find λopt that minimizes ∆(λ).
 6: if ∆(λ) < 0 then
0 xm ≤ λm,i Update L = [L, λ]
bm,i (xm ) = (2) 7:
1 xm > λm,i 8: else
Once we have the binary vectors for each component xm , 9: Set flag = 1.
we simply form the binary representation of x by concatena- 10: end if
tion, i.e., b(x) = [b1 (x1 )b2 (x2 ) · · · bM (xM )]. 11: end while
In the enrollment phase, for each feature vector x(i) we 12: end for

compute the corresponding binary representation b x(i) .
Here on, we will denote this as b(i) for simplicity, and refer
collection sessions took place, scheduled a few weeks apart in
to these as the binary feature vectors.
order to evaluate the stability of the signal with time. Every
During the authentication phase, the feature vector y ex-
recording is 3 minutes long and the lead orientation matches
tracted from the user’s ECG reading is mapped to the binary
that of Lead II of the standard 12-lead ECG system. 16 out
vector b = b(y) using the same binary decision stumps b(·).
of 52 volunteers were recorded in both sessions and 36 in just
We now comment briefly on the operation of the SPEC-
one session. The sampling frequency is 200Hz.
Hashing algorithm (a detailed treatment can be found in [9]).
Open-set recognition setup. To simulate an open-set bio-
Suppose we have a list of thresholds L, and the corresponding
metric system the dataset was split between a training and an
binary mappings obtained using Eq.(3). Let dH (i, j) denote
evaluation set. The training set included the 36 subjects for
the Hamming distance between the binary vectors b(i) and
who just one ECG reading is available.The proposed algo-
b(j) (corresponding to the feature vectors x(i) and x(j) ). We
rithm was trained on signals from this set. The evaluation set
can define the matrix T (corresponding to list L )as follows
included signals from the 16 volunteers for which two record-
1 −dH (i,j) ings are available. The earlier recording was used for enroll-
Tij = e (3) ment and the later for testing.
Z
P Closed-set recognition setup. To simulate the closed-set
where Z = i,j e−dH (i,j) . Also, let us denote the normal- biometric system the proposed algorithm was trained on ECG
ized similarity matrix by S̃. We can then view the operation signals from all users i.e., the enrollees of the system. The
of the SPEC-Hashing algorithm as finding an optimum list enrollment data includes the first ECG readings from the 16
of thresholds that minimizes the Kullback-Liebler divergence volunteers and the first half of the ECG reading from the 36
between the distribution T and the target distribution S̃. The volunteers.
closeness of the two distributions represents how closely the The resulting binary vectors, obtained using the list of bi-
similarities are preserved by the mapping from the Euclidean nary decision-stumps learnt by the SPEC-Hashing algorithm
space to the Hamming feature space. were 345 bits long for the open-set setup and 468 bits long for
The SPEC-Hashing algorithm builds the list L in an in- the closed-set setup (we set the maximum number of bits per
cremental fashion. Let L∗ = [L, λ]. Define dimension to be 25 bits). The difference of the bit length be-
tween the two setups is due to the dimensionality after LDA
∆(λ) = KL(S̃||T∗ ) − KL(S̃||T) (4) projection.
Figure 2 shows the histograms for the intra-subject and
If ∆(λ) is negative, then adding λ to the list reduces the di- inter-subject Euclidean and Hamming distances for the
vergence from the target distribution, and the list is updated closed-set recognition setup. For the Euclidean space, the
to include λ. average inter-subject distance is 0.2656 and the average intra-
3. EXPERIMENTAL RESULTS
subject distance is 0.0757. For the Hamming space, the
The proposed system was evaluated over ECG signals from average inter-subject distance is 0.2496 (116 bits) and the
52 different volunteers that were collected at the Biometrics intra-subject 0.0719 (33 bits).
Security Laboratory of the University of Toronto. Two signal Figure 3 shows the performance of three open-set sys-

3011
A) AC/LDA feature space B) SPEC-Hash feature space
50 40

Percentage of Pairs

Percentage of Pairs
40
30

30
20
20

10
10

0 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Euclidean Distance Hamming Distance

Fig. 2. Histogram of intra-class and inter-class Euclidean and Hamming distances


System Setup EER 4. CONCLUSION
AC/LDA Euclidean Classifer Open-set 18.6% As ECG biometric systems begin to scale, it is important to
Hamming Classifier (SPEC-Hashing) Open-set 14.82% study binarization of the ECG feature vectors and design of
AC/LDA Euclidean Classifer Closed-set 11.4% Hamming-distance classifiers, in order to enable important
Hamming Classifier (SPEC-Hashing) Closed-set 5.5% applications such as template protection using biometric en-
cryption, efficient low-complexity matching for identification
mode of operation etc. A method for binarization of the
Table 1. System equal-error-rate (EER) ECG feature vectors based on SPEC-Hashing was proposed
Open-set Biometric System Performance and evaluated over ECG signals from 52 subjects. With the
proposed treatment there is virtually no loss in performance
60
AC/LDA Euclidean Classifier
SPEC-Hashing
compared to nearest neighbor classification in the Euclidean
ML Quantization LDA-projection space.
50
False Acceptance Rate (%)

EER 5. REFERENCES
40
[1] S. A. Israel, J. M. Irvine, A. Cheng, M. D. Wiederhold, and B. K.
Wiederhold, “ECG to identify individuals,” Pattern Recognition, vol.
30 38, no. 1, pp. 133–142, 2005.
[2] L. Biel, O. Pettersson, L. Philipson, and P. Wide, “ECG analysis: a new
approach in human identification,” IEEE Trans. on Instrumentation and
20
Measurement, vol. 50, no. 3, pp. 808–812, 2001.
[3] S.I. Safie, J.J. Soraghan, and L. Petropoulakis, “Electrocardiogram ECG
10 biometric authentication using pulse active ratio PAR,” Information
10 15 20 25 30 35 40 45 50
Forensics and Security, IEEE Transactions on, vol. 6, no. 4, pp. 1315–
False Rejection Rate (%) 1322, dec. 2011.
[4] G. G. Molina, F. Bruekers, C. Presura, M. Damstra, and M. van der
Fig. 3. ROC: FAR versus FRR for the Euclidean and Ham- Veen, “Morphological synthesis of ECG signals for person authentica-
tion,” in Proceedings of 15th European Signal Proc. Conf., Poland, Sept.
ming (SPEC-Hash and ML) spaces 2-7 2007.
[5] I. Odinaka, Po-Hsiang Lai, A.D. Kaplan, J.A. O’Sullivan, E.J. Sirevaag,
S.D. Kristjansson, A.K. Sheffield, and J.W. Rohrbaugh, “ECG biomet-
tems: the AC/LDA Euclidean-distance classifier (which is rics: A robust short-time frequency analysis,” in Proceedings of IEEE In-
the baseline system), the Max-Lloyd quantization approach ternational Workshop on Information Forensics and Security, Dec. 2010,
pp. 1–6.
(for the sake of completeness), and the proposed Hamming-
[6] F. Agrafioti and D. Hatzinakos, “Fusion of ECG sources for human
distance classifier with binary feature vectors designed using identification,” in Proceedings of 3rd Int. Symp. on Communications
the SPEC-Hashing algorithm. The Max-Lloyd quantization Control and Signal Processing, Malta, March 2008, pp. 1542–1547.
approach does not perform well, because binarization does [7] F. Agrafioti, F. M. Bui, and D. Hatzinakos, “Medical biometrics in mo-
not take into consideration class information. The perfor- bile health monitoring,” Security and Communication Networks, vol. 4,
no. 5, pp. 525–539, July 2010, .
mance of the proposed Hamming-distance classifier using bi-
[8] Foteini Agrafioti and Dimitrios Hatzinakos, “ECG biometric analysis
narization based on SPEC-Hashing outperforms all cases. As
in cardiac irregularity conditions,” Signal, Image and Video Processing,
stated in Table 1, the Equal Error Rate (EER) for the AC/LDA pp. 1863–1703, 2008.
based Euclidean-distance classifier was found to be 18.6% [9] Ruei-Sung Lin, D.A. Ross, and J. Yagnik, “Spec hashing: Similarity
(Open-set), and the EER of the proposed Hamming-distance preserving algorithm for entropy-based coding,” in Computer Vision
classifier is 14.82% (Open-set). The EER in the closed set and Pattern Recognition (CVPR), 2010 IEEE Conference on, june 2010,
pp. 848–854.
setup of the proposed system is 5.5%.

3012