You are on page 1of 5

ISIT 2010, Austin, Texas, U.S.A.

, June 13 - 18, 2010

Identification and Lossy Reconstruction in Noisy


Databases
Ertem Tuncel∗ , Deniz Gündüz†
∗ Department of Electrical Engineering, University of California, Riverside, CA.
† Centre Tecnológic de Telecomunicacions de Catalunya (CTTC), 08860, Castelldefels, Barcelona, Spain.
E-mail: ertem.tuncel@ucr.edu, deniz.gunduz@cttc.es

Abstract—A noisy database system is studied in which the noisy a compressed version of the observed feature vectors rather
versions of the underlying feature vectors are observed in both than storing the whole noisy observation. This compression at
the enrollment and the query phases. The noisy observations the enrollment stage introduces a tradeoff between the identi-
are compressed before being stored in the database, and the
user wishes both to identify the correct entry corresponding to fication capacity and the compression, which is characterized
the noisy query vector and to reconstruct the original feature in [3] and [4].
vector within a desired distortion requirement. A fundamental Here we consider another dimension of the biometric
capacity/storage/distortion tradeoff is identified for this system database system. In the identification stage, we require not
in the form of single-letter information theoretic expressions.
The relation of this problem to the classical Wyner-Ziv rate-
only a reliable identification of the database entry correspond-
distortion problem is shown, where the noisy query vector acts ing to the noisy query, but also a lossy reconstruction of the
as the correlated side information in the lossy reconstruction of underlying feature vector. Note that the noisy query vector
the feature vector. serves as a side information for the reconstruction in the
identification stage. In a sense, this problem combines and
I. I NTRODUCTION
generalizes the capacity/storage tradeoff problem in biometric
Biometric data, such as fingerprints, behavioral patterns and databases studied in [3] and [4] with the classical Wyner-
iris scans are replacing classical identification documents for Ziv rate-distortion problem in [7]. We provide a single-letter
increased security. However, efficient use of such data for information theoretic expression for the set of achievable
sensitive security applications requires storage of extensive capacity/storage/distortion tradeoff for this biometric identi-
digital biometric data in a large database and fast search fication system.
algorithms for reliable identification of the queries within The rest of the paper is organized as follows. We introduce
the database. On top of the storage constraints and search the system model and the necessary definitions in Section II.
speed requirements, another difficulty arises due to the noisy The main result of the paper is presented in Section III, and
observation of the underlying biometric feature in both the Sections IV and V are dedicated to its proof. In Section VI
enrollment and identification stages. This might be either due we study a binary symmetric feature vector and identify the
to the random noise in the scanning device as in the case of capacity/storage/distortion tradeoff assuming noiseless obser-
fingerprinting or iris scanning, or due to the temporal changes vation in the enrollment phase and an erasure channel in the
in the expression of the underlying feature as in the case of query phase. Section VII concludes the paper.
behavioral patterns such as keystroke.
The first attempts to identify the fundamental performance
limits of biometric databases are done in [1] and [2] where II. S YSTEM M ODEL
the maximum exponential rate of entries that can be reliably
identified in a database is characterized. The main assumption We assume that the feature vectors {X n (m)}M m=1 are

is that the components of the underlying feature vectors are generated independently with the identical distribution of
independent and identically distributed (i.i.d.) with a known n

distribution, while both the enrollment vectors and the queries P [X n (m) = xn ] = PX (xi )
are noisy versions of the feature vectors through two different i=1
discrete memoryless channels, which might model two differ-
ent measurement devices or measurement conditions. Defining over the finite feature alphabet X .
the highest possible identification rate as the capacity of the The database is formed by an enrollment phase, in which
system, [2] provides a single-letter expression for the capacity, the noisy version of the feature vector of an individual is
characterized by the mutual information between the noisy observed and recorded to the database. We denote the observed
enrollment and the noisy query distributions. noisy feature vector by Y n (m), m ∈ M = {1, . . . , M },
However, to improve the efficiency of the identification which are assumed to be the output of a discrete memoryless
process in the storage device, it may be desirable to store only channel (DMC) characterized by PY |X , where Y is the finite

978-1-4244-7891-0/10/$26.00 ©2010 IEEE 191 ISIT 2010


ISIT 2010, Austin, Texas, U.S.A., June 13 - 18, 2010

observation alphabet. We have where d : X × X̂ → [0, dmax ]. Though the reconstruction


n
 function outputs a legitimate X̂ n even when Ŵ = W , we are
P [Y n (m) = y n |X n (m) = xn ] = PY |X (yi |xi ), only interested in upper-bounding the distortion conditioned
i=1 on Ŵ = W .
for m ∈ M. Definition 1: (Rc , Ri , D) is an achievable compression
In the enrollment phase, each entry is compressed before rate, identification rate, and distortion tuple if, for any  > 0
it is recorded to the database, and only the compressed and sufficiently large n, there exist deterministic enrollment
descriptions of the observed feature vectors are stored in the function f and deterministic identification and reconstruction
database. We consider a deterministic compression function functions g and h, respectively, such that
1
f : Y n → L = {1, . . . , L}, log L ≤ Rc (2)
n
where L denotes the index set for the compressed observation 1
log M ≥ Ri (3)
vectors. We denote the index for entry m ∈ M as J(m) = n
f (Y n (m)). These indices refer to length-n codewords from while
the compression codebook of size L.
Pen ≤ , (4)
In the identification phase, an index W is chosen uniformly
n n
from M and is independent of the database entries. The user of E[d(X (W ), X̂ )|Ŵ = W ] ≤ D +  . (5)
the database observes X n (W ) through a memoryless channel We also denote by R the set of achievable (R , R , D) triplets. c i
characterized by PZ|X with finite output alphabet Z, i.e.,
n
III. C APACITY /S TORAGE /D ISTORTION T RADEOFF

P [Z n = z n |X n (W ) = xn ] = PZ|X (zi |xi ) (1) The main result of the paper is stated in the following
i=1 theorem.
where Z n is the output of the channel. Note that due to the Theorem 1: Define R∗ as the region of triplets (Rc , Ri , D)
independence of the noisy observation channels in the enroll- for which there exist an auxiliary random variable U ∈ U with
ment and the identification phases, Y n (W ) − X n (W ) − Z n joint distribution pU Y XZ and a function φ : U × Z such that
form a Markov chain. U − Y − X − Z forms a Markov chain and
The user has two goals. The first goal is to identify W in Ri ≤ I(U ; Z)
the database by using the noisy observation vector Z n and R − Ri
c
≥ I(U ; Y |Z)
the entries of the database {J(m)}Mm=1 . In addition, she also
wants to reconstruct an estimate of the original feature vector D ≥ E[d(X, φ(U, Z))] .
X n (W ) within a desired average distortion requirement. Then R = R∗ .
We define two separate functions for the identification and Remark 1: Using standard arguments, it is straightforward
the reconstruction processes. The identification function is to show that R∗ is convex and it suffices to consider auxiliary
defined as alphabets U with |U| ≤ |Y| + 2.
g : LM × Z n → M If there is no reconstruction requirement, we obtain the
following capacity/storage tradeoff by letting D = dmax .
and the corresponding estimate is denoted by
Corollary 1: A compression-identification rate pair
Ŵ = g(J, Z n ) (Rc , Ri ) is achievable if and only if there exist a random
variable U ∈ U with joint distribution pU Y XZ such that
where J  J(1), . . . , J(M ). The average error probability in U − Y − X − Z forms a Markov chain and
the identification process is defined as
1  Ri ≤ I(U ; Z) (6)
Pen  Pr[Ŵ = W |W = w]. R − Ri
c
≥ I(U ; Y |Z) (7)
M
w∈M

The lossy reconstruction function is defined as where |U| ≤ |Y| + 1.


The equivalence of (6) and (7) to
h : L × Z n → X̂ n ,
Ri ≤ I(U ; Z) (8)
where X̂ is the finite reconstruction alphabet. The correspond- R c
≥ I(U ; Y ) (9)
ing reconstruction is denoted
which characterize the original region derived in [3] and [4],
X̂ n = h(J(Ŵ ), Z n ) follows from [8, Theorem 1]. More specifically, rewriting (8)
and the distortion it incurs is measured by the single-letter as
measure −Ri ≥ −I(U ; Z) (10)
n
1 one can treat −Ri as the first-stage rate of a fictitious two-stage
d(xn , x̂n ) = d(xi , x̂i )
n i=1 source coder, for which (6) and (7) characterize the cumulative

192
ISIT 2010, Austin, Texas, U.S.A., June 13 - 18, 2010

i
rate region and (8) and (9) characterize the marginal rate ŵ = g(j(1), . . . , j(2nR ), z n ) as the smallest such m, and set
i
region. Equivalence of the two regions then follow from the g(j(1), . . . , j(2nR ), z n ) = 1 if no such m is found.
fact that So far, the only randomness mentioned above is that of
  the codebook U n (j, k). We pause here to emphasize that this
−I(U ; Z) = min I(U ; Y ) − I(U ; Z)
U randomization is for the purpose of creating an ensemble of
I(U ; Y ) = 0 codebooks over which we compute average probability of error
and average distortion. On the other hand, the database is filled
is trivially achieved by a dummy U . with random entries also, but this randomness is inherent to
Another special case of this setup is obtained if we ignore the problem and is independent from codebook generation.
the identification requirement of the user, i.e., by letting i
Now for m = 1, . . . , 2nR , define J(m) = f (Y n (m)) and
Ri = 0. It is not hard to see that the model then reduces to K(m) as the smallest k found in the process of enrolling
the classical Wyner-Ziv problem of lossy source compression Y n (m). If no (j, k) was found, also set K(m) = 1. Although
in the presence of receiver side information with the slight K(m) is not recorded, it is useful to define it for analysis
difference that the receiver wants to reconstruct the X n vector purposes. Finally, let Ŵ = g(J, Z n ).
rather than the Y n vector that is available at the encoder. We Reconstruction: For any noisy observation z n ∈ Z n and a
obtain the following rate-distortion region. given compression index j ∈ L, the reconstruction function
Corollary 2: A compression-distortion pair (Rc , D) is h(j, z n ) is defined as follows. Find the smallest k such that
achievable if and only if there exist a random variable U ∈ U (z n , U n (j, k)) ∈ T[ZU
n
] , and output φ(Ui (j, k), zi ) for the ith
with joint distribution pU Y XZ such that U − Y − X − Z forms component of h(j, z n ). If no such k is found, then output a
a Markov chain and random vector from the reconstruction alphabet.
Rc ≥ I(U ; Y |Z) Probability of error: We define the following events:
 
D ≥ E[d(X, X̂)]
E0 (m) = (Y n (m), Z n ) ∈ n
/ T[Y Z]
with |U| ≤ |Y| + 1.  
E1 (m) = (Y n (m), U n (J(m), K(m))) ∈ n
/ T[Y U]
IV. ACHIEVABILITY  
E2 (m, k) = (Z n , U n (J(m), k)) ∈ n
/ T[ZU ] .
We will first prove the achievability of an (Rc , Ri , D) tuple
for which there exist a random variable U ∈ U and a function The average probability of error for the identification process
φ : U × Z satisfying U − Y − X − Z and can then be bounded as
Ri + R̄ ≤ I(U ; Z) (11) Pr{Ŵ = W |W = w} ≤ Pr{E0 (w)}
Rc + R̄ ≥ I(U, Y ) (12) + Pr{E1 (w)|E0 (w)c }
D ≥ E[d(X, φ(U, Z))] (13) + Pr{E2 (w, K(w))|E1 (w)c }
 
for some R̄ ≥ 0. As mentioned above, this auxiliary −R̄ will + Pr{E2 (m, k)c }. (14)
play the role of rate transferred from the “second-stage” rate m=w k
Rc to the “first-stage” rate −Ri of a fictitious source coder. It is straightforward to show that Pr{E0 (w)} → 0. We can
Fix pU |Y and the function φ that satisfy the conditions in also show using standard arguments that Pr{E1 (w)|E0 (w)c }
c
the lemma. We first generate a codebook of size 2n(R +R̄) vanishes with increasing n if
n
that consists of i.i.d. codewords U . We index the codewords
Rc + R̄ > I(U ; Y ).
c
U n (j, k) for j = 1, . . . , 2nR and k = 1, . . . , 2nR̄ .
Enrollment: For any y ∈ Y n , we define the enroll-
n
That Pr{E2 (w, K(w))|E1 (w)c } also vanishes with increasing
ment function f (y n ) as the smallest index j for which
n follows from the Markov lemma [6]. In fact, with high
(y n , U n (j, k)) ∈ T[Y
n
U ] for some k = 1, . . . , 2
nR̄
. We set
n probability,
f (y ) = 1 if no such index can be found. Thus, one can think
of the collection of all codewords U n (j, k) for k = 1, . . . , 2nR̄ (Z n , X n (w), Y n (w), U n (J(w), K(w))) ∈ T[ZXY
n
U] (15)
as “bins,” and f (y n ) as a source coder which records only the
bin index. which will be useful in the distortion analysis. Finally,
 
Identification: For any noisy observation z n ∈ Z n and the i
Pr{E2 (m, k)c } ≤ 2n(R +R̄) 2−nI(U ;Z)
i
given compression indices j(1), . . . , j(2nR ) of the database m=w k
entries, the identifier looks for a database entry m ∈
i the right-hand side of which vanishes for large enough n if
{1, . . . , 2nR }, such that (z n , U n (j(m), k)) ∈ T[ZU n
]
1
for
nR̄
some k = 1, . . . , 2 . We define the identification function Ri + R̄ < I(U ; Z).
1 For a probability distribution P , we denote by T n the set of all strongly
X
Next, we consider the average distortion incurred by the
[X]
typical sequences. For more detail on strong typicality see [5]. reconstruction. A crucial observation at this point is that with

193
ISIT 2010, Austin, Texas, U.S.A., June 13 - 18, 2010

probability approaching one, when Ŵ = W , Now, we define


X̂i = φ(Ui (J(W ), K(W )), Zi ) , Ui  (Z i−1 , Zi+1
n
, J(W ))
that is, the index k found in the reconstruction process for Z n and observe that Zi − Xi (W ) − Yi (W ) − Ui forms a Markov
and j = J(W ) matches K(W ). This follows from (15) and chain. Using (3), we can write
the fact that
 (1 − )nRi − 1
Pr{E2 (W, k)c } ≤ 2nR̄ 2−nI(U ;Z)
≤ H(Z n ) − H(Z n |J(W )) (20)
k=K(W ) n
 
which vanishes since R̄ < I(U ; Z) is granted. Thus, when = H(Zi |Z i−1 ) − H(Zi |Z i−1 , J(W ))
Ŵ = W , with high probability i=1
n
 
d(X n (W ), X̂ n ) ≤ H(Zi |Z i−1 ) − H(Zi |Z i−1 , Zi+1
n
, J(W ))
n
1 i=1
= d(Xi (W ), X̂i ) n

n i=1 = [H(Zi ) − H(Zi |Ui )]
n
1 i=1
= d(Xi (W ), φ(Ui (J(W ), K(W )), Zi )) n
n i=1 = I(Zi ; Ui ) .

≤ (1 +  ) PZXY U (z  , x , y  , u ) d(x , φ(u , z  )) i=1

z  ,x ,y  ,u Thus,


≤ E[d(X, φ(U, Z))] +  dmax 1
n
1
≤ D+. (1 − )Ri ≤ I(Zi ; Ui ) + . (21)
n i=1 n
Since the ensemble averages satisfy the desired require-
We also have from (2) that
ments, there must exist a deterministic codebook and functions
f , g, and h for which the same requirements are satisfied. nRc ≥ log L
Having shown the sufficiency of the conditions (11)-(13), = H(J(W ))
if we apply Fourier-Motzkin elimination for R̄, we obtain
the achievability of the rate-distortion tuples as given in the ≥ I(J(W ); Y n (W )) .
expression of the theorem. Combining this with (20), we get
V. C ONVERSE 1
n(Rc − Ri + Ri + )
Here we prove the converse part of the theorem, i.e., that n
R ⊂ R∗ . We assume the achievability of a tuple (Rc , Ri , D), ≥ I(J(W ); Y (W )) − I(J(W ); Z n )
n

i.e., for any  > 0 there exist deterministic functions f, g and = I(J(W ); Y n (W )|Z n ) (22)
h such that (2)-(5) are satisfied. n
We have = I(J(W ); Yi (W )|Z n , Y i−1 (W ))
i=1
log M = H(W ) n 

= H(W |J, Z n ) + I(W ; J, Z n ) = H(Yi (W )|Z n , Y i−1 (W ))
i=1
≤ H(W |Ŵ ) + I(W ; J, Z n ) (16) 
−H(Yi (W )|J(W ), Z n , Y i−1 (W ))
≤ 1 + Pen log M + I(W ; J, Z n ) (17)
n

where (16) follows since Ŵ is a deterministic function of J ≥ H(Yi (W )|Zi ) − H(Yi (W )|J(W ), Z n )
and Z n , and (17) follows from Fano’s inequality. From here i=1
we can obtain n

= H(Yi (W )|Zi ) − H(Yi (W )|Ui , Zi )
(1 − ) log M − 1 ≤ I(W ; J, Z n ) i=1
= I(W ; Z n |J) (18) n
= I(Yi (W ); Ui |Zi )
= H(Z n |J) − H(Z n |W, J)
i=1
≤ H(Z n ) − H(Z n |W, J)
where (22) follows from the fact that Z n − Y n (W ) − J(W )
= H(Z n ) − H(Z n |J(W )) (19) forms a Markov chain. Thus,
n
where (18) follows since W is independent of the database en- 1 1
tries, and hence of J, and (19) follows since Z n is independent Rc − Ri + Ri + ≥ I(Yi (W ); Ui |Zi ) . (23)
n n i=1
of J(m) with m = W .

194
ISIT 2010, Austin, Texas, U.S.A., June 13 - 18, 2010


As for the distortion constraint, first observe that For the maximum distortion D = 2, the problem reduces
to the one studied in [4], and
E[d(X n (W ), h(J(W ), Z n ))]
Ri
= (1 − Pen )E[d(X n (W ), h(J(Ŵ ), Z n ))|Ŵ = W ] Rc (Ri , /2) = .
1−
+Pen E[d(X n (W ), h(J(W ), Z n ))|Ŵ = W ]
On the other hand, if recognition rate Ri is set to zero, we
≤ (1 − Pen )(D + ) + Pen dmax
obtain
≤ D + (1 + dmax ) . Rc (0, D) = ψ (D)
Thus, denoting by hi the ith component of h, we have which coincides with the ordinary Wyner-Ziv rate-distortion
function for erasure side information.
D + (1 + dmax )
n
1 VII. C ONCLUSIONS
≥ E[d(Xi (W ), hi (J(W ), Z n ))]
n i=1 We have studied a noisy database system where both the
n
 enrollment and the query vectors are noisy versions of the
1 underlying feature vectors. The noisy enrollment vectors are
= E[d(Xi (W ), hi (Ui , Zi ))] . (24)
n i=1 compressed before being stored in the database to reduce the
storage requirement and increase the search speed. The user
From (21), (23), (24), and convexity of R∗ , R ⊂ R∗
of the database wishes not only to identify the correct entry
follows.
corresponding to a noisy query vector, but also to reconstruct
VI. A N E XAMPLE the original feature vector of the queried entry within a de-
sired distortion requirement. We have identified a fundamental
Consider binary feature vectors with PX (x) = 12 for x ∈
capacity/storage/distortion tradeoff and identified the set of
X = {0, 1}. Let the enrollment channel PY |X be noiseless
achievable compression rate, identification rate, and distortion
(thus Y = X ), and PZ|X be a symmetric erasure channel with
tuples in a single-letter form. This problem combines and
Z = {0, ?, 1} and erasure probability . Also let X̂ = X and generalizes the previously studied capacity/storage tradeoff
d(·, ·) be the Hamming distortion measure. in databases and the Wyner-Ziv rate distortion function for
Due to space limitations, we provide here only a sketch lossy source compression in the presence of decoder side
of the full analysis. It is not difficult to show that, although information. As an example, we have studied the case of
the alphabet size could potentially be as high as |Y| + 2 in binary symmetric feature vectors with noiseless enrollment
general, in this case |U| = |Y| = 2 suffices to characterize the channel and erasure query channel and evaluated the capac-
whole region, with PU |Y being a binary symmetric channel ity/storage/distortion tradeoff for this special scenario.
with crossover probability α ≤ 12 . With this choice,
R EFERENCES
I(Y ; U ) = 1 − H(α) (25)
[1] J. A. O’Sullivan and N. A. Schmid, “Large deviations performance
I(Z; U ) = (1 − )[1 − H(α)] (26) analysis for biometrics recognition,” Proc. of Allerton Conf. on Comm.,
Control, and Computing, Oct. 2002, Monticello, IL.
E[d(X, φ(U, Z))] = α (27) [2] F. Willems, T. Kalker, J. Goseling and J.-P. Linnartz, “On the capacity
of a biometrical identification system,” Proc. IEEE Int’l Symp. Inform.
with
Theory, Yokohama, Japan, July 2003.
z z= ? [3] M. B. Westover and J. A. O’Sullivan, “Achievable rates for pattern
φ(u, z) = .
u z =? recognition,” IEEE Trans. Inform. Theory, vol. 54, no. 1, pp. 299-320,
Jan. 2008.
It then follows that (Rc , Ri , D) ∈ R if and only if [4] E. Tuncel, “Capacity/storage tradeoff in high-dimensional identification
systems,” IEEE Trans. Inform. Theory, vol. 55, no. 5, pp. 2097-2106,
Rc ≥ Rc (Ri , D) May 2009.
[5] I. Csiszár and J. Körner, Information Theory: Coding Theorems for
Discrete Memoryless Systems, New York: Academic, 1981.
where [6] T. Berger, “Multiterminal source coding,” Lectures presented at CISM

Summer School on the Inform. Theory Approach to Commun., July 1977.
c i Ri + ψ (D) 0 ≤ Ri ≤ (1 − )ψ (D)
R (R , D) = Ri
[7] A. D. Wyner and J. Ziv, “The rate-distortion function for source coding
1− (1 − )ψ (D) ≤ Ri ≤ 1 −  with side information at the decoder,” IEEE Transactions on Information
(28) Theory, vol. 22, no. 1, pp. 1-10, Jan. 1976.
[8] E. Tuncel, “The rate transfer argument in two-stage scenarios: When
for 0 ≤ D ≤ 2 , with does it matter?” Proc. IEEE Int’l Symp. Inform. Theory, Seoul, S. Korea,
July 2009.
D
ψ (D) = 1 − H .

Note that D > 2 and Ri > 1 −  need not be considered.
In (28), the expression for the range (1 − )ψ (D) ≤ Ri ≤
1 −  follows from (25)-(27), whereas the expression in the
range 0 ≤ Ri ≤ (1−)ψ (D) is obtained through rate transfer.

195

You might also like