Professional Documents
Culture Documents
Window
Feature vector
Feature Extraction
56
9
17
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Gaussian Mixture Model (GMM)
1 2
1 2
1 2
( ) ( )
(1) (1) (1)
(
(2) (2) (2)
)
T
T
T
v D v D
v
v v
v
v
v
v
D
Acoustic vectors
for training
GMM
Feature 1 Feature 2 Feature D
Histograms
score =log-likelihood (speech | model)
18
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Speaker Verification
The odds form of Bayes theorem
H
0
the speakers model ( ) and the tested
recording (T) have the same source
H
1
the speakers model ( ) and the tested
recording (T) have different sources
1
0
1
0
1
0
( ) ( | ) ( | )
( ) ( | ) ( | )
P P T P T
P P T H H H P
H H
T
H
=
0
1
( | )
( | )
P T
P T
>
Decision threshold
Likelihood ratio
0
57
10
19
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Outline
Automatic Speaker Recognition
Voice as Evidence
Bayesian Interpretation of Evidence
Corpus Based Methodology
Univariate Scoring Method
Multivariate Direct Method
Strength of Evidence
Evaluation of the Strength of Evidence
Mismatched Recording Conditions
Aural Speaker Recognition
20
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Interpretation of Evidence
Bayesian interpretation (BI)
Principle
The Bayesian model, proposed for forensic speaker recognition
by Lewis in 1984, allows for revision based on new information of
a measure of uncertainty (likelihood ratio of the evidence
(province of the forensic expert)) which is applied to the pair of
competing hypotheses.
The Bayesian model shows how new data (questioned recording)
can be combined with prior background knowledge (prior odds
(province of the court)) to give posterior odds (province of the
court) for judicial outcomes or issues.
prior odds x ? = posterior odds
58
11
21
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Strength of Evidence
Bayesian interpretation (BI)
( )
( )
( )
( )
( )
( )
0 0
1 1
0
1
P E P E
P
P P E H H
H H
H
P E H
=
prior
background
knowledge
posterior
knowledge
on the issue
New
Data
Prior odds Posterior odds
Likelihood
Ratio (LR)
province of the court
province of the court province of the
forensic expert
22
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Voice as Evidence
In the case of questioned recording (trace),
the evidence does not consist in speech
itself, but in the quantified degree of
similarity between speaker dependent
features extracted fromthe trace, and
speaker dependent features extracted from
recorded speech of a suspect, represented
by his/her model.
59
12
23
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Voice as Evidence
Feature
extraction
Similarity
(Distance)
Models for
each speaker
Score
Suspected speaker
reference database (R)
Suspect
Trace
Evidence (E)
Suspected speaker model
Signification ?
Bayesian Interpretation
Questioned recording
24
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Outline
Automatic Speaker Recognition
Voice as Evidence
Bayesian Interpretation of Evidence
Corpus Based Methodology
Univariate Scoring Method
Multivariate Direct Method
Strength of Evidence
Evaluation of the Strength of Evidence
Mismatched Recording Conditions
Aural Speaker Recognition
60
13
25
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Bayesian Interpretation of Evidence
The odds form of Bayes theorem
H
0
the suspected speaker is the source of the
questioned recording (within-source variability)
H
1
the speaker at the origin of the questioned
recording is not the suspected speaker
(between-sources variability)
1
0
1
0
1
0
( ) ( | ) ( | )
( ) ( | ) ( | )
P P E P E
P P E H H H P
H H
E
H
=
0
1
( | )
( | )
P E
P H E
H
Likelihood ratio Strength of evidence
similarity
typicality
26
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Outline
Automatic Speaker Recognition
Voice as Evidence
Bayesian Interpretation of Evidence
Corpus Based Methodology
Univariate Scoring Method
Multivariate Direct Method
Strength of Evidence
Evaluation of the Strength of Evidence
Mismatched Recording Conditions
Aural Speaker Recognition
61
14
27
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Uni- and Multivariate Methods
Scoring Method: Likelihood
calculated from distribution of
scores modeling within-source
and between-sources variability
H
0
: distribution of scores of
within-source variability
H
1
: distribution of scores of
between-sources variability
3 databases:
Suspect Reference Database
(R)
Potential Population
Database (P)
Suspect Control Database
(C)
Direct Method: Likelihood
directly calculated from GMM of
the suspect and GMM of the
potential population
H
0
: GMM of the suspect
H
1
: GMMs of the potential
population
2 databases :
Suspect Reference Database
(R)
Potential Population Database
(P)
Databases Used:
R=5 utterances per speaker (2-3 min each)
P =100 speakers (2-3 min each)
C =30-40 utterances per speaker (10-20 sec
each)
28
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Corpus Based Methodology
3 databases (DBs)
Potential population database (P)
Large-scale database used to model the potential
population of speakers to evaluate the between-sources
variability
Suspected speaker reference database (R)
Database recorded with the suspected speaker to model
her/his speech
Suspected speaker control database (C)
Database recorded with the suspected speaker to
evaluate her/his within-source variability
62
15
29
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Scoring Method
Trace
Relevant population
Suspect
Casework
Suspected speaker
reference database (R)
Suspected speaker
control database (C)
Potential population
database (P)
30
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Within-source variability
Feature
extraction
Similarity
(Distance)
Models for
each speaker
Scores
Suspected speaker
reference database (R)
Suspect
Suspected speaker model
Distribution of the
within-source variability
Suspect
Suspected speaker
control database (C)
63
16
31
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Between-sources Variability
Feature
extraction
Similarity
(Distance)
Models for
each speaker
Scores
Trace
Speaker models of the
potential population
Questioned recording
Potential population
database (P)
Distribution of the
between-sources variability
32
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Evaluation of the within-source variability
O
c
c
u
r
e
n
c
e
s
Similarity scores
Comparison of the suspected speaker models
with the utterances of his control database (C)
64
17
33
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Evaluation of the between-sources variability
O
c
c
u
r
e
n
c
e
s
Similarity scores
Comparison of the trace with the speaker models of
potential population database (P)
34
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Likelihood ratio
P (E | H
1
) / P (E | H
2
) =0.15 / 0.002 =75
Similarity scores
E
s
t
i
m
a
t
e
d
p
r
o
b
a
b
i
l
i
t
y
E =6
65
18
35
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Outline
Automatic Speaker Recognition
Voice as Evidence
Bayesian Interpretation of Evidence
Corpus Based Methodology
Univariate Scoring Method
Multivariate Direct Method
Strength of Evidence
Evaluation of the Strength of Evidence
Mismatched Recording Conditions
Aural Speaker Recognition
36
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Strength of Evidence - Likelihood ratio
A likelihood ratio of 9.16
obtained means that it is
9.16 times more likel y
to observe the score (E)
given the hypothesis H
0
(the suspect is the source
of the questioned
recording) than given the
hypothesis H
1
(that
another speaker from the
relevant population is the
source of the questioned
recording).
66
19
37
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
DET (Detection Curve)
DET curve can be computed fromdistributions of scores with a variable threshold
38
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Analysis and comparison
Trace
Potential
population
database (P)
Feature
extraction
Feature extraction
and modelling
Feature
extraction
Feature extraction
and modelling
Suspected
speaker
control
database (C)
Suspected
speaker
reference
database (R)
Features
Suspected
speaker
model
Features
Relevant
speakers
models
Comparative
analysis
Comparative
analysis
Comparative
analysis
Similarity
scores
Similarity
scores
Evidence (E)
67
20
39
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Interpretation of the evidence
Similarity
scores
Similarity
scores
Evidence (E)
Modelling of the
within-source variability
Modelling of the
between-sources variability
Numerator of the
likelihood ratio
Denominator of the
likelihood ratio
Likelihood ratio (LR)
Distribution of the
within-source variability
Distribution of the
between-sources variability
40
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Individual Case
Trace Suspect
Casework
Suspected speaker
reference database
Suspected speaker
single recording
Questioned recording
68
21
41
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Scoring Method with Limited Suspect Data
1
0
1
0
1
0
( ) ( | ) ( | )
( ) ( | ) ( | )
P P E P E
P P E H H H P
H H
E
H
=
The odds form of Bayes theorem
H
0
the two recordings have the same source
H
1
the two recordings have different sources
Likelihood ratio
Strength of evidence
with respect to new hypotheses
0
1
( | )
( | )
P E
P H E
H
42
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Direct Method
The odds form of Bayes theorem
H
0
the speakers model ( ) and the
questioned recording (T) have the same source
H
1
the speakers model ( ) and the
questioned recording (T) have different sources
1
0
1
0
1
0
( ) ( | ) ( | )
( ) ( | ) ( | )
P P T P T
P P T H H H P
H H
T
H
=
0
1
( | )
( | )
P T
P T
Likelihood ratio
0
Strength of evidence ?
69
22
43
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Multivariate (Direct) Method LR Numerator
Feature
extraction
Similarity
(Distance)
Models for
each speaker
Score
Suspected speaker
reference database (R)
Suspect
Trace
Suspected speaker model
Numerator of the likelihood ratio
Questioned recording
score =log-likelihood (trace | H
0
)
44
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Feature
extraction
Similarity
(Distance)
Model of
all speakers
Score
Trace
Model of the
potential population
Questioned recording
Potential population
database (P)
Multivariate (Direct) Method LR Denominator
Denominator of the likelihood ratio
score =log-likelihood (trace | H
1
)
70
23
45
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Evaluation of the Strength of Evidence
Principle
Estimation and comparison of likelihood ratios that
can be obtained fromthe evidence E:
when the hypothesis H
0
is true:
The suspected speaker truly is the source of the
questioned recording (trace)
when the hypothesis H
1
is true:
The suspected speaker is truly not the source of the
questioned recording (trace)
46
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Outline
Automatic Speaker Recognition
Voice as Evidence
Bayesian Interpretation of Evidence
Corpus Based Methodology
Univariate Scoring Method
Multivariate Direct Method
Strength of Evidence
Evaluation of the Strength of Evidence
Mismatched Recording Conditions
Aural Speaker Recognition
71
24
47
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Evaluation of the Strength of Evidence
Univariate (Scoring) Method
48
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Cumulative Density Functions
72
25
49
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Tippett plots (reliability-survival functions)
Univariate (Scoring) Method
50
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Evaluation of the Strength of Evidence
Multivariate (Direct) Method
73
26
51
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Tippett plots (reliability-survival functions)
Multivariate (Direct) Method
52
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Outline
Automatic Speaker Recognition
Voice as Evidence
Bayesian Interpretation of Evidence
Corpus Based Methodology
Univariate Scoring Method
Multivariate Direct Method
Strength of Evidence
Evaluation of the Strength of Evidence
Mismatched Recording Conditions
Aural Speaker Recognition
74
27
53
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Using databases with mismatched recording conditions
FBI NIST 2002 Database : 2
conditions (Microphone -
Telephone)
The extent of mismatch can be measured using statistical testing
54
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Compensating for Mismatch
E
H
1
scores
(matched conditions)
Pot Pop. H
1
scores
(mismatched conditions)
Ho scores
(matched conditions)
Not compensating for mismatch can be the difference between
an LR < 1 and an LR > 1
75
28
55
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Outline
Automatic Speaker Recognition
Voice as Evidence
Bayesian Interpretation of Evidence
Corpus Based Methodology
Univariate Scoring Method
Multivariate Direct Method
Strength of Evidence
Evaluation of the Strength of Evidence
Mismatched Recording Conditions
Aural Speaker Recognition
56
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Experimental Framework
Listeners
90 listeners whose mother-tongue is French
Laypersons with no phonetic training
Same computer and headphones
Training
No limitation on the number of listening trials
Testing
Verbal scores scale from1through 7
Perceptual cues
Aural Aural Speaker Recognition Speaker Recognition
76
29
57
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Perceptual Verbal Scale and Perceptual Cues Perceptual Verbal Scale and Perceptual Cues
Score 1 I amsure that the two speakers are not the same
Score 2 I amalmost sure that the two speakers are not the same
Score 3 It is possible that the two speakers are not the same
Score 4 I cannot decide
Score 5 It is possible that the two speakers are the same
Score 6 I amalmost sure that the two speakers are the same
Score 7 I amsure that the two speakers are the same
Perceptual Verbal Scale
58
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Strength of Evidence for Aural Recognition Strength of Evidence for Aural Recognition
0.0
0.2
0.4
0.6
1 2 3 4 5 6 7
E
s
t
i
m
a
t
e
d
P
r
o
b
a
b
i
l
i
t
y
H1 Ho
) (
) (
1
0
H E P
H E P
LR =
E
Perceptual Verbal Score
Likelihood Ratio (LR) = Ratio of the heights on the histograms for the
two hypotheses at the point " E"
Discrete scores
Histograms used to estimate
the probabilities of
scores for each hypothesis
77
30
59
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Evaluating Strength of Evidence in Matched Conditions Evaluating Strength of Evidence in Matched Conditions
Aural Automatic
Similar separations between curves for aural and automatic systems
Ref. PSTN vs Traces PSTN
60
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Evaluating Strength of Evidence in Mismatched Conditions Evaluating Strength of Evidence in Mismatched Conditions
Aural
Automatic
Better curve separation in
aural recognition
Better evaluation of LR for aural
recognition in mismatched conditions
Ref. PSTN vs Traces Noisy PSTN
78
31
61
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
10
-2
10
-1
10
0
10
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Li kel i hood Rati o (LR)
E
s
t
i
m
a
t
e
d
P
r
o
b
a
b
i
l
i
t
y
H0 Aural
H1 Aural
Automatic-Adapted
Automatic-Adapted
Evaluating Strength of Evidence in Adapted Conditions Evaluating Strength of Evidence in Adapted Conditions
Adaptation for noisy conditions results in the improvement
of performance of automatic recognition
Ref. PSTN vs Traces Adapted Noisy PSTN
62
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Admissibility of Scientific Evidence (USA)
Daubert criteria:
whether the theory or technique can be, and has been
tested,
whether the technique has been published or subjected
to peer review,
whether actual or potential error rates have been
considered,
whether standards exist and are maintained to control
the operation of the technique,
whether the technique is widely accepted within the
relevant scientific community.
79
32
63
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
References
Ph. Rose, Forensic Speaker Identification, Taylor and Francis, London,
2002.
D. Meuwly, A. Drygajlo, "Forensic Speaker Recognition Based on a
Bayesian Framework and Gaussian Mixture Modelling (GMM)", The
Workshop on Speaker Recognition 2001: A Speaker Odyssey, Crete,
Greece, J une, 2001, pp. 145-150 .
A. Drygajlo, D. Meuwly, A. Alexander, "Statistical
Methods and Bayesian Interpretation of Evidence in
Forensic Automatic Speaker Recognition",
EUROSPEECH'2003, Geneva, Switzerland, Sept. 2003,
pp. 689-692.
A. Alexander, A. Drygajlo, "Scoring and Direct Methods
for the Interpretation of Evidence in Forensic Speaker
Recognition, ICSLP 2004, J eju, Korea, 2004.
64
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
References
F. Botti., A. Alexander, and A. Drygajlo, An interpretation framework for the
evaluation of evidence in forensic automatic speaker recognition with limited
suspect data, Odyssey 2004, The Speaker and Language Recognition
Workshop, Toledo, Spain, 2004, pp. 6368.
A. Alexander, F. Botti, and A. Drygajlo, Handling Mismatch in Corpus-Based
Forensic Speaker Recognition, Odyssey 2004, The Speaker and Language
Recognition Workshop, Toledo, Spain, May 2004, pp. 6974
A. Alexander, F. Botti, D. Dessimoz, A. Drygajlo, "The Effect of Mismatched
Recording Conditions on Human and Automatic Speaker Recognition in Forensic
Applications", Forensic Science International, 146S (2004), pp. S95-S99.
D. Meuwly, A. Drygajlo, "A Bayesian Interpretation of Evidence in Forensic
Automatic Speaker Recognition", to be published in Forensic Science
International.
J . Gonzalez-Rodriguez, A. Drygajlo, D. Ramos-Castro, M. Garcia-Gomar, J .
Ortega-Garcia, "Robust Estimation, Interpretation and Assessment of Likelihood
Ratios in Forensic Speaker Recognition", to be published in Computer Speech
and Language.
80
33
65
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Conclusions
The Bayes model, current interpretation framework used
in forensic science, is adapted for forensic automatic
speaker recognition
The corpus based methodology provides a coherent
way of assessing and presenting the evidence of
questioned recording
Distributions of likelihood ratios can be used for the
evaluation of the performance of automatic and aural
methods in forensic speaker recognition applications
66
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
While there is certainly no perfect solution available in the field of forensic
speaker recognition at present, the scientific community is under a moral
obligation to contribute whatever possible to aid the course of justice to
establish scientifically founded methodology and techniques
What is clearly needed is joint research initiatives of forensic scientists
and speech engineers in order to study problems arising from the actual
technology and from practical work of forensic experts and gain a more
complete insight into the concept of the individuality of voice
Considering recent advances in automatic speaker verification
technology, especially with regard to robustness of parameters, enlarged
sizes of speaker groups and new statistical algorithms forensic scientists
expect a major contribution from the speech engineering side as far as
automatic speaker recognition is concerned
Conclusions
81