You are on page 1of 81

Forensic Speech Science

Part I: Forensic Phonetics


Anders Eriksson
Department of Linguistics,
Gothenburg University,
Gothenburg, Sweden
Historical background
Man has always had strong
intuitions about the reliability of
voice recognition:
The voice of the speaker is as
easily distinguished by the ear as
the face is by the eye
Quintillian, 3596 AD
1
Historical background
An early court case:
In 1660, William Hulet was
accused of having executed King
Charles I. A witness, Richard
Gittens, testified that he knew that
it was Hulet by his speech.
Historical background
On March 1, 1932 the son of the famous aviator
Charles Lindbergh was kidnapped and was later
found dead. The crime has been called the Crime
of the century because of the enormous publicity
it attracted. Its interest in forensic phonetics,
however, has to do with voice recognition and
memory.
Before it became known that the boy was dead, a
ransom was paid to the kidnapper by a negotiator.
Onthat occasion, April 2, 1932, Lindbergh heard
the kidnappers voice, but could not see him.
2
Historical background
In September 1934, 29 months after
hearing the voice of the kidnapper,
Lindbergh (in disguise) was confronted
with the suspected kidnapper, Bruno
Hauptmann, who was instructed to
repeat the phrase Lindbergh had heard.
Lindbergh then claimed that he
recognized the voice as the one he had
heard 29 months earlier.
Historical background
At the trial in J anuary 1935, Lindbergh
testifies under oath that the suspects voice is
the onehe had heard 29 months earlier.
3
Historical background
The invention of
the sound
spectrograph meant
a breakthrough in
speech analysis. A
first model was
built at Bell Labs
in the early forties.
Historical background
The original motivation behind the
development of the spectrograph
was the phonetic study of speech.
a method of approach to studies
of speech production and
measurement
Steinberg, 1934
4
Historical background
A real time spectrograph called Direct
Translator was also produced, to be
used for pronunciation training for the
deaf and foreign language students.
Historical background
In spite of the general interest of the
spectrograph as a tool and the
suggested applications, no
publications describing the work
appeared from Bell labs until 1945.
Why?
Because the work was rated as a war
project.
5
Historical background
The reason could hardly have been
military applications of pronun-
ciationtraining for the deaf. It
must have been something else.
We have reasons to believe that
speaker identification by the use of
spectrograms was what gave the
research its war project rating.
Historical background
It has been suggested that one of the
intended applications was identi-
fyingenemy war ships by identifying
their radio operators, but very little is
know about it.
The term voiceprint appears in
some publications but without
explicit reference to speaker
identification.
6
Historical background
If the people at Bell Labs, sponsored by the
military, secretly worked on voiceprints for
speaker identification purposes, as we have
reasons to believe, then the early history of
the voiceprints follows very parallel tracks
in the USSR, including the fact that we know
very little about it.
The only (?) account of the Soviet efforts we
have is the novel The First Circle by
Solzhenitsyn.
Historical background
The plot of the novel takes place within a
time-span of only three days during the
Christmas Holiday of 1949 and the setting is
the Mvrinoprison at the outskirts of
Moscow, were the Stalinist regime held
unreliable scientist imprisoned.
The prison had its own acoustics laboratory
and the so called Clipped Speech Laboratory
where work on speech coding took place.
7
Historical background
One day the focus shifted, at least
temporarily, from voice clipping to voice
recognition, when the people working in
the lab were given the task of identifying a
anonymous speaker in a tapped telephone
conversation by comparing the recorded call
with sample recordings of five suspects.
They were given only two days to complete
the task.
Historical background
Given that Siberia was a likely alternative
option, it comes as no surprise that they
succeeded with their task.
There is no detailed information in the
novel about the methods they used, but it
is obvious that they were familiar with
similar efforts outside the USSR.
8
Historical background
Based on the description in the
novel, it seems likely that the
spectrograph they used may
have been based on the description
by Steinberg published in J ASA in
1934.
Historical background
This diagram in Steinbergs papers fits
the description in the novel very well.
9
Historical background
Screen shots from the American
television (1991) series based on
The First Circle.
Historical background
Two quotations from the novel:
The science of phonoscopy, born today,
December 26th 1949, does have a rational
core.
They envisioned the system, like finger-
printing ... Any criminal conversation
would be recorded ... and the criminal
would be caught straight off, like a thief
who had left his fingerprints on the safe
door.
10
Historical background
The term the inmates at Mvrino
coined for the use of acoustic
analysis as a means of speaker
identification was phonoscopy. In
Russia today and many former east
European countries this is still the
term used.
Some fundamental issues
In the following sections we will
present a selection of important
issues in forensic phonetics trying
to describe problems as well as
solutions, and what we know at
present and do not yet know.
11
Voiceprints
Much of the story of voiceprinting
in forensic phonetics revolves
around one particular man
Lawrence G. Kersta who was an
engineer at Bell and head of the lab
until he resigned in 1966to start his
own company dedicated to forensic
phonetics.
Voiceprints
Between 1945, when people at
Bell started to publish again and
1962 there was no mention of
voiceprints.
But in 1962 Kersta, still at Bell,
published a paper in Nature titled
Voiceprint identification
12
Voiceprints
He also gave a paper at the ASA
meeting that same year called
"Voiceprint-identification
infallibility.
In both papers he described how
spectrograms could be used for
speaker identification.
Voiceprints
What made his claims so remarkable
was, however, the accuracy he
claimed for his method.
Based on visual comparison of key
words, his examiners achieved no
less than 99% correct identification
or better.
13
Voiceprints
In spite of his rather sensational claims
and the fact that his description of the
method was vague, to say the least, the
scientific community was slow to
react.
Up until 1966 when he resigned from
Bell to start his own company he
remained largely unchallenged.
Voiceprints
He therefore enjoyed some initial
success and his testimonies were
accepted as evidence by courts in
some, but not all, states.
He later began to meet with resistance,
however, when other researchers
tested the method of visual voice
recognition from spectrograms.
14
Voiceprints
Subjects in a study by Young and
Campbell (1967), for example, using
the voiceprint technique, obtained
78.4% correct identifications for two
words spoken in isolation but only
38.3% when the same words were
taken from different contexts.
Voiceprints
Many others joined in as more and
more results indicated that the
method was by no means as
reliable as Kerstahad claimed.
But there were also those who
supported him, most notably Tosi
who was a qualified phonetician.
15
Voiceprints
A weak point in addition to the fact
that the results could not be
reproduced was that there was never
a detailed, explicit description of the
method. We may rather safely
assume, however, that it was largely
intuitively based.
Voiceprints
The controversy continued until the
late eighties and voiceprintingis still
done by private detectives and other
non-academic experts but nobody
in the speech science community
believes in its usefulness for forensic
purposes any more.
16
Voiceprints
What we, as forensic phoneticians,
may learn from this experience is not
so much that the methods were not
sufficiently reliable, but that they
were put to use in forensic field work
without having been thoroughly tested
and that professional phoneticians
were far too slow to react to it.
Voicerecognition and memory
As we mentioned, the Lindberghcase
raised questions about voice
recognition accuracy and memory. A
researcher who questioned whether it
would be possible to accurately
remember an unknown voice over a
period of two years was a psychologist
by the name of Francis McGehee.
17
Voicerecognition and memory
In the first of her experimentsthe
listeners heard a speaker read a 56-
word passage. Theywere then
assigned to groups who heard the
speaker as one of the speakers in a
voice line-up with five foils at
intervals of 1, 2, and 3 days, 1, 2
and 3 weeks and 1, 3, and 5 months
respectively.
Voicerecognition and memory
Recognition rate varied as a function
of time starting at a little over 80%
correct identifications after a lapse of 1
day or 1 week. After 2 weeks the
recognition rate had fallen to 69%,
after a month to 57%, after 3 months
to 35% and after 5 months it was down
to 13%, which is less than chance.
18
Voicerecognition and memory
Later studies have in general confirmed her
findings although the precise decay rate
may vary from study to study.
0 5 10 15 20
Ti me lapse (weeks)
10
20
30
40
50
60
70
80
C
o
r
r
e
c
t

i
d
e
n
t
i
f
i
c
a
t
i
o
n
s

(
%
)
Non-contemporary speech samples
The term refers to speech samples,
which are obtained at different
points in time and later used in an
identification process. The relevant
question in forensic phonetics is at
what separation in time between
speech samples, change over time
becomes a problematic factor.
19
Non-contemporary speech samples
In forensic cases time spans of a year
or more between a suspect recording
and a later attempt at identifying the
speaker are not unusual. It is therefore
important to know if voice changes
that take place over a period of one or
a few years may affect the accuracy of
speaker recognition.
Non-contemporary speech samples
This question has been addressed in
a series of studies by Hollienand
Schwartz (2000).
They tested latencies between
recordings from 4 weeks up to 20
years.
20
Non-contemporary speech samples
There was a drop in correct
identification from around 95% for
contemporary samples to 7085% for
latencies from 4 weeks to 6 years
(with no observable time trend in the
interval). For the 20-year latency,
however, a sharp drop down to 35%
could be observed.
Non-contemporary speech samples
For similar voices,
however, there was a
dramatic effect.
Performance dropped
from around 95% for
contemporary
samples to 40% for
samples recorded
only 4 weeks later.
In the normal case, non-contemporary speech thus
seems to affect identification only marginally.
21
Other issues involving the sample
Other factors that may influence identification
accuracy are primarily sample duration and
acoustic quality.
If we first consider the influence of sample
duration, we may observe that in real life
investigations samples may be very short, often
just a few words or a phrase or two which
means that sample duration is on the order of a
few seconds.
Other issues involving the sample
In an early study by Pollack et al. (1954) the
authors observed that identification accuracy
increased with sample size but only up to about
1.2 seconds. For longer samples phonetic
variation took over as the most important factor.
They conclude that duration per se is
relatively unimportant, except insofar as it
admits a larger or smaller statistical sampling of
the speakers speech repertoire.
22
Other issues involving the sample
This somewhat surprising finding has,
however, been confirmed in other
studies. Bricker and Pruzansky(1966)
presented stimuli which varied in
duration as well as phonemic variation.
They found that identification rate
increased with duration only if the
longer stimuli also contained more
phonemic variation.
Other issues involving the sample
It is important to point out, however, that
while an increase in correct identifications
is desirable it is equally desirable to keep
the number of false alarms down.
Yarmeyand Matthys(1992) found that:
The facilitating effect on identification of
longer voice-sample durations was
counteracted by the high false alarm rates
in both suspect-present and suspect-absent
line-ups.
23
Other issues involving the sample
A large proportion of threats and abuse is
done over the telephone. Telephone quality
speech has therefore received some
attention in forensic phonetics studies.
An important question in the forensic
context is whether the poorer sound quality
of recorded telephone conversations
adversely affects voice identification.
Other issues involving the sample
It is a common belief that because of the
difference in sound quality, speaker
identification of voices heard over the
telephone must necessarily be performed
using voices recorded over the telephone,
the underlying assumption being that the
difference in sound quality would make
identification less reliable if directly
recorded voice samples were used.
24
Other issues involving the sample
There are surprisingly few studies that
address this question but the results there
are indicate that the problem might not be
as serious as one might expect.
Rathbornet al.(1981) did not find any
significant differences in identification of a
target voice heard over the telephone and
tested using a taped lineupover the
telephone, in contrast to voice identification
tested directly with a taped lineup.
Other issues involving the sample
A question that has received some attention
lately is the influence on acoustic analysis
of voice samples of the band-pass filtering
that occurs in telephone transmissions.
Knzel (2001) found that the lower cut-off
frequency had the effect of shifting F
1
in
German vowels upwards compared to the
corresponding tokens in a simultaneous
DAT-recording. The average frequency
shift was on the order of 6%
25
Familiarity with the speaker
Hollienet al. studied
speaker identification as
a function of familiarity
under three speaking
conditions, normal,
stressedand disguised.
Listeners who were
familiar with the
speakers performed
significantly better under
all conditions.
Familiarity with the speaker
These results have generally been confirmed
in other studies.
It is important to point out, however, that
although recognition rates are generally high
for familiar speakers, recognition is by no
means always perfect. For individual
speakers and listeners the error rates can be
very high if the utterances are short and
belong to a fairly large open set. (Ladefoged
& Ladefoged, 1980)
26
Familiarity with the speaker
An influence of utterance length on the
recognition of familiar speakers has also
been found in other studies.
In a series of experiments reported by Rose
and Duncan (1995), recognition of familiar
speakers varied from chance level to nearly
perfect as a function of utterance length.
Familiarity with the speaker
It has been generally assumed that in
voice recognition, discrimination
constitutes the initial step with recog-
nitionoccurring as a later phase.
But Van Lancker et al. have shown
that discrimination and recognition
are not stages in one process, but are
dissociated, unordered abilities.
27
Familiarity with the speaker
It is therefore entirely possible that
a listener who is good at
recognizing familiar speakers may
perform badly if the task is to
discriminate between unfamiliar
speakers.
Disguise
Voice disguise, to the extent that it is
used, may be a serious problem for
speaker identification. In the extreme
end of the spectrum we find electronic
manipulation or even communicating
via speech synthesis, which would
make speaker identification virtually
impossible.
28
Disguise
In the world of real forensic work,
however, voice disguise tends to be of a
rather unsophisticated nature.
Knzel, based on experience from the
German Federal Police (BKA), notes that
falsetto, pertinent creaky voice,
whispering, faking a foreign accent, and
pinching ones nose are the most
common types.
Disguise
Even unsophisticated types of disguise
may have a considerable detrimental effect
on speaker identification. In a study by
Reich and Duke all types produced
significantly fewer correct identifications.
Hypernasalityproduced the greatest effect.
Whisper resulted in markedly fewer
correct identifications in a study by
Orchard and Yarmey.
29
Disguise
Voice disguise is not as common as
one might think. Knzel reports that:
Over the last two decades, between 15
and 25 per cent of the annual cases
dealt with at the BKA speaker
identification section exhibited at least
one kind of disguise.
Disguise
Electronically manipulated messages are
still rare, but Knzel notes that there has
been an increase in recent years, mainly
in the form of editing recorded voices.
While at present electronic manipulation
is rare and therefore not a significant
problem, that may soon change, with
increasing availability of such devices.
30
It is generally found that foreign accent makes
identification more difficult, but the difference is
often small and not always present.
McGeheefound no difference at all using
speakers with a German accent.
Doty (1998) on the other hand found substantial
differences using speakers from the US and
England speaking English as a native language
and speakers from France and Belize speaking
English as a foreign language and native speakers
of English as listeners (88% vs. 13%).
Foreign Accents
Foreign Accents
Results by Goldstein, et al. (1981) fall
somewhere in between: With relatively
long speech samples, accented voices
were no more difficult to recognize than
were unaccented voices; reducing the
speech sample duration decreased
recognition memory for accented and
unaccented voices, but the reduction was
greater for accented voices.
31
Foreignlanguages
Thompson (1987) recorded six bilingual
male students reading messages in English,
Spanish, and English with a strong Spanish
accent.
Voices were best identified by monolingual
English speaking listeners when speaking
English and worst when speaking Spanish.
Identification accuracy was intermediate
for the accent condition.
Schiller and Kster (1996) tested
Americans with no knowledge of
German, Americans who knew some
German, and native German speakers
using recordings of German speakers.
Subjects with no knowledge of German
made significantly more errors than
other subjects. Subjects who knew some
German performed similarly to native
German speakers.
Foreignlanguages
32
Kster and Schiller (1997) used Spanish and
Chinese listeners.
Spanish and Chinese listeners who were
familiar with German showed better
recognition rates than listeners with no
knowledge of German.
Spanish and Chinese listeners with a
knowledge of German performed measurably
worse than the German and English listeners
with a knowledge of German.
Foreignlanguages
We may summarize the results by saying
that listeners with no knowledge of a
language perform worse on voice
recognition than listeners with some
knowledge or native speakers, while
listeners with some knowledge of the
language tend to perform on the same level
as native speakers or only slightly below.
Foreignlanguages
33
Earwitnesses
Factors, which are relevant for speaker
recognition in general, like memory,
familiarity, disguise etc. are also
relevant for earwitnesses, but there are
additional factors about which we
presently do not know as much as we
would like.
Earwitnesses
The first such factor is stress.
the majority of (the relatively few)
studies of earwitnessing bear little
resemblance to real-life witnessing
circumstances. Most have used
nonstressful situations with prepared
subjects participating in laboratory
situations
Bull and Clifford (1984)
34
Earwitnesses
The stress that witnesses may experience
in a real life situation can never be fully
recreated in a laboratory experiment.
Neither can we, or the witness, have much
experience to draw on that will help us
determine just how and how much the
capabilities of a traumatized victim to
recognize a voice or discriminate between
voices may be affected.
Earwitnesses
Another factor is familiatity.
personal experience of voice recognition,
is always of familiar voices the voices
that are not usually those to be identified in
criminal situations (Bull and Clifford)
And as we know from the work by Van
Lancker and Kreiman, recognizing a
familiar voice and discriminating between
unfamiliar ones are independent abilities.
35
Earwitnesses
A third factor is preparedness
Whereas subjects in a laboratory
experiment are, to a higher or lesser
degree, prepared for the situation, real
life witnesses are in most cases not.
Studies have shown that voice
identification accuracy under unprepared
conditions is much lower.
Earwitnessline-ups
An earwitnessline-up (or voice parade) is meant
to be the auditory equivalent of an eyewitness
line-up. It is used when a person has heard but not
seen the perpetrator.
Recordings of a suspects voice and number of
foils are presented and the witness is to compare
the voices with the memory of the perpetrators
voice and determine if any of them matches the
memory.
36
Earwitnessline-ups
Two important questions in connection
with earwitnessline-ups are
1) how many voices should be present
in the line-up?
2) how similar to the suspects voice
should the voices of the foils be?
Earwitnessline-ups
It has been found that with few voices
there may be marked position effects and
that the number of correct identifications
decreases as lineupsize increases. So the
question is if there is an optimal size
where the position effect is minimized
and the decrease in correct
identifications has bottomed out.
37
Earwitnessline-ups
A number of studies have addressed the
question of lineupsize. They are in
reasonable agreement that the decrease in
identification accuracy bottoms out with
about 6 foils and that position effects only
appear if the target voice comes first.
Thus, as a rule of thumb at least, 5 or 6
foils should be used.
Earwitnessline-ups
How similar to the target should the foils
be?
At least the two extremes must be avoided.
The target voice must not stand out as
different. The speakers must be reasonably
matched with respect characteristics like
speaker age, dialect etc.
On the other hand they should not be
sound-alikes.
38
Earwitnessline-ups
When Rothman (1977) used sound-
alikes(brothers, fathers, sons)
identification dropped from 94%
(ordinary foils) to 58% (sound-alikes).
Similar results were obtained by Hollien
and Schwartz (2000).
Thus foils should be chosen so as to
represent a reasonable degree of
variation but avoiding the extremes.
Lie detection
Attempts have been made recently to use
brain scanning methods in order to study
the possibility of consistent differences
in brain activity patterns which separate
lie or deception from truthful statements.
Although this research is only in its
infancy, some highly interesting results
have been obtained.
39
Lie detection
Langlebenet al. (2002) used Functional
Magnetic Resonance Imaging (fMRI) to
detect differences in brain activity when
their subjects told a lie compared to when
they told the truth. Their results indicate
that: There is a neurophysiological
difference between deception and truth at
the brain activation level that can be
detected with fMRI. Similar results have
been obtained in other studies.
Lie detection
High resolution thermal imaging
which can detect minor regional
changes in the blood flow in the
face for example has also been used
in an attempt to develop methods to
detect lie and deception (Pavlidis
and Levine, 2002).
40
Lie detection
We should be aware that these are
very preliminary results. When, and
if, these methods can be put to use in
forensic fieldwork we will not know
for many years to come. We must
also be aware that there may be a
very long way to go between research
results and reliable field applications.
Lie detection
Unfortunately this is not always the
case. Unproven technologies are
becoming increasingly attractive to
US law enforcement and security
agencies Laboratory tools from
infrared sensors to eye trackers are
being converted into lie detectors
(Knight 2004).
41
Overgeneralization, charlatanry, fraud
The most well known lie detector
is the so called Polygraph. Its first
appearance can be dated back to
1917. A more refined version was
used in a court case in 1923 and
Polygraphs have been used ever
since with some refinements.
Overgeneralization, charlatanry, fraud
The basic idea behind the Polygraph is
that lying increases the level of stress and
if you can register the involuntary
reactions we know to be correlated with
stress (respiration, pulse, blood pressure,
and galvanic skin respons(e.g.palm
sweat), these signs can be used to detect
lies and deception.
42
Overgeneralization, charlatanry, fraud
A typical Polygraph setup.
Overgeneralization, charlatanry, fraud
The problem with the Polygraph as a lie
detector lies in the interpretation.
Correlations between stress levels and
pulse for example are found as group
results. To generalize from group results
to individuals is, of course, not a valid
step. Neither is it a valid step to conclude
that a person who experiences stress
must necessarily be lying.
43
Overgeneralization, charlatanry, fraud
The basic idea behind lie detectors
based on voice analysis is that there are
properties in the voice signal that may be
reliably correlated with lie or deception.
Voice stress analysis (VSA), based on the
monitoring of so called micro tremor is
such a method.
Overgeneralization, charlatanry, fraud
But whereas there are scientifically
established correlations between stress
and the indicators used by the Poly-
graph, there is no scientific basis for
the voice stress analysis whatsoever.
The few in depth studies there are of
micro tremor in the larynx indicate
that it does not even exist.
44
Overgeneralization, charlatanry, fraud
But it does make pretty diagrams!
Overgeneralization, charlatanry, fraud
So what the VSA analyzers do is
measure the variation in something
that isnt even there, in itself an
achievement of sorts.
If the people who use these gadgets
dont know any better we may be
generous enough to call it charlatanry,
the alternative being fraud of course.
45
Overgeneralization, charlatanry, fraud
Finally an example which without the
slightest doubt may be classified as fraud.
An Israeli based company markets the
most wonderful tools including both lie
detectors and love detectors. The
technique behind the lie detector is said to
be something called Layered Voice
Analysis (LVA).
Overgeneralization, charlatanry, fraud
Here is how they claim it works
every event that passes through the
brain will leave its finger prints on
the speech flow. LVA Technology
ignores what your subject is saying,
and focuses only on his brain activity.
In other words, the how it is said is
crucial and not the what.
46
Overgeneralization, charlatanry, fraud
They are careful not to explicitly call
the gadget lie detector, but there is
absolutely no question that that is what
they want us to believe it is:
LVA is capable of detecting the
intention behind the lie, and by so
doing can lead you in identifying and
revealing the lie itself.
Overgeneralization, charlatanry, fraud
There is, of course, not a shred of
evidence for a relationship between
voice and brain activity of the proposed
kind. And a thorough scrutiny of the
description of the method in the
American patent documents confirms
the suspicion that the method is pure
nonsense, perhaps best described as
statistics based on digitization artefacts.
47
Overgeneralization, charlatanry, fraud
The statistics is based upon what is defined as
thorns and plateaus which has no relevance
at all for voice analysis and is moreover
dependent on how the signal is sampled.
Overgeneralization, charlatanry, fraud
Gadgets like these do not deserve to be
taken seriously as such, but their use in
forensic investigations must be. If bogus
lie detectors like the ones described here
are used not just by shady private
investigators, but by insurance
companies, police departments and
security agencies, this poses a threat that
we must oppose more actively.
48
1
1
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Forensic Automatic Speaker Recognition
FORENSIC SPEECH SCIENCE
Dr. Andrzej Drygajlo
andrzej.drygajlo@epfl.ch
Speech Processing and Biometrics Group
Signal Processing Institute (ITS -LIAP)
Swiss Federal Institute of Technology Lausanne (EPFL)
School of Criminal Sciences
University of Lausanne
2
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Biometric characteristics in forensic applications
Biological traces
DNA (DeoxyriboNucleic Acid), blood, saliva,etc.
Biological (physiological) characteristics
fingerprints, eye irises and retinas, hand palms and
geometry, and facial geometry
Behavioral characteristics
dynamic signature, gait, keystroke dynamics, lip motion
Combined
voice
49
2
3
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Popular biometric characteristics (modalities)
Fingerprint
Voice
Face
Retina
Signature
Iris
4
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Forensic Biometric Applications
Forensic Biometrics
Individualisation of human beings
Challenge: to automate forensic biometric methods
Existing systems and databases
Automatic Fingerprint Identification System(AFIS US
made) and fingerprints databases
DNA sequencers and DNA databases
Challenge: Large scale automatic systems and
databases for: speech, handwriting, face images,
earmarks, etc.
50
3
5
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Constraints
Systems developed according to specified
recommendations from:
Tool perspective (recognition and computer
technology)
Forensic expert perspective (methodology)
Criminal policy perspective (investigation)
Legal perspective (impact of the application of
the data and privacy protection lawon the
efficiency of the methods used)
J udicial perspective (the role of the court)
6
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Law enforcement and forensic applications
The lawenforcement applications include the use of
biometrics to recognize individuals
Apprehended or incarcerated because of criminal activity
Suspected of criminal activity
Whose movement is restricted as a result of criminal activity
The biometric may be used to identify non-cooperative and
unknown subjects, to ensure that the correct inmates are
released, or to verify that individuals under home arrest are
in compliance
51
4
7
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Forensic Speaker Recognition
Aural-perceptual methods
earwitnesses, line-ups
Visual methods and voiceprint?
visual comparison of spectrograms of linguistically identical
utterances (utterly misleading!)
Aural-instrumental methods
analytical acoustic approach combined with an auditory phonetic
analysis
Automatic methods
Speaker verification not adequate
Speaker identification not adequate
Bayesian framework for the evaluation of identity
8
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Forensic specificity
Short utterances
Questioned recording - uncontrolled environment
Investigations in controlled conditions (longer utterances)
Telephone quality (95%)
Clear understanding of the inferential process
Respective duties of the actors involved in the judicial
process: jurists, forensic experts, judges, etc.
The forensic experts role is to testifyto the worth of the evidence by using,
if possible a quantitative measure of this worth.
It is up to the judge and/or the jury to use this information as an aid
to their deliberations and decision.
52
5
9
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Forensic Experts Role
A forensic expert testifying in court to a conclusion in an
individual case is not an advocate, but a witness who
presents factual information and offers a professional
opinion based upon that factual information.
Expert opinion testimony is, and will remain, one of the
most powerful forms of evidence in the courtroom.
In order for it to be effective, it must be carefully
documented, and expressed with precision, but without
overstatement, in as neutral and objective a way as the
adversary system permits.
Professional concepts must be articulated in a way lay
persons (like the judge and the lawyers) can understand.
10
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Individual Case
Trace Suspect
Casework
Suspected speaker
reference database
Suspected speaker
single recording
Questioned recording
53
6
11
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Adversary System
The speaker at the origin
of the questioned recording
is not the suspected speaker
The suspected speaker
is the source of the
questioned recording
12
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Outline
Automatic Speaker Recognition
Voice as Evidence
Bayesian Interpretation of Evidence
Corpus Based Methodology
Univariate Scoring Method
Multivariate Direct Method
Strength of Evidence
Evaluation of the Strength of Evidence
Mismatched Recording Conditions
Aural Speaker Recognition
54
7
13
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Automatic Speaker Recognition
Speaker recognition is the general termused to include all
of the many different tasks of discriminating people based on
the sound of their voices.
Speaker identification is the task of deciding, given a
sample of speech, who among many candidate speakers
said it. This is an N-class decision task, where N is the
number of candidate speakers.
Speaker verification is the task of deciding, given a sample
of speech, whether a specified candidate speaker said it.
This is a 2-class decision task and is sometimes referred to
as a speaker detection task.
14
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Recognition
results
Speech
wave
Training
Recognition
Feature
extraction
Reference
templates/models
for each speaker
Similarity
(Distance)
Principal structure of speaker recognition systems
Decision / Interpretation ?
55
8
15
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Principal structure of speaker recognition systems
Feature
extraction
Similarity
(Distance)
Models for
each speaker
Score Speech wave
Training
Text-dependent methods:
- Dynamic Time Warping (DTW)
- Hidden Markov Models (HMMs)
Text-independent methods:
- Vector Quantization (VQ)
- Gaussian Mixture Models (GMMs)
16
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Frame

Window
Feature vector
Feature Extraction
56
9
17
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Gaussian Mixture Model (GMM)
1 2
1 2
1 2

( ) ( )
(1) (1) (1)
(
(2) (2) (2)
)
T
T
T
v D v D
v
v v
v
v
v
v
D









Acoustic vectors
for training
GMM
Feature 1 Feature 2 Feature D
Histograms
score =log-likelihood (speech | model)
18
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Speaker Verification
The odds form of Bayes theorem
H
0
the speakers model ( ) and the tested
recording (T) have the same source
H
1
the speakers model ( ) and the tested
recording (T) have different sources
1
0
1
0
1
0
( ) ( | ) ( | )
( ) ( | ) ( | )
P P T P T
P P T H H H P
H H
T
H
=
0
1
( | )
( | )
P T
P T

>
Decision threshold
Likelihood ratio
0

57
10
19
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Outline
Automatic Speaker Recognition
Voice as Evidence
Bayesian Interpretation of Evidence
Corpus Based Methodology
Univariate Scoring Method
Multivariate Direct Method
Strength of Evidence
Evaluation of the Strength of Evidence
Mismatched Recording Conditions
Aural Speaker Recognition
20
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Interpretation of Evidence
Bayesian interpretation (BI)
Principle
The Bayesian model, proposed for forensic speaker recognition
by Lewis in 1984, allows for revision based on new information of
a measure of uncertainty (likelihood ratio of the evidence
(province of the forensic expert)) which is applied to the pair of
competing hypotheses.
The Bayesian model shows how new data (questioned recording)
can be combined with prior background knowledge (prior odds
(province of the court)) to give posterior odds (province of the
court) for judicial outcomes or issues.
prior odds x ? = posterior odds
58
11
21
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Strength of Evidence
Bayesian interpretation (BI)
( )
( )
( )
( )
( )
( )
0 0
1 1
0
1
P E P E
P
P P E H H
H H
H
P E H
=
prior
background
knowledge
posterior
knowledge
on the issue
New
Data
Prior odds Posterior odds
Likelihood
Ratio (LR)
province of the court
province of the court province of the
forensic expert
22
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Voice as Evidence
In the case of questioned recording (trace),
the evidence does not consist in speech
itself, but in the quantified degree of
similarity between speaker dependent
features extracted fromthe trace, and
speaker dependent features extracted from
recorded speech of a suspect, represented
by his/her model.
59
12
23
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Voice as Evidence
Feature
extraction
Similarity
(Distance)
Models for
each speaker
Score
Suspected speaker
reference database (R)
Suspect
Trace
Evidence (E)
Suspected speaker model
Signification ?
Bayesian Interpretation
Questioned recording
24
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Outline
Automatic Speaker Recognition
Voice as Evidence
Bayesian Interpretation of Evidence
Corpus Based Methodology
Univariate Scoring Method
Multivariate Direct Method
Strength of Evidence
Evaluation of the Strength of Evidence
Mismatched Recording Conditions
Aural Speaker Recognition
60
13
25
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Bayesian Interpretation of Evidence
The odds form of Bayes theorem
H
0
the suspected speaker is the source of the
questioned recording (within-source variability)
H
1
the speaker at the origin of the questioned
recording is not the suspected speaker
(between-sources variability)
1
0
1
0
1
0
( ) ( | ) ( | )
( ) ( | ) ( | )
P P E P E
P P E H H H P
H H
E
H
=
0
1
( | )
( | )
P E
P H E
H
Likelihood ratio Strength of evidence
similarity
typicality
26
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Outline
Automatic Speaker Recognition
Voice as Evidence
Bayesian Interpretation of Evidence
Corpus Based Methodology
Univariate Scoring Method
Multivariate Direct Method
Strength of Evidence
Evaluation of the Strength of Evidence
Mismatched Recording Conditions
Aural Speaker Recognition
61
14
27
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Uni- and Multivariate Methods
Scoring Method: Likelihood
calculated from distribution of
scores modeling within-source
and between-sources variability
H
0
: distribution of scores of
within-source variability
H
1
: distribution of scores of
between-sources variability
3 databases:
Suspect Reference Database
(R)
Potential Population
Database (P)
Suspect Control Database
(C)
Direct Method: Likelihood
directly calculated from GMM of
the suspect and GMM of the
potential population
H
0
: GMM of the suspect
H
1
: GMMs of the potential
population
2 databases :
Suspect Reference Database
(R)
Potential Population Database
(P)
Databases Used:
R=5 utterances per speaker (2-3 min each)
P =100 speakers (2-3 min each)
C =30-40 utterances per speaker (10-20 sec
each)
28
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Corpus Based Methodology
3 databases (DBs)
Potential population database (P)
Large-scale database used to model the potential
population of speakers to evaluate the between-sources
variability
Suspected speaker reference database (R)
Database recorded with the suspected speaker to model
her/his speech
Suspected speaker control database (C)
Database recorded with the suspected speaker to
evaluate her/his within-source variability
62
15
29
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Scoring Method
Trace
Relevant population
Suspect
Casework
Suspected speaker
reference database (R)
Suspected speaker
control database (C)
Potential population
database (P)
30
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Within-source variability
Feature
extraction
Similarity
(Distance)
Models for
each speaker
Scores
Suspected speaker
reference database (R)
Suspect
Suspected speaker model
Distribution of the
within-source variability
Suspect
Suspected speaker
control database (C)
63
16
31
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Between-sources Variability
Feature
extraction
Similarity
(Distance)
Models for
each speaker
Scores
Trace
Speaker models of the
potential population
Questioned recording
Potential population
database (P)
Distribution of the
between-sources variability
32
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Evaluation of the within-source variability
O
c
c
u
r
e
n
c
e
s
Similarity scores
Comparison of the suspected speaker models
with the utterances of his control database (C)
64
17
33
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Evaluation of the between-sources variability
O
c
c
u
r
e
n
c
e
s
Similarity scores
Comparison of the trace with the speaker models of
potential population database (P)
34
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Likelihood ratio
P (E | H
1
) / P (E | H
2
) =0.15 / 0.002 =75
Similarity scores
E
s
t
i
m
a
t
e
d
p
r
o
b
a
b
i
l
i
t
y
E =6
65
18
35
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Outline
Automatic Speaker Recognition
Voice as Evidence
Bayesian Interpretation of Evidence
Corpus Based Methodology
Univariate Scoring Method
Multivariate Direct Method
Strength of Evidence
Evaluation of the Strength of Evidence
Mismatched Recording Conditions
Aural Speaker Recognition
36
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Strength of Evidence - Likelihood ratio
A likelihood ratio of 9.16
obtained means that it is
9.16 times more likel y
to observe the score (E)
given the hypothesis H
0
(the suspect is the source
of the questioned
recording) than given the
hypothesis H
1
(that
another speaker from the
relevant population is the
source of the questioned
recording).
66
19
37
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
DET (Detection Curve)
DET curve can be computed fromdistributions of scores with a variable threshold
38
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Analysis and comparison
Trace
Potential
population
database (P)
Feature
extraction
Feature extraction
and modelling
Feature
extraction
Feature extraction
and modelling
Suspected
speaker
control
database (C)
Suspected
speaker
reference
database (R)
Features
Suspected
speaker
model
Features
Relevant
speakers
models
Comparative
analysis
Comparative
analysis
Comparative
analysis
Similarity
scores
Similarity
scores
Evidence (E)
67
20
39
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Interpretation of the evidence
Similarity
scores
Similarity
scores
Evidence (E)
Modelling of the
within-source variability
Modelling of the
between-sources variability
Numerator of the
likelihood ratio
Denominator of the
likelihood ratio
Likelihood ratio (LR)
Distribution of the
within-source variability
Distribution of the
between-sources variability
40
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Individual Case
Trace Suspect
Casework
Suspected speaker
reference database
Suspected speaker
single recording
Questioned recording
68
21
41
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Scoring Method with Limited Suspect Data
1
0
1
0
1
0
( ) ( | ) ( | )
( ) ( | ) ( | )
P P E P E
P P E H H H P
H H
E
H
=
The odds form of Bayes theorem
H
0
the two recordings have the same source
H
1
the two recordings have different sources
Likelihood ratio
Strength of evidence
with respect to new hypotheses
0
1
( | )
( | )
P E
P H E
H
42
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Direct Method
The odds form of Bayes theorem
H
0
the speakers model ( ) and the
questioned recording (T) have the same source
H
1
the speakers model ( ) and the
questioned recording (T) have different sources
1
0
1
0
1
0
( ) ( | ) ( | )
( ) ( | ) ( | )
P P T P T
P P T H H H P
H H
T
H
=
0
1
( | )
( | )
P T
P T

Likelihood ratio
0

Strength of evidence ?
69
22
43
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Multivariate (Direct) Method LR Numerator
Feature
extraction
Similarity
(Distance)
Models for
each speaker
Score
Suspected speaker
reference database (R)
Suspect
Trace
Suspected speaker model
Numerator of the likelihood ratio
Questioned recording
score =log-likelihood (trace | H
0
)
44
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Feature
extraction
Similarity
(Distance)
Model of
all speakers
Score
Trace
Model of the
potential population
Questioned recording
Potential population
database (P)
Multivariate (Direct) Method LR Denominator
Denominator of the likelihood ratio
score =log-likelihood (trace | H
1
)
70
23
45
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Evaluation of the Strength of Evidence
Principle
Estimation and comparison of likelihood ratios that
can be obtained fromthe evidence E:
when the hypothesis H
0
is true:
The suspected speaker truly is the source of the
questioned recording (trace)
when the hypothesis H
1
is true:
The suspected speaker is truly not the source of the
questioned recording (trace)
46
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Outline
Automatic Speaker Recognition
Voice as Evidence
Bayesian Interpretation of Evidence
Corpus Based Methodology
Univariate Scoring Method
Multivariate Direct Method
Strength of Evidence
Evaluation of the Strength of Evidence
Mismatched Recording Conditions
Aural Speaker Recognition
71
24
47
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Evaluation of the Strength of Evidence
Univariate (Scoring) Method
48
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Cumulative Density Functions
72
25
49
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Tippett plots (reliability-survival functions)
Univariate (Scoring) Method
50
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Evaluation of the Strength of Evidence
Multivariate (Direct) Method
73
26
51
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Tippett plots (reliability-survival functions)
Multivariate (Direct) Method
52
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Outline
Automatic Speaker Recognition
Voice as Evidence
Bayesian Interpretation of Evidence
Corpus Based Methodology
Univariate Scoring Method
Multivariate Direct Method
Strength of Evidence
Evaluation of the Strength of Evidence
Mismatched Recording Conditions
Aural Speaker Recognition
74
27
53
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Using databases with mismatched recording conditions
FBI NIST 2002 Database : 2
conditions (Microphone -
Telephone)
The extent of mismatch can be measured using statistical testing
54
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Compensating for Mismatch
E
H
1
scores
(matched conditions)
Pot Pop. H
1
scores
(mismatched conditions)
Ho scores
(matched conditions)
Not compensating for mismatch can be the difference between
an LR < 1 and an LR > 1
75
28
55
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Outline
Automatic Speaker Recognition
Voice as Evidence
Bayesian Interpretation of Evidence
Corpus Based Methodology
Univariate Scoring Method
Multivariate Direct Method
Strength of Evidence
Evaluation of the Strength of Evidence
Mismatched Recording Conditions
Aural Speaker Recognition
56
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Experimental Framework
Listeners
90 listeners whose mother-tongue is French
Laypersons with no phonetic training
Same computer and headphones
Training
No limitation on the number of listening trials
Testing
Verbal scores scale from1through 7
Perceptual cues
Aural Aural Speaker Recognition Speaker Recognition
76
29
57
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Perceptual Verbal Scale and Perceptual Cues Perceptual Verbal Scale and Perceptual Cues
Score 1 I amsure that the two speakers are not the same
Score 2 I amalmost sure that the two speakers are not the same
Score 3 It is possible that the two speakers are not the same
Score 4 I cannot decide
Score 5 It is possible that the two speakers are the same
Score 6 I amalmost sure that the two speakers are the same
Score 7 I amsure that the two speakers are the same
Perceptual Verbal Scale
58
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Strength of Evidence for Aural Recognition Strength of Evidence for Aural Recognition
0.0
0.2
0.4
0.6
1 2 3 4 5 6 7
E
s
t
i
m
a
t
e
d

P
r
o
b
a
b
i
l
i
t
y

H1 Ho
) (
) (
1
0
H E P
H E P
LR =
E
Perceptual Verbal Score
Likelihood Ratio (LR) = Ratio of the heights on the histograms for the
two hypotheses at the point " E"
Discrete scores
Histograms used to estimate
the probabilities of
scores for each hypothesis
77
30
59
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Evaluating Strength of Evidence in Matched Conditions Evaluating Strength of Evidence in Matched Conditions
Aural Automatic
Similar separations between curves for aural and automatic systems
Ref. PSTN vs Traces PSTN
60
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Evaluating Strength of Evidence in Mismatched Conditions Evaluating Strength of Evidence in Mismatched Conditions
Aural
Automatic
Better curve separation in
aural recognition
Better evaluation of LR for aural
recognition in mismatched conditions
Ref. PSTN vs Traces Noisy PSTN
78
31
61
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
10
-2
10
-1
10
0
10
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Li kel i hood Rati o (LR)
E
s
t
i
m
a
t
e
d

P
r
o
b
a
b
i
l
i
t
y
H0 Aural
H1 Aural
Automatic-Adapted
Automatic-Adapted
Evaluating Strength of Evidence in Adapted Conditions Evaluating Strength of Evidence in Adapted Conditions
Adaptation for noisy conditions results in the improvement
of performance of automatic recognition
Ref. PSTN vs Traces Adapted Noisy PSTN
62
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Admissibility of Scientific Evidence (USA)
Daubert criteria:
whether the theory or technique can be, and has been
tested,
whether the technique has been published or subjected
to peer review,
whether actual or potential error rates have been
considered,
whether standards exist and are maintained to control
the operation of the technique,
whether the technique is widely accepted within the
relevant scientific community.
79
32
63
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
References
Ph. Rose, Forensic Speaker Identification, Taylor and Francis, London,
2002.
D. Meuwly, A. Drygajlo, "Forensic Speaker Recognition Based on a
Bayesian Framework and Gaussian Mixture Modelling (GMM)", The
Workshop on Speaker Recognition 2001: A Speaker Odyssey, Crete,
Greece, J une, 2001, pp. 145-150 .
A. Drygajlo, D. Meuwly, A. Alexander, "Statistical
Methods and Bayesian Interpretation of Evidence in
Forensic Automatic Speaker Recognition",
EUROSPEECH'2003, Geneva, Switzerland, Sept. 2003,
pp. 689-692.
A. Alexander, A. Drygajlo, "Scoring and Direct Methods
for the Interpretation of Evidence in Forensic Speaker
Recognition, ICSLP 2004, J eju, Korea, 2004.
64
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
References
F. Botti., A. Alexander, and A. Drygajlo, An interpretation framework for the
evaluation of evidence in forensic automatic speaker recognition with limited
suspect data, Odyssey 2004, The Speaker and Language Recognition
Workshop, Toledo, Spain, 2004, pp. 6368.
A. Alexander, F. Botti, and A. Drygajlo, Handling Mismatch in Corpus-Based
Forensic Speaker Recognition, Odyssey 2004, The Speaker and Language
Recognition Workshop, Toledo, Spain, May 2004, pp. 6974
A. Alexander, F. Botti, D. Dessimoz, A. Drygajlo, "The Effect of Mismatched
Recording Conditions on Human and Automatic Speaker Recognition in Forensic
Applications", Forensic Science International, 146S (2004), pp. S95-S99.
D. Meuwly, A. Drygajlo, "A Bayesian Interpretation of Evidence in Forensic
Automatic Speaker Recognition", to be published in Forensic Science
International.
J . Gonzalez-Rodriguez, A. Drygajlo, D. Ramos-Castro, M. Garcia-Gomar, J .
Ortega-Garcia, "Robust Estimation, Interpretation and Assessment of Likelihood
Ratios in Forensic Speaker Recognition", to be published in Computer Speech
and Language.
80
33
65
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
Conclusions
The Bayes model, current interpretation framework used
in forensic science, is adapted for forensic automatic
speaker recognition
The corpus based methodology provides a coherent
way of assessing and presenting the evidence of
questioned recording
Distributions of likelihood ratios can be used for the
evaluation of the performance of automatic and aural
methods in forensic speaker recognition applications
66
Speech Processing and Biometrics Group (GTPB)
Signal Processing Institute (ITS), LIAP
While there is certainly no perfect solution available in the field of forensic
speaker recognition at present, the scientific community is under a moral
obligation to contribute whatever possible to aid the course of justice to
establish scientifically founded methodology and techniques
What is clearly needed is joint research initiatives of forensic scientists
and speech engineers in order to study problems arising from the actual
technology and from practical work of forensic experts and gain a more
complete insight into the concept of the individuality of voice
Considering recent advances in automatic speaker verification
technology, especially with regard to robustness of parameters, enlarged
sizes of speaker groups and new statistical algorithms forensic scientists
expect a major contribution from the speech engineering side as far as
automatic speaker recognition is concerned
Conclusions
81

You might also like