Professional Documents
Culture Documents
Presented By,
▪ Nisarga N C
▪ Savitha Acharya N
▪ Vaishnavi M
▪ Vardhini V Hebballi
INTRODUCTION
• Objective of voice recognition system
• Success of this system
a)Simple nature of acquiring voice signal
b) Uniqueness of voice signal
SYSTEM CLASSIFICATION
• CLASSIFICATION 1
a) Speaker Identification
b) Speaker Verification
• CLASSIFICATION 2
a) Text Dependent
b) Text independent
• Classification - Classification consists of models for each speaker and a decision logic
necessary to render a decision
• Verification – In verification, the decision is made between either of the choices i.e.,
acceptance or rejection
2 PHASES OF RECOGNITION
• Speaker identification training
• Silence detection
• Windowing
• FFT
PRE-PROCESSING
• If needed amplify the signal
• Speech signals are non-stationary signals, meaning to say the transfer function of
the vocal tract; which generates it, changes with time, though it changes
gradually. It is safe to assume that it is piecewise stationary
• Thus the speech signal is framed with 30ms to 32ms frames with an overlap of
20ms
• The need for the overlap is that information may be lost at the frame boundaries,
so frame boundaries need to be within another frame
• Mathematically, framing is equivalent to multiplying the signal with a series of
sliding rectangular windows
• The problem with rectangular windows is that the power contained in the side
lobes is significantly high and therefore may give rise to spectral leakage
SILENCE DETECTION
• When the user says out any word the system will never have control
over the instant the word is uttered
• It is for this reason that we need a silence detection step which
basically determines when the user has actually started uttering the
word any thereby translates the frame of reference to that instant.
WINDOWING
• To minimize the signal discontinuities at the beginning and end of the
frame
• The concept here is to minimize the spectral distortion by using the
window to taper the signal to zero at the beginning and end of the
frame.
FAST FOURIER TRANSFORM
• It is used to convert signal from time domain to frequency
domain
TIME DOMAIN ANALYSIS
• Voice signal is a non stationary wave or signal which is examined for
short duration of 20-40ms. So that it’s characteristics is almost
stationary in that range
• To obtain dominant frequency of each window we apply STFT
• We can decompose into phase and magnitude spectrum
For computing STFT , we need to know