Voice Recognition PDF

United Arab Emirates University College of Engineering
Design of Automatic Face/Voice Recognition Systems for Personal Identification

Supervisor: Dr. Farhad Kissain
Mariam Al Dhuhoori Fatema Mohammed Laila AL Shehhi Mona Atti AL-Rashdi 970724502 199902260 199902258 199904062
Overview
Introduction. Main principles of speaker recognition. Selected Method. Speaker recognition Models. Feature Extraction. Feature Extraction Implementation. Conclusion.
Introduction:
Our Project Graduation project I Graduation project II Voice recognition
Face recognition
Objectives of our Project

Design and implement a simple face recognition system. Design and implement an automatic voice recognition system. The MatLAB program is used to implement the project.
Main principle of Speaker Recognition
Identification
Verification
Speaker Recognition methods

Text Dependent :
For speaker identity is based on his/ her speaking one or more specific phase.
Text Independent:
Speaker models capture characteristics of somebodys speech which show up irrespective of what one is saying.
Selected Method
Text Independent:
Identify the person who speaks regardless to what is saying.
Speaker recognition Models
Feature Extraction
Feature Matching
Speech Feature Extraction

That extracts a small amount of data from the voice signal that can later be used to represent each speaker.
MFCC: is based on the known variation of the human ears critical bandwidths with frequency, filters spaced linearly at low frequencies and logarithmically at high frequencies.
Input Speech Signals
ze
ro
There are silence at the beginning and at the end of the signals. The word consists of two syllables.
Frame Blocking
Continuous speech
Frame Blocking
frame
Windowing
FFT
spectrum
Cepstrum
Mel-frequency wrapping
Continuous speech signal is blocked into frames of N samples with adjacent frames being separated by M(M<N).
Frame1 Frame2 Frame3 Consist of First N samples. Begins M samples after first frame, and the overlaps it by N-M samples. Begins 2M samples after the first frame, and the overlaps it by N-2M samples.
Frame Blocking
Continuous speech
Frame Blocking
frame
Windowing
FFT
spectrum
Cepstrum
Continuous speech signal is blocked into frames of N samples with adjacent frames being separated by M(M<N).
Frame1 Frame2 Frame3 Consist of First N samples. Begins M samples after first frame, and the overlaps it by N-M samples. Begins 2M samples after the first frame, and the overlaps it by N-2M samples.
After Frame Blocking
The speech signals were blocked into frames of N samples with overlap.
Windowing
Continuous speech
Frame Blocking
frame
Windowing
FFT
spectrum
Cepstrum
Each individual frame will be windowed. Hamming window is used in this project.
After Windowing
Before Windowing
After Windowing
After Windowing
Continuous speech
Frame Blocking
frame
Windowing
FFT
spectrum
Cepstrum
Convert each frame of N samples form time domain into the frequency domain.
Log Scale of S1 speech wave

120 100 80 60 40 20 20 40 60 80 100120 120 100 80 60 40 20 50 120 100 80 60 40 20 20 40 60 80100 120 140 100 150 120 100 80 60 40 20 20 40 60 80100 120 140 120 100 80 60 40 20 50 100 150 120 100 80 60 40 20 20 40 60 80100 120 120 100 80 60 40 20 50 100 150 120 100 80 60 40 20 50 100 150
After Windowing
Continuous speech
Frame Blocking
frame
Windowing
FFT
spectrum
Cepstrum
Mel frequency scale is a linear frequency spacing below 1KHz and a logarithmic spacing above 1KHz.
Filter Bank.
Mel Spaced Filter Bank
Mel-Frequency Wrapping
Before Mel-Frequency wrapping
After Mel-Frequency wrapping
After Windowing
Continuous speech
Frame Blocking
frame
Windowing
FFT
spectrum
Cepstrum
Convert the log mel spectrum back to Time domain.
Features Matching Method

The Dynamic Time Warping, (DTW)
Hidden Markov Modeling (HMM) Vector Quantization (VQ)
Clustering of training vector
Data points for all sounds of first set
Data points of speaker 5 and speaker 6 of first set
Data points of all sounds after passing it into LBG Algorithm for first set
Data points of all sounds after passing it into LBG Algorithm for first set
Vector Quantization (VQ) source modeling VQ.

cluster. codeword. codebook.
VQ advantages
The model is trained much faster than other method like Back Propagation. It is able to reduce large datasets to a smaller number of codebook vectors. Can handle data with missing values. The generated model can be updated incrementally. Not limited in the number of dimensions in the codebook vectors like nearest national techniques. Easy to implementation and more accurate.
Performance Rate
Sounds (word: twenty) Laila Mona Mariam Fatema
S1
S1
S1
S1
S1
S2
S2
S4
S2
S3
S3
S3
S3
S3
S2
S4
S4
S2
S1
S4
Success rate in recogntion %
100%
50%
75%
50%
Testing Phase of Second Set
Testing Phase of Second Set for 2 speakers
Results for test 1 (When speaker said: Twenty):
Speaker Speaker Speaker Speaker Speaker Speaker
1 2 3 4 5 6
matches with speaker 1 Not Match matches with speaker 3 matches with speaker 4 Not Match Not Match
Conclusion

Voice Recognition PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Voice Recognition PDF

Uploaded by

Copyright:

Available Formats

United Arab Emirates University College of Engineering

Design of Automatic Face/Voice Recognition Systems for Personal Identification

Objectives of our Project

Main principle of Speaker Recognition

Speaker Recognition methods

Speaker recognition Models

Speech Feature Extraction

Input Speech Signals

After Frame Blocking

Log Scale of S1 speech wave

Mel Spaced Filter Bank

Before Mel-Frequency wrapping

After Mel-Frequency wrapping

Convert the log mel spectrum back to Time domain.

Features Matching Method

Clustering of training vector

Data points for all sounds of first set

Data points of speaker 5 and speaker 6 of first set

Vector Quantization (VQ) source modeling VQ.

Success rate in recogntion %

Testing Phase of Second Set

Testing Phase of Second Set for 2 speakers

Results for test 1 (When speaker said: Twenty):

Speaker Speaker Speaker Speaker Speaker Speaker

You might also like