You are on page 1of 37

United Arab Emirates University College of Engineering

Design of Automatic Face/Voice Recognition Systems for Personal Identification


Supervisor: Dr. Farhad Kissain
Mariam Al Dhuhoori Fatema Mohammed Laila AL Shehhi Mona Atti AL-Rashdi 970724502 199902260 199902258 199904062

Overview
Introduction. Main principles of speaker recognition. Selected Method. Speaker recognition Models. Feature Extraction. Feature Extraction Implementation. Conclusion.

Introduction:
Our Project Graduation project I Graduation project II Voice recognition

Face recognition

Objectives of our Project


Design and implement a simple face recognition system. Design and implement an automatic voice recognition system. The MatLAB program is used to implement the project.

Main principle of Speaker Recognition

Identification

Verification

Speaker Recognition methods


Text Dependent :
For speaker identity is based on his/ her speaking one or more specific phase.

Text Independent:
Speaker models capture characteristics of somebodys speech which show up irrespective of what one is saying.

Selected Method
Text Independent:
Identify the person who speaks regardless to what is saying.

Speaker recognition Models

Feature Extraction

Feature Matching

Speech Feature Extraction


That extracts a small amount of data from the voice signal that can later be used to represent each speaker.
MFCC: is based on the known variation of the human ears critical bandwidths with frequency, filters spaced linearly at low frequencies and logarithmically at high frequencies.

Input Speech Signals

ze

ro

There are silence at the beginning and at the end of the signals. The word consists of two syllables.

Frame Blocking
Continuous speech

Frame Blocking

frame

Windowing

FFT

spectrum

Cepstrum

Mel-frequency wrapping

Continuous speech signal is blocked into frames of N samples with adjacent frames being separated by M(M<N).
Frame1 Frame2 Frame3 Consist of First N samples. Begins M samples after first frame, and the overlaps it by N-M samples. Begins 2M samples after the first frame, and the overlaps it by N-2M samples.

Frame Blocking
Continuous speech

Frame Blocking

frame

Windowing

FFT

spectrum

Cepstrum

Mel-frequency wrapping

Continuous speech signal is blocked into frames of N samples with adjacent frames being separated by M(M<N).
Frame1 Frame2 Frame3 Consist of First N samples. Begins M samples after first frame, and the overlaps it by N-M samples. Begins 2M samples after the first frame, and the overlaps it by N-2M samples.

After Frame Blocking

The speech signals were blocked into frames of N samples with overlap.

Windowing
Continuous speech

Frame Blocking

frame

Windowing

FFT

spectrum

Cepstrum

Mel-frequency wrapping

Each individual frame will be windowed. Hamming window is used in this project.

After Windowing

Before Windowing

After Windowing

After Windowing
Continuous speech

Frame Blocking

frame

Windowing

FFT

spectrum

Cepstrum

Mel-frequency wrapping

Convert each frame of N samples form time domain into the frequency domain.

Log Scale of S1 speech wave


120 100 80 60 40 20 20 40 60 80 100120 120 100 80 60 40 20 50 120 100 80 60 40 20 20 40 60 80100 120 140 100 150 120 100 80 60 40 20 20 40 60 80100 120 140 120 100 80 60 40 20 50 100 150 120 100 80 60 40 20 20 40 60 80100 120 120 100 80 60 40 20 50 100 150 120 100 80 60 40 20 50 100 150

After Windowing
Continuous speech

Frame Blocking

frame

Windowing

FFT

spectrum

Cepstrum

Mel-frequency wrapping

Mel frequency scale is a linear frequency spacing below 1KHz and a logarithmic spacing above 1KHz.

Filter Bank.

Mel Spaced Filter Bank

Mel-Frequency Wrapping

Before Mel-Frequency wrapping

After Mel-Frequency wrapping

After Windowing
Continuous speech

Frame Blocking

frame

Windowing

FFT

spectrum

Cepstrum

Mel-frequency wrapping

Convert the log mel spectrum back to Time domain.

Features Matching Method


The Dynamic Time Warping, (DTW)
Hidden Markov Modeling (HMM) Vector Quantization (VQ)

Clustering of training vector

Data points for all sounds of first set

Data points of speaker 5 and speaker 6 of first set

Data points of all sounds after passing it into LBG Algorithm for first set

Data points of all sounds after passing it into LBG Algorithm for first set

Vector Quantization (VQ) source modeling VQ.


cluster. codeword. codebook.

VQ advantages
The model is trained much faster than other method like Back Propagation. It is able to reduce large datasets to a smaller number of codebook vectors. Can handle data with missing values. The generated model can be updated incrementally. Not limited in the number of dimensions in the codebook vectors like nearest national techniques. Easy to implementation and more accurate.

Performance Rate
Sounds (word: twenty) Laila Mona Mariam Fatema

S1

S1

S1

S1

S1

S2

S2

S4

S2

S3

S3

S3

S3

S3

S2

S4

S4

S2

S1

S4

Success rate in recogntion %

100%

50%

75%

50%

Testing Phase of Second Set

Testing Phase of Second Set for 2 speakers

Results for test 1 (When speaker said: Twenty):

Speaker Speaker Speaker Speaker Speaker Speaker

1 2 3 4 5 6

matches with speaker 1 Not Match matches with speaker 3 matches with speaker 4 Not Match Not Match

Conclusion

You might also like