Presentation On Speech Recognition

Final Year Project Proposal on
SPEECH RECOGNITION SYSTEM

Under supervision of: Dr. Y N Singh Associate Professor Department of Computer Science and Engineering Institute of Engineering and Technology, Lucknow
Proposed by: Aditya Sharma Computer Science and Engineering Final Year Roll No: 1005210005
Problem Statement
Given a speech sample uttered by a given user , the system will sense the voice activity and extract out the significant voice sample thereafter converting it into text message according to the language specification and model .
This text message can be further used to send commands to the system or as an input into an expert system.
Basic Challenges
Robustness graceful degradation, not catastrophic failure Portability independence of computing platform Adaptability to changing conditions (different mic, background noise, new speaker, new task domain, new language even) Language Modelling is there a role for linguistics in improving the language models? Confidence Measures better methods to evaluate the absolute correctness of hypotheses. Out-of-Vocabulary (OOV) Words Systems must have some method of detecting OOV words, and dealing with them in a sensible way. Spontaneous Speech disfluencies (filled pauses, false starts, hesitations, ungrammatical constructions etc.) remain a problem. Prosody Stress, intonation, and rhythm convey important information for word recognition and the user's intentions (e.g., sarcasm, anger) Accent, dialect and mixed language non-native speech is a huge problem, especially where code-switching is commonplace
Speech Recognition Process

Acoustic Model (HMM)
Input Speech
Feature Analysis (Spectral Analysis)
Pattern Classification (Decode/Search)
Hello World Utterance Verification
Language Model
Word Lexicon
Milestones in Speech Recognition Research
Isolated Words
Isolated Words; Connected Digits; Continuous Speech
Connected Words; Continuous Speech
Continuous Speech; Speech understanding
Spoken Dialog; Multiple modalities
Filter-bank analysis; Timenormalization ; Dynamic programming 1962
Pattern Recognition; LPC analysis; Clustering algorithms; Level Building
Hidden Markov models; Stochastic language Modeling
Stochastic language understanding; Finite-state machines; Statistical learning 1987 1992
Concatenative synthesis; Machine learning; Mixedinitiative dialog
1967
1972
1977
1982 Year
1997
2002
Future of Speech Recognition Technologies

Very Large Vocabulary, Limited tasks, Controlled Environment Very Large Vocabulary, Limited tasks, Arbitrary Environment Unlimited Vocabulary, Unlimited tasks, many Languages
Dialog Systems
Robust Systems
Multilingual Systems; Multimodal Speech Enabled Devices
2002
2005 Year
2008
2011
Software Modules
Preprocessing Voice Activation Detection Input Noise Cancelling Pre-emphasis
Frame Blocking and Windowing
Feature Extraction
Post processing
Observations for HMM based classification Weight Function Normalization O={O1, O2,O3,.On}
HMMs in Automatic Speech Recognition

HMM can be used to classify features sequences to known classes by making a HMM for each class.
By determining the probability of a sequences to the HMMs, we can decide Which HMM could most probably generate the sequence.
There are several idea about what to model: Isolated word recognition (HMM for each word) Monophone acoustic model (HMM for each phone) Triphone Acoustic model (HMM for each three phone sequence)
Hierarchical System of HMMs
HMM of a Triphone
HMM of a Triphone
HMM of a Triphone
Higher level HMM of a word
Language model
HMM Limitations
Data intensive Computationally intensive

50 phones = 125000 possible triphones
3 states per triphone 3 Gaussian mixture for each state 262 trillion trigrams 2-20 phonemes per word in 64k vocabulary
64k word vocabulary
39 dimensional feature vector sampled every 10ms
100 frame per second
Reading and References

M. Narasimha Murty and V. Susheela Devi, Pattern Recognition, An Algorithmic Approach, Springer University Press. Christopher M Bishop, Pattern Recognition and Machine Learning, Springer University Press Lawrence Rabiner and Biing-Hwang Juang, Fundamentals of Speech Recognition, Prentice-Hall International, Inc. Lawrence Rabiner, A Tutorial on Hidden Markov Models and Selected applications in Speech Recognition, IEEE.

Presentation On Speech Recognition

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Presentation On Speech Recognition

Uploaded by

Copyright:

Available Formats

Final Year Project Proposal on

SPEECH RECOGNITION SYSTEM

Speech Recognition Process

Feature Analysis (Spectral Analysis)

Pattern Classification (Decode/Search)

Hello World Utterance Verification

Milestones in Speech Recognition Research

Isolated Words; Connected Digits; Continuous Speech

Connected Words; Continuous Speech

Continuous Speech; Speech understanding

Spoken Dialog; Multiple modalities

Filter-bank analysis; Timenormalization ; Dynamic programming 1962

Pattern Recognition; LPC analysis; Clustering algorithms; Level Building

Hidden Markov models; Stochastic language Modeling

Stochastic language understanding; Finite-state machines; Statistical learning 1987 1992

Concatenative synthesis; Machine learning; Mixedinitiative dialog

Future of Speech Recognition Technologies

Multilingual Systems; Multimodal Speech Enabled Devices

Frame Blocking and Windowing

HMMs in Automatic Speech Recognition

Hierarchical System of HMMs

Higher level HMM of a word

50 phones = 125000 possible triphones

64k word vocabulary

39 dimensional feature vector sampled every 10ms

100 frame per second

Reading and References

You might also like