Professional Documents
Culture Documents
SPEECH RECOGNITION
ABSTRACT:
Speech Recognition is a technology that allows the computer
to identify and understand words spoken by a person using a microphone
or telephone. . The ultimate goal of the technology is to be able to
produce a system that can recognize with 100% accuracy all words that
are spoken by any person, for better understanding between machine and
humans.
It is the process of converting an acoustical signal captured by a
microphone or a telephone into a set of words. These are exciting
technologies that changes the way you interact your computer.
CONTENTS:
1. ABOUT SPEECH RECOGNITON
2. HOW DO HUMAN’S AND MACHINE DO IT?
3. HISTORY
4. INPUT TECHNOLOGIES
5. PRINCIPLES OF SPEAKER RECOGNITION
6. PERFORMANCE MEASUREMENT
7. PROBLEMS IN SPEECH RECOGNITION
8. APPLICATION
9. SPEECH RECOGNITION IN WINDOWS
10. CONCLUSION
The voice from the chord travels in the medium air, as waves.
Then the waves enter our ears and are transmitted to our brain which actually recognizes
the sound waves and respond to it.
Machine takes the input as acoustic waveform from the commonly used device a mic.
Converts it to signal acoustical signal, analyses it and then converts it to text or perform
the other command directed.
3. HISTORY
Research on speech recognition is been carried out over five decades. The first speech
recognizer appeared in 1952 and consisted of a device for the recognition of single
spoken digits. Then came the IBMShoeBox in 1964 which was used to recognize the
digits and also to perform the arithmetic operations on the digits. This innovative device
recognized and responded to 16 spoken words, including the ten digits from "0" through
"9." When a number and command words such as "plus," "minus" and "total" were
spoken.
PARAMETERS RANGE
Speaking mode Isolated word to continuous speech.
Speaking style Read speech to spontaneous speech.
Enrollment Speaker-dependent to Speaker-independent
Vocabulary Small(<20 words) to large(>100 words)
Decibels High (>30db) to Low (<10db)
Table: Typical parameters used to characterize the capability of speech
recognition systems
First, the input speech is converted to the acoustical waveform and the
waveform based on some measurements is checked with reference
waveform#1 and reference waveform#2. Based on the maximum
selection of input speech and reference waveform the speaker is
identified. This is the principle behind speaker identification.
6. PERFORMANCE MEASUREMENT
Performance of speech recognition systems is typically described in terms of word error rate,
E, defined as:
Where
Robustness:
In a robust system, performance degrades gracefully as conditions become more different
from those under which it was trained. Differences in channel characteristics and acoustic
environment should receive particular attention.
Portability:
Portability refers to the goal of rapidly designing, developing and deploying systems for new
applications. At present, systems tend to suffer significant degradation when moved to a new
task. In order to return to peak performance, they must be trained on examples specific to the
new task, which is time consuming and expensive. Over all – independence of computing
platform
Adaptation:
How can systems continuously adapt to changing conditions (new speakers, microphone,
task, etc) and improve through use? Such adaptation can occur at many levels in systems,
subword models, word pronunciations, language models, etc.
Language Modeling:
Current systems use statistical language models to help reduce the search space and resolve
acoustic ambiguity. As vocabulary size grows and other constraints are relaxed to create more
habitable systems, it will be increasingly important to get as much constraint as possible from
language models; perhaps incorporating syntactic and semantic constraints that cannot be
captured by purely statistical models.
Out-of-Vocabulary Words:
Systems are designed for use with a particular set of words, but system users may not know
exactly which words are in the system vocabulary. This leads to a certain percentage of out-
of-vocabulary words in natural conditions. Systems must have some method of detecting
such out-of-vocabulary words, or they will end up mapping a word from the vocabulary onto
the unknown word, causing an error.
Modeling Dynamics:
Systems assume a sequence of input frames which are treated as if they were independent.
But it is known that perceptual cues for words and phonemes require the integration of
features that reflect the movements of the articulators, which are dynamic in nature. How to
model dynamics and incorporate this information into recognition systems is an unsolved
problem.
Some systems are speaker dependent---a user must provide samples of
his or her speech before using them, whereas other systems are said to
be speaker-independent.
8. APPLICATIONS
Health care
In the health care domain, even in the wake of improving speech
recognition technologies, medical transcriptionists (MTs) have not yet
become obsolete. where the provider dictates into a speech-recognition
engine, the recognized words are displayed right after they are spoken,
and the dictator is responsible for editing and signing off on the
document. Searches, queries, and form filling may all be faster to perform
by voice than by using a keyboard.
Military
Speech recognizers have been operated successfully in fighter aircraft
with applications including: setting radio frequencies, commanding an
autopilot system, setting steer-point coordinates and weapons release
parameters, and controlling flight displays. the U.S. program in speech
recognition for the Advanced Fighter Technology Integration (AFTI)/F-16
aircraft (F-16 VISTA)
Press any key on the Press keyboard key; Press a; Press Shift
keyboard plus a
10. CONCLUSION
Speech recognition allows converting speech into text, making it easier
both to create and to use information. Speech is easier to generate while
text is time taking thing. It is the process of converting an acoustical
signal captured by a microphone or a telephone into a set of words. These
are exciting technologies that changes the way you interact your
computer.