You are on page 1of 8

PRESENTATION ON

SPEECH RECOGNITION
ABSTRACT:
Speech Recognition is a technology that allows the computer
to identify and understand words spoken by a person using a microphone
or telephone. . The ultimate goal of the technology is to be able to
produce a system that can recognize with 100% accuracy all words that
are spoken by any person, for better understanding between machine and
humans.
It is the process of converting an acoustical signal captured by a
microphone or a telephone into a set of words. These are exciting
technologies that changes the way you interact your computer.

CONTENTS:
1. ABOUT SPEECH RECOGNITON
2. HOW DO HUMAN’S AND MACHINE DO IT?
3. HISTORY
4. INPUT TECHNOLOGIES
5. PRINCIPLES OF SPEAKER RECOGNITION
6. PERFORMANCE MEASUREMENT
7. PROBLEMS IN SPEECH RECOGNITION
8. APPLICATION
9. SPEECH RECOGNITION IN WINDOWS
10. CONCLUSION

1. ABOUT SPEECH RECOGNITON


Speech Recognition is a technology that allows the computer to identify
and understand words spoken by a person using a microphone or
telephone. Even after years of research in this area, the best speech
recognition software applications still cannot recognize speech with 100%
accuracy. Some applications are able to recognize over 90% of words
when spoken under specific constraints regarding content and previous
training to recognize the speaker's speech characteristics.
Computer software that understands your speech enables you to have
conversations with the computer. These conversations would include you
and the computer speaking as commands. Spoken language interfaces to
computers is a topic that has lured and fascinated engineers and speech
scientists alike for over five decades and spoken language interfaces are
fast becoming a necessity.

2. HOW DO HUMAN’S AND MACHINE DO IT?


Humans use their articulation system i.e. they produce sound waves through their vocal
chords.

The voice from the chord travels in the medium air, as waves.

Then the waves enter our ears and are transmitted to our brain which actually recognizes
the sound waves and respond to it.

Machine takes the input as acoustic waveform from the commonly used device a mic.

Converts it to signal acoustical signal, analyses it and then converts it to text or perform
the other command directed.

3. HISTORY
Research on speech recognition is been carried out over five decades. The first speech
recognizer appeared in 1952 and consisted of a device for the recognition of single
spoken digits. Then came the IBMShoeBox in 1964 which was used to recognize the
digits and also to perform the arithmetic operations on the digits. This innovative device
recognized and responded to 16 spoken words, including the ten digits from "0" through
"9." When a number and command words such as "plus," "minus" and "total" were
spoken.

"Shoebox" -- a forerunner of today's voice recognition


systems.
It was developed by William C. Dersch at IBM's Advanced Systems
Development Division Laboratory in San Jose. In the above image Dr. E. A.
Quade demonstrates the machine.
3. INPUT TECHNOLOGIES

A speech interface, in a user's own language, is ideal because it is the


most natural, flexible, efficient, and economical form of human
communication.When one thinks about speaking to computers, the first
image is usually speech recognition, the conversion of an acoustic signal
to a stream of words. Speech recognition involves several component
technologies.
First, the digitized signal must be transformed into a set of
measurements.
Next, the various speech sounds must be modeled appropriately. The
most widespread technique for acoustic modeling is called Hidden Markov
Modeling (HMM)
One possible reason why HMMs are used in speech recognition is that a
speech signal could be viewed as a piecewise stationary signal or a short-
time stationary signal.
Another reason why HMMs are popular is because they can be trained
automatically and are simple and computationally feasible to use.
An interesting feature of frame-based HMM systems are that speech
segments are identified during the search process, rather than explicitly.
An alternate approach is to first identify speech segments, then classify
the segments and use the segment scores to recognize words. This
approach has produced competitive recognition performance in several
tasks.
Some systems require speaker enrollment---a user must provide samples
of his or her speech before using them, whereas other systems are said to
be speaker-independent, in that no enrollment is necessary.
Some typical parameters used to characterize the capability of speech
recognition systems are as follows

PARAMETERS RANGE
Speaking mode Isolated word to continuous speech.
Speaking style Read speech to spontaneous speech.
Enrollment Speaker-dependent to Speaker-independent
Vocabulary Small(<20 words) to large(>100 words)
Decibels High (>30db) to Low (<10db)
Table: Typical parameters used to characterize the capability of speech
recognition systems

4. PRINCIPLES OF SPEAKER RECOGNITION

Speaker recognition is the process of automatically recognizing who is


speaking on the basis of individual information included in speech waves.

First, the input speech is converted to the acoustical waveform and the
waveform based on some measurements is checked with reference
waveform#1 and reference waveform#2. Based on the maximum
selection of input speech and reference waveform the speaker is
identified. This is the principle behind speaker identification.

6. PERFORMANCE MEASUREMENT

Performance of speech recognition systems is typically described in terms of word error rate,
E, defined as:
Where

• N is the total number of words in the test set.

• S is the total number of substitutions.

• I is the total number of insertions.

• D is the total number of deletions.

7. PROBLEMS IN SPEECH RECOGNITION

Robustness:
In a robust system, performance degrades gracefully as conditions become more different
from those under which it was trained. Differences in channel characteristics and acoustic
environment should receive particular attention.

Portability:

Portability refers to the goal of rapidly designing, developing and deploying systems for new
applications. At present, systems tend to suffer significant degradation when moved to a new
task. In order to return to peak performance, they must be trained on examples specific to the
new task, which is time consuming and expensive. Over all – independence of computing
platform

Adaptation:
How can systems continuously adapt to changing conditions (new speakers, microphone,
task, etc) and improve through use? Such adaptation can occur at many levels in systems,
subword models, word pronunciations, language models, etc.

Language Modeling:

Current systems use statistical language models to help reduce the search space and resolve
acoustic ambiguity. As vocabulary size grows and other constraints are relaxed to create more
habitable systems, it will be increasingly important to get as much constraint as possible from
language models; perhaps incorporating syntactic and semantic constraints that cannot be
captured by purely statistical models.

Out-of-Vocabulary Words:
Systems are designed for use with a particular set of words, but system users may not know
exactly which words are in the system vocabulary. This leads to a certain percentage of out-
of-vocabulary words in natural conditions. Systems must have some method of detecting
such out-of-vocabulary words, or they will end up mapping a word from the vocabulary onto
the unknown word, causing an error.
Modeling Dynamics:
Systems assume a sequence of input frames which are treated as if they were independent.
But it is known that perceptual cues for words and phonemes require the integration of
features that reflect the movements of the articulators, which are dynamic in nature. How to
model dynamics and incorporate this information into recognition systems is an unsolved
problem.
Some systems are speaker dependent---a user must provide samples of
his or her speech before using them, whereas other systems are said to
be speaker-independent.
8. APPLICATIONS

Health care
In the health care domain, even in the wake of improving speech
recognition technologies, medical transcriptionists (MTs) have not yet
become obsolete. where the provider dictates into a speech-recognition
engine, the recognized words are displayed right after they are spoken,
and the dictator is responsible for editing and signing off on the
document. Searches, queries, and form filling may all be faster to perform
by voice than by using a keyboard.

Military
Speech recognizers have been operated successfully in fighter aircraft
with applications including: setting radio frequencies, commanding an
autopilot system, setting steer-point coordinates and weapons release
parameters, and controlling flight displays. the U.S. program in speech
recognition for the Advanced Fighter Technology Integration (AFTI)/F-16
aircraft (F-16 VISTA)

F-16 FIGHTER PLANE


Telephony and other domains

The improvement of mobile processor speeds made feasible the speech-


enabled Symbian and Windows Mobile Smartphone’s. Current
speech-to-text programs are too large and require too much CPU
power to be practical for the Pocket PC.

People with disabilities

People with disabilities can benefit from speech recognition programs.


Speech recognition is especially useful for people who have
difficulty using their hands, ranging from mild repetitive stress
injuries to involved disabilities that preclude using conventional
computer input devices. Speech recognition is used in deaf,
telephony such as voicemail to text.

9. SPEECH RECOGNITION IN WINDOWS

Windows Speech Recognition tool.


The Location of windows speech recognition is:
Start Menu\Programs\Accessories\Ease of Access\Windows Speech
Recognition
The above tool is basically used for performing commands related to
windows by using input as speaker independent voice.
Some of the frequently used commands are as follows

To perform this action Say this


Click any item by its name Click File; Start; View
Double-click any item. Double-click Computer; Double-click file
name

Right-click any item Right-click Computer; Right-click folder


name

Open a program Open a program

Close a program Close that; Close Paint

Minimize Minimize that; Minimize Paint


Scroll in one direction Scroll up; Scroll down; Scroll right; Scroll left

Scroll an exact distance in Scroll up 5; Scroll down 7


other units

Press any key on the Press keyboard key; Press a; Press Shift
keyboard plus a

Minimize all windows to show Show desktop


your desktop

Click something you don't Show numbers (Numbers will appear on


know the name of…. the screen for active window)

10. CONCLUSION
Speech recognition allows converting speech into text, making it easier
both to create and to use information. Speech is easier to generate while
text is time taking thing. It is the process of converting an acoustical
signal captured by a microphone or a telephone into a set of words. These
are exciting technologies that changes the way you interact your
computer.

You might also like