Professional Documents
Culture Documents
ACKNOWLEDGEMENTS
With the Grace of the Almighty ALLAH WHO bestowed us a lot of blessings
and we became able to complete our project, we are thankful to ALLAH who
is the Creator of the whole universe.
We acknowledge the help and support of our parents and teachers, due to
which we have reached our destination. We are thankful to our supervisor sir
Wajahat for encouraging us during our project.
II
ABSTRACT
Making the machine to learn to recognize the human voice is a great need of
the day. Voice recognition is the process of making the computer intelligent
enough to distinguish a voice spoken by a human from all other voices. In
voice recognition voice is first recorded sampled, framed, windowed and then
different features which make it distinguishable are extracted. These features
include short time energy, zero crossing rate, and spectral roll off spectral flux
spectral centriod. Once these features are extracted then pattern recognition is
applied whose output moves the robotic hand according to the voice command
of the user. Neural networks are used for pattern recognition with five neurons
at the input layer. This voice recognition system is 98 percent accurate when
experimented with a single user. It covers all the commands needed to control
a robotic arm.
III
Table of Contents
Dedication..................I
Acknowledgements..II
Abstract ..... III
Table of Contents.......IV
List of Figures..VIII
List of Tables . ....X
Chapter 1: Introduction...1
1.1 Problem statement................................................................................1
1.2 Objective..............1
1.3 Organization of project report..............................................................2
Chapter 2: Literature Review ....3
2.1 Aim and Purpose..............3
2.2 Mechanical Assembly Review.............................4
2.3 Speech Recognition Review.........................................................5
IV
4.2
First
phase..
.22
4.2.1 Sampling rate ...22
4.2.2 Recording..23
4.2.3 Retrieving .....23
4.2.4 Normalizing..................24
4.2.5 Framing.................24
4.2.6 Windowing............24
4.3 Feature extraction.......27
4.4.Features......27
4.4.1.Zero crossings rate....................28
4.4.2 Short time energy..........................30
4.4.3 Spectral roll off.....31
4.4.4 Spectral flux..........32
4.4.5 Spectral centroid.... ..32
4.5 Electrical and electronic components.32
4.5.1 DC power supply..33
4.5.2 Actuators.......33
4.5.3 H-Bridge.......33
4.5.4 Microcontroller 89c51......................34
4.6
Proteus
design.
....37
4.7 Motor driving circuitry...38
4.8 PCB designing....39
4.9 Problems and troubleshooting............39
VI
Chapter 6: Conclusion...50
Future Suggestions.51
VII
References...52
Appendix A
Feature Extraction, Computing Statistics, Neural network codes....54
Appendix B
Components cost..61
Appendix C
Data Sheets...62
LIST OF FIGURES
Figure 2.1: Project Design.......5
Figure 3.1: Style of neural computation.10
Figure 3.2: Network/data manager in nntool..11
Figure 3.3: Create new data in nntool.......................................................... .12
Figure 3.4: Create network in nntool.............................................................13
Figure 3.5: Hard limit transfer function .....13
Figure 3.6: Pure line transfer function .......14
Figure 3.7: Log sigmoid transfer function.................................................14
Figure 3.8: View of the project network in nn tool.........15
Figure 3.9 Feed farward back propagation....................................................16
Figure 3.10: Result of the trainning...............................................................16
Figure 4.1: Sampling rate...............................................................................23
VIII
IX
LIST OF TABLES
Table 4.1: Pin configuration ..36
CHAPTER 1:
Introduction
1.2 Objective
Prime objective of our project is to develop a voice recognition system that
can be used as command and control for any machine.
For that we have chosen robotic arm. Since it has an extensive set of
commands which can be used to train our voice recognition system.
Most of the voice recognition systems are developed for speech to text
conversion. Since these systems are operated for a large vocabulary they are
less accurate and require more computation. our aim was to develop a voice
Air University
Introduction
recognition for the command and control of a machine, while using less
computational power
CHAPTER 2:
Literature Review
the
science
of
Air University
Literature Review
Air University
Cyborg Hand
Dyslexic people or others who have problems with character or word use and
manipulation in a textual form.
People with physical disabilities that affect either their data entry, or ability to
read (and therefore check) what they have entered.
CHAPTER 3:
Design Procedure
Air University
Cyborg Hand
3.2.1 Definition
A computer program is said to learn from experience E with respect to some
class of tasks T and performance measure P, if its performance at tasks in T, as
measured by P, improves with experience E.
3.2.2 Generalization
The core objective of a learner is to generalize from its experience. The
training examples from its experience come from some generally unknown
probability distribution and the learner has to extract from them something
more general, something about that distribution, which allows it to produce
useful answers in new cases.
Air University
Procedure
Design
Back propagation
Bayesian statistics
o
Bayesian network
Case-based reasoning
Decision trees
Learning Automata
Lazy learning
Instance-based learning
o
Air University
Cyborg Hand
Random Forests
Ensembles of classifiers
o
Boosting
Ordinal classification
Regression analysis
Linear classifiers
o
Logistic regression
Perceptron
Quadratic classifiers
k-nearest neighbor
Boosting
Decision trees
o
C4.5
Random forests
Bayesian networks
Air University
Procedure
Design
10
Air University
Cyborg Hand
problem at hand (the testing phase). The Artificial Neural Network is built
with a systematic step-by-step procedure to optimize a performance criterion
or to follow some implicit internal constraint, which is commonly referred to
as the learning rule [9]. The input/output training data are fundamental in
neural network technology, because they convey the necessary information to
"discover" the optimal operating point.. There is a style in neural computation
that is worth describing.
Fig 3.2: Network/data manager in nntool
An input is presented to the neural network and a corresponding desired or
target response set at the output. An error is composed from the difference
between the desired response and the system output. This error information is
fed back to the system and adjusts the system parameters in a systematic
fashion (the learning rule). The process is repeated until the performance is
acceptable [8]. It is clear from this description that the performance hinges
heavily on the data. In artificial neural networks, the designer chooses the
network topology, the performance function, the learning rule, and the
criterion to stop the training phase, but the system automatically adjusts the
parameters. At present, artificial neural networks are emerging as the
technology of choice for many applications, such as pattern recognition,
prediction, system identification, and control.
11
Air University
Procedure
Design
Air University
Cyborg Hand
The linear transfer function gives the value of positive value for input value
greater than one and negative one for input value less than one.
13
Air University
Procedure
Design
The sigmoid transfer function shown below takes the input, which may have
any value between plus and minus infinity, and squashes the output into the
range 0 to 1.
14
Air University
Cyborg Hand
15
Air University
Procedure
Design
3.6.5 Epoch
An entire pass through all of the input training vectors is called an epoch.
When such an entire pass of the training set has occurred without error,
training is complete [11].
One iteration through the process of providing the network with an input and
updating the network's weights. Typically many epochs are required to train
the neural network.
16
Air University
Cyborg Hand
Cartesian robot /Gantry robot: Used for pick and place work,
application of sealant, assembly operations, handling machine tools and
arc welding. It's a robot whose arm has three prismatic joints, whose axes
are coincident with a Cartesian coordinator.
17
Air University
Procedure
Design
SCARA robot: Used for pick and place work, application of sealant,
assembly operations and handling machine tools. It's a robot which has
two parallel rotary joints to provide compliance in a plane [15].
Parallel robot: One of the use is mobile platform handling cockpit flight
simulators. It's a robot whose arms have concurrent prismatic or rotary
joints.
Actuator
Actuators are like the "muscles" of a robot, the parts which convert stored
energy into movement. In our project actuators are 3 DC motors that spin
gears.
Platform
Our robotic arm is supported by an iron stand that is fixed on the surface
of a steel base. Robotic arm is made of light weighted assembly.
18
Air University
Cyborg Hand
End Effectors
In robotics, an end effector is the device at the end of a robotic arm,
designed to interact with the work environment.
Spring
Spring is conncted on the back side of finger from its upper to the supporting iron sheet..
Air University
Procedure
Design
Our robotic arm has two joints called rotary joint and linear joint.
Rotary joint: It can rotate the robotic arms end effector 270 degrees. This
joint connects robotic arm with the iron stand through a gear.
Linear joint: It can rotate the robotic arms end effector along the radius of
270 degrees. This joint connects the end effector with the robotic arm
through a gear [15].
(2) Payload
The carrying capacity of a robotic arm is called its payload. Our robotic
arm can pick an object up to 300 grams.
(3) Reach
The end effector (gripper) of our robotic arm can move up to 12 inches on
a lead screw.
(4) Repeatability
A measurement may be said to be repeatable when this variation is smallar than some agreed limit. The repeatability of this robotic arm is 1.5.
20
Air University
Cyborg Hand
A finger having two axes of rotation. These two axes are at each finger at its
both two joints. These axes are at the corner of a finger in order to provide
better rotation. Metallic (iron) string is connected from lever to the upper end.
When motor rotates clockwise, the finger rotates anti-clockwise. Its rotation is
around its both 1st and 2nd axes (at joints) in two steps. When motor rotates
anti-clockwise, the finger rotates clockwise. Spring is connected at back
portion of finger. When motor is rotated anti-clockwise, the then the lever
looses its grip on finger. The spring stretches the finger back by moving both
joints around their axes in two steps. With the attachment of spring, the
complexity of circuit reduces. Use of spring makes circuit economic and
efficient.
21
CHAPTER 4:
Design Details
4.1 Introduction
Front end are generalized terms that refer to the initial and the end stages of a
process. The front end is responsible for collecting input in various forms from
the user and processing it to conform to a specification the back end can use.
The front end is an interface between the user and the back end. In our project
front end processing has two phases. In first phase we have done sampling and
recording, retrieving and normalizing, and finally framing and windowing. The
second phase involves feature extraction [4].
22
Air University
Cyborg Hand
4.2.2 Recording
We take our voice with the help of microphone and record or saved it in the hard
drive with the sampling rate (44100 samples per sec).
4.2.3 Retrieving
Retrieving is done at the sample rate of 44100 samples per sec through wave
read function of MAT.LAB.
23
Air University
Design Details
After retrieving the copies of voice command signals is sent to each feature
extraction blocks.
4.2.4 Normalizing
The process of normalizing is called pre emphasis.
In this process all the samples in the voice signal are divided by the maximum
value of a sample. It is done to reduce the dynamic range of the signal and to
make it spectrally flat.
4.2.5 Framing
Framing is called frame blocking.
In this step the voice command signal is divided into a no of blocks. Each frame
(block) contain equal no of voice samples.
4.2.6 Windowing
Windowing is done to remove the discontinuities at the start and at the end of
the frame. We have employed Hamming window for this purpose.
Hamming window
It is also known as a raised cosine window. The hamming window for N points
is defined as:
W (i) =0.54+0.46*cos(2i/N)
Where N/2<=I<N/2
24
Air University
Cyborg Hand
These are specific examples from a general family of curves of the form
W (i) = a + (1 - a) cos (2 pi i / N)
25
Air University
Design Details
26
Air University
Cyborg Hand
4.4 Features
We have extracted the following features:
27
Air University
Design Details
rate
is
an
important
parameter
for
voiced/unvoiced
28
Air University
Cyborg Hand
crossings occur is a simple measure of the frequency content of a signal. Zerocrossing rate is a measure of number of times in a given time interval/frame
that the amplitude of the speech signals passes through a value of zero. Speech
signals are broadband signals and interpretation of average zero-crossing rate
is therefore much less precise.
However, rough estimates of spectral properties can be obtained using a
representation based on the short- time average zero-crossing rate.
29
Air University
Design Details
The choice of the window determines the nature of the short-time energy
representation. In our model, we used Hamming window. The hamming
window gives much greater attenuation outside the band pass than the
comparable rectangular window.
30
Air University
Cyborg Hand
h(n) = 0 , otherwise
31
Air University
Design Details
32
Air University
Cyborg Hand
4.5.2 Actuators
Solenoid actuated valves and a D.C Motor are used as actuators. Output
voltage of the first relay is 12 volt and 5volts applied to its coil. Relays which
are connected in relay board have output voltage of 220 volts and they are
energized by first group of relays having output voltage of 12volts and .The
output of second group of relays is used to actuate the Solenoid actuated
valves.
33
Air University
Design Details
34
Air University
Cyborg Hand
35
Air University
Design Details
Function
Name
P1.0
P1.1
P1.2
P1.3
P1.4
P1.5
P1.6
P1.7
Reset
P3.0
P3.1
36
Air University
Cyborg Hand
Air University
Design Details
The DC motor drive is used. The drive can rotate the motor in clockwise
direction in order to close the fingers by moving joints. The three motors can
move simultaneously in order to grip an object.
We have used the PNP and NPN transistors D313 and B1367. The rating of
motor is about 12V/ 1 A. Both transistors are used as switch in saturation
mode. By switching, we have given the 0V to PNP then motor rotates in
clockwise direction. When 0 is given to NPN transistor, then motor rotates in
anti-clockwise direction. We have used two free wheeling diodes in parellal
with both transistors in reverse bias. When current passes through motor, then
inductors of motor gets charged. These inductors produce back EMF which
can damage the circuit. So these diodes work as protection circuit. Then we
have connected the two switches with the two back to back diodes in parallel.
These switches work as limiting switches. When the lever touches the switch,
then one switch cut off the circuit in one clockwise extreme position and when
the motor rotates anti- clockwise direction then the lever touches the other
switch to cut off the circuit at other extreme position. Four resistances of
1kohm are used to limit the current and to prevent the short circuiting.
38
Air University
Cyborg Hand
39
Air University
Design Details
Troubleshooting
40
CHAPTER 5:
41
Air University
42
Air University
Cyborg Hand
when your GUI is being used, you have no control over the sequence of events
that trigger particular callbacks or what other callbacks might still be running
at those times. This distinguishes event-driven programming from other types
of control flow, for example, processing sequential data files.
construction kit.
43
Air University
When you right-click a FIG-file in this way, the figure opens in the GUIDE
Layout Editor, where you can work on it.
Create toolbars
44
Air University
Cyborg Hand
45
Air University
5.4.1 Start
By pressing start button dialogue box is open which contains speak push
button, which allow user to speak with in one second time.
46
Air University
Cyborg Hand
5.4.2 Troubleshoot
By pressing trouble shoot button the dialogue box open which consist of
following buttons:
1. Rotary motor
2. Gripper open
3. Gripper close
4. Back to main menu
47
Air University
5.4.3 Training
48
Air University
Cyborg Hand
49
Chapter 6:
Conclusion
This report explains yhe implementation of voice controlled Cyborg Hand.The
three phases of voice recognition,along with some basis of machine
learning which were concern with the project were described.GUI
designed for our project was also explained. Robotic Hand designed
and developed for the project was also explained in detail in the
report.
Voice recognition consists of three phases front-end processing, feature
extraction and pattern recognition.
Front end processing consists of sampling, recording, retrieving, framing and
windowing. The second phase of voice recognition is feature extraction, which
consists of extraction of features. The third phase is pattern recognition, which
is a major concern of machine learning. The approach of pattern recognition
implemented in our project is Artificial Neural Network. ANN used in our
project is feed-forward back propagation.
After voice recognition, the report explains the design and development of
Cyborg Arm. The robotic arm designed for our project is a fixed sequence
robotic arm with the end effecter of gripper.
To establish the coordination between robotic arm and computer, and between
human and computer a GUI was designed in our project.
Our implementation of voice recognition is 80% accurate when tested 100
times in some environment. Accuracy degrades with change in environment
due to noise. Accuracy also degrades with change in microphone and training
samples.
50
Air University
Cyborg Hand
Future Suggestions
We have implemented the voice-recognition on PC. It can also be
implemented in another convinient microcontroller. The most suitable
microcontroller for that purpose is DS PIC33, since it has builtin ADC and
also have three pin port for voice recocording and playing.
Our voice recognition system some what depends upon environment. It is
because of noise (voices other than voice commands).Though we have done
the noise removal in feature extraction, in which features like ZCR detectsthe
major voice activity region in agiven sample.But the system can be made less
dependent on noise by implementing of noise removal algorithms at the front
end processing.
Moreover it has enough processing power for front end processing and pattern
recognition.
51
References
[1] Pattern Recognition. Bishop, C Neural Networks for Pattern
Recognition3rd Edition,1996
[2] Prasad D Polur, Ruobing Zhou, Jun Yang, Fedra Adnani, Rosalyn S.
Hobsod SPEECH RECOGNITION USING ARTIFICIAL NEURAL. 2001
Proceedings of the 23rd Annual EMBS International Conference, October 2528, Istanbul, Turkey..
[3] Roziati Zainuddin, Othman O. Khalifa. .Neural Networks Used for
Speech Recognition. NINETEENTH NATIONAL RADIO SCIENCE
CONFERENCE, ALEXANDRIA, March, 119-21, 2002.
[4] Lawrence Rabinar, Biing-Hwang Jaung Fundamentals of speech
recognition.
[5] Harmonic Spectral Centroid. McAdams, S. 1999. Perspectives on the
contribution of timbre to musical structure. Computer Music Journal.
23(3):85-102
[6] Neural Network: Eric Davalo and Patrick Naim Neural Networks 3rd
Edition,1989
[7] Feed Forward Neural Network. http://en.wikipedia.org/wiki/ /Feed forward
neural network.
[8] Neural Network :Aleksander, I. and Morton An introduction to neural
computing. 2nd edition, 1992.
[9]Neural.network.http://www.emsl.pnl.gov:2080/docs/cie/neural/neural.home
page.html
[10] Pattern Recognition. Bishop, C Neural Networks for Pattern
Recognition3rd Edition,1996. [11] Neural network: Howard Demuth, Mark
Beale Neural Network Toolbox4th Edition, July 2002
52
Air University
Cyborg Hand
[12] Neural Network: Hagan,M.T. and H.B. Demuth, Neural Networks for
Control,Proceedings of the 1999 American Control Conference, San Diego,
CA, 1999, pp1642-1656.
[13] GUI: http:// en.wikipedia.org/wiki/Graphical_user_interface
[14] Saeed b Nikku. Introduction to robotics, 2nd Edition, 2001
[15] Robotic Arm .http://en.wikipedia.org/wiki/Robotic_arm.
[16] B.L.Theraja.Electrical Machines 8thEdition, 4 Volumes Set by Tony
Burns, Stephen...
[17] H Bridge http://en.wikipedia.org/wiki/H_bridge
[18] DB25 connector. http://www.nullmodem.com/DB-25.htm
53
Appendix A
MATLAB CODE FOR FEATURE EXTRACTION
54
Air University
Cyborg Hand
m = ((fs/(2*windowLength))*[1:windowLength])';
C = zeros(numOfFrames,1);
for (i=1:numOfFrames)
window = H.*(signal(curPos:curPos+windowLength-1));
FFT = (abs(fft(window,2*windowLength)));
FFT = FFT(1:windowLength);
FFT = FFT / max(FFT);
C(i) = sum(m.*FFT)/sum(FFT);
if (sum(window.^2)<0.010)
C(i) = 0.0;
end
curPos = curPos + step;
end
C = C / (fs/2);
)
C(i) = 0.0;
end
curPos = curPos + step;
end
55
Air University
Appendix A
m = [0:windowLength-1]';
F = zeros(numOfFrames,1);
for (i=1:numOfFrames)
window = H.*(signal(curPos:curPos+windowLength-1));
FFT = (abs(fft(window,2*windowLength)));
FFT = FFT(1:windowLength);
FFT = FFT / max(FFT);
if (i>1)
F(i) = sum((FFT-FFTprev).^2);
else
F(i) = 0;
end
curPos = curPos + step;
FFTprev = FFT;
end
for (i=1:numOfFrames)
56
Air University
Cyborg Hand
window = (H.*signal(curPos:curPos+windowLength-1));
fftTemp = abs(fft(window,2*fftLength));
fftTemp = fftTemp(1:fftLength);
S = sum(fftTemp);
for (j=1:numOfBins)
x(j) = sum(fftTemp((j-1)*h_step + 1: j*h_step)) / S;
end
En(i) = -sum(x.*log2(x));
curPos = curPos + windowStep;
end
57
Air University
Appendix A
58
Air University
Cyborg Hand
for i = 1:1
file = sprintf('%s%d.wav','c',i);
input('You have 1 seconds to say your name. Press enter when ready to record--> ');
y = wavrecord(44100,44100);
sound(y,44100);
wavwrite(y,44100,file);
end
59
Air University
Appendix A
load net1
file = sprintf('%s%u.wav','c',1);
ff1=computeAllStatistics(file, win, step);
ff1=ff1';
%fv(1:5,1)=ff1;
a=sim(net1,ff1);
b=a(1,1);
b
if(b>1)
rotaryfornechay
end
if(b<1)
rotaryforuper
end
60
61
Appendix C
Data Sheets
62
Air University
Cyborg Hand
63
Air University
Appendix C
64
Air University
Cyborg Hand
65
Air University
Appendix C
66
Air University
Cyborg Hand
67
Air University
Appendix C
68
Air University
Cyborg Hand
69
Air University
Appendix C
70
Air University
Cyborg Hand
71
Air University
Appendix C
72
Air University
Cyborg Hand
73
Air University
Appendix C
74
Air University
Cyborg Hand
75
Air University
Appendix C
76
Air University
Cyborg Hand
77
Air University
Appendix C
78
Air University
Cyborg Hand
79