You are on page 1of 5

I nt ernat i onal Journal of E mergi ng Trends & Technol ogy i n Comput er Sci ence (I JE TTCS)

Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com


Volume 1, Issue 4, November December 2012 ISSN 2278-6856


Vol ume 1 , I ssue 4 November - December 2 0 1 2 Page 1 0 6

Abstract: Speech is a unique human characteristic used as a
tool to communicate and express ideas. Automatic speech
recognition (ASR) finds application in electronic devices that
are too small to allow data entry via the commonly used input
devices such as keyboards. Personal Digital Assistants (PDA)
and cellular phones are such examples in which ASR plays
an important role. The main objective is to recognize the
spoken word by same speaker using multi-class support
vector machine. The multi-class classifier is used for number
of classes. The input data with linear predictive coefficient
(LPC) feature extracted are input to training and it takes less
time by Support vector Machine (SVM) to find support
vectors. For test on similar input data the Adaptive directed
Acyclic Graph SVM (ADAGSVM) classifies the input. The
result is obtained in online by interfacing with machine
command of any operating system with better results. For
training and testing we have constructed sample datasets of
Marathi digits zero to nine (Shunya to Nau) and English
letter data set A to Z and machines commands such as login,
shutdown.
Keyword: Dynamic time alignment SVM, Linear
Predictive Coefficient (LPC), Support vectors (SVs) and
Adaptive Directed acyclic graph SVM.
1. INTRODUCTION
Speech recognition is a conversion from an acoustic
waveform to a written equivalent of the message
information. The nature of the speech recognition
problem is heavily depending upon the constraints placed
on speaker, speaking situation and message context. The
most of the application of speech recognition systems are
many and varied; e.g. a voice operated typewriter and
voice communication with computers and command line
interface with machine. The listening tests are conducted
on a large vocabulary task, recognition accuracy by
human was found to be an order of magnitude higher
than machines. Though these tests are included data with
varied signal qualities, human recognition performance
was found to be consistent over a diverse set of
conditions.
The task of this is to create learning machines that can
classify the given spoken digit in to 10 classes
(zero//Shunya// to nine//nau//). Use 10 digits in

Figure 1 Structure of speech recognition system.
Marathi language and 10 machine commands for offline
training and testing. In online, few machine commands
for training as well as for testing are used.
We tried to solve the problem of time alignment of speech
data when using SVM as a classifier and try to reduce the
training time of SVM with good recognition performance.
The different time alignment algorithm is used for
equating different time spoken utterances of the same
word in online [10-14]. It is shown in Fig. 1.This paper
is into three parts. (i) Preprocessing (iii) Training (iii)
Testing.
i)Preprocessing: The preprocessing part consists of two
operations namely End point detection and linear
predictive coding. End point detection is used to separate
the speech signal from non-speech signal. Then linear
predictive coding (LPC) is applied for feature extraction.
ii) Training: The training part of SVM uses number of
classes and due to this the multiclass problem is solved
by one against one approach. The Radial basis kernel is
used to pass the data from input space to higher
Enhanced Speech Recognition Using ADAG
SVM Approach

Mr. Rajkumar S. Bhosale
1
, Ms. Archana R. Panhalkar
2
, Mr. V. S. Phad
3
, Prof. N. S.
Chaudhari
4


1,2,3
Parikrama GOI, Department of Computer Engineering
Parikrama College of Engineering, Kashti-41470, M.S. India


4
Indian Institute of Technology
Indore (M.P.) 452 017
I nt ernat i onal Journal of E mergi ng Trends & Technol ogy i n Comput er Sci ence (I JE TTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 1, Issue 4, November December 2012 ISSN 2278-6856


Vol ume 1 , I ssue 4 November - December 2 0 1 2 Page 1 0 7

dimension feature space. Dynamic time alignment is
used for variable length vector.
iii) Testing: Adaptive directed acyclic graph SVM
(ADAGSVM) algorithm is used for testing each word.
The max win algorithm is used to know the test word.

2. PRE-PROCESSING TECHNIQUES
2.1 Speech Separation by End Point Detection
The sound forge software is used for end point
detection. The speech samples are recorded with bit depth
of 16; channel mono and sample rate 44100 Hz. Sound
forge is used to remove the non-speech part from the
wave. The end point is also known by finding average
noise and average amplitude in the recorded wave. It is
shown in Fig. 2.
After pattern detection, it is applied to the LPC front-end
processor for analysis of speech signal.

2.1 LPC Model for Feature Extraction.
The order of LPC is 14 and window sizes of 256 are
considered for finding LPC coefficient [5]. The basic idea
behind the LPC model is that a given speech sample at
time n,
( ) s n
can be approximated as linear combination
of the past p speech
Figure 2 End point detection by sound forge samples.
Given as
) ( ...... ) 2 ( ) 1 ( ) (
2 1
p n s a n s a n s a n s
p
+ + + ~

The LPC contains six operations which are preemphasis,
frame blocking, windowing, autocorrelation, LPC
analysis conversion to cepstral, coefficients.

3. DYNAMIC TIME ALIGNMENT SVM
Dynamic time alignment used in the SVM for this each
sample is a vector of fixed length, for variable length
vector time alignment tool used. The dynamic time
alignment tool is used into the kernel function so called
dynamic time alignment SVM (DTAKSVM).
i. Given two sequence of vectors
X
and
Y
first
choose the small length vector of the two. Without
loss of generality denote this vector by
X
.
















Figure 3 Nonlinear time alignment algorithm.
i. Considering the first feature vector x1 of the
chosen sequence
X
, and compute the best local
match of this vector in sequence
Y
over a window
of predetermined size denoted by p units. Initially
the window starts from y1. Let the best match be
denoted by yk where
p k s s 1
.
ii. Select the next feature vector x2 and repeat the
above procedure except that the match window
now starts from index k instead of 1 as previously.
Repeat the above procedure till we have obtained two
vectors
X
and
new
Y
which are of the same length.
4. TRAINING USING USING SVM
4.1 Support Vector Classification
Here in detail explain classification mechanism of SVMs
[14] in three cases of linearly separable, linearly non-
separable and non-linear through a two-class pattern
recognition problem. Support vector machines are
originally designed for binary classification problem.
4.2 Linearly Separable Case
The general two-class classification problem can be stated
as follows [2].It is shown in figure 4.
i. Given a data set D of N samples:
N N
y x y x y x , , , , ,
2 2 1 1

. Each sample
is composed of a training example
i
x
of length
I-
J
I i
j
Rang
J
k
+ (I-J)
j
I nt ernat i onal Journal of E mergi ng Trends & Technol ogy i n Comput er Sci ence (I JE TTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 1, Issue 4, November December 2012 ISSN 2278-6856


Vol ume 1 , I ssue 4 November - December 2 0 1 2 Page 1 0 8

M, with elements
, , ,
2 1 M i
x x x x =
and a
target value
{ } 1 , 1 + e
i
y
.
ii. The goal is to find a classifier with decision
function,
( ) x f
, such that
( ) D y x y x f
i i i
e = , ,
.
Figure 4 shows the hyperplane, which separates the
positive from the negative point. This can be formulated
as follows: suppose that all the training data satisfy the
following constraints:

1 + > + b w x
i
for
1 + =
i
y
(4.1)

1 s + b w x
i
for
1 =
i
y
(4.2)

Figure 4 Linear separating hyperplanes for the separable
case.

4.2 Linearly Non-Separable Case
The solution to this problem is identical to the separable
case except for modification of the bounds of the
Lagrange multipliers. The parameter C introduces
additional capacity control within the classifier. In some
circumstances C can be directly related to a regularization
parameter.
4.3 Non-Linear Case
By using non-linear case to map the input data to some
higher dimensional feature space, where the data is
linearly separable. For mapping the data different kernel
functions are uses.
Thus, a mapping from input space to feature space can be
achieved via a substitution of the inner product with:
( ) ( )
j i j i
x x x x u u
(4.3)
calculating each
u
explicitly is not needed. Instead, we
can use a functional representation K(xi, xj) that
computes the inner product in feature space.
( ) ( ) ( )
j i j i
x x x x K u u =
(4.4)
The functional representation is called kernel.
The role of Kernel is shown in Figure 5.
The optimization problem of above equation becomes,
( ) ( )

= =
=
l
i
l
j
j i j i j i
x x K y y W
1 1
2
1
max max

=
+
l
i
i
1

(4.5)
It is not all functions that can be used as kernels. Feasible
kernel must satisfy the some conditions like as
Exponential radial basis function implementation.
2
2
) , (

j i
x x
j i
e x x K

=
(4.6)

Is used as c in implementation.
The term

is needs to be defined by user. Its value


used by try and error method.

5. TESTING USING SVM
The testing part consists of number of data sets i.e.
multiple classes. But the SVMs are binary classifiers so a
technique is used to extend the method to handle multiple
classes. To handle the multi-class problem used technique
is called one to one classifier [2] [11]. Methods for
solving the multi-class problem of SVMs are typically to
consider the problem as a combination of two class
decision functions, e.g. one-against-one and one-against-
rest.

Figure 5 The role of the kernel
5.1 Adaptive Directed Acyclic Graph S V M
(ADAGSVM)
ADAGSVM stands for Adaptive Directed Acyclic Graph
SVM. It is multi-class method used to alleviate the
problem of the DDAG structure [2] [4] [18]. An Adaptive
DAG (ADAG) is a DAG with reversed triangular
structure. This approach provides accuracy comparable
I nt ernat i onal Journal of E mergi ng Trends & Technol ogy i n Comput er Sci ence (I JE TTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 1, Issue 4, November December 2012 ISSN 2278-6856


Vol ume 1 , I ssue 4 November - December 2 0 1 2 Page 1 0 9

to that of max Wins, which is probably the currently most
accurate method for multi-class SVMs, and requires low
computations. It employs exact same training phase as
One vs. One classifier. I.e. it creates
2 ) 1 ( K K
binary
SVMs, one for each pair of classes. However, it
distinguishes itself in the testing phase; the nodes are
arranged in a reversed triangle with
2 K
nodes
(rounded up) at the top,
2
2 K
nodes in the second layer
and so on until the lowest layer of a final node. It has K-1
internal nodes.
Given a test example x, starting at the top level, the
binary decision function is evaluated. The node is then
exited via the outgoing edge with a message of the
preferred class. In each round, half reduces the number of
candidate classes. Based on the preferred classes from its
parent nodes, the binary function of the next-level node is
chosen.
The reduction process continues until reaching the final
node at the lowest level. Consider for the testing of class
char (4), the ADAG structure is shown in Figure 6.


Figure 6 Structure of an Adaptive DAG classifier for 10
class problem.
6. RESULTS
The SVM classifier is used for 10 Marathi digits and 10
machine commands for recognition in training and
testing. There are 200 data point by 20 replications of
each digit and commands by same speaker. Results are
obtained for different values of c and constant value of d
equal to 10. The following results are obtained in
percentage for 100 data points recognition. The same
speaker is used for training and testing dataset.
For training Marathi 10 digit Shunya to Nau and 10
machine command (on, off login) with 20 replication
of each are used. Table 1 is shows the percentage
efficiency for 100 dataset in offline testing of data and it
is calculated as follow.
= ) (
i
e ofEachWord Efficiency

rOfSamples TotalNumbe
cognized mples Numberofsa Re
100


= iency TotalEffic
n
e
n
i
i
= 1


Where n is total number of words. Speaker from Training
and Testing Data sets are same. The LPC Features, with
100 input words for training and same 100 words for
testing in offline. In online, table 2 uses few machine
commands for training and one machine command for
testing are used.
Table: 6.1
Performance of SVMs classifier using LPC features for
100 points
Input
Word
C=0.5
(Overall
efficiency in
%)
C= 0.7
(Overall
efficiency in
%)
C=0.9
(Overall
efficiency in
%)

Marathi
Data set
Offline

94

95

97

Command
Data Set
Offline

93

94

96.5

Results are shown for machine commands in percentage
when each command applied for 10 times in testing.

Table: 6.2
Performance of SVMs classifier using LPC features for
few machine commands.

Input Word C= 0.9
(Over all efficiency in
%)

Command Data Set Online

96.00

We have tried to reduce the training time of SVM with
good recognition performance. Proposed method
(DTAKSVM) required less training time for recognition
performance.
I nt ernat i onal Journal of E mergi ng Trends & Technol ogy i n Comput er Sci ence (I JE TTCS)
Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com
Volume 1, Issue 4, November December 2012 ISSN 2278-6856


Vol ume 1 , I ssue 4 November - December 2 0 1 2 Page 1 1 0

6. CONCLUSION
Support Vector Machine classifier using Adaptive
Directed Acyclic Graph SVM algorithm is used for 10
Marathi digits, 10 machine commands for offline speech
recognition. It is also uses few machine commands for
training and one command for testing online. It is found
to be promising classifier for speech recognition. Due to
the less training and testing time performance, it can be
used for real time application. LPC features are used for
speech recognition task, DTAK algorithm with the kernel
function (RBF). The system is speaker dependent. The
testing part of speech recognition uses ADAG SVM. The
ADAG gives better result in less iteration for the number
of classes. It may be use for authentication as it is speaker
dependent. The running time of this scheme is directly
proportional to number of classes in input dataset. This
scheme can be extended for continuous speech
recognition.
REFERENCES
[1] Boonserm Kijsirikul and Nitiwut Ussivakul,
Multiclass Support Vector Machine Using
Adaptive Directed Acyclic Graph, IEEE 2002, pp
980-985.
[2] Xin Dong, Wu Zhaohui and Pan Yunhe, A New
Multi-Class Support Vector Machines, IEEE 2001,
pp 1673-1676.
[3] Chen Junil, Jiao Licheng, Classification
Mechanism of Support Vector Machines, IEEE
2000.
[4] K.P. Bennett, J.A. Blue. A Support Vector Machine
Approach to Decision Trees, IEEE 1998, pp 2396-
2401.
[5] L. Rabiner and B. Juang, Fundamental of Speech
Recognition. Prentic Hall, 1993.
[6] C. J. C. Burges. A Tutorial on Support Vector
Machines for Pattern Recognition. Knowledge
Discovery and Data Mining, 2(2), 1998.
[7] V.N.Vapnik, The nature Of Statistical Learning
Theory, Springer Verlag, New York 1995.
[8] V.N.Vapnik, Statistical Learning Theory,
Springer Verlag, New York 1998.
[9] Hiroshi Shimodaira, et al, Support Vector Machine
with Dynamic Time-Alignment Kernel for Speech
Recognition, Eurospeech 2001, Scandinavia.
[10] Shantanu Chakrabartty and Yunbin Deng,
Dynamic Time alignment in Support Vector
Machines for Recognition Systems.
[11] Kijsirikul and N. Ussivakul. Multiclass support
vector machines using adaptive directed acyclic
graph. In Proceedings of International Joint
Conference on Neural Networks (IJCNN 2002), pp
980985, 2002.
[12] Nello Cristinini and Jhon Shawe-Taylor. An
Introduction to Support Vector Machine and other
kernel-based learning methods, Cambridge
University Press 2000.
[13] T. Joachims. Text categorization with support
vector machines: learning with many relevant
features. In Proc. 10th European Conference on
Machine Learning ECML-98, pp. 137142, 1998.
[14] S. Gunn. Support vector machines for
classification and regression. Technical Report,
University of Southampton Image Speech and
Intelligent Systsmes Group, 1997.

AUTHORS


1
Bhosale R.S. received the B.E. and M.E.
degrees in Computer Science and
Engineering from SRTMU, Nanded in
2001 and 2007, respectively. During 2001-
2007, he stayed in Signal Processing and
Computer Networking Research Laboratory.

2
Panhalkar A. R. received the B.E. and
M.E. degrees in Computer Science and
Engineering from SRTMU, Nanded in
2003 and 2008, respectively. During 2003-
2008, she stayed in Signal Processing and
Image Processing Research Laboratory.

3
Phad V.S. received the B.E. and M.E.
degrees in Computer Science and
Engineering from SRTMU, Nanded in 2003
and 2008, respectively. During 2003-2008,
she stayed in Signal Processing and Image
Processing Research Laboratory.

You might also like