You are on page 1of 6

Malay Grapheme to Phoneme Tool for Automatic Speech Recognition

Tien-Ping Tan1, Bali Ranaivo-Malanon2


1
School of Computer Science, Universiti Sains Malaysia, 11800 USM, Penang, Malaysia
2
Multimedia University, Jalan Multimedia, 63100 Cyberjaya, Selangor, Malaysia
tienping@cs.usm.my, ranaivo@mmu.edu.my

Abstract
This paper presents the design and performance of
a Malay grapheme to phoneme (G2P) tool for
generating the pronunciation dictionary for a Malay
automatic speech recognition system (ASR). The G2P
tool is a rule based system. It is flexible in adding and
removing rules, and handling of English words. The
G2P tool also contains morphological and syllable
tool, which it uses to determine the pronunciation of a
word. Our evaluation results showed that using the
pronunciation dictionary that was generated
automatically from our G2P tool, our Malay ASR
system achieves WER of 16.5%, which is only 1.9%
higher compared to the usage of a pronunciation
dictionary that are manually verified.

1. Introduction
A grapheme-to-phoneme (G2P) is a tool used to
generate the pronunciation of a given word. A
grapheme is the fundamental unit of written
language, and a phoneme is the smallest
linguistically distinctive unit of sound [1]. G2P is an
important component of many speech processing
systems. For example, in speech synthesis systems, the
pronunciation of unknown words, that is, words that are
not in the pronunciation dictionary can be predicted by
applying G2P rules. In speech recognition systems, a
G2P tool can be used to generate the pronunciation
dictionary.
Malay, in its variety of forms, is widely used in
Malaysia, Indonesia, Singapore, and southern of
Thailand. In this paper, we focus only on the Malay as
it is used in Malaysia. Malay is written using either
Latin alphabet (Rumi) or an adapted Arabic alphabet
(Jawi). The G2P that is described in this paper is only
for Rumi Malay.
This paper reports our effort to develop Malay G2P
system for ASR system. Section 2 provides an

overview of the difficulties in Malay pronunciation.


Section 3 describes Malay phonology, while Section 4
and Section 5 discuss Malay morphology and syllable
structures respectively. Section 6 presents the
grapheme to phoneme rules. We then evaluate our
pronunciation dictionary by using an ASR system in
Section 7. Conclusions are drawn in Section 8.

2. Challenges for Malay pronunciation


The official and national language of Malaysia is
Malay, which presents a variety of dialects depending
on the regions. These dialects are distinguishable in
term of words, pronunciation and/or tone.
Malaysia is a multiracial and multilingual country.
It is often usual to hear the members of one specific
ethnic group to use their first language, leaving Malay
as an intergroup language. Many researchers have
shown that the first language (L1) of a speaker
influences second language (L2) acquisition [2].
Speakers with a different L1 may not speak like the
native speakers in term of pronunciation or style.
Malay like other languages is also very much
influenced by English. A lot of English words have
been absorbed into Malay especially in the area of
science and technology. Although there is a standard
method to convert the borrowed words from English
into Malay, in reality in writing and in conversation,
the original English words are often used instead. Thus,
code switching in Malay is a very common and
interesting phenomenon. Automatic speech recognition
(ASR) for Malay is therefore challenging due to the
existence of different variants of Malay and code
switching.
The first published paper on Malay ASR system
was in 1997 [3]. Their research concerns the
recognition of Malay isolated digits [4] and the
segmentation and labeling of speech utterance for
Malay ASR system [5]. Pronunciation dictionary is one
of the components of an ASR system. A G2P tool is

Table 1. Malay consonants


Place of articulation

Mode of
articulation

Bi-lab.

Lab.-dent.

Dent.

pb

Plosive
Fricative
Affricate
Vibrante
Lateral
Nasale
Glide

fv

m
w

Table 2. Malay vowels


Front
Close
Close-mid
Open

i
e

Back
u
o

Alveo.

Alveopalat.

td
sz
t
r
l
n

Palat.

Vel.

Glot.

kg
x

?
h

To our knowledge, there is only one Malay


pronunciation
dictionary
besides
our
own
pronunciation dictionary. This lexicon contains 13,550
entries, each entry is associated with its pronunciation,
syllable grouping of the phonemes and the stress level
for each syllable [9].

3. Malay phonology
There are 36 phonemes in Malay [10]. Six of them
are vowels, three are diphthongs and 27 are
consonants. Table 1 and Table 2 show the IPA tables
for Malay vowels and consonants respectively. The
three Malay diphthongs are /aj/, /aw/ and /oj/. Figure 1
shows the Malay phoneme distribution in the text.

18
16
14
12
P e rc e n t

10
8
6
4
2
0
a
b
d
dZ
e
f
g
h
i
j
J
k
l
m
n
N
o
o j/a j/a w
p
r
s
S
t
tS
u
w
z
?
@

normally used to generate the pronunciation of a word.


This pronunciation is predicted based on the grapheme
of the word, since most of the graphemes in a particular
context have a specific way to be pronounced.
There is still no real consensus on the list of Malay
speech sounds even for the standard Malay. El-Imam
and Don [6] proposed 27 consonants (19 native
consonants and 8 consonants that appear only in
borrowed words) and 6 vowels. Ting [7] proposed 33
phonemes with 18 pure Malay consonants and 6
vowels. In the same year, the same previous authors
announced 8 vowels [8].
El-Iman and Don [6] have divided the rules for the
transformation of letters into sounds into two sets: a set
30 grapheme-to-phoneme rules and a set of 46
phoneme-to-phonetic rules. At the word level, the
system made 43% errors due to the pre-processing of
abbreviations, numbers, and unknown words. Ting et
al. [7] have used two methods to obtain the graphemeto-phoneme rules. First, they wrote manually 94 rules.
The matching is only 71% at the word level. Second,
they applied CART tree model to acquire automatically
the grapheme-to-phoneme rules. The result is slightly
better: 73.93% at the word level.
In Malay, the pronunciations of most words can be
determined from the grapheme, although there are
some exceptions. Besides grapheme, the morphology
and the syllable structures of Malay words are also
required to determine the pronunciation of the word.
As mentioned above, English words also often appear
in Malay texts. For these words, a different strategy has
to be applied to generate their pronunciations.

Phoneme

Figure 1. Phoneme distribution of Malay

4. Malay morphology
Malay is an agglutinative language. It can create
new words by adding affixes to a root word. Besides,
additional bound morphemes can be added to the
affixed word as it is shown in Figure 2 [11].

Circumfix
Infix

Prefix

Proclitic

Root

Suffix

Affixed word

Enclitic Particle

Figure 2. Affixed word with clitic and particle [11]

The procedure for segmenting a word into syllables


is simple (Figure 4). First, the possible syllable that can
be formed is determined. During syllable segmentation,
the grapheme of the word is converted to different
sound class like vowel, diphthong, fricative, affricate,
plosive, nasal and glides. The word is then segmented
to syllables by determining the largest syllable that can
be formed from right to left.
diberikannya

Infixation is no longer used in Malay. The native


affixes contain nine prefixes, three suffixes, and seven
circumfixes. There are two proclitics, four enclitics,
and three particles. Most of these bound morphemes
are monosyllabic.

diberikan.nya
diberi.kan.nya
dibe.ri.kan.nya

5. Malay syllable structures

di.be.ri.kan.nya
Malay syllable structures are shown in Table 3.
Most of the words with two or more consonants that
form the coda of a syllable are borrowed from English.
For example the Malay word struktur is from the
English word structure.
Table 3. Malay syllable structures
Syllable
Word
Description
V
i.kan
V.CVC
CV
sa.tu
CV.CV
CVC
ban.tu
CVC.CV
CCV
dwibahasa CCV.CV.CV.CV
CCVC
prak.tik
CCVC.CVC
CCCV
stra.tegi
CCCV.CVCV
CCCVC
struk.tur
CCCVC.CVC
Figure 3 shows the distribution of Malay words in
the texts in term of syllable length. Most of the words
in Malay are disyllabic. Disyllabic words form nearly
half of the overall words in the text. This is followed by
words with three syllables.

Figure 4. Segmenting the word diberikannya to


syllables

6. Grapheme to phoneme conversion rules


An efficient Malay G2P tool has to be flexible in
adding and removing rules because speakers with
different background may use different pronunciation
rules. In addition, it needs to handle English words that
can appear in Malay texts. Our Malay G2P is a rulebased tool. We use eight rules to automatically
generate the standard Malay pronunciations.
Pronunciation variants can be generated by adding or
removing some of the standard Malay rules used. As
for English words, we produce the pronunciation of
English words using Malay phonemes.

6.1. Standard Malay rules


6.1.1. General replacement rule. Every grapheme is
by default mapped to a Malay phoneme. For example
the word diberikan is converted to /d i b r i k a n/.
One might notice that there is a phoneme without any
default mapping to any grapheme. That phoneme is /e/.
This phoneme is normally mapped to the grapheme e
for certain words.

0.5
0.45
0.4
0.35
Percent (%)

CV.CV.CV.CVC.CV

0.3
0.25
0.2
0.15
0.1
0.05
0
1

3
Number of syllables

>5

Figure 3: Words distribution with different number


of syllable length

Table 4. Grapheme to phoneme mapping rules


Grapheme Phoneme Grapheme Phoneme
p
b
t

p
b
t

j
l
r

d
l
r

d
k
q
g
s
x
h
f
v
z
sy
sh
kh
gh
c

d
k
k
g
s
s
h
f
v
z

m
n
ng
ny
w
y
a
e
i
o
u
ai
au
oi

m
n

w
j
a

i
o
u
aj
aw
oj

6.1.2. Schwa rule. The grapheme a at the end of a


word is pronounced as //. This rule is applicable for
old Malay words. For example the word suka is
pronounced as /s u k /. If the root word is combined
with a suffix, the a at the end of the root word is still
pronounced as //. For example, the word sukakan,
which is formed by adding the suffix -kan to the root
word suka is pronounced as /s u k k a n/ For English
borrowed words and proper nouns, this rule is not
applicable. The current schwa rule does not distinguish
borrowed words and proper nouns. Thus, the rule is
applied on all words. However, a workaround is
possible since there are a finite number of words which
is applicable. These words can be manually identified.
There are also many speakers who do not use this rule.
6.1.3. Glottal stop insertion rule. A glottal stop /?/ is
inserted between two particular sequences of vowel
graphemes in a word.
Table 5: Glottal stop insertion rules
Grapheme sequence
Word
Pronunciation
aa
taat
/t a ? a t/
oa
doa
/d o ? a/

6.1.4. General glottal stop rule. The grapheme k at


the end of the syllable is converted to a glottal stop /?/.
For example, tidak is pronounced as /t i d a ?/.
6.1.5. Final r deletion rule. The grapheme r at the
end of a word is not pronounced. However, there are
some speakers that pronounce this final r. For
example: sukar is pronounced as /s u k a/.

6.1.6. Glide insertion rule. The rule inserts a glide


between two particular vowel grapheme sequences in a
word. For example, buah is pronounced as /b u w a h/,
and siap is pronounced as /s i j a p/.
6.1.7. Last syllable rule. For words with more than
one syllable, in a certain context, the grapheme u and
i of the last syllable is converted to phoneme /o/ and
/e/ respectively. Like schwa rule, if the root word is
appended with a suffix, the rule is still applicable on
the last syllable of the root word.

Target
grapheme
u
i

Table 6. Last syllable rules


Following
Word
Pronunciation
grapheme
k, h, p, m, hidup
/h i d o p/
ng, r
k, l, t, h, r, bilik
/b i l e ?/
t, k

6.1.8. Duplicate grapheme rule. Two similar


graphemes are converted to a single phoneme. This
rule is used mostly for proper nouns. For example,
Azzam is pronounced as /a z a m/.

6.2. Variant Malay rules


We create a set of pronunciation variants by simply
removing schwa rule and final r deletion rule from the
standard Malay rules because there are speakers who
do not use these rules in their speech.

6.3. English words


Since it is possible for English words to appear in
Malay text, a different approach was used to generate
the pronunciation of these words. First, the unknown
word is compared to the English vocabulary (from an
English pronunciation dictionary). If the word can be
found, the English pronunciation is converted to the
Malay equivalent. This is done by mapping each
English phoneme to the nearest Malay phoneme
based on perception. Except for some diphthongs, they
are mapped to 2 phonemes. Studies have shown that
non-native speakers tend to replace the target language
phoneme with their native language phoneme. The
English pronunciation dictionary Hub4 from CMU was
used. It is possible that a word can appear in English
and Malay. For words that appear in the English
pronunciation dictionary, the pronunciation of the word
will appear as variants.

Table 7. Correspondence of English and Malay


phonemes
English
Malay
English
Malay
Phoneme Phoneme Phoneme Phoneme

a
a
b

e
,
e
f
g
h
i
I (long)

a
o
e

o
aw
aj
b

d
d
e

ej
f
g
h
i
i

k
l
m
n

p
r
s

u
v
w
j
z

k
l
m
n

ow
oj
p
r
s

t
t
u
u
v
w
j
z

8. Conclusion
The results show that automatically generated
pronunciation dictionary performed only slightly worst
than the pronunciation dictionary that was created
semi-automatically. However, it also shows that there is
still room for improvement. For the mapping of
grapheme e to phoneme /e/, one possible way to
reduce the mismatch is by force aligning the grapheme
e to either // or /e/. This approach however only
solves part of the problem. The second improvement is
to identify words that should be applied schwa rules
and words that should not. As discuss earlier, one way
is to manually determine those words that should apply
this rule. This will eliminate some unnecessary variants
from the dictionary. Thirdly, we should verify the
English to Malay phoneme mapping to make sure that
they are applied correctly. We may even improve the
mapping by taking into consideration the context of the
English phoneme it is in. Fourth improvement possible
is to determine from the original text, a word found in
the English pronunciation dictionary, whether it is
really an English word or a Malay word.

8. References

7. Evaluation

[1] Wikipedia, http://en.wikipedia.org/wiki/.

For evaluating the read speech, the MASS Malay


speech corpus [12] was used. The speech corpus
consists of about 70 hours of read speech from 90
speakers. The audio files were divided into training and
testing part. Sphinx 3 automatic speech recognition
system from CMU was used for the evaluation. The
language model was a trigram model created from the
text corpus of 500MB. As for the acoustic model, a
continuous HMM acoustic model with 3000 states, and
16 Gaussian mixtures per state was created. A
pronunciation dictionary with about thirty thousand
words was created using our G2P tool. We test the
pronunciation dictionary that was created automatically
using the G2P tool against the pronunciation dictionary
that was created semi-automatically. For the
pronunciation dictionary that was created semiautomatically, expert was assigned to correct the
pronunciation of words that are not correctly generated,
especially the conversion of grapheme e to /e/, and
schwa rule that was applied incorrectly. With the
automatically generated dictionary, we were able to
achieve WER of 16.5%, while using the semiautomatically generated pronunciation dictionary, our
current system achieved a WER of 14.6%, or 11.5%
relative improvement.

[2] J. Flege, "Second Language Speech Learning Theory,


Findings, and Problems," in Speech Perception and
Linguistic Experience: Issues in Cross-Language Research,
W. Strange, Ed.: Baltimore: York Press, 1995, pp. 233-277.
[3] A. Hussain, M. Othman and Z. A. Md. Shariff,
Recurrent backpropagation neural subnetworks for phoneme
based Malay speech recognition, ISPACS 97, Malaysia,
1997.
[4] S. A. R. Al-Haddad, S. A. Samad, A. Hussain, K.A.
Ishak, Isolated Malay Digit Recognition Using Pattern
Recognition Fusion of Dynamic Time Warping and Hidden
Markov Models, American Journal of Applied Sciences,
5(6), 2008, pp. 714-720.
[5] S. A. R. Al-Haddad, S. A. Samad, A. Hussain,
Automatic Segmentation and Labeling for Malay Speech
Recognition, WSEAS Transactions on Signal Processing,
9(2), 2006, pp. 1337-1341.
[6] Y. A. El-Imam, Z. M. Don, Text-to-speech conversion
of standard Malay, International Journal of Speech
Technology 3, Kluwer Academic Publishers, 2000, pp. 129146.
[7] H. N. Ting, J. Yunus, S. H. S. Salleh, Classification of
Malay speech sounds based on place of articulation and

voicing using neural networks, Proceedings of IEEE


International Conference on Electrical Technology, vol. 1,
2001, pp. 170-173.
[8] H. N. Ting, J. Yunus and S. H. S. Salleh. Speakerindependent Malay syllable recognition using singular and
modular neural network, Journal Teknologi, 35(D), 2001,
pp. 65-76.
[9] K. L. Wai, H. O. Siew and R. Zainuddin, Building a unit
selection speech synthesiser for Malay language using
FESTVOX and hidden Markov model toolkit (HTK),
Chiang Mai University Journal of Natural Sciences, 6(1),
2007, pp. 149-158.

[10] Y. M. Maris, The Malay Sound System. Malaysia: Siri


Teks Fajar Bakti, 1979.
[11] B. Ranaivo-Malacon, "Computational Analysis of
Affixed Words in Malay Language," Internal Publication,
USM, 2004.
[12] T-P. Tan, H. Lee, E. K. Tang, X. Xiong, E. S. Chng,
"MASS: A Malay Language LVCSR Corpus Resource",
Cocosda09, Urumqi, 2009 (submitted).

You might also like