You are on page 1of 12

See

discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/262392606

Orthographic Transcription for Spoken


Tunisian Arabic

Conference Paper March 2013


DOI: 10.1007/978-3-642-37247-6_13

CITATIONS READS

5 55

5 authors, including:

Ins Zribi Marwa Graja


University of Sfax University of Sfax
7 PUBLICATIONS 30 CITATIONS 6 PUBLICATIONS 18 CITATIONS

SEE PROFILE SEE PROFILE

Lamia Hadrich Belguith


University of Sfax
107 PUBLICATIONS 249 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

A defintion question-answering system in Arabic language View project

Automatic processing of the Tunisian dialect: construction of linguistic resources View project

All content following this page was uploaded by Ins Zribi on 11 August 2016.

The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document
and are linked to publications on ResearchGate, letting you access and read them immediately.
Orthographic Transcription for Spoken Tunisian Arabic

Ins Zribi, Marwa Graja, Mariem Ellouze Khmekhem,


Maher Jaoua, and Lamia Hadrich Belguith

ANLP Research Group, MIRACL Lab., University of Sfax, Tunisia


ineszribi@gmail.com,
{marwa.graja,maher.jaoua,l.belguith}@fsegs.rnu.tn,
mariem.ellouze@planet.tn

Abstract. Transcribing spoken Arabic dialects is an important task for building


speech corpora. Therefore, it is necessary to follow a definite orthography and a
definite annotation to transcribe speech data. In this paper, we present OTTA,
Orthographic Transcription for Tunisian Arabic. This convention proposes the
use of some rules based on the standard Arabic transcription conventions and
we define a set of conventions which preserve the particularities of Tunisian
dialect.

Keywords: Tunisian dialect, orthographic transcription.

1 Introduction
Arabic is a Semitic language among the oldest in the world. It is recognized by its
three main variants which are Classical Arabic (CA), Modern Standard Arabic (MSA)
and colloquial or dialectal Arabic (DA) [1]. Arabic dialects are distinguished accord-
ing to many levels. They can be classified according to geographical areas as they can
be classified according to sociological and regional differences. In every Arab coun-
try, we usually find dialects used by urban residents, peasants / farmers and Bedouins
[2]. Arabic dialects can also be classified into two groups the Eastern dialects (Levan-
tine Arabic, Gulf Arabic and Egyptian Arabic) and Western dialects of the Arab
world (the Arab Maghreb) [3]. The major difference between these dialects is located
at the phonological level and mainly in the vowels and consonants interdentally. For
example, the dialects of the west tend to neglect some short vowels and vowel leng-
thening to reduce the long vowels. Those in the east remain the same vowels of Clas-
sical Arabic [3].
Dialectal Arabic is essentially spoken and used in every day communication [1],
[4]. It is used in Chat, public services, radio, telephone conversation, etc. So, it is so
important to consider Dialectal Arabic in the new technologies like dialogue systems,
telephone applications, etc. From this fact emerges the necessity to transcribe dialectal
Arabic. But the transcription task is still difficult to achieve and the transcription out-
put should use an orthographic convention to obtain a coherent data and consistent
corpora. Few studies ([15], [16], [17]) have addressed the task of orthographic tran-
scription for dialects. This is due to the lack of resources which are necessary to
process dialects. Indeed, we still need resources and data to build main tools for

A. Gelbukh (Ed.): CICLing 2013, Part I, LNCS 7816, pp. 153163, 2013.
Springer-Verlag Berlin Heidelberg 2013
154 I. Zribi et al.

natural language processing, extract features of dialects and test methods and ap-
proaches. In this context and to build a Tunisian Arabic corpus, we propose OTTA
(Orthographic Transcription for Tunisian Arabic), a set of guidelines to orthography
transcribe the spoken Tunisian dialect.
This paper is organized as follows. In section 2, we propose a comparison between
Tunisian Arabic and MSA. In section 3, we present a background of existing ortho-
graphy to other dialects and their limits. In section 4, we present our OTTA guidelines
to orthography transcribe Tunisian Arabic.

2 Tunisian Arabic and Modern Standard Arabic: A


Comparative Study
The Tunisian Arabic (TA) is a subset of Arabic dialects associated with the Arabic of
the Maghreb (the west of Arab world). Like all Arabic dialects, it is characterized by
morphology, syntax, phonology and lexicon which have differences and similarities
compared to the MSA and even to other Arabic dialects. The TA, as the North Afri-
can dialects, is strongly influenced by Berber, but also by other languages such as
Turkish, Italian, Spanish and French. It has several main regional varieties but the
Tunis variety (used in the capital city of Tunisia) is the most understood by all
Tunisians [5].
To identify TA characteristics, we studied TuDiCoI corpus (Tunisian Dialect Cor-
pus Interlocutor) [6]. It is a pilot corpus of spoken dialogue in Tunisian dialect. It is a
collection of recorded conversations in railway station about request information such
as the train schedule, ticket fare, booking, etc. It consists of 434 dialogues, which
represent 3080 utterances.

2.1 Phonology
The phonological system of TA compared to the phonological system of MSA has sev-
eral differences in vowel and consonant system [7]. Indeed, the vowel system of Tuni-
sian dialect is distinguished, first, by a short vowel [8]. The MSA has only three short
vowels [i, a, u] which can be doubled to their long corresponding. Tunisian dialect neg-
lects short vowels especially when they are located at the end of a syllable [7].
Take the example of the verb ("[ "ariba], "he drank") in MSA which ends
with the short vowel [a]. In TA, this verb is transformed into ("[ "rib], "he
drank"). We notice the deletion of the first and the last vowel [a]. Generally, deleting
the first vowel changes the syllabic structure of lexical units which tend to monosyl-
lables for certain words.
Tunisian Arabic is considered the closest one to MSA among other Arabic dialects
especially during pronunciation of consonants. In most cases, the consonant " "of
MSA is pronounced [q] in the dialects of urban and rural and, generally, is pronounced
[q] ([ qa:l], "he said") in the urban dialects and [g] ([ ga:l], "he said") in the rural
dialects. Sometimes, the word sense in the TA could be changed if we change the pro-
nunciation of the letter "." For example, the word ("[ "QRU:n]) means "centuries"
but the word ("[ "gru:n]) means "horns". The TA is characterized also by the
Orthographic Transcription for Spoken Tunisian Arabic 155

presence of phonetic assimilation (the pronunciation of the letter "[ "] in the word
("[ "azza: r], "a butcher") often becomes "[ "z] ("", [zazza: r]) and metathe-
sis (the word ("[ "ms], "sun"), is pronounced ("[ "sem])) [7]. Moreover,
borrowing words from other languages have introduced new phonemes which are not
used in the consonant system of MSA such as ('', [v]), ('[ 'g]) and ('[ 'p]) [9]. There
are, for example ("[ "vista], "jacket"), ("[ "Bagra], "cow") and ("[" pist],"
track ") [10, 11].

2.2 Morphology

There are many differences between TA and MSA at morphological level. First, the
suffixes relative to the dual form are generally absent. The noun suffixes " "and
" "are usually replaced by the use of the numeral word "( "two) located before
or after the plural noun form. Also, we note the deletion of the feminine gender in the
plural when we conjugate a verb.
MSA is characterized by the use of singular, dual and plural in both for masculine
and feminine in the verb conjugation. However, in TA, we notice the total disappear-
ance of the dual (male and feminine) and the feminine plural. Similarly, TA is charac-
terized often by the disappearance of the feminine gender for the singular1. In addition,
the fall of some markers and some conjugation suffixes are another difference from the
MSA. Also, Tunisian verbs conjugation knows simplifications in affixation system
[12].
The linguistic study of the corpus shows that the Tunisian Arabic knows the pres-
ence of new affixes and clitics compared to the MSA.
Negation in MSA can be marked with the use of one negation particle such as
///[ m: / l: / lm / ln] which is situated before the verb. In TA, the negation
is generally marked with the presence of two particles. The first is located before the
verb " "and the second is agglutinated in the end position of the verb [7]. The nega-
tion takes the following structure /[m:] + verb + /[].

(a) Example: " [ " m:kli:ti] I have not eaten.

Among these news enclitics, we note, also, the introduction of ("[ "w] "his") as a
new pronominal clitics. Likewise, the enclitic dual pronoun is replaced by the mascu-
line plural pronoun. We note, also, the deletion of the verbal interrogation prefix ""
which is transformed in a suffix "".
To transform trilateral MSA verbs in the passive form, we add some infixes to the
verbal root. Indeed, we add the short vowels [, , ]/ [u, i, a] to generate the passive
form. For example, the root "[ "ktb] (write) must follow the verbal model [fu3ila]
to be in passive. The verb will be " [ " kutiba] (was written). While in TA, we add
the prefix [t] to the verb. The passive form of the verb "[ "kteb] "write" in TA will
be "[ "tkteb] (was written) [13].

1
We note that some Tunisian dialects distinguish between the masculine and the feminine in
the singular form.
156 I. Zribi et al.

2.3 Orthography
Like other Arabic dialects, TA has known a variation in its orthographic transcription.
This variation is caused by the absence of dialectal orthographic rules and also by the
phonological differences between MSA and TA.
Indeed, the MSA common part of TA lexicon knows same orthographic variations.
For example, the noun " "can be transcribed with the suffix ""," "tah marbu-
ta or with the " "," "ha.
Also, the word " " appears in certain case as a proclitic " "and in certain cases
is transcribed as a separate word.
Sometimes, speakers do not pronounce some letters and others pronounce them.
For example, the word "" , (I said to you) is often pronounced " "by deleting
the letter .

2.4 Lexicon and Code Switching

The current linguistic situation in Tunisia is both diglossic2 and bilingual3 [14]. The
Tunisian people code switch easily and frequently between MSA, Tunisian dialect
and French language in the same conversation. This is due to the lack of knowledge
and the facility of use either Arabic or French language in a certain subject [14].
Code switching between Arabic and French language affects the lexical level of
Tunisian dialect. In fact, it allows introducing new dialectal words which are derived
from foreign languages.
Indeed, the verbal system of Tunisian dialect contains several French verbs [12].
These verbs are integrated into the system via the application of Arabic schemes
derivation and the phoneme substitution for the phonetic integration of these verbs.
Generally, the integration of these verbs conserves theirs original meaning. The appli-
cation of Arabic schemes allows their conjugation [12]. Often, borrowed verbs are
transformed via the scheme "[ " falil]. For example, the French verb jongler (to
juggle) becomes "["ang/i/l], and for the French verbal phrase avoir sa matrise
(have his Master) is transformed into "[" matr /i/z] [12]. In some cases, we just
add affixes to the borrowed verb for its conjugation. For example, to conjugate the
French verb installer (to install), the prefix "["j] is added. The verb will be in
Tunisian Arabic "[ "jansta:li] (he installs) [12].
The same rule is used for the generation of nouns and adjectives. From the bor-
rowed verbs, nouns and adjectives are derived also. For example, the noun 4" "is
derived from the French verb reviser (to revise). We add a simple prefix"[ "m].
When analyzing the TuDiCoI [6] corpus, we note that there are 2265 French
words which represent 11.81% of the corpus and 328 words derived from French
language which correspond to about 2% of the corpus.

2
Diglossic refers to the use of MSA and the dialectal form [14].
3
Bilingual involves the use of Arabic and the French [14].
4
A person who has revised some things.
Orthographic Transcription for Spoken Tunisian Arabic 157

These words are part of the vocabulary used in conversations and they must be
added to any dictionary of Tunisian dialect lexicon. Table 1 gives some examples of
loanwords which appear in Tunisian dialect lexicon.

Table 1. Examples of foreign words


Dialect French origin Translation
merci thank you
oui Yes
train Train
je rgle
je valide I validate

3 Background about Orthographic Conventions of Arabic


Dialect

Few studies in the literature dealt with the transcription task of dialectal Arabic. In-
deed, the transcription of Levantine dialects was the subject of Zawaydehs work [15]
and Maamouri et al. [16] who developed a set of rules to transcribe Levantine dialects
in order to create a Levantine Arabic corpus. Their works are based on the MSA tran-
scription conventions. The principle of their proposed method requires the transcrip-
tion of dialectal Arabic using Arabic script without short vowels5 (except the short
vowel nunation6) by respecting the conventions of spelling and words segmentation of
MSA.
Take the example of the Levantine word ("[ "ultilak], I said to you). This
word is transcribed as two separate words. This segmentation is the result of applying
the MSA rule which requires the separation between the verb and the prepositional
object. As well, their convention converts the letter ""7 to its origin " "in MSA. So
the Levantine word "( "I said to you) is transcribed into () .
Maamouri et al. [16] justified their choice of using MSA-based orthography by re-
ducing the cost of resource creation (the speed and ease of creation) for Levantine
Arabic. Indeed, the annotators can use their knowledge of MSA instead of learning
and using new phonetic symbols. Thus, transcription of Levantine Arabic using MSA-
based orthography can use existing MSA tools.
Habash et al. [17] present also, a conventional orthographic Dialectal Arabic
(CODA). It is designed mainly for developing computational models of Arabic di-
alects. CODA is an extension of the LDC guidelines. It proposes to develop a CODA
for all Arabic dialects. Actually, Habash et al. [17] have developed a CODA map to
cover only Egyptian Arabic.

5
The work of [15] requires the use of the double consonant "shedda" when it raises the ambi-
guity of pronunciation.
6
Nunation is a short vowel used in Arabic language ().
7
The velar letter " "in Levantine dialect is pronounced "".
158 I. Zribi et al.

MSA-based orthographic transcription has several advantages. First, the transcrip-


tion task is easy for annotators. Second, we can use the MSA existing tools to process
Arabic dialects. Also, the corpus is readable for all Arabs. However, each dialect of
the Arabic language has its own features which allow it to be distinguished from other
dialects and even from MSA. Given the differences between Arabic dialects, existing
works cannot be applied in totally to all Arabic dialects. That's why we propose, in
this paper, a set of conventions to orthography transcribing the Tunisian Dialect.

4 Orthographic Transcription of Tunisian Arabic

To standardize the orthographic transcription of the Tunisian Arabic, we use some


orthographic rules of MSA transcription and we define new rules for the specificities
of the Tunisian Arabic.
We used this convention for the transcription of the corpus TuDiCoI. Indeed, we
defined a set of annotations used to reflect the pronunciation of the Tunisian dialect in
order to improve the transcription quality. Subsequently, the obtained corpus based on
such transcription will be useful for the creation of processing tools for Tunisian
Arabic such as the Tunisian dialect stemmer, morph-syntactic tagger, etc. Also, it is
useful for the creation of automatic speech processing systems such as speech synthe-
sis, automatic transcription systems, etc.

4.1 Transcription Rules

The Tunisian dialect lexicon consists of MSA words (with or without modification),
dialectal Tunisian words and loanwords. The transcription of the words which are
pronounced without modification compared to MSA must respect the orthography
transcription standards of MSA. Therefore, we present, first, the main MSA-based
rules to be respected in our transcripts. Next, we define specific rules based on the
Tunisian Arabic specificities.
Tunisian Arabic transcription is based on the orthography of MSA in the case
where the word is pronounced like in MSA or with the reduction of some short vo-
wels [, , ]. Our transcription method keeps the main MSA orthographic rules in the
following cases:

The use of Arabic characters with short vowels.


The conventions of word segmentation.

Thus the word in Tunisian Arabic ([kte:b], a book) is written " "even its reduced
vocalism form compared to MSA [kitabon]. Similarly, we use word segmentation
rules of MSA. Indeed, we transcribe some affixes as words. Take the example of the
combination question mark ("[ "esh], "what") which is sometimes reduced to a
single letter "[ "sh] concatenated to the next word. This combination replaces the
conjunction query ("[ "ma: a:], "what") in MSA. To make it closer to the sentence
structure in MSA, we have chosen to transcribe this combination as a separate word
Orthographic Transcription for Spoken Tunisian Arabic 159

with its extended form " "and not its reduced form "". Also, in Tunisian Arabic,
the preposition ("[ "la], "on") is transformed into a single letter ("", [], "on"). In
MSA, a letter must always be agglutinated to the word which follows it. So, this pre-
position is transcribed as an enclitic (see example (b) below). This is the same case
for the conjunctive ("[ "w], "and"). In addition, we apply the segmentation of words
in negation case and in the use of prepositional object case. For example, the word
("[" qoltlou], "I said to him") must be transcribed in two words separated by
spaces (see example (c)). This segmentation is justified by the fact that the preposi-
tional object ("[ "Lou], "to him") should not be agglutinated to the verb, similarly,
for the conjunction of negation ("[ "ma:]) (see example (d)).

(b) "[ "attawla], "on the table."


(c) "[ " qolt / lou], "I said," and not "".
(d) "[ " ma: qoltech], "I did not say" and not " "or "".

The pronunciation of "Hamza8" presents a difference compared to the MSA. The


"Hamza" is sometimes replaced by one of these vowels ( , or ). For example, we
replace the phoneme [i] of the word ("[ "faida], "profit") by the phoneme [j]
("[ "fajda], "profit"). Moreover, speakers of Tunisian Arabic pronounce the "Ham-
za" only if it is located at the beginning of a word. For example, in the noun ("",
[osteth], professor) the hamza letter is pronounced.
Moreover, we can identify in Tunisian Arabic new phonemes such as [g]. The use
of Arabic letters to transcribe these phonemes produces meaning ambiguities. There-
fore, we propose to use the non-Arabic letters (('', [v]), ('[ 'g]) and ('[ 'p])) to tran-
scribe these new phonemes.
Obviously, the lexicon of Tunisian Arabic knows the presence of words which
have no roots in the Standard Arabic language. These are words specific to Tunisian
dialect. The transcription of these words is different from one to another transcriber.
In order to have a homogeneous corpus with words transcribed in a unique way, we
define rules which combine rules of the MSA and phonemic compositions. Table 2
summarizes the proposed transcription rules.

Table 2. Transcription rules of Tunisian Arabic

N Transcription rule Example


1 If the last phoneme of a word is a short [ karhba] (car)
vowel [a], then the word is spelled with
the silent letter . Apply this rule only for [ pascla] (bicycle).
names.
2 If the last phoneme of a word is characte- [ja:xi:],[ ra:hu:].
rized by a vowel lengthening ([a:], [u:] or
[i:]), then the word is spelled with one of We do not use the simple alif ()
these vowels , or . after (). We do not write .

8
Hamza is a letter in the Arabic alphabet representing the glottal stop [].
160 I. Zribi et al.

Table 2. (Continued)
3 Use the non-Arabic letters (, [v]), ( Transcribe , [bagra] and not
[g]) and ([ p]) to transcribe new pho- [ baqara] (cow).
nemes used in Tunisian Arabic. We do
not use the letters (, and ).
4 The letter "Hamza" is transcribed only Transcribe and do not .
when it is pronounced.
5 Some words are transcribed as affixes: the Transcribe and do not
preposition (, on) is reduced to a single .
letter transcribed as a prefix.
6 Some suffixes are transcribed as words: Transcribe )( and do
question mark is sometimes reduced not .
to a single letter , but we must write it
correctly.

4.2 Annotation

We use and adapt transcription conventions of TOE [18] which are made by LPL
laboratory. These conventions are developed for the transcription of conversational
French corpus. We add some precisions and we modify some rules for the transcrip-
tion of Tunisian Arabic.
Our transcription method consists of an orthographic transcription which reflects
the phonetic pronunciation; therefore, we do not use acronyms in the transcripts.
Thus, we should transcribe word as it is spelled. Our transcription doesnt note any
abbreviation.
To annotate named entities, we propose adding an opening and closing tag before
and after the proper name indicating its type (see example (e)). We use a / code
( for patronym and for toponym). The form is: < / ,Ortho9>.

(e) < , >

Sometimes, dialect speakers do not pronounce some letters. For example, the word
[ mou] ("not at all") is often pronounced ("[ "me], "not at all") by reducing
vowel lengthening "". Therefore, we propose to transcribe words into their correct
forms and make non pronounced characters between parentheses. We should tran-
scribe " " )( and not "". The same case for this example ("[ "metlek] "I've
done for you") should be transcribed as "")( .
Given the existence of loanwords in Tunisian, we aim to annotate these words
etymologically. This allows us to perform its automatic processing. We use this anno-
tation [lan: language, orthography, pronunciation] for marking loanwords. We put in
square brackets, and separated by comma, the language, the standard transcription in
the foreign language, and the speakers pronunciation (in SAMPA alphabet).

9
Ortho is the orthographic transcription of named entity.
Orthographic Transcription for Spoken Tunisian Arabic 161

(f) [ lan:Fr, merci, d2] (Thank you very much)


(g) [lan:ASM ( ] he rides)
(h) ( Mohamed Ali went to Tunis) is transcribed:
< , < >, >

Besides, we aim to transcribe and annotate spontaneous spoken language. Thus, we


are confronted with the problem of disfluencies which are defined as a phenomenon
occurring frequently throughout spontaneous speech, and consist of the interruption of
the normal course of speech [19]. As well, we propose to improve the annotations
used to mark these facts. For marking incomplete words, we propose to use the sign
"". They are noted by a final dash just after the final sound of the truncated word, and
followed by a blank (example: -).
Also, we identified a set of hesitations frequently used by Tunisian dialect speakers
as " "which is transcribed by adding the symbol in the left of a word ().
Some of these hesitations are from other languages. We used a standard lexical list of
hesitations.
The transcription is principally an orthographic transcription. We use punctuation
marks for delimiting phrase boundaries in the transcription. We added precisions
about particular pronunciations, and some other details. The most usual cases are
described in the table 3.

Table 3. Annotation rules of Tunisian Arabic

Annotation rules Examples


Numbers Numbers have to be written in ( one)
letters. ( ten thousands)
Titles Movies, books, newspaper titles
are written between quotation " " )(
marks.
Undeterminable Graphic variants are noted be- { ,{}they killed, he
morphologic tween braces, separated by com- killed him}
variants mas.
Atypical accords We do not correct the transcrip- ( instead of
tion. We should write it as it is )
pronounced.
Liaisons Links which are specific to Tuni- ( == Tuni-
sian dialect are annotated. We also sian dialect liaison)
annotate the missing links be- # ( not used
tween words. liaison)
Reported speech Direct reported speech sequences \
are noted between the characters \
\.It is preceded and followed by
a blank.
Incomprehensible Long and short incomprehensible
sequences sequences are always noted by
one star (*).
162 I. Zribi et al.

Table 3. (Continued)

Laughers Laughers are transcribed with the & means


symbol &. A speech sequence that the speaker said the
produced while laughing is coded words, and then laughed.
between "&&". && &&
means that the speaker
laughed when he is say-
ing .
Pauses Short pauses (less than 200 ms) +
are notated with +.
Non-linguistic They are incomprehensible se- Breathing ()
events quences. We suggest to precise Puffing ()
what kind of event they are. Noise by the mouth ()
Cough ()
Sneeze ()
Wistle ()

5 Conclusion

Transcription of a spoken language which is in constantly evolution and known by the


use of borrowing from other languages presents a challenge for its standardization. In
this context, we tried to propose a set of rules which allow the transcription of Tuni-
sian Arabic with respect to the MSA rules and also not to neglect the characteristics of
such a dialect. We plan, also, to continue the enhancement of our conventions for the
transcription of Tunisian dialect.

References
1. Al-Saidat, E., Al-Momani, I.: Future Markers in Modern Standard Arabic and Jordanian
Arabic: A Contrastive Study. European Journal of Social Sciences 12(3) (2010)
2. Diab, M., Habash, N.: Arabic Dialect Processing Tutorial. In: Proceedings of the Human
Language Technology Conference of the North American Chapter of the ACL, Rochester,
pp. 56. Association for Computational Linguistics (April 2007)
3. Alorifi, F.S.: Automatic identification of Arabic dialects using Hidden Markov Models. In
mmoire de thse, Universit de Pittsburgh (2008)
4. Almeman, K., Lee, M.: Towards developing a Multi-dialect Morphological analyzer for
Arabic. In: 4th International Conference on Arabic Language Processing, Rabat, Morocco,
May 2-3 (2012)
5. Khalfaoui, A.: A cognitive approach to analyzing demonstratives in Tunisian Arabic. In
PhD thesis of university of Minnesota (November 2009)
6. Graja, M., Jaoua, M., HadrichBelguith, L.: Lexical Study of A Spoken Dialogue Corpus in
Tunisian Dialect. In: ACIT 2010: The International Arab Conference on Information
Technology, Benghazi - Libya, December 14-16 (2010)
Orthographic Transcription for Spoken Tunisian Arabic 163

7. Mejri, S., Said, M., Sfar, I.: Pluringuisme et diglossie en Tunisie. In: Synergies Tunisie,
vol. (1), pp. 5374 (2009)
8. Tilmatine, M.: Substrat Et Convergences: Le Berbre Et Larabe Nord-Africain. Estudios
de Dialectologia Norteafricana y Andalusl 4, 99119 (1999)
9. Kirchhoff, K., Bilmes, J., Das, S., Duta, N., Egan, M., Ji, G., He, F., Henderson, J.,
Liu, D., Noamany, M., Schone, P., Schwartz, R., Vergyri, D.: Novel approaches to Arabic
speech recognition: report from the 2002 Johns-Hopkins Summer Workshop. In: Proceed-
ings of IEEE International Conference on Acoustics, Speech, and Signal Processing
(ICASSP 2003), Missouri, USA, vol. 1, pp. 344347 (April 2003)
10. Mejri, S., Baccouche, T.: Latlas linguistique de Tunisie: repres mthodologiques pour la
description du systme dialectal. In: Lentin, J., Lonnet, A. (eds.) Mlanges David Cohen,
pp. 4754. Maisonneuve & Larose, Paris (2003)
11. Quitout, M.: Parlons larabe tunisien. In book edited by LHarmattan (2006)
12. Ouerhani, B.: Interfrence entre le dialectal et le littral en Tunisie: Le cas de la morpholo-
gie verbale. In: Synergies Tunisie, vol. (1), pp. 7584 (2009)
13. Maalej, Z.: Passives in modern standard and Tunisian Arabic. Matriaux Arabes et Suda-
rabiques-Gellas 9, 5176 (1999)
14. Bouzemni, A.: Linguistic situation in Tunisia: French and Arabic code switching. In:
INTERLINGISTICA, vol. 16(1), pp. 217223 (2005) ISSN 1134-8941
15. Zawaydeh, B., Stallard, D., Makhoul, J. (2003),
http://ldc.upenn.edu/Catalog/docs/LDC2005S08/BBN-Babylon-
transcription-guidelines.pdf
16. Maamouri, M., Buckwalter, T., Cieri, C.: Dialectal Arabic Telephone Speech Corpus:
Principles, Tool Design, and Transcription Conventions. In: NEMLAR International Con-
ference on Arabic Language Resources and Tools, Cairo, September 22-23 (2004)
17. Habash, N., Diab, M., Rambow, O.: Conventional Orthography for Dialectal Arabic. In:
Proceedings of the Language Resources and Evaluation Conference (LREC), Istanbul
(2012)
18. Bertrand, R., Blache, P., Espesser, R., Ferr, G., Meunier, C., Priego-Valverde, B.,
Rauzy, S.: Le CID - Corpus of Interactional Data - Annotation et Exploitation Multimo-
dale de Parole Conversationnelle. Traitement Automatique des Langues 49(3), 105134
(2008)
19. Heeman, P., Allen, J.: Detecting and correcting speech repairs. In: Proceedings of the 32nd
Annual Meeting on Association for Computational Linguistics, Las Cruces, New Mexico,
pp. 295302 (1994)

View publication stats

You might also like