Professional Documents
Culture Documents
Ilaria Cicola
EPHE
Paris, France
ilaria.cicola@etu.ephe.fr
Department Italian Institute of Oriental Studies, Sapienza
University of Rome
Rome, Italy
I.
INTRODUCTION*
All authors have contributed equally to this work , but since it refers to a
modular project, Lancioni should be mainly credited for secs. 3 and 5, Pepe
for sec. 2B, Silighini for sec. 4, Pettinari for sec. 2C, Cicola for secs. 1 and
2A, Benassi for sec. 6, Campanelli for sec. 2D.
ISBN: 978-1-61208-311-7
TOOLS
107
IMMM 2013 : The Third International Conference on Advances in Information Mining and Management
ISBN: 978-1-61208-311-7
17 occurrences
SOLUTION:
Al+vaworiy~+
vaworiy~_1
[]
ENGLISH GLOSS:
the+revolutionary+
POS ANALYSIS:
Al/DET+vaworiy~/ADJ+
Figure 1. Excerpt of an AraMorph analysis of Meedan Memory
108
IMMM 2013 : The Third International Conference on Advances in Information Mining and Management
portrayal;
D. Arabic WordNet
Arabic WordNet (AWN) is a lexical resource for Modern
Standard Arabic based on the widely used Princeton
WordNet (PWN) for English [5]. There is a straightforward
mapping between word senses in Arabic and those in PWN,
thus enabling translation to English on the lexical level. Each
concept is also provided with a deep semantic underpinning,
since, besides the standard Wordnet representation of senses,
word meanings are defined according to SUMO.
However, AWN represents only a core lexicon of Arabic,
since it has been built starting from a set of base concepts. In
this sense, being the mapping with PWN relatively poor, this
project uses in fact an AWN augmented model (AAWN)
which extends this core WordNet downward to more specific
concepts using lexical resources such as Arabic Wikipedia
(AWp) and Arabic Wiktionary (AWk).
Wikipedia is by far the largest encyclopedia in existence
with more than 4 million articles in its English version
ISBN: 978-1-61208-311-7
109
IMMM 2013 : The Third International Conference on Advances in Information Mining and Management
WORKFLOW
ISBN: 978-1-61208-311-7
110
IMMM 2013 : The Third International Conference on Advances in Information Mining and Management
RESULTS
ISBN: 978-1-61208-311-7
111
IMMM 2013 : The Third International Conference on Advances in Information Mining and Management
RESULT SUMMARY
error rate
precision
recall
F measure
60.54%
64.90%
74.56%
1.39
ISBN: 978-1-61208-311-7
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
http://www.ldc.upenn.edu/Catalog/docs/LDC2010T13/atb1-v4.1taglist-conversion-to-PennPOS-forrelease.lisp
T. Buckwalter, Buckwalter Arabic Morphological Analyzer Version
1.0. Philadelphia:Linguistic Data Consortium, 2002.
T. Buckwalter and D. Parkinson, A Frequency Dictionary of Arabic.
London and New York: Routledge, 2011.
S. Green and J. DeNero, A class-based agreement model for
generating accurately inflected translations, in ACL 12,
Proceedings of the 50th Annual Meeting of the Association for
Computational Linguistics: Long Papers, vol. 1, pp. 146-155.
C. Fellbaum, WordNet and wordnets, in: Brown, Keith et al. (eds.),
Encyclopedia of Language and Linguistics, Second Edition, Oxford:
Elsevier, 2005, pp. 665-670
S. Green and Ch. D. Manning, Better Arabic parsing: baselines,
evaluations, and analysis, in COLING 10, Proceedings of the 23rd
International Conference on Computational Linguistics, pp. 394-402.
M. Boella, F. R. Romani, A. Al-Raies, C. Solimando, and G.
Lancioni, The SALAH Project: segmentation and linguistic analysis
of hadth arabic texts. information retrieval technology lecture notes
in Computer Science vol. 7097, Springer, Heidelberg, 2011, pp 538549.
Ch. M. Meyer and I. Gurevych, What psycholinguists know about
chemistry: aligning Wiktionary and WordNet for increased domain
coverage, in Proceedings of the 5th International Joint Conference
on Natural Language Processing, Chiang Mai, Thailand, 2011, pp.
883-892.
E. Niemann and I. Gurevych, The peoples web meets linguistic
knowledge: automatic sense alignment of wikipedia and wordnet, in:
Proceedings of the International Conference on Computational
Semantics (IWCS), Oxford, United Kingdom, 2011, pp. 205-214.
E. Wolf and I. Gurevych, Aligning Sense Inventories in Wikipedia
and WordNet, in: Proceedings of the First Workshop on Automated
Knowledge Base Construction (AKBC), Grenoble, France, 2010, pp.
24-28.
H. A. Salmon, An Advanced Learners Arabic-English Dictionary.
Beirut: Librairie du Liban, 1889.
N. Habash, O. Rambow, and R. Roth, MADA+TOKAN: A toolkit
for Arabic tokenization, diacritization, morphological disambiguation,
POS tagging, stemming and lemmatization, in Proceedings of the
2nd International Conference on Arabic Language Resources and
Tools (MEDAR), Cairo, Egypt, 2009, pp. 102-109.
I. Turki Khemakhem, S. Jamoussi and A. Ben Hamadou, Integrating
morpho-syntactic features in English-Arabic statistical machine
translation, in Proceedings of the Second Workshop on Hybrid
Approaches to Translation, Sofia, Bulgaria, 2013, pp. 74-81.
M. Daoud, D. Daoud and Ch. Boitet, Collaborative construction of
Arabic lexical resources, in Proceedings of the 2nd International
Conference on Arabic Language Resources and Tools (MEDAR),
Cairo, Egypt, 2009, pp. 119-124.
S. Izwaini, Problems of Arabic Machine Translation: evaluation of
three systems, in Proceedings of the International Conference on the
Challenge of Arabic for NLP/MT, The British Computer Society
(BSC), London, 2006, pp. 118-148.
Arabic Treebank Guidelines, http://www.ircs.upenn.edu/arabic/
guidelines.html. Accessed October 2013.
https://github.com/anastaw/Meedan-Memory. Accessed October
2013.
112