You are on page 1of 6

AN HPSG ANALYSIS OF ARABIC VERB

Md. Shariful Islam Bhuyan*, and Reaz Ahmed*

*
Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology,
Dhaka-1000, Bangladesh
sharifulislam@cse.buet.ac.bd, reaz@ cse.buet.ac.bd

ABSTRACT
In spite of being a successful syntactic theory in many 2. ARABIC VERBAL SYSTEM
respects, Head-driven Phrase Structure Grammar Arabic language exhibits an extremely rich morphology
(HPSG) has inadequate coverage for morphological [8]-[9]. Both concatenative and nonconcatenative
constructions, especially for nonconcatenative operations take place in the formation of an Arabic
morphology, which is prominent in the Semitic word. Inflection is made by concatenative operations
languages such as Arabic, Hebrew etc. In this paper, whereas derivation is made by non-concatenative
we extend the HPSG framework to support rich operations.
nonconcatenative morphology of the verbal system of Morpho-syntactic operations performed over the
Arabic, the best instance of nonconcatenative morphemes come with two flavors: concatenative and
morphology among the living languages. We also nonconcatenative. Concatenative operations are those
introduce necessary features for syntactic and semantic where morphemes are linearly concatenated. For
aspects of an Arabic Verb. example:
i. Prefixation: clear | unclear
Keywords: Nonconcatenative Morphology, Head-driven
Phrase Structure Grammar, Arabic Verbal ii. Suffixation: walk | walked
Morphology, Constraint-based Grammar iii. Circumfixation: mind | unmindful

Nonconcatenative operations are those where


1. INTRODUCTION morphemes are nonlinearly embedded. For example:
Broad-coverage precision grammar [1]-[3] and i. Infixation: kataba | kattaba
computational lexicon development for deep linguistic
processing is a research-intensive area with several ii. Simulfixation: eat | ate
potential applications [4]. Amidst the vast literature on iii. Modification: man | men
formal linguistic theory [5], Head-driven Phrase
Structure Grammar (HPSG) [6] has a unique position iv. Suppletion: go | went
since it combines the best features of the contemporary
approaches as well as establishes an integrated There are many other morpho-syntactic operations
framework for cross-layer representation comprising also. In this paper, we mainly focus on
phonology, morphology, syntax, semantics, pragmatics nonconcatenative operation and give a mathematical
and discourse. Although, HPSG successfully describes formalism to capture their rich diversity.
numerous syntactic and semantic phenomena, it lacks Arabic word formation is an excellent example of
rigorous analyses for morphological phenomena, nonconcatenative root-pattern morphology. A
especially for non-concatenative morphology [7]. combination of root letters are plugged in a variety of
Nonconcatenative morphology illustrates an interesting morphological pattern with priory fixed letters and
paradigm of morphological operations, which is particular vowel melody that gives rise to corresponding
prominent in the Semitic languages such as Arabic, syntactic and semantic phenomena. To feel the richness
Hebrew etc [10]-[11]. of Arabic morphological patterns, which we call
Among the living languages, Arabic demonstrates “measure” in this paper, following example is given.
the best instance of nonconcatenative morphology. Here, the root letters ‘k’, ‘t’, ‘b’ bearing a concept of
Arabic verb system exhibit both concatenative and writing, are plugged in various measures to get a myriad
nonconcatenative morphology, capable of lexically of syntactic and semantic phenomena. The measures
expressing diverse syntactic and semantic phenomena. with a particular semantic paradigm are called “Form”.
Formalisms of existing morphological analyzers for Arabic has many forms. Among them, ten forms are
Arabic are not powerful enough to capture this higher used regularly. The root letters ‘k’, ‘t’, ‘b’ can be
layer diversity. In this paper, we extend the HPSG plugged in among nine of them.
framework to support rich nonconcatenative
i. Form I (Transitive): kataba – He wrote
morphology for the first comprehensive HPSG-
construction of Arabic verbal system.
Table 1 Derivational Paradigm of root “ktb”
FORM I FORM II FORM III FORM …
Active perfect katab-a kattab-a kaatab-a …
Passive perfect kutib-a kuttib-a kuutib-a …
Active imperfect ya-ktub-u yu-kattib-u yu-kaatib-u …
Passive imperfect yu-ktab-u yu-kattab-u yu-kaatab-u …
Active imperative u-ktub kattib kaatib …
Passive Imperative litu-ktab litu-kattab litu-kaatab …
Verbal noun kitaab-atun ta-ktiib-un kitaab-un …
Active participle kaatib-un mu-kattib-un mu-kaatib-un …
Passive participle ma-ktuub-un mu-kattab-un mu-kaatab-un …
Locative participle ma-ktab-un … … …
Instrumental participle mi-ktab-un … … …
… … … … …

2) Suffix: hu – the object pronoun attached as a


ii. Form II (Causative): kattaba – He caused to
clitic
write
3) Root: k, t, b – the root letters bearing the
iii. Form III (Ditransitive): kaataba – He
concept of writing
corresponded
4) Measure: ya_ _u_u – bearing the syntactic and
iv. Form IV (Factitive): aktaba – He dictated
semantic information of the event
v. Form V (Reflexive): takattaba – It was written
It may be possible to concatenate multiple prefixes
on its own
and suffixes. However, there must be a single measure
vi. Form VI (Reciprocity): takaataba – They wrote and single set of root letters, where the measure
to each other packages syntactic and semantic features and root
supplies the core concept. Here the measure indicates
vii. Form VII (Submissive): inkataba – He was that – the actor is in 3rd person, singular number,
subscribed masculine gender; the verb is in indicative case, active
viii. Form VIII (Reciprocity): iktataba – They wrote voice and derived form I; it also indicates that the event
to each other has not yet been completed. If we plug in another set of
root letters, for example, n, S, r – which bears the
ix. Form X (Control): istaktaba – He asked to write concept of helping, we get sayanSuruhu – He will help
him.
The above example illustrates the derivational In our analysis, a measure may contain two parts –
paradigm of Arabic word. However, there is also an stem-measure and affix-measure. From the above
inflectional paradigm, which is governed by the tables, we note a crucial point that, for a particular
agreement information. Every entry of the table 1, can inflectional paradigm, a certain portion of the word,
take fourteen inflectional form according to there containing all the root letters, is always constant. For
number gender and person. For imperfect form, there table 2 and 3, these are “katab” and “kutib”
are three such inflectional paradigms. Table 2 and 3 respectively. We call this fixed portion, the “stem-
show the inflectional paradigm for active perfect and measure” and the remaining part, containing prefix
passive perfect entry of form I. and/or suffix, the “affix-measure”.
An Arabic word can encode a complete sentence.
For example, sayaktubuhu - He will write it. We can Table 2 Inflectional Paradigm of Form I-Active-
break the word in the following component. Perfect
Ind/Sub/Juss Singular Dual Plural
(Writing concept) 3rd/Masc. katab-a katab-aa katab-uua
root 3rd/Fem. katab-at katab-ataa katab-na
prefix sa-yaktubu-hu suffix 2nd/Masc. katab-ta katab-tuma katab-tum
(future particle) measure (it-attached 2nd/Fem. katab-ti katab-tuma katab-tunna
pronoun) 1st katab-tu katab-na katab-na
(3rd/sg/masc/ind/perf/act/form-I)
Depending on this analysis, we can give the
From the diagram, we can conclude that an Arabic
following model of an Arabic word.
word has four components. We can break the word in
the following components.
A root-derived Arabic word =
1) Prefix: sa – the particle indicating future Prefix + affix-measure (stem-measure (Root)) + Suffix
Table 3 Inflectional Paradigm of Form I-Passive- mathematical object Construct (a formal representations
Perfect of grammar rules or schema that are used to license
Ind/Sub/Juss Singular Dual Plural signs). Both sign and construct are described using
3rd/Masc. kutib-a kutib-aa kutib-uua feature structure - a collection of features of
3rd/Fem. kutib-at kutib-ataa kutib-na corresponding linguistic objects along with their values.
2nd/Masc. kutib-ta kutib-tuma kutib-tum These features and their values constitute a very
2nd/Fem. kutib-ti kutib-tuma kutib-tunna detailed type hierarchy (see figure 1). We use
1st kutib-tu kutib-na kutib-na constructional HPSG [6] in this paper.

There are syntactic and semantic features, which


governs the derivational and inflectional paradigms for
Arabic roots. With a linguistic investigation, we have
listed some features that will be used in this paper.
Attributes in the table 4 and 5, govern the derivational
and inflectional paradigm for an Arabic root
respectively.

Table 4 Attributes Governing Derivational


Paradigm
Attribute Values
POS noun, verb, particle
FORM I, II, III, IV, … Figure 1: An HPSG Type Hierarchy
VOICE active, passive
VFORM perfect, imperfect, imperative An utterance can have linguistic feature spanning
multiple layers, e.g. phonological, morphological,
Table 5 Attributes Governing Inflectional syntactic, semantic, pragmatic and others. To capture
Paradigm these features, the description of a typical HPSG sign
Attribute Values looks like figure 2. To capture grammatical rules, the
MODALITY emphatic, uncertainty feature structure of a construct has a mother (MTR)
MOOD indicative, subjunctive, jussive feature and a daughters (DTRS) feature. The value of
PERSON 1st, 2nd, 3rd the MTR is a sign and the value of the DTRS is a
NUMBER singular, dual, plural nonempty list of signs. the description of a typical
GENDER masculine, feminine HPSG construct looks like figure 3. The licensing of
signs follows The Sign Principle which states that
CASE nominative, accusative, genitive
“Every sign must be lexically or constructionally
DEFINITENESS definite, indefinite
licensed, where, lexically licensed only if it satisfies
POLARITY affirmative, negative
some lexical entry, and constructionally licensed only if
it is the mother of some construct” [6].
This is not the whole story of Arabic morphology.
Another facet of Arabic morphology is the concept of
root class. We call a set of roots, which share a common
derivational and inflectional paradigm, a root class.
Depending on the characteristics of root letters, the
class is determined. The roots ‘k’, ‘t’, ‘b’ and ‘n’, ‘S’,
‘r’ both are member of same root class – the sound root
class.

3. AN HPSG PRIMER Figure 2: An HPSG Sign


Natural languages are generally consists of two
components. First, the utterances that can be used by From the type hierarchy of figure 1, we can see that
human. Second, the linguistic rules that license those there are two type of feature structure. Functions are the
utterances. For example, in English, ”He writes books”, feature structure that is described using an attribute
”writes books”, ”writes” – all are valid utterances. value matrix (AVM). They maps features to feature
However, “Writes he books”, “writes he”, “rwite” are structure. Atoms are atomic types that can be used as
not valid, since the rules do not license them. HPSG is a the value of features. Notable functions are sign, cxt
mathematical theory for natural languages that formally (construction), lexeme, phrase and others. To model a
captures these two core linguistic components. linguistic phenomenon we first need to identify the
Utterances are modeled using a mathematical object involved signs with their hierarchy. Next, we need to
Sign (a formal representation of linguistic objects design functional feature structures for them with
phrase, words, etc.) and rules are captured using another linguistically motivated features.
FRAMES feature. Next, the feature MEASURE, which
contains the morphological, syntactic and semantic
information contributed by measure. First, the feature
FORM, which denotes the semantic paradigm. kataba
Figure 3: An HPSG Construction

Then we also need to define the necessary constructs as


well as the atomic type hierarchy. In the next section,
we build these ingredients for Arabic verbs.

4. ARABIC IN HPSG
Here we give the attribute value matrix (AVM) for an
Arabic verb kataba – “He wrote”, in active form and its
corresponding passive form kutiba – “It was written”,
using our analysis. In the figure 4, we have three
features associated with morphology. First, the feature
TYPE, which denotes the associated root class. Arabic
roots are classified into several root class according to
their derivational and inflectional paradigm. This
feature affects both root and measure. Therefore, it has
been taken out to a first level morphological feature. In
this case, its value is sound. Next, the feature ROOT,
which has a list of root letters as well as the CONTENT
feature, which gives the semantic contribution made by
root letters. In this case, its value is structure-shared
with the write-fr in the

Figure 5: An HPSG Sign for kutiba

is a form-I derivative. Next, the feature PATTERN that


captures the stem along with the root letters using
structure sharing. Then, the feature CAT, which
contains the syntactic category for this measure. Its
value is structure-shared with the syntactic feature CAT.
Finally, the feature PNG, which captures the PERSON,
NUMBER and GENDER information of our semantic
actor in the case where it is not syntactically realized.
We present the syntactic and semantic information
using the SYN and SEM feature, for the word - kataba.
First, the CAT features identifies the syntactic category
of - kataba. It contains the VFORM and VOICE feature
of Arabic, which governs the derivational paradigm of
verb lexeme. In this case, their values are perfect and
active respectively. Next, the VAL feature that captures
the subcategorization of verbs. VAL is a list of signs,
which are required by the syntactic head. In this case,
the verb - kataba, requires an object. The verb - write is
a transitive verb that takes an object. We should note
that the hidden pronoun - he is encoded by the
inflectional morphology, when no explicit subject is
used. The semantic actor is not realized syntactically.
So, the verb only subcategorizes for syntactic object.
We can also see the constraints imposed over the object.
In this version, its syntactic head should be a noun
Figure 4: An HPSG Sign for kataba
phrase with the value of its CASE feature set to
accusative. The negative value of the OPT feature co-indexing for sharing semantic objects. The discourse
indicates that this object is not optional, rather required referent predicate is actually the actor of the write-
to be syntactically correct. predicate. To denote this constraint, the INDEX value
of hidden pronoun and the ACTOR value of the write-
predicate are co-indexed, both are given the value i.
This is an example of reference co-indexing. We also
use event co-indexing. The event hook SIT of write-
predicate, situation hook of the entire scenario and
argument ARG of the perfect-predicate, all are co-
indexed and expressed using the value s. Another
important issue of HPSG representation is the syntax-
semantics interface. In this example, this is done by co-
indexing the INDEX value of the syntactic object and
the UNDGR value of the write-predicate with a value j.
This indicates that the syntactic object is our semantic
undergoer whereas from our previous discussion we can
note that the semantic actor is not syntactically realized.

Figure 6: An HPSG Sign for yaktubu

Next, we need to consider some semantic features.


Here, we use a type feature version of predicate logic to
capture semantics of natural language. First, we
consider the INDEX feature, which is a reference to a
discourse entity. Then, the PNG feature, which capture
the semantics of PERSON, NUMBER and GENDER.
Next, the FRAMES feature, which serves as a bag for
elementary-predicates to describe the situation at hand.
For example, in the case of kataba, the event of writing
is expressed. The event is completed in the past and
there is a discourse referent to the actor. To capture the Figure 7: An HPSG Sign for yuktabu
core event, write-predicate is introduced. To capture the
temporal constraint, we use the perfect-predicate. In the figure 5, we show the HPSG representation of the
Finally, to express the actor of the event, the hidden passive - kutiba. We identify the associated changes for
pronoun, we introduce a discourse referent with this conversion. The PATTERN feature is changed to
corresponding PNG feature. Predicates have their capture the derivational morphological operation. Next
respective arguments. write-predicate has a situation change can be found obviously in the feature VOICE,
hook, expressed by the feature SIT. There are two changing its value to passive. Unlike English, which
semantic role associated with this predicate. First, we can have a prepositional complement in passives,
consider the role of writer, who plays a doer role, Arabic passives do not subcategorize for a subject or
expressed by the feature ACTOR. Second, we consider any other argument. For this reason, the VAL list is
the role of written, who plays an undergoer role, empty. Moreover, the discourse referent in the feature
expressed by the feature UNDGR. The perfect-predicate FRAMES is now co-indexed with the UNDGR feature
takes a situation hook as an argument, which is of the write-predicate, expressed by the value j.
expressed as the feature ARG. We use the technique of Semantic actor now completely unknown by not having
any syntactic or semantic reference, which is a [3] Comrie B., Fabri R., Hume B., Mifsud M., Stolz T.,
distinctive property of Arabic passives. In figure 6 and and Vanhove M., (Eds), “Towards an HPSG
7, we also give the HPSG sign for yaktubu and yuktabu, Analysis of Maltese,” 1st International Conference
- active imperfect and passive imperfect form of kataba. on Maltese Linguistics, 2007.
There is a newly introduced feature is MOOD that take [4] Bond F., Oepen S., Siegel M., Copestake A., and
the value indicative in this case as well as VFORM Flickinger D., “Open source machine translation
change to imperfect. An imperfect-predicate denotes with DELPH-IN,” Open-Source Machine
the non-completion aspect of the event. Translation Workshop at the 10th Machine
Translation Summit, pp. 15-22, 2005.
5. CONCLUSION [5] Sells P., Lectures on Contemporary Syntactic
In this paper, we give the proposal how to capture Theories, Stanford: CSLI Publications, 1985.
nonconcatenative morphology, especially Arabic verb [6] Sag I., and Wasow T., Syntactic Theory: A Formal
morphology within the framework of HPSG. There are Introduction, Stanford: CSLI Publications, 1999.
lot of works to do in the future. To construct matrix from [7] Bird S., and Klein E., “Phonological Analysis in
table 1 we need to cope with a wide range of diversity Typed Feature Systems”, Computational
that an Arabic verb can take. Results will be immensely Linguistics, vol. 20, pp. 55-90, 1994.
helpful for the construction of resource grammar for [8] Beesley K., “Finite-State Morphological Analysis
languages with rich nonconcatenative morphology. and Generation of Arabic at Xerox Research: Status
and Plans in 2001”, ACL Workshop on Arabic
Language Processing: Status and Prospects, pp. 1–
REFERENCES 8, 2001.
[9] Smrž O., Functional Arabic Morphology. Formal
[1] Copestake A., and Flickinger D. “An open-source System and Implementation, PhD Dissertation,
grammar development environment and broad- Charles University in Prague, 2007.
coverage English grammar using HPSG,” Second [10] Riehemann S., A Constructional Approach to
conference on Language Resources and Idioms and Word Formation, PhD Dissertation,
Evaluation, 2000. Stanford University, 2001.
[2] Marimon M., Bel N., Espeja S., and Seghezzi N., [11] Riehemann S., “Type-Based Derivational
“The Spanish Resource Grammar: pre-processing Morphology,” Journal of Comparative Germanic
strategy and lexical acquisition,” ACL Workshop Linguistics, vol. 2, pp. 49-77, 1998.
on Deep Linguistic Processing, 2007.

You might also like