You are on page 1of 21

JIMMA UNIVERSITY

INSTITUTE OF TECHNOLOGY
DEPARTMENT OF INFORMATION TECHNOLOGY

POST GRADUATE PROGRAM


GRADUATE SEMINAR (CMIT 6127)

WORD SENSE DISAMBIGUATION

By: WORKINEH TESEMA


Email: workineh.tesema@ju.edu.et / workinatesema@yahoo.com
ID: Msc00345/06
Advisor: Mr. DEBELA T.

Jimma University
Jimma
May 22, 2014

ii

MASTER OF SCIENCE

IN
INFORMATION TECHNOLOGY

2014 A.Y

POST GRADUATE PROGRAM

GRADUATE SEMINAR (CMIT 6127)

Acknowledgment
I would like to thanks gratefully and acknowledge my advisor Mr. Debela Tesfaye for all his
guidance at every step of the way, for patiently listening to me even during uncountable hours,

ii

for teaching me how to do seminar, for imparting so much valuable knowledge and for all his
encouragement and words of kindness. Any noteworthy aspects of this seminar are entirely
because of him; all faults are in spite of him.

Table of Content

Table of Contents

Page

Acknowledgment.............................................................................................................................ii
Acronyms........................................................................................................................................iv

ii

Operational Definitions...................................................................................................................v
Abstract...........................................................................................................................................vi
1. Introduction..................................................................................................................................1
2. Literature Review........................................................................................................................4
2.1. Introduction...........................................................................................................................4
2.2. Related Works.......................................................................................................................6
3. Methodology...............................................................................................................................9
3.1. Algorithm..............................................................................................................................9
3.2. Evaluation...........................................................................................................................10
4. Research Gap Identification ......................................................................................................11
5. Conclusion ................................................................................................................................11
6. Future Work...............................................................................................................................12
Reference............................................................................................................................13
Appendix A........................................................................................................................14

Acronyms
CBC: Cluster By Committee
IDLE: Interactive DeveLopment Environment
MT: Machine Translation

ii

NLP: Natural Language Processing


WSD: Word Sense Disambiguation

Operational Definitions
Word Sense Disambiguation: Word sense disambiguation is the process of
automatically figuring out the intended meaning of such a word when used in a sentence

ii

Natural Language Processing is the branch of artificial Intelligence or computer science


focused on developing systems that allow computers to communicate with people using
everyday language.

ii

Abstract
All human languages have words that can mean different things in different contexts. Word sense
disambiguation is the process of automatically figuring out the intended meaning of such a word
when used in a sentence. This seminar paper deals with word sense disambiguation (WSD).
WSD has obvious relationships to other fields such as lexical semantics, whose main endeavour
is to define, analyze, and ultimately understand the relationships between word, meaning,
and context.
After making intensive review on the related works we picked a paper that follow corpus based
statistical approach. More specifically, we extracted lexical features of the target word to learn if
there is any ambiguity or not and if the word is ambiguous we applied clustering algorithm to
disambiguate the word. The method relied on the text data extracted from Wikipedia. we have
implemented and evaluated the algorithm and obtained a promising result comparable to the
result reported by the original work. We assumed that the evaluation result difference observed
might be resulted from employing different test and training set.

1.Introduction
The goal of natural language processing is to teach the computer or machine, human language.
Natural language processing here refers to the use and ability of systems to process sentences in
a natural language such as English, rather than in a specialized artificial computer language such
as C++. Occasionally the phrase natural language is used not for actual languages as we are used
in ordinary, such as our actual use of English to communicate in everyday life, but for a more
restricted subset of such a human language, one purged of constructions and ambiguities that
computers could not sort out. Hence one writer states that human languages allow anomalies that
natural languages cannot allow. There may be a need for such a language, but a natural language
restricted in this way is artificial, not natural. In today computation the issue of word sense is
creating great problem in dealing with natural language processing.
Ambiguity is crucial problem in natural language processing.

Natural language is highly

ambiguous and must be disambiguated. Ambiguities compound to generate enormous number of


possible interpretations. Ambiguity is the primary difference between natural and computer
languages. Formal programming languages are designed to be unambiguous, i.e. they can be
defined by a grammar that produces a unique parse for each sentence in the language. Having a
unique linguistic expression for every possible conceptualization that could be conveyed would
make language overly complex and linguistic expressions unnecessarily long.
At any level of natural language be it like phonology, morphology, syntax, semantic, and
discourse the problem of ambiguity definitely will happen. Any word according to its context has
different or multiple sense. Some words have multiple meanings. Sometimes two completely
different words are spelled the same but different senses in the sentences. According to Jurafsky
and Martin (1999) all natural languages contain words that can mean different things in different
contexts. In English, for example, the word bank can refer to a financial institution that accepts
deposits and channels the money into lending activities or sloping land (especially the slope
beside a body of water). Such words with multiple meanings are potentially ambiguous, and the
process of deciding which of their several meanings is intended in a given context is known as
Word Sense Disambiguation (WSD).

14

Word sense disambiguation (WSD) can also be defined as the process of identifying the correct
sense or meaning of a word in particular context. Human beings are especially good at this. For
example, given the sentence a financial institution that accepts deposits and channels the money
into lending activities, it is immediately apparent to us that bank here refers to the financial
institution, whereas given the bank sloping land (especially the slope beside a body of water) we
know that bank here means the sloping land(mouth of the river). Making computers to
understand natural languages is challenged by many problems, word sense ambiguity being one
of them and it is very difficult for computers to achieve this same feat. Although computers are
best at following fixed rules, it is impossible to create a set of simple rules that would accurately
disambiguate any word in any context. This difficulty stems from the fact that natural languages
themselves seldom follow hard and fast rules.
One of the first problems encountered by any natural language processing system is that of
lexical ambiguity, be it syntactic or semantic. The problem of resolving semantic ambiguity is
generally known as word sense disambiguation and has proved to be more difficult than syntactic
disambiguation. The problem is that words often have more than one meaning, sometimes fairly
similar and sometimes completely different. The meaning of a word in a particular usage can
only be determined by examining its context. This is, in general, a trivial task for the human
language processing system. However, the task has proved to be difficult for computer and some
have believed that it would never be solved. However, there have been several advances in word
sense disambiguation and we are now at a stage where lexical ambiguity in text can be resolved
with a reasonable degree of accuracy.
There are no decisive ways of identifying where one sense of a word ends and the next begins,
and this is at the core of what makes word sense disambiguation hard. Another problem is the
challenge in deciding what the senses are.
Since WSD involves the association of a given word in a text or discourse with a definition or
meaning which is distinguishable from other meanings potentially attributable to that word, it
usually involves two steps (Ide and Veronis ,1998). The first step is to determine all the different
senses for every word relevant to the text or discourse under consideration, i.e., to choose a sense
inventory, e.g., from the lists of senses in everyday dictionaries, from the synonyms in a
thesaurus, or from the translations in a translation dictionary. The second step involves a means

14

to assign the appropriate sense to each occurrence of a word in context. All disambiguation
works involves matching the context of an instance of the word to be disambiguated either with
information from external knowledge sources or with contexts of previously disambiguated
instances of the word. For both of these sources we need preprocessing or knowledge extraction
procedures representing the information as context features. For some disambiguation tasks,
there are already well-known procedures such as morphosyntactic disambiguation and therefore
WSD has largely focused on distinguishing senses among homographs belonging to the same
syntactic category.
Natural language processing (NLP) involves resolution of various types of ambiguity. Lexical
ambiguity is one of these ambiguity types, and occurs when a single word (lexical form) is
associated with multiple senses or meanings. For applications which are sensitive to semantic
denotation, or more precisely lexical semantics, this ambiguity type can pose a major obstacle.
WSD has obvious relationships to other fields such as lexical semantics, whose main endeavor is
to define, analyze, and ultimately understand the relationships between word, meaning, and
context. But even though word meaning is at the heart of the problem, WSD has never really
found a home in lexical semantics. It could be that lexical semantics has always been more
concerned with representational issues (Lyons 1995) and models of word meaning and polysemy
so far too complex for WSD (Cruse 1986; Ravin and Leacock 2000). And so, the obvious
procedural or computational nature of WSD paired with its early invocation in the context of
machine translation (Weaver 1949) has allied it more closely with language technology and thus
computational linguistics. In fact, WSD has more in common with modern lexicography, with its
intuitive premise that word uses group into coherent semantic units and its empirical corpus
based approaches, than with lexical semantics (Wilks et al. 1993).
Automatic word sense disambiguation can play an important role in the field of machine
translation. Accurate word sense disambiguation can also lead to better results in information
retrieval. For example given a query that consists of the words financial bank we would like to
return documents that contains information about financial and their call and not about the slope
beside a body of water.
Generally, the goal of natural language processing is to help computers learn human languages.
Natural Language Processing can bring powerful enhancements enabling the computers to

14

communicate with humans using natural languages. As one of the efforts, the main focus of this
seminar is to resolve ambiguity in natural language. The systems are trained to perform the task
of word sense disambiguation.

2. Literature Review
2. 1 Introduction
Natural Language Processing is the branch of Artificial language focused on developing systems
that allow computers to communicate with people using everyday language. It is also called
Computational Linguistics which concerns how computational methods can aid the
understanding of human language. The primary aim of natural language processing is to make
communication between humans and computers. However, during the communication there is a
problem of ambiguity. Among these, word sense ambiguity. As explained in the introductory
part, the resolution of such ambiguity, is commonly termed as word sense disambiguation
(WSD) (Jurafsky & Martin, 1999).
The trouble with word sense disambiguation is word senses. There are no decisive ways of
identifying where one sense of a word ends and the next begins. This makes the definition of the
WSD task problematic. Word meaning is in principle infinitely variable and context sensitive. It
does not divide up easily into distinct sub-meanings or senses. Lexicographers frequently
discover in corpus data loose and overlapping word meanings, and standard or conventional
meanings extended, modulated, and exploited in a bewildering variety of ways (Kilgarriff 1997;
Hanks 2000).
WSD approaches can be categorized in to several classes based on various criteria. In this paper
we organized them as knowledge intensive or shallow approaches (the once relying on free text
or corpora) based on the level and type of information employed. Further classification can also
made as supervised, unsupervised or hybrid based on the level of human involvement.
The knowledge intensive approaches usually employee information obtained from WordNet,
dictionaries, thesauri, or concept hierarchies and other corpuses tagged with sense of the
constituent words (Wanton and Llavori, 2012; Banerjee, 2002; Ide &Veronis, 1998; Ellen
14

(1993); Ying Liu (2002) Ying Liu in his work titled Using WordNet to Disambiguate Word
Senses argued that WordNet can be used to extract the target noun by visiting synonym set in
WordNet. To this end Ying Liu (2002) counted the frequency of synonym sets visit to select the
most visit one as the sense of the target word in the given context.
The main drawback of such approaches is that the resources required to annotate the words are
hand-built (by humans) and are therefore expensive to acquire and maintain. This inevitably
leads to knowledge acquisition bottlenecks when attempting to handle larger amounts of text,
new domains or even new languages.
In corpus based approaches, the information required for the disambiguation is gained from the
training on a given corpus. As mentioned above, this approach can be further classified into two
subclasses based on the level of human involvement in training the algorithm on the corpus as:
supervised and unsupervised learning approaches.
In supervised learning approaches, a sense disambiguation algorithm is learned from a
representative set of labeled instances drawn from the same distribution as the test set to be used.
This is a straightforward application of the supervised learning approach to creating a classifier.
In such approaches, a learning system is presented with a training set consisting of featureencoded inputs along with their appropriate label, or category. The output of the system is a
classifier system capable of assigning labels to new feature-encoded inputs (Jurafsky &
Martin,1999).
In contrast to the supervised approaches, the unsupervised approaches in sense disambiguation
eschew the use of sense tagged data of any kind during training. In these approaches, featurevector representations of unlabeled instances are taken as input and are then grouped into clusters
according to a similarity metric. These clusters can then be represented as the average of their
constituent feature-vectors, and labeled by hand with known word senses. Unseen featureencoded instances can be classified by assigning the word sense from the cluster to which they
are closest according to the similarity metric (Jurafsky & Martin,1999).

14

2.2. Related Works


A great deal of work is concerned with the disambiguation of word senses in several Languages
relying on several approaches. As mentioned in section 2.1 the WSD approaches can be
categorized based on the level of human involvement as Supervised or unsupervised or hybrid.
Another similar classification can also be made based on the nature of resources or corpus
utilized: a) Knowledge intensive approaches utilizing information available in structured
resources like WordNet or Thesaurus b) approaches relying on the patterns learned from free
texts like Wikipedia c) the hybrid approaches employing the combination of both approaches..
Most common works in WSD uses supervised approaches which require texts annotated with
some linguistics features derived from Structured or unstructured resources. The approaches
relying on the unstructured text for the disambiguation of word senses mostly use the
information obtained from Wikipedia. The majority of the approaches in this category relay on
the annotation attributes derived from Wikipedia. Wikipedia is a free online encyclopedia,
representing the outcome of a continuous collaborative effort of a large number of volunteer
contributors. The resource is very appropriate for the task of sense disambiguation because of its
larger size, multilingualism and easy of accessibility. Virtually any Internet user can create or edit
a Wikipedia web page, and this freedom of contribution has apositive impact on both the
quantity (fast-growing number of articles) and the quality (potential mistakes are quickly
corrected within the collaborative environment) of this online resource. Wikipedia editions are
available for more than 200 languages, with a number of entries varying from a few pages to
more than one million articles per language (RadaMihalcea, 2002). Mihalcea (2002) for instance,
pointed out that there are cases when Wikipedia makes different or finer sense distinctions than
even more structured and expensive resources like WordNet.
Some promising works like Mihalcea (2002) extracted on Wikepidia to disambiguate the sense
of a given word employing Naive Bayes classifier. He extracted word features used to for the
classification from Wikipedia text. The features which are used as the basis for the classification
were the sequence of words appearing both before and after the term to be disambiguated. Such
co-occurring words were extracted by using a sliding window of 6 (three before and after the
current word). The study shows that Wikipedia sense annotations can be used to build a word
sense disambiguation system leading to a relative error rate reduction of 3044% as compared to
14

simpler baselines. However, since Wikipedia is a free online encyclopedia, anybody who have
Internet access can edit or delete the data which results in reduction in the quality of the corpus
affecting the word sense disambiguation system performance.
Some works demonstrated that word senses can be disambiguated using the information obtained
from electronic dictionary (Ide &Veronis, 1998; Ellen (1993). The work proved that the correct
sense of a given word can be learned using word definitions extracted from electronic dictionary.
Ide &Veronis (1998) work combines the characteristics of both dictionary (WordNet) and
structured semantic network which provide definitions for the different senses of words to learn
the senses. The basic assumption behind the approach is that words having multiple senses will
have distinct features and multiple definitions which can be learned from electronic dictionary.
Ellen (1993) also relied on the information obtained from WordNet for WSD and pointed out that
the more common words are tend to be appear in the larger synonym sets in WordNet. It is
precisely those nouns that actually get used in documents and are most likely to have many
senses and synonyms. The basic problem addressed in Ellen (1993) study is to detect polysems
and synonyms. According to the study, the of polysems and synonyms are dealt with assigning
different senses of a word to different concept identifiers and assigning the same concept
identifier to synonyms. Ellen concluded that the algorithm used is not sufficient to reliably select
the correct sense of a noun from the set of sense disambiguation in WordNet. Despite the
achievement, the algorithim is only limited to disambiguating nouns.
Banerjee(2002) also used dictionary definition and the information extracted from the famous
electroninc database, WordNet. Accordingly ,given a target word to be disambiguated and a few
surrounding words that provide the context for the word, they have used the definitional glosses
of these words and extracted words related to them through various relations defined in
WordNet. Since the algorithm is dependent on the glosses of the words in the context window,
words that do not occur in any synonym set in WordNet are ignored. This rules out function
words like the, of, an, etc and also most proper nouns. If the words exist to the left or right of the
target word are not enough, they add additional words from the other direction. This is an
attempt to provide roughly the same amount of data for every target word.
Some promising works like Wanton and Llavori(2012)

used clustering algorithim to

disambiguate word senses. Their approach not only able to disambiguate all words in a sentence

14

but also reveals the implicit relationships (not asserted in WordNet) existing among these word
senses and the clustering algorithm is used to contextually group word senses according to their
representations.

14

3. Methodology
The approach we followed in this work is unsupervised corpus-based methods of word sense
discrimination and do not rely on external knowledge sources such as machine readable
dictionaries, concept hierarchies or sense-tagged text. Our justification for selecting the approach
is based on the fact that unsupervised approach are extremely cheap to implement in terms of
resource requirement. The counter part of the unsupervised approach namely the supervised
approach relays on huge and expensive resources like WordNet which are very hard to acquire,
modify and develop it for new languages. It is also quite obvious that such resources are
developed only for few languages in the word. The method followed in this seminar work is
therefore corpus based method which doesnt require the expensive resources. The algorithm
first gathers all the contexts of a given word and then cluster them by using Cluster BY
Committee (CBC) algorithm.

The clusters are labeled using frequency information and the

number of clusters produced is considered as the number of the senses of the target word.
CBC takes a word type as input, and finds clusters of words that represent each sense of the word
(Lin and Pantel 2002). The clusters will be made up of synonyms or words that are related to the
discovered senses. For example, if chair is input, CBC produces two sets: the first including
president, while the second includes house chair that used in the house.

3.1. The algorithm


Given a target Word
1. Extract all the context of the target word
2. Cluster the contexts
3. Label/name the clusters
4. Produce the clusters together with the labels as the senses of the target Word
Table 1: the algorithm of the approach

14

The premise behind the approach is that words having multiple senses or ambiguous words have
different distinct groups of contexts. Usually such contexts are expressed in terms of words.
Therefore one can identify senses of a given word by retrieving all the words occurring in all the
contexts of the words and group the words depending on the contexts. Lets consider our bank
example again to illustrate the idea. The term bank has at least 2 contexts: financial institution vs
portion of land. The first sense is expressed with a certain set of words like finance, money,
deposit, interest, loan etc where as the second sense is expressed by completely different sets of
words which include land, river, water, sand etc.. One possible way of disambiguating the senses
of a given word can therefore be achieved by identifying such distinct sets of words which can
be obtained by extracting words co-occurring with the term followed by clustering them. As
depicted in the above table (table 1), the approach extract all the contexts of a given word. To
this end we relied on the co-occurrence information to retrieve the contexts. Accordingly all the
words appearing with the target word in a fixed sized window of words is considered as context.
The extracted words are grouped and labeled. Each group is considered as a sense of the target
word.

3.2. Evaluation

We used a portion of Wikipedia text to train and evaluate the algorithm. The system is therefore
made to identify the sense of certain sets of word from Wikipedia. To this end, we deliberately
selected 10 words 6 ambiguous and the remaining 4 unambiguous words and provided to the
system. We then manually evaluated the result produced by the system. The task is basically
clustering of the 10 words in to the correct classes: ambiguous or unambiguous. Our strategy to
evaluate the performance is therefore counting the number of words assigned to the correct class
ambiguous or unambiguous class. Accordingly out of the 6 ambiguous words 3 of them are
correctly classified as ambiguous giving the performance level of 51%. Similarly out of the 4
unambiguous words 3 of them are correctly classified as unambiguous giving the performance
level of 49%. The average performance level is therefore 50%.

14

One can also evaluate the performance of the algorithm by counting the number of senses
produced for a given word and the correctness of the label produced by the system which is
simply out of the scope of our current work

4. Research Gap Identification


Given the difficulty level of the task, one cannot expect to have a full-fledged WSD system from
such a small project. We have then identified the following major Gap to be filled in the future
works:
1. The good performing approach utilizes the information obtained from expensive
resources like WordNet. Only few languages have WordNet which largely developed by
human power and is very expensive to construct for new language. Our basic future
research question is therefore how to enhance the performance of WSD without relying
on WordNet for languages that do not have the resource?
2. What other factors other than frequency are helpful to accuratly extract words expressing
all the contexts of a given word?

5. Conclusion
In computational linguistics, word-sense disambiguation (WSD) is an open problem of natural
language processing, which governs the process of identifying which sense of a word (i.e.
meaning) is used in a sentence, when the word has multiple meanings. The solution to this
problem impacts other computer-related writing, such as discourse, improving relevance of
search engines, anaphora resolution, coherence, inference.
The role of this seminar also to solve word-sense disambiguation (WSD) in natural language.
algorithm used in this work is known as Cluster By Committee(CBC). We relied Wikipedia
corpus to train the algorithm. The system take input from user and return the senses of the word
as an output. We have evaluated the system and obtained a promising result.

14

6. Future Work
Word sense is young especially in local language like Ethiopians language a lot of works are yet
to be done in the future.

In our future work, we are planning to develop WSD for local

languages.

Reference

14

Andres Montoyo(et.al)( 2005) Combining Knowledge- and Corpus-based Word-Sense


Disambiguation Methods. University of Alicante, Spain.
Banerjee, Satanjeev & Ted Pedersen (2003). Extended gloss overlaps as a measure of semantic
relatedness. Proceedings of the 18th International Joint Conference on Artificial
Intelligence (IJCAI),Acapulco, Mexico.
Cruse, D. Alan. 1986. Lexical Semantics. Cambridge, UK: Cambridge University Press.
Daniel Jurafsky and James H. Martin (1999) Speech and Language Processing: An Introduction
to Natural Language Processing, Computational Linguistics and Speech Recognition.
Prentice Hall, Englewood Cliffs, New Jersey, India
Ellen M. Voorhees(1993). Using WordNet to Disambiguate Word Senses for Text
Retrieval.SIGIR.
Ide, Nancy & Jean Vronis(1998). Word sense disambiguation: The state of the art.
Computational Linguistics.
Kilgarriff, Adam & Martha Palmer(2000). Introduction to the special issue on Senseval.
Computers and the Humanities.
Lin& Patrick Pantel(2002). Concept discovery from text. Proceedings of the 19th International
Conference on Computational Linguistics (COLING), Taipei, Taiwan
Lyons, John(1995). Linguistic Semantics: An Introduction. Cambridge, UK: Cambridge
University Press.
Mihalcea(2002). Bootstrapping large sense tagged corpora. In Proceedings of LREC 2002,
Canary Islands, Spain.
Weaver, Warren(1949). Translation. Mimeographed, 12 pp. Reprinted in William
N. Locke & Donald A. Booth, eds. 1955. Machine Translation of Languages. New York:
John Wiley & Sons.
Y. Wilks and M. Stevenson(1997). The Grammar of Sense: using part-of-speech tags as a first
step in semantic disambiguation.Journal of Natural Language Engineering.
Ying Liu(2002) .Using WordNet to Disambiguate Word Senses: Electrical and Computer
Engineering

14

APPENDIX A

The tools are:


Python 2.5 version

- 2.7 version

14

You might also like