You are on page 1of 29

WORDNET

Approach on word sense techniques


- AKILAN VELMURUGAN
What is WORDNET
Machine readable semantic dictionary interlinked by
semantic relations
Developed by PRINCETON University
Large lexical database for English language
Language forms a scale free network with small
average shortest path having words as nodes and
concepts as links


source: http://wordnet.princeton.edu/
Use of wordnet
Easily navigable
Used as online dictionary for English
Freely for public availability
structure to show relations in the form of
- noun, verb, adjective, adverb
- synonymn
- hypernym (Is a kind of )
- hyponym ( is a kind of)
- troponym (particular ways to )
- meronym (parts of . . .)
WORDNET Application
source: http://wordnet.princeton.edu/
Few representations of WORDNET

Schema representation
Graph Theory
Tree structure
Force graph structure
wordnet explorer

Visual Interface for wordnet


Using RDF Schema and OWL ontology

Wordnet classes and properties are
represented as wn:word and wn:wordsense




Source: www.w3.org/.../WNET/wordnet-sw-20040713.html
Source: www.w3.org/.../WNET/wordnet-sw-20040713.html
Represented using Graph theory

can be directed or un-directed graph


Source: www. nodebox.net/code/index.php/Graph
Source: www. nodebox.net/code/index.php/Graph
Represented using Tree sturucture

uses tokens and lexical relations


Source: www. docs.huihoo.com/nltk/0.9.5/en/ch02.html
Source: www. docs.huihoo.com/nltk/0.9.5/en/ch02.html
Represented using Force Graph Structure

Presentation of words and meanings as graph
nodes, and relations as edges between them

Source: www. code.google.com/p/synonym/
Source: www. code.google.com/p/synonym/
Represented for WORDNET Explorer

For applying visual principles to Lexical
semantics

Source: www.cs.toronto.edu/~ccollins/research/wnVis.htm
Source: www.cs.toronto.edu/~ccollins/research/wnVis.htm
Flow of study
Background study on wordsense
word ontology
Word Sense Disambiguation
Variable lexical notation for a concept
i-level generic notation
i-level specific notation
Semantic relatedness in WSD
Experiment Results
Thesaurus as a complex network

Visual Interface for wordnet




WORDNET synsets
word ontology set
algebra rules for
representing lexical
notations semantic
relatedness between
concepts concept
distribution statistics
Degree of semantic
relatedness :: WSD
Word Sense
Disambiguation semcor
Test cases WSD on a
complex network WSD
in English Thesaurus
Future work
Source: http://kylescholz.com/projects/wordnet
Wordnet common sense ontology
Symbols are words
Concept meanings are synsets
Represented by one or more wods
Words used for representation: synonymns
Synonyms and polysemous word
Synset comprises a list of words and a list of
semantic relations between other sysnsets.
Part I list of words each one with a list of synsets that
the word represents
Part II set of semantic relations between synsets(is-a,
part-of, substance-of, member-of)

WSD: variable lexical notations for a concept
Generic concept
notation:
D = I J K
J = D (I K)
= (D I )(D K)
= D (I K)
J = D ( I K)

since, B = D E F
D = B (EF)
=(B E)(B F)
= B(E F)
D =B (E F)

Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications




WSD: variable lexical notations for a concept
J = D ( I K)
=( B(E F) )( I
K)
J = B( (E F)( I
K) )
when J = fly,
D = fish lure
I = spinner
k = troll
And introducing boolean
operators,
AND for
OR for
NOT for





Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications
WSD: variable lexical notations for a concept

(fly) becomes :
(fisherman's lure OR fish lure)
AND ( (NOT spinner) AND (NOT
troll) )
then B = lure,
E = ground bait,
F = stool pigeon

(fly) becomes :
(bait OR decoy OR lure) AND (
((NOT
ground bait) AND (NOT
stoolpigeon) AND((NOT
spinner)AND(NOT troll)) )

Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications
Notation for synset
i-level generic notation for a synset
If S
k
is a synset, F
i
is the synset that
is located i links away following the
hypernym links from S
k
then the i-
level generic notation for S
k
is:





Note: F
i
is the parent node of F
i-
1
, F
i-1
is the parent node of F
i-2


i-level specific notation for a synset
J = P Q R
when, P = T
Q = U
R = V W
J = T U (V W)
If S is a synset, L
i
is the set of
synsets, C
ik
that are located i
links away following the
hyponym links from S, then the
i-level specific regular notation
for S is:


Note: if C
ik
is null, then C
(i-1)k
would
be used (C
(i-1)k
is a leaf node in the
case)
Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications
Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications
WSD: Semantic relatedness and word sense disambiguation
Procedure for determining the semantic relatedness
of two given wordnet synsets
Conception 1: Concepts that appear more frequently
and closer with each others are "more related" to
each others than the concepts that appear less
frequently and farther are.

Conception 1 Synset relatedness measurement
concepts Synset lexical notation
close or far of
appearance
Exists in a web page or not
co-occurance frequency Number of web pages containing synsets
Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications
WSD: Semantic relatedness and word sense disambiguation
Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications
WSD: Tested for four random texts
i-level generic notation ( 1, 2, 3 )
Size of windows of context: Target words Vs Context words ( 3, 5, 7 )
Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications
Thesaurus as a complex network
As a Directed Graph:
sink composed of the 73,046
terms with kout = 0
source are the 30,260 terms
with at least one outgoing link
(kout > 0) Root words
absolute source : without incoming
links kin = 0
normal source : (kout > 0 and kin >
0)
bridge source : without outgoing
links to root words (kout(source) =
0)
1 Normal source
2 Bridge source
3 Absolute source
4 sink
Source: arXiv:cond-mat/0312586 v1 2003
Thesaurus as a complex network
Frequency of outgoing
links
Frequency of incoming
links
Source: arXiv:cond-mat/0312586 v1 2003
Thesaurus as a complex network
Incoming Vs Outgoing Frequency Frequency distribution
K
out
for root words
K
in
for all words

- Root words in K
out

- All words in K
in

- Root words in K
in

- Non root words in K
in

Extension of wordnet
Transforming a Tree structure to a Matrix structure
Wordnet in other languages (japanese, korean, Thai)
Imagenet interlinked with wordnet
REBUILDER a repository of software designs
Retrieves using bayesian network and wordnet

You might also like