Professional Documents
Culture Documents
Pawan Goyal
CSE, IIT Kharagpur
Distributional Semantics
1 / 43
Introduction
1, 3, 4, . . .
Distributional Semantics
2 / 43
Introduction
1, 3, 4, . . .
I, III, IV, . . .
Distributional Semantics
2 / 43
Introduction
1, 3, 4, . . .
I, III, IV, . . .
What is Semantics?
Distributional Semantics
2 / 43
Introduction
1, 3, 4, . . .
I, III, IV, . . .
What is Semantics?
The study of meaning: Relation between symbols and their denotata.
Distributional Semantics
2 / 43
Introduction
1, 3, 4, . . .
I, III, IV, . . .
What is Semantics?
The study of meaning: Relation between symbols and their denotata.
John told Mary that the train moved out of the station at 3 oclock.
Distributional Semantics
2 / 43
Finding the underlying functional relations among various entities and events
Distributional Semantics
3 / 43
Computational Semantics
Distributional Semantics
4 / 43
Computational Semantics
Computational Semantics
The study of how to automate the process of constructing and reasoning with
meaning representations of natural language expressions.
Distributional Semantics
4 / 43
Computational Semantics
Computational Semantics
The study of how to automate the process of constructing and reasoning with
meaning representations of natural language expressions.
Distributional Semantics
4 / 43
Computational Semantics
Computational Semantics
The study of how to automate the process of constructing and reasoning with
meaning representations of natural language expressions.
Distributional Semantics
4 / 43
Computational Semantics
Computational Semantics
The study of how to automate the process of constructing and reasoning with
meaning representations of natural language expressions.
Distributional Semantics
4 / 43
Distributional Hypothesis
Distributional Semantics
5 / 43
Distributional Hypothesis
Distributional Semantics
5 / 43
Distributional Hypothesis
Distributional Semantics
5 / 43
Distributional Hypothesis
Distributional Semantics
5 / 43
Distributional Hypothesis
Distributional Semantics
5 / 43
Distributional Semantics
6 / 43
Distributional Semantics
6 / 43
Distributional Semantics
6 / 43
Contextual representation
A words contextual representation is an abstract cognitive structure that
accumulates from encounters with the word in various linguistic contexts.
Distributional Semantics
7 / 43
Contextual representation
A words contextual representation is an abstract cognitive structure that
accumulates from encounters with the word in various linguistic contexts.
Distributional Semantics
7 / 43
Contextual representation
A words contextual representation is an abstract cognitive structure that
accumulates from encounters with the word in various linguistic contexts.
Distributional Semantics
7 / 43
Contextual representation
A words contextual representation is an abstract cognitive structure that
accumulates from encounters with the word in various linguistic contexts.
Distributional Semantics
7 / 43
Distributional Semantics
8 / 43
Distributional Semantics
8 / 43
Alternative names
I
I
I
I
I
corpus-based semantics
statistical semantics
geometrical models of meaning
vector semantics
word space models
Distributional Semantics
8 / 43
Distributional Semantics
9 / 43
Vector Space
Distributional Semantics
10 / 43
Vector Space
Distributional Semantics
10 / 43
Vector Space
Distributional Semantics
10 / 43
Word Space
Small Dataset
An automobile is a wheeled motor vehicle used for transporting passengers .
A car is a form of transport , usually with four wheels and the capacity to carry around
five passengers .
Transport for the London games is limited , with spectators strongly advised to avoid
the use of cars .
The London 2012 soccer tournament began yesterday , with plenty of goals in the
opening matches .
Giggs scored the first goal of the football tournament at Wembley , North London .
Bellamy was largely a passenger in the football match , playing no part in either goal .
Distributional Semantics
11 / 43
Distributional Semantics
12 / 43
Distributional Semantics
12 / 43
Distributional Semantics
12 / 43
Count number of times the target word co-occurs with the context words:
co-occurrence matrix
Distributional Semantics
12 / 43
Count number of times the target word co-occurs with the context words:
co-occurrence matrix
Build vectors out of (a function of) these co-occurrence counts
Distributional Semantics
12 / 43
automobile
car
soccer
football
wheel
1
1
0
0
transport
1
2
0
0
passenger
1
1
0
1
tournament
0
0
1
1
Distributional Semantics
London
0
1
1
1
goal
0
0
1
2
match
0
0
1
1
13 / 43
2.5
football (0,2)
goal
1.5
soccer (0,1)
0.5
automobile (1,0)
car (2,0)
0
0
0.5
1.5
2.5
transport
Distributional Semantics
14 / 43
Computing similarity
automobile
car
soccer
football
wheel
1
1
0
0
transport
1
2
0
0
passenger
1
1
0
1
tournament
0
0
1
1
London
0
1
1
1
goal
0
0
1
2
match
0
0
1
1
car . soccer = 1
car . football = 2
soccer . football = 5
Distributional Semantics
15 / 43
Distributional Semantics
16 / 43
Distributional Semantics
16 / 43
Distributional Semantics
17 / 43
General Questions
How do the rows (words, ...) relate to each other?
How do the columns (contexts, documents, ...) relate to each other?
Distributional Semantics
17 / 43
Distributional Semantics
18 / 43
Distributional Semantics
19 / 43
Distributional Semantics
20 / 43
Words as contexts
Parameters
Window size
Window shape - rectangular/triangular/other
Distributional Semantics
21 / 43
Words as contexts
Parameters
Window size
Window shape - rectangular/triangular/other
Distributional Semantics
21 / 43
Words as contexts
Parameters
Window size
Window shape - rectangular/triangular/other
Distributional Semantics
21 / 43
Words as contexts
Parameters
Window size
Window shape - rectangular/triangular/other
Distributional Semantics
21 / 43
Distributional Semantics
22 / 43
1
|Di |
Distributional Semantics
22 / 43
1
|Di |
(NNj +0.5)
(Nj +0.5)
|Di |
k1 ((1b)+b avgDl
)+fij
1+log(1+log(fij ))
|D |
i
(1)+ avgDl
log N+1
Nj
Distributional Semantics
22 / 43
Distributional Semantics
23 / 43
basic intuition
word1
dog
dog
word2
small
domesticated
freq(1,2)
855
29
freq(1)
33,338
33,338
Distributional Semantics
freq(2)
490,580
918
23 / 43
basic intuition
word1
dog
dog
word2
small
domesticated
freq(1,2)
855
29
freq(1)
33,338
33,338
freq(2)
490,580
918
Association measures are used to give more weight to contexts that are
more significantly associted with a targer word.
Distributional Semantics
23 / 43
basic intuition
word1
dog
dog
word2
small
domesticated
freq(1,2)
855
29
freq(1)
33,338
33,338
freq(2)
490,580
918
Association measures are used to give more weight to contexts that are
more significantly associted with a targer word.
The less frequent the target and context element are, the higher the
weight given to their co-occurrence count should be.
Distributional Semantics
23 / 43
basic intuition
word1
dog
dog
word2
small
domesticated
freq(1,2)
855
29
freq(1)
33,338
33,338
freq(2)
490,580
918
Association measures are used to give more weight to contexts that are
more significantly associted with a targer word.
The less frequent the target and context element are, the higher the
weight given to their co-occurrence count should be.
Co-occurrence with frequent context element small is less informative
than co-occurrence with rarer domesticated.
Distributional Semantics
23 / 43
basic intuition
word1
dog
dog
word2
small
domesticated
freq(1,2)
855
29
freq(1)
33,338
33,338
freq(2)
490,580
918
Association measures are used to give more weight to contexts that are
more significantly associted with a targer word.
The less frequent the target and context element are, the higher the
weight given to their co-occurrence count should be.
Co-occurrence with frequent context element small is less informative
than co-occurrence with rarer domesticated.
different measures - e.g., Mutual information, Log-likelihood ratio
Distributional Semantics
23 / 43
PMI(w1 , w2 ) = log2
Pcorpus (w1 , w2 )
Pind (w1 , w2 )
Distributional Semantics
24 / 43
PMI(w1 , w2 ) = log2
PMI(w1 , w2 ) = log2
Pcorpus (w1 , w2 )
Pind (w1 , w2 )
Pcorpus (w1 , w2 )
Pcorpus (w1 )Pcorpus (w2 )
Distributional Semantics
24 / 43
PMI(w1 , w2 ) = log2
PMI(w1 , w2 ) = log2
Pcorpus (w1 , w2 )
Pind (w1 , w2 )
Pcorpus (w1 , w2 )
Pcorpus (w1 )Pcorpus (w2 )
freq(w1 , w2 )
N
freq(w)
Pcorpus (w) =
N
Pcorpus (w1 , w2 ) =
Distributional Semantics
24 / 43
Distributional Semantics
25 / 43
Distributional Semantics
25 / 43
Distributional Semantics
25 / 43
ij =
min(fi , fj )
fij
fij + 1 min(fi , fj ) + 1
Distributional Semantics
25 / 43
Distributional Semantics
26 / 43
Distributional Semantics
27 / 43
Distributional Semantics
27 / 43
Distributional Semantics
28 / 43
Distributional Semantics
28 / 43
Distributional Semantics
28 / 43
Distributional Semantics
28 / 43
Dimensionality Reduction
efficiency - sometimes the marix is so large that you dont want to construct
it explicitly.
Distributional Semantics
29 / 43
Dimensionality Reduction
efficiency - sometimes the marix is so large that you dont want to construct
it explicitly.
smoothing - capture latent dimensions that generalize over sparser
surface dimensions, synonym vectors may not be orthogonal.
Distributional Semantics
29 / 43
Distributional Semantics
30 / 43
A = UV T
Ak = Uk k Vk T
Pawan Goyal (IIT Kharagpur)
Distributional Semantics
31 / 43
SVD: An Example
Distributional Semantics
32 / 43
SVD: An Example
Sim(human,user) = 0.0, Sim(human,minors) = 0.0
Distributional Semantics
33 / 43
SVD: An Example
U=
Distributional Semantics
34 / 43
SVD: An Example
Distributional Semantics
35 / 43
SVD: An Example
V=
Distributional Semantics
36 / 43
SVD: An Example
Sim(human, user) = 0.94, Sim(human, minors) = 0.83
Distributional Semantics
37 / 43
Let X and Y denote the binary distributional vectors for words X and Y .
Similarity Measures
2|XY|
Distributional Semantics
38 / 43
Let X and Y denote the binary distributional vectors for words X and Y .
Similarity Measures
2|XY|
Distributional Semantics
38 / 43
Let X and Y denote the binary distributional vectors for words X and Y .
Similarity Measures
2|XY|
Distributional Semantics
38 / 43
Let X and Y denote the binary distributional vectors for words X and Y .
Similarity Measures
2|XY|
Distributional Semantics
38 / 43
Distributional Semantics
39 / 43
Similarity Measures
~
Cosine similarity : cos(~X,~Y) = ~X Y~
|X||Y|
Distributional Semantics
39 / 43
Similarity Measures
~
Cosine similarity : cos(~X,~Y) = ~X Y~
|X||Y|
Distributional Semantics
39 / 43
Similarity Measures
~
Cosine similarity : cos(~X,~Y) = ~X Y~
|X||Y|
Small exercise: Show that Euclidean distance gives the same kind of ranking
as cosine similarity.
Distributional Semantics
39 / 43
Distributional Semantics
40 / 43
Similarity Measures
p
p+q
Distributional Semantics
40 / 43
Synonyms
Two words are absolute synonyms if they can be inter-substituted in all
possible contexts without changing the meaning.
Distributional Semantics
41 / 43
Synonyms
Two words are absolute synonyms if they can be inter-substituted in all
possible contexts without changing the meaning.
Distributional Similarity
The distributional similarity of two words is the extent to which they can be
inter-substituted without changing the plausibility of the sentence.
Distributional Semantics
41 / 43
Attributional Similarity
The attributional similarity between two words a and b depends on the degree
of correspondence between the properties of a and b.
Ex: dog and wolf
Relational Similarity
Two pairs(a, b) and (c, d) are relationally similar if they have many similar
relations.
Ex: dog: bark and cat: meow
Distributional Semantics
42 / 43
Distributional Semantics
43 / 43
Distributional Semantics
43 / 43
Distributional Semantics
43 / 43
Distributional Semantics
43 / 43