Professional Documents
Culture Documents
Stefan Evert
Universitat Politcnica de
Catalunya, Barcelona, Spain
University of Osnabrck,
Germany
gboleda@lsi.upc.edu
stefan.evert@uos.de
Stefan Evert
What is
association?
VP [die]
Applications
Cooccurrences
Randomness
assumption
NP [the bucket]
NP
Measures of
association
The best AM
The best AM:
theoretical
The best AM:
intuitive
The best AM:
empirical
old
projects
Developing new
AMs
Conclusion &
[kick]
Det
kick
the
proverbial
bucket
(from http://www.museoffire.com/tutorials.html)
A note on terminology
empirical collocations
significant cooccurrence
(Firth, Sinclair, )
semi-compositional
pairs
lexical
phraseology
& lexicography
collocations
(e.g. Hausmann)
collocations
lexicalised expressions
non-compositional
or
multiword
otherwise
idiosyncratic
expressions
(NLP, e.g. Choueka)
collocation
is a confusing notion
at the heart of the
MWE debate
figurative
expressions
lexical
collocations
light verbs
(SVC, FVG)
complex lexical
items (MWU)
multiword
expressions
English
noun compounds
named
entities
particle verbs
(VPC)
institutionalised
phrases & clichs
(multiword)
terminology
6
compositional syntax
semi-compositional
opaque
idiom
compositionality
semantic dimension
pragmatic components
decomposable
metaphor
rigid
limited variability
MWU
flexibility
syntactic dimension
LWC
morphosyntactic
preferences
semi-fixed construction
n-gram
productive MWE
pattern
substitutability
lexical dimension
selectional
restrictions
partly
determined
Scales of MWE-ness
completely
determined
(no substitution)
figurative
expressions
lexical
collocations
light verbs
(SVC, FVG)
complex lexical
items (MWU)
multiword
expressions
English
noun compounds
named
entities
particle verbs
(VPC)
institutionalised
phrases & clichs
(multiword)
terminology
8
Collocations of bucket
noun
water
spade
plastic
size
record
slop
mop
ice
bucket
seat
coal
density
brigade
sand
algorithm
shop
container
champagne
shovel
oats
idiom
local MI
verb
183
31
36
41
38
14
16
22
18
21
16
11
10
12
9
17
10
10
7
7
1023.77
288.11
225.83
195.89
163.95
162.62
155.47
125.76
125.49
89.21
77.25
63.64
62.31
61.32
60.77
59.49
59.10
56.79
56.50
54.93
throw
fill
empty
randomize
hold
put
carry
tip
kick
chuck
use
weep
pour
take
fetch
get
douse
store
drop
pick
compound
technical
local MI
36
30
14
9
31
37
26
10
12
7
31
7
9
42
7
46
4
7
10
11
168.87
139.45
96.73
96.11
78.93
77.96
71.95
59.30
59.28
44.85
42.31
41.73
40.73
37.57
35.13
34.73
33.03
31.82
31.49
28.89
lex. coll.
Collocations of bucket
adjective
large
single-record
full
cold
small
galvanized
ten-record
empty
old
steaming
clean
leaky
wooden
bottomless
galvanised
big
iced
warm
hot
pink
semantic effects
local MI
37
5
21
13
21
4
3
9
20
4
7
3
6
3
3
12
3
6
6
3
114.79
64.53
63.23
55.52
45.61
43.47
40.17
38.41
35.67
31.89
27.47
25.91
25.50
25.17
24.70
23.86
22.62
19.55
17.05
11.15
facts of life
Multiword extraction
14
Online bibliographies
MWE project, Stanford (ca. 2001)
Idioms & Collocations in German, Berlin (ca. 2006)
Help us build new resources at http://multiword.sf.net/
16
semantic
interpretation
multiword
extraction
MWE
detection
token
recognition
compositionality
morphosyntactic
preferences
variability &
modifiability
17
Approaches: compositionality
Related to token recognition and WSD
machine learning approaches are promising
INTENSIFIER(smoker)
= heavy
Semantic interpretation
formalisation of non-compositional meaning aspects still unclear
no direct comparison of current approaches possible
26
Questions?