Professional Documents
Culture Documents
Application #2:
Lexicography Corpus Linguistics
Application #2:
Collocations Collocations
Collocations Collocations
Defining a collocation Defining a collocation
Krishnamurthy Krishnamurthy
Calculating
collocations
Corpora for lexicography Calculating
collocations
Corpus Linguistics Practical work I Can extract authentic & typical examples, with Practical work
1 / 28 2 / 28
Application #2:
Collocations & colligations Corpus Linguistics
Application #2:
Collocations Collocations
Collocations Collocations
Defining a collocation Defining a collocation
Calculating Calculating
two (or more) lexical items collocations A colligation is a slightly different concept: collocations
I The meaning tends to be more than the sum of its parts words (e.g., determiners)
These are extremely hard to define by intuition: Colligations often create noise in a list of collocations
I Pro: Corpora have been able to reveal connections I e.g., this house because this is so common on its own,
previously unseen and determiners appear before nouns
I Con: It may not be clear what the theoretical basis of I Thus, people sometimes use stop words to filter out
collocations are non-collocations
I Pro & Con: how do they fit into grammar?
3 / 28 4 / 28
Application #2:
What a collocation is Corpus Linguistics
Application #2:
Collocations Collocations
Collocations Collocations
Defining a collocation
Krishnamurthy
Collocations are expressions of two or more words that are Defining a collocation
Krishnamurthy
5 / 28 6 / 28
Prototypical collocations Corpus Linguistics
Application #2:
Compositionality tests Corpus Linguistics
Application #2:
Collocations Collocations
Collocations Collocations
Defining a collocation Defining a collocation
Calculating Calculating
Prototypically, collocations meet the following criteria: collocations with corpus data collocations
7 / 28 8 / 28
Application #2:
Semantic prosody & preference Corpus Linguistics
Application #2:
Collocations Collocations
Collocations Collocations
Defining a collocation Defining a collocation
9 / 28 10 / 28
Application #2:
Notes on a collocations definition Corpus Linguistics
Application #2:
Krishnamurthy 2000 Collocations Krishnamurthy 2000 Collocations
Collocations Collocations
Defining a collocation Defining a collocation
Krishnamurthy Krishnamurthy
Firth 1957: You shall know a word by the company it keeps Calculating Calculating
collocations
We often look for words which are adjacent to make up a collocations
I Collocational meaning is a syntagmatic type of Practical work Practical work
collocation, but this is not always true
meaning, not a conceptual one
I e.g., computers run, but these 2 words may only be in
I e.g., in this framework, one of the meanings of night is
the same proximity.
the fact that it co-occurs with dark
We can also speak of upward/downward collocations:
Example: ass is associated with a particular set of adjectives
(think of goose if you prefer) I downward: involves a more frequent node word A with
a less frequent collocate B
I silly, obstinate, stupid, awful
I upward: weaker relationship, tending to be more of a
I We can see a lexical set associated with this word
grammatical property
Lexical sets & collocations vary across genres, subcorpora,
etc.
11 / 28 12 / 28
Corpus linguistics Corpus Linguistics
Application #2:
Calculating collocations Corpus Linguistics
Application #2:
Krishnamurthy 2000 Collocations Collocations
Collocations Collocations
Calculating
collocations
I Two words appearing together a lot are a collocation Calculating
collocations
(Slides 1430 are based on Manning & Schutze (M&S) 1999)
13 / 28 14 / 28
Application #2:
POS filtering (2) Corpus Linguistics
Application #2:
Collocations Collocations
Collocations Collocations
Defining a collocation Defining a collocation
Calculating Calculating
use a POS filter (Justeson and Katz 1995) collocations collocations
I only examine word sequences which fit a particular Practical work C(w1 , w2 ) w1 w2 Tag Pattern Practical work
15 / 28 16 / 28
Application #2:
(Pointwise) Mutual Information Corpus Linguistics
Application #2:
Collocations Collocations
Collocations Collocations
Defining a collocation Defining a collocation
Krishnamurthy Krishnamurthy
We want to compare the likelihood of 2 words next to other Calculating One way to see if two words are strongly connected is to Calculating
being being a chance event vs. being a surprise collocations
compare
collocations
17 / 28 18 / 28
Pointwise Mutual Information Equation Corpus Linguistics
Application #2:
Mutual Information example Corpus Linguistics
Application #2:
Collocations Collocations
Collocations Collocations
Our probabilities (p (w1 w2 ), p (w1 ), p (w2 )) are all basically Defining a collocation Defining a collocation
I C (Ruhollah ) = 20
I N is the number of words in the corpus I C (AyatollahRuhollah ) = 20
I The number of bigrams the number of unigrams I N = 14, 307, 668
p (w1 w2 ) 20
(3) I(w1 , w2 ) = log (4) I(Ayatollah , Ruhollah ) = log2 = log2 N 4220
20
N
p (w1 )p (w2 ) 42 20
N N
C (w1 w2 ) 18.38
= log N
C (w1 ) C (w2 )
N N
To see how good a collocation this is, we need to compare it
C (w w )
= log[N C (w1 )1C (2w2 ) ] to others
19 / 28 20 / 28
Application #2:
Motivating Contingency Tables Corpus Linguistics
Application #2:
Collocations Collocations
Collocations Collocations
The formula we have also has the following equivalencies: Defining a collocation Defining a collocation
What we can instead get at is: which bigrams are likely, out
Krishnamurthy Krishnamurthy
21 / 28 22 / 28
Application #2:
Observed bigram probabilities Corpus Linguistics
Application #2:
Collocations Collocations
Collocations Collocations
Defining a collocation Defining a collocation
Krishnamurthy Krishnamurthy
23 / 28 24 / 28
Expected bigram probabilities Corpus Linguistics
Application #2:
Expected bigram frequencies Corpus Linguistics
Application #2:
Collocations Collocations
Collocations Collocations
Defining a collocation Defining a collocation
Krishnamurthy Krishnamurthy
If we assumed that sherlock and holmes are Calculating Multiplying by 7105 (the total number of bigrams) gives us Calculating
independenti.e., the probability of one is unaffected by the collocations
the expected number of times we should see each bigram:
collocations
probability of the otherwe would get the following table: Practical work Practical work
25 / 28 26 / 28
Application #2:
Working with collocations Corpus Linguistics
Application #2:
Collocations Collocations
Calculating Calculating
P (fo fe )2 collocations collocations
(6) 2 = fe Practical work The question is: Practical work
27 / 28 28 / 28