You are on page 1of 2

10/1/2017 probability - Jaccard similarity coecient vs.

Point-wise mutual information coecient - Cross Validated

join this community tour help

Cross Validated is a question and Here's how it works:
answer site for people interested in
statistics, machine learning, data
analysis, data mining, and data
visualization. Join them; it only takes a
Anybody can ask Anybody can The best answers are voted
a question answer up and rise to the top

Jaccard similarity coefficient vs. Point-wise mutual information coefficient

Can you explain the difference between the Jaccard similarity coefficient and the pointwise mutual information (PMI) measure? It would
be great if you could add a few examples.

probability distance-functions mutual-information association-measure jaccard-similarity

edited Jan 17 at 14:00 asked Jan 17 at 12:11

ttnphns Moeen MH
31.6k 7 95 248 128 4

1 Answer

These two are quite different. Still, let us try to "bring them to a common denominator", to see
the difference. Both Jaccard and PMI could be extended to a continuous data case, but we'll
observe the primeval binary data case.

Using a,b,c,d convention of the 4-fold table, as here,

1 0
1 | a | b |
X -------
0 | c | d |
a = number of cases on which both X and Y are 1
b = number of cases where X is 1 and Y is 0
c = number of cases where X is 0 and Y is 1
d = number of cases where X and Y are 0
a+b+c+d = n, the number of cases.

we know that Jaccard[X, Y ] =

P (X,Y )
PMI by Wikipedia definition is PMI[X, Y ] = log .
P (X)P (Y )

Let us first forget about "log" - because Jaccard implies no logarithming. Then plug a,b,c,d
notation into PMI formula to obtain:

P (X, Y ) a/n an (a+b)(a+c) Ochiai[X, Y ]

= = = =
a+b a+c
P (X)P (Y ) (a + b)(a + c) a+b a+c gm[P (X), P (Y )]
n n
n n

where "gm" is geometric mean of the two probabilities, and Ochiai similarity between X and Y

vectors is just another name for cosine similarity in case of binary data: a a


So, you can see that PMI (without logarithm) is Ochiai coefficient further "normalized" (or I'd
say, de-normalized) by the overall probability of the two-way positive (eventful) data.

But Jaccard and Ochiai are comparable. Both are association measures ranging from 0 to 1.
They differ in the accents they put on the potential discrepancy between frequencies b and c.
I've described it in the answer "Ochiai" above links to. To cite:

Because product (seen in Ochiai) increases weaker than sum (seen in Jaccard) when only
one of the terms grows, Ochiai will be really high only if both of the two proportions
(probabilities) are high, which implies that to be considered similar by Ochiai the two
vectors must share the great shares of their attributes/elements. In short, Ochiai curbs
similarity if b and c are unequal. Jaccard does not.

edited Apr 13 at 12:44 answered Jan 17 at 13:56

Community ttnphns
1 31.6k 7 95 248 1/2
10/1/2017 probability - Jaccard similarity coecient vs. Point-wise mutual information coecient - Cross Validated 2/2