15 views

Uploaded by AngelRibeiro10

Similarities measures

- Strategies to Solve Exponential and Logarithmic Equations
- Information - Entropy Theory of AI
- A Low-Cost VLSI Architecture for Robust (1)
- Test Bank for Calculus for the Life Sciences 1st Edition by Schreiber
- OCWExponLog
- CAPE® Pure Mathematics Past Papers ebook
- ib Maths Formula Booklet
- F2 Introduction to Algebra
- InformeRutherford DiegoChavesCarriles 2017-18
- ipr
- Math 24 SG 3
- hanchuanpeng2005.pdf
- 7 Sequence & Series Part 2 of 2
- Digital Negative and Log Transformation
- _Week10.pdf
- 1603.03720.pdf
- Costa Sousa Lobo 2001 Modeling and Comparison of Dissolution Profiles
- Syllabus MAT235
- Week 7 Notes
- Basic Pharmacokinetics.pdf

You are on page 1of 2

_

Cross Validated is a question and Here's how it works:

answer site for people interested in

statistics, machine learning, data

analysis, data mining, and data

visualization. Join them; it only takes a

minute:

Anybody can ask Anybody can The best answers are voted

a question answer up and rise to the top

Join

Can you explain the difference between the Jaccard similarity coefficient and the pointwise mutual information (PMI) measure? It would

be great if you could add a few examples.

ttnphns Moeen MH

31.6k 7 95 248 128 4

1 Answer

These two are quite different. Still, let us try to "bring them to a common denominator", to see

the difference. Both Jaccard and PMI could be extended to a continuous data case, but we'll

observe the primeval binary data case.

Y

1 0

-------

1 | a | b |

X -------

0 | c | d |

-------

a = number of cases on which both X and Y are 1

b = number of cases where X is 1 and Y is 0

c = number of cases where X is 0 and Y is 1

d = number of cases where X and Y are 0

a+b+c+d = n, the number of cases.

a

we know that Jaccard[X, Y ] =

a+b+c

.

P (X,Y )

PMI by Wikipedia definition is PMI[X, Y ] = log .

P (X)P (Y )

Let us first forget about "log" - because Jaccard implies no logarithming. Then plug a,b,c,d

notation into PMI formula to obtain:

a

= = = =

a+b a+c

P (X)P (Y ) (a + b)(a + c) a+b a+c gm[P (X), P (Y )]

n n

n n

where "gm" is geometric mean of the two probabilities, and Ochiai similarity between X and Y

vectors is just another name for cosine similarity in case of binary data: a a

a+c

.

a+b

So, you can see that PMI (without logarithm) is Ochiai coefficient further "normalized" (or I'd

say, de-normalized) by the overall probability of the two-way positive (eventful) data.

But Jaccard and Ochiai are comparable. Both are association measures ranging from 0 to 1.

They differ in the accents they put on the potential discrepancy between frequencies b and c.

I've described it in the answer "Ochiai" above links to. To cite:

Because product (seen in Ochiai) increases weaker than sum (seen in Jaccard) when only

one of the terms grows, Ochiai will be really high only if both of the two proportions

(probabilities) are high, which implies that to be considered similar by Ochiai the two

vectors must share the great shares of their attributes/elements. In short, Ochiai curbs

similarity if b and c are unequal. Jaccard does not.

Community ttnphns

1 31.6k 7 95 248

https://stats.stackexchange.com/questions/256684/jaccard-similarity-coecient-vs-point-wise-mutual-information-coecient/25 1/2

10/1/2017 probability - Jaccard similarity coecient vs. Point-wise mutual information coecient - Cross Validated

https://stats.stackexchange.com/questions/256684/jaccard-similarity-coecient-vs-point-wise-mutual-information-coecient/25 2/2

- Strategies to Solve Exponential and Logarithmic EquationsUploaded byEmmanuel Pedroza Niño
- Information - Entropy Theory of AIUploaded byGeorge Rajna
- A Low-Cost VLSI Architecture for Robust (1)Uploaded byVeerender Chary T
- Test Bank for Calculus for the Life Sciences 1st Edition by SchreiberUploaded bya410896504
- OCWExponLogUploaded byblueyes78
- CAPE® Pure Mathematics Past Papers ebookUploaded byJANE BALBOSA MOHAMMED
- ib Maths Formula BookletUploaded byfred
- F2 Introduction to AlgebraUploaded by412137
- InformeRutherford DiegoChavesCarriles 2017-18Uploaded byDiego Chaves
- iprUploaded byRachel Sparks
- Math 24 SG 3Uploaded byMysteryAli
- hanchuanpeng2005.pdfUploaded bytoufik1986
- 7 Sequence & Series Part 2 of 2Uploaded bySabhari Ram
- Digital Negative and Log TransformationUploaded byTushar Patil
- _Week10.pdfUploaded byDhiaa LaMi
- 1603.03720.pdfUploaded byvojarufosi
- Costa Sousa Lobo 2001 Modeling and Comparison of Dissolution ProfilesUploaded byCesar Londoño Giraldo
- Syllabus MAT235Uploaded byAllen Li
- Week 7 NotesUploaded byNdivhuho Neosta
- Basic Pharmacokinetics.pdfUploaded byarturorojas2014
- UreaUploaded byRusydina
- Dp Finite QueueUploaded byZharlene Sasot
- PEG_manual_v4.2Uploaded byMaciasPajas
- Math475_Project2Uploaded byLance Johnson
- PertUploaded bythissr
- sol02Uploaded byspitzersglare
- Research PPTUploaded byRimaRizkyA
- R & RUploaded byazadsingh1
- NewProgress_AMaths TB(Sol)_ch13Uploaded bykkakilai
- MAT 161 - 04 fa 2013Uploaded byJohn Puskas

- Review Text BasedUploaded byAngelRibeiro10
- Overlap Coefficient - WikipediaUploaded byAngelRibeiro10
- Language, Music and Computing - Mitrenina, Eds - 2019.pdfUploaded byAngelRibeiro10
- A Survey of Heterogeneous Information Network AnalysisUploaded byAngelRibeiro10
- edital_poscomp 2018Uploaded byAngelRibeiro10
- curso_grafos_handout201009Uploaded byAngelRibeiro10
- Text MiningUploaded byAngelRibeiro10
- Ontolog Social Web KeynoteUploaded byAngelRibeiro10
- Quando eu era um filhoteUploaded byAngelRibeiro10
- Biopython_Tutorial.pdfUploaded byAngelRibeiro10
- Guide to Unconventional Computing for MusicUploaded bySonnenschein
- Fundamentals of Algorithmics Brassard InglesUploaded byTusharVatsa
- inplementar.pdfUploaded byAngelRibeiro10
- egc2013_tutoriel_MissaouiUploaded byAngelRibeiro10
- ontolog-social-web-keynote.pdfUploaded byAngelRibeiro10
- Beethoven's Letters. (1790--1826.) Vol. iUploaded byAngelRibeiro10
- Sound LabUploaded byAngelRibeiro10
- Aristóteles - Arte PoéticaUploaded byFellipe Ferini dos Santos
- natural language processingUploaded byAngelRibeiro10
- How to Use the Hungarian Algorithm_ 10 Steps (With Pictures)Uploaded byAngelRibeiro10
- jumping-nlp-curves.pdfUploaded byAngelRibeiro10
- Redes ComplexasUploaded byAngelRibeiro10
- Guia.politicamente.incorreto.da.Historia.do.BrasilUploaded byCleber Daniel Paiva
- acustica.txtUploaded byAngelRibeiro10
- Programa Escola RCUploaded byAngelRibeiro10
- book_270.pdfUploaded bygerman2210
- inplementar.pdfUploaded byAngelRibeiro10
- Introduction to Computer Programming With MATLABUploaded byAngelRibeiro10
- Redes Complexas 2Uploaded byAngelRibeiro10

- Python Based Machine Learning for Profile MatchingUploaded byIRJET Journal
- qconsp17-featureengineering-170426171227Uploaded byPrashanth Mohan
- Python Data Science Cookbook - Sample ChapterUploaded byPackt Publishing
- Vector Based ApproachesUploaded byAl Tapone
- T1.pdfUploaded byKriti Goyal
- MainClustering of text documentsUploaded byAtchyut Nagabhairava
- Neutrosophic Sets and Systems, vol. 13/ 2016. JournalUploaded byAnonymous 0U9j6BLllB
- LECTURE02 03 SimilarityMetrices DataVisualizationUploaded bybilo044
- Complex Neutrosophic Similarity Measures in Medical DiagnosisUploaded byAnonymous 0U9j6BLllB
- String MetricsUploaded byselvaperumalvijayal
- Optimizing Ontology Mapping Using Genetic Algorithms (OOMGA)Uploaded byAnonymous CW4bGDy
- Solution 1Uploaded bytinhyeusoida
- Jacard vs PMIUploaded byAngelRibeiro10
- FastMap- A Fast Algorithm for Indexing Data-mining and VisualizaUploaded byBartoszSowul
- document clustering based on term frequency and inverse therapyUploaded byHaveit12
- Ontology Modelling for FDA Adverse Event Reporting SystemUploaded byEditor IJRITCC
- Removingsimilar Data Chunk Using Map ReducingAlgorithm on Cloud StorageUploaded byInternational Journal of Innovative Science and Research Technology
- Evaluating Ayasdi’s Topological Data Analysis for Big Data HKim2015Uploaded byviju001
- A Novel Metric for Measuring Similarity for Document ClusteringUploaded byseventhsensegroup
- A Fast Fpga Based Architecture for Measuring the Distance BetweenUploaded byIAEME Publication
- NEURAL SYMBOLIC ARABIC PARAPHRASING WITH AUTOMATIC EVALUATIONUploaded byCS & IT
- Distributed Clustering SurveyUploaded byAnita Tayal
- hvdc transmission lines electriacl power systemsUploaded bydhrgham
- SIMILARITY MEASURES FOR RECOMMENDER SYSTEMS: A COMPARATIVE STUDYUploaded byJournal 4 Research
- Spatial EcoUploaded bymalingaprabhasara
- IW, Princeton, PaperUploaded byClément Canonne
- Generating Sentences by Editing PrototypesUploaded byLi Du
- A Survey of Similarity Measures in Web Image SearchUploaded byAnonymous vQrJlEN
- 341_FullPaper nominalisationUploaded bymaurodevas
- An Application of Single-Valued Neutrosophic Sets in Medical DiagnosisUploaded byMia Amalia