You are on page 1of 14

Memory-Based Reasoning

이재현
PASTA Lab.
POSTECH

PASTA IE POSTECH
1. Introduction
Memory-Based Reasoning(MBR) is
 Identifying similar cases from experience
 Applying the information from these cases to the problem at hand.
 MBR finds neighbors similar to a new record and uses the neighbors for classification
and prediction.

It cares about the existence of two operations


 Distance function ; assigns a distance between any two records
 Combination function ; combines the results from the neighbors to arrive at an
answer.

Applications of MBR span many areas;


 Fraud detection
 Customer response prediction
 Medical treatments
 Classifying responses

PASTA IE
POSTECH
2. How does MBR work?
What is the most likely movie last seen by a respondent based on the source of the record a
nd the age of the individual?

MBR has two distinct phases


 The learning phase generates the historical database
 The prediction phase applies MBR to new cases

PASTA IE
POSTECH
2.1. The three main issues in solving a problem with MBR
Choosing the appropriate set of historical records
 The historical records, also known as the training set, is a subset of available records.
 The training set needs to provide good coverage of the records so that the nearest
neighbors to an unknown record are useful for predictive purposes.

Representing the historical records


 The performance of MBR in making predictions depends on how the training set is
represented in the computer.

Determining the distance function, Combination function, and number of neighbors


 The distance function, combination function, and number of neighbors are the key
ingredients in determining how good MBR is at producing results.

PASTA IE
POSTECH
3. Case study ; Classifying News Stories
What are the codes?
 News provider assigns codes to news stories in order to describe the content of the
stories. These codes help users search for stories of interest.

Applying MBR
 Choosing the training set
The training set consisted of 49,652 news stories
 Choosing the Distance function
In this case, a distance function already existed, based on a notion called relevance
feedback that measures the similarity of two documents based on the words they contain.

PASTA IE
POSTECH
3. Case study ; Classifying News Stories
Relevance Feedback function

score ( A, B )
d classification ( A, B)  1 
score ( A, A)
 Choosing the combination function
The combination function used a weighted summation technique.

 Choosing the number of neighbors


The investigation varied the number of nearest neighbors between 1 and 11 inclusive.

PASTA IE
POSTECH
3. Case study ; Classifying News Stories
The result
 Recall and precision are two measurements that are useful when measuring how well a
set of codes get assigned.
Recall ; “How many of the correct codes did MBR assign to the story?”
Precision ; “How many of the codes assigned by MBR were correct?”

Codes by MBR Correct codes Recall Precision


A,B A,B,C,D 50% 100%
A,B,C,D,E,F,G,H A,B,C,D 100% 50%

Category Recall Precision


Government 85% 87%
Industry 91% 85%
Market Sector 93% 91%
Product 69% 89%
Region 86% 64%
Subject 72% 53%
PASTA IE
POSTECH
4. Measuring Distance
Three most common distance functions
 Absolute value of the difference ; |A-B|
 Square of the difference ; (A-B)2
 Normalized absolute value |A-B|/(maximum difference)
Example
Recnum Gender Age Salary
1 Female 27 $ 19,000
2 Male 51 $ 64,000
3 Male 52 $105,000
4 Female 33 $ 55,000
5 Male 45 $ 48,000

 Gender
Dgender(female,female) = 0, Dgender(male,female) = 1
Dgender(female,male) = 1, Dgender(male,male) = 0

PASTA IE
POSTECH
4. Measuring Distance
 Age
27 51 52 33 45
27 0.00 0.96 1.00 0.24 0.72
51 0.96 0.00 0.04 0.72 0.24
52 1.00 0.04 0.00 0.76 0.28
33 0.24 0.72 0.76 0.00 0.48
45 0.72 0.24 0.28 0.48 0.00

 Merge into a single record distance function.


Summation ; dsum(A,B) = dgender(A,B) + dage(A,B) + dsalary(A,B)
Normalized summation ; dnorm(A,B) = dsum(A,B)/max(dsum)
Euclidean distance ; deuclid(A,B) = sqrt(dgender(A,B)2 + dage(A,B)2 + dsalaty(A,B)2)

PASTA IE
POSTECH
4. Measuring Distance
Set of nearest neighbors for three distance functions
dsum dnorm deuclid
1 1,4,5,2,3 1,4,5,2,3 1,4,5,2,3
2 2,5,3,4,1 2,5,3,4,1 2,5,3,4,1
3 3,2,5,4,1 3,2,5,4,1 3,2,5,4,1
4 4,1,5,2,3 4,1,5,2,3 4,1,5,2,3
5 5,2,3,4,1 5,2,3,4,1 5,2,3,4,1

Insert new customer


 Gender ; Female, Age ; 45, Salary ; $100,000
Set of nearest neighbor for new customer
1 2 3 4 5 neighbors
dsum 1.662 1.659 1.338 1.003 1.640 4,3,5,2,1
dnorm 0.554 0.553 0.446 0.334 0.547 4,3,5,2,1
deuclid 0.781 1.052 1.251 0.494 1.000 4,1,5,2,3

PASTA IE
POSTECH
5. The combination function ; Asking the neighbors for the answer

The basic approach ; Democracy


 The basic combination function used for MBR is to have the K nearest neighbors vote
on the answer-”democracy” in data mining.

 Customers with Attrition History


Recnum Gender Age Salary Attriter
1 Female 27 $ 19,000 No
2 Male 51 $ 64,000 Yes
3 Male 52 $105,000 Yes
4 Female 33 $ 55,000 Yes
5 Male 45 $ 48,000 No
new Female 45 $100,000 ?

PASTA IE
POSTECH
5. The combination function ; Asking the neighbors for the answer
 Using MBR to determining if the new customer will attrite
Neighbors Neighbor K=1 K=2 K=3 K=4 K=5
Attrition
dsum 4,3,5,2,1 Y,Y,N,Y,N yes yes yes yes yes
deuclid 4,1,5,2,3 Y,N,N,Y,Y yes ? no ? yes

 Attrition prediction with confidence


K=1 K=2 K=3 K=4 K=5
dsum Yes, 100% Yes, 100% Yes, 67% Yes, 75% Yes, 60%
deuclid Yes, 100% Yes, 50% Yes, 67% Yes, 50% Yes, 60%

PASTA IE
POSTECH
5. The combination function ; Asking the neighbors for the answer

Weighted voting
 Weighted voting is similar to voting except that the neighbors are not all created equal
 Closer neighbors have stronger votes than neighbors farther away do.
 The size of the vote is inversely proportional to the distance from the new record.
 To prevent problems when the distance might be 0, it is common to add 1 to the
distance before taking the inverse.
 Attrition prediction with weighted voting
K=1 K=2 K=3 K=4 K=5
dnorm 0.749 to 0 1.441 to 0 1.441 to 0.647 2.085 to 0.647 2.085 to 1.290
deuclid 0.669 to 0 0.669 to 0.562 0.669 to 1.062 1.157 to 1.062 1.601 to 1.062

 Confidence with weighted voting


K=1 K=2 K=3 K=4 K=5
dnorm Yes, 100% Yes, 100% Yes, 69% Yes, 76% Yes, 62%
deuclid Yes, 100% Yes, 54% Yes, 61% Yes, 52% Yes, 60%

PASTA IE
POSTECH
6. Conclusion
Strengths of Memory-Based Reasoning
 It produces results that are readily understandable.
 It is applicable to arbitrary data types, even non-relational data.
 It works efficiently on almost any number of fields.
 Maintaining the training set requires a minimal amount of effort.

Weaknesses of Memory-Based Reasoning


 It is computationally expensive when doing classification and prediction.
 It requires a large amount of storage for the training set.
 Results can be dependent on the choice of distance function, combination function,
and number of neighbors

PASTA IE
POSTECH

You might also like