Professional Documents
Culture Documents
이재현
PASTA Lab.
POSTECH
PASTA IE POSTECH
1. Introduction
Memory-Based Reasoning(MBR) is
Identifying similar cases from experience
Applying the information from these cases to the problem at hand.
MBR finds neighbors similar to a new record and uses the neighbors for classification
and prediction.
PASTA IE
POSTECH
2. How does MBR work?
What is the most likely movie last seen by a respondent based on the source of the record a
nd the age of the individual?
PASTA IE
POSTECH
2.1. The three main issues in solving a problem with MBR
Choosing the appropriate set of historical records
The historical records, also known as the training set, is a subset of available records.
The training set needs to provide good coverage of the records so that the nearest
neighbors to an unknown record are useful for predictive purposes.
PASTA IE
POSTECH
3. Case study ; Classifying News Stories
What are the codes?
News provider assigns codes to news stories in order to describe the content of the
stories. These codes help users search for stories of interest.
Applying MBR
Choosing the training set
The training set consisted of 49,652 news stories
Choosing the Distance function
In this case, a distance function already existed, based on a notion called relevance
feedback that measures the similarity of two documents based on the words they contain.
PASTA IE
POSTECH
3. Case study ; Classifying News Stories
Relevance Feedback function
score ( A, B )
d classification ( A, B) 1
score ( A, A)
Choosing the combination function
The combination function used a weighted summation technique.
PASTA IE
POSTECH
3. Case study ; Classifying News Stories
The result
Recall and precision are two measurements that are useful when measuring how well a
set of codes get assigned.
Recall ; “How many of the correct codes did MBR assign to the story?”
Precision ; “How many of the codes assigned by MBR were correct?”
Gender
Dgender(female,female) = 0, Dgender(male,female) = 1
Dgender(female,male) = 1, Dgender(male,male) = 0
PASTA IE
POSTECH
4. Measuring Distance
Age
27 51 52 33 45
27 0.00 0.96 1.00 0.24 0.72
51 0.96 0.00 0.04 0.72 0.24
52 1.00 0.04 0.00 0.76 0.28
33 0.24 0.72 0.76 0.00 0.48
45 0.72 0.24 0.28 0.48 0.00
PASTA IE
POSTECH
4. Measuring Distance
Set of nearest neighbors for three distance functions
dsum dnorm deuclid
1 1,4,5,2,3 1,4,5,2,3 1,4,5,2,3
2 2,5,3,4,1 2,5,3,4,1 2,5,3,4,1
3 3,2,5,4,1 3,2,5,4,1 3,2,5,4,1
4 4,1,5,2,3 4,1,5,2,3 4,1,5,2,3
5 5,2,3,4,1 5,2,3,4,1 5,2,3,4,1
PASTA IE
POSTECH
5. The combination function ; Asking the neighbors for the answer
PASTA IE
POSTECH
5. The combination function ; Asking the neighbors for the answer
Using MBR to determining if the new customer will attrite
Neighbors Neighbor K=1 K=2 K=3 K=4 K=5
Attrition
dsum 4,3,5,2,1 Y,Y,N,Y,N yes yes yes yes yes
deuclid 4,1,5,2,3 Y,N,N,Y,Y yes ? no ? yes
PASTA IE
POSTECH
5. The combination function ; Asking the neighbors for the answer
Weighted voting
Weighted voting is similar to voting except that the neighbors are not all created equal
Closer neighbors have stronger votes than neighbors farther away do.
The size of the vote is inversely proportional to the distance from the new record.
To prevent problems when the distance might be 0, it is common to add 1 to the
distance before taking the inverse.
Attrition prediction with weighted voting
K=1 K=2 K=3 K=4 K=5
dnorm 0.749 to 0 1.441 to 0 1.441 to 0.647 2.085 to 0.647 2.085 to 1.290
deuclid 0.669 to 0 0.669 to 0.562 0.669 to 1.062 1.157 to 1.062 1.601 to 1.062
PASTA IE
POSTECH
6. Conclusion
Strengths of Memory-Based Reasoning
It produces results that are readily understandable.
It is applicable to arbitrary data types, even non-relational data.
It works efficiently on almost any number of fields.
Maintaining the training set requires a minimal amount of effort.
PASTA IE
POSTECH