Professional Documents
Culture Documents
Overview
The average way The probabilistic way By leveraging the relational network structure
Conclusions
CIS 764-Gaurav Chauhan
Problems Caused
Following problems occur in data analysis because of missing values in the same
Summarizing variables Computing new variables Comparing variables Combining variables In Time Series Analysis
Considering average of the available values for prediction Using probabilistic approach for value prediction Leveraging relation network structure of the data to predict values
Temperature (avg)
60F 66F 62F 64F 69F 59F
1942
1943 1944
60F
59F 65F
CIS 764-Gaurav Chauhan
Assume that we have n values and we are required to predict n+1th value For every i such that i=1 to n the probability that a data instance has a value vi is p(vi) Each of these probabilities is calculated on the bases of the frequency with which vi occurs in the data. That said, vn+1 is picked at random such that
This technique applies only to relational data only The values of missing instances are predicted as the mode of the peers who fit the relational network and have no missing values
Example 1
Book C
Category C
Book A
Category A
Book B
Category B
Book A
Book C
Book B
Category B
? (Predicted= A) Category C
Example 2 Teacher
Conclusion
Missing values in the data are bad when it is used for analysis, learning or mining purposes Various techniques aim at predicting data but none has reached a 100% accuracy An average of 90% accuracy with which these values are predicted is still acceptable
CIS 764-Gaurav Chauhan
References
www.hrs.co.nz
http://dblife.cs.wisc.edu/search.cgi?enti ty=entity-8982
Questions Anyone
I am shivering not because of nervousness but because of cold room temperature -one nervous student