Professional Documents
Culture Documents
Data Mining
Evolution of knowledge-based systems Key partners in Data Mining
Data analyst / statistician
Knowledge Engineer
Domain Expert
DM phases
(a) Problem definition (b) Creating target data set (c ) Data pre-processing and transformation (d ) Feature and algorithm selection (e) Data Mining (f) Evaluation of learned knowledge (g) Fielding the knowledge base
University of Patras, HCI Group - SETN02 4
Removing noise
Missing values Data aggregation by period
DATA WAREHOUSE
10
Data mining
Classification problem
2 classes: solvent and insolvent customers Distribution among classes in original dataset: 99% of solvent customers and 1% of insolvent customers Very small number of insolvencies
11
Latency
Count_X_charges
TrendDif5
AverageUnits
TrendCount5 CountInstallments
University of Patras, HCI Group - SETN02 13
14
15
Decision Tree
Classification Results E21
Cases Selected Original Count % Cases not Selected Original Count % Category 0 1 0 1 0 1 0 1 Predicted Group Total 0 1 101 35 136 9 1203 1212 74.26 25.74 100 0.74 99.26 100 42 22 64 16 638 654 65.62 34.38 100 2.45 97.55 100
16
Neural Network
Classification Results E30 Category 0 Count 1 0 % 1 Count 0 1 0 % 1 Predicted Group 0 1 Total 65 69 136 8 1203 1212 47.7 50.7 100 0.6 99.2 100 24 40 64 11 643 654 37.5 62.5 100 1.6 98.3 100
Cases Selected
Original
Original
17
Actual cases
Performance over 90% in the majority class and over 83% in the minority class.
stage
(a) Problem definition
DK
HIGH
MEDIUM
Type of DK
Business and domain knowledge, requirements Implicit, tacit knowledge Attribute relations, semantics of corporate DB Tacit and implicit knowledge for inferences Interpretation of the selected features Inspection of discovered knowledge Definition of criteria related to business objectives
(b) Creating target data set (c ) Data preprocessing (d ) Feature and algorithm selection (e) Data Mining (f) Evaluation of learned knowledge
HIGH
MEDIUM
LOW
MEDIUM
HIGH
20
Conclusion
Data mining is a knowledge-driven process All stages contribute to the success of the process Domain experts play significant role in most phases of the process Need for selection of algorithms and techniques that support interpretation of mined knowledge
Need for integrated tools and adequate techniques to support involvement of domain experts in the process
University of Patras, HCI Group - SETN02 21