Professional Documents
Culture Documents
Fourth International Conference on Knowledge Discovery & Data Mining Friday, August 28, 1998 New York, New York
T8-2
T8-3
Tutorial Goals
Compare and Summarize Data Mining Tools which:
Offer multiple modeling and classification algorithms Support project stages surrounding model construction Stand alone Are general-purpose Cost a lot We could get our hands on
Topics
Products covered Review of algorithms Comparative tables of properties Screen shots exemplifying qualities Summary of distinctives
T8-5
Caveats
We dont know every tool well (and are sure to have missed some!)
Level of exposure noted for each tool
Model 1
T8-7
Tools Evaluated
Version Tested 4 3.0.1 2.1.1 Beta 4.0.3 2 2.5 3.1 1 2.1 3.5 3 8.1 2 1.07 4 1.1
Product
Company
URL http://www.isl.co.uk/clem.html http://www.think.com/html/products/products.htm http://www.datamindcorp.com http://www.sas.com/software/components/miner.html http://www.urbanscience.com/main/gainpage.htm http://www.software.ibm.com/data/iminer/ http://www.sgi.com/Products/software/MineSet/ http://www.unica-usa.com/model1.htm http://www.abtech.com http://www.unica-usa.com/prodinfo.htm http://www.salford-systems.com http://www.wardsystems.com/neuroshe.htm mailto://olpars@partech.com http://www.cognos.com/busintell/products/index.html http://www.rulequest.com/see5-info.html http://www.mathsoft.com/splus/ http://www.wizsoft.com/why.html
Our Experience Moderate Moderate High Moderate Low Low Low Moderate Moderate High Moderate Moderate High Moderate Moderate High Moderate
Integral Solutions, Ltd. Integral Solutions, Ltd. Clementine Thinking Thinking Machines, Machines, Corp. Corp. Darwin DataMind DataMind DataCruncher SASInstitute Institute Enterprise Miner SAS Urban Science Urban Science GainSmarts IBM Intelligent Miner IBM Silicon Graphics, Inc. Silicon Graphics, Inc. MineSet Group Group 1 1/Unica Technologies Model 1 AbTechCorp. Corp. AbTech ModelQuest Unica Technologies, Inc. Unica PRW CART NeuroShell OLPARS Scenario See5 S-Plus WizWhy Salford Systems Ward Systems Group, Inc. PAR Government Systems Cognos RuleQuest Research MathSoft WizSoft
T8-8
Data Input and Model Output Options Usability Ratings Visualization Capabilities Modeling Automation Methods
T8-9
Unix Standalone
PC Standalone (95/NT)
Platforms
Database Connectivity
NT Server / PC Client
Key
blank + no capability some capability good capability excellent capability
Clementine Darwin DataCruncher Enterprise Miner GainSmarts Intelligent Miner MineSet Model 1 ModelQuest PRW CART Scenario NeuroShell OLPARS See5 S-Plus WizWhy
+ +
T8-10
Tool Groupings
Desktop
PC (standalone) Flat Files One or Two Algorithms Data Fits into RAM
High End
Multiple Platforms, ClientServer Flat Files or Direct Database Access Multiple Algorithm Types Large Databases
T8-11
Technical
Algorithm Options
Knobs to enhance model performance
Model Automation
Simplify model design cycle Documentation of steps used in generating models (repeatability)
Descriptive Reporting
Domain terminology Graphical representations
T8-12
Automatic Header
Summary Reports
T8-13
Decision Trees
a>4
n y
b>3.5
n y n
b>2
y
a> 1
n y
1
1998 Elder Research
0
T8-14 updated October 19, 1998
Polynomial Networks
Z17 = 3.1 + 0.4a - .15b2 + 0.9bc - 0.62abc + 0.5c3
Layer 0 (Normalizers) a k f d h
N1
Layer 1 Layer 2
z1 z16 z9 Double 16 Single 14 z14 z17 Triple 17 z8 Double 19 Double 20 z20 z19
U7
Layer 3 Unitizers
N9
z6
N6 N4 N8
z4
MultiLinear 15
z15
z21 Triple 21
U2
Y1
Y2
N5
z5
T8-15
Consensus Models
Parametrically Summarize Data Points
orders, terms
Decision Trees
(e.g., CART, CHAID, C5)
T8-16
Histogram
family, order
T8-17
Contributory Models
retain data points; each potentially affects estimate at new point Kernels
shape, spread
k, distance metric
k-Nearest Neighbor
Goal, iterations
Spread, index
1998 Elder Research
Properties of Algorithms
Algorithm Classical (LR, LDA) Neural Networks Visualization Decision Trees Polynomial Networks K-Nearest Neighbors Kernels Accurate Scalable Interpretable Useable Robust Versatile Fast Hot
Key
C good neutral D bad
C C D C D C
C D DD C DD DD
C D C C D C D
C D C C C D
CC C D D D
D D C D D
C DD
D C
DDD C C C D C C D D
T8-19
Multi-layer Perceptrons
Polynomial Networks
Sequential Discovery
Decision Trees
Algorithms
Association Rules
Nearest Neighbor
Linear/Statistical
Rule Induction
Time Series
Clementine Darwin Datamind Enterprise Miner GainSmarts Intelligent Miner MineSet Model 1 ModelQuest PRW CART Cognos NeuroShell OLPARS See5 SPlus WizWhy
+ + + +
T8-20
Kohonen
K Means
Bayes
Normalize Inputs
Cross-Validation
Clementine Darwin Enterprise Miner Intelligent Miner Model 1 PRW NeuroShell OLPARS
T8-21
Network Visual
Learning Rate
Multi-Layer Perceptrons
Parameter Summary
Momentum
Classification Costs
Pruning Severity
Missing Data
C5 or C4.5
"CART"
Decision Trees
Clementine Darwin Enterprise Miner GainSmarts Intelligent Miner MineSet Model 1 ModelQuest CART Scenario S-Plus See5
+ + +
T8-22
Visual Trees
CHAID
Priors
Other
Regression / Stats
Clementine Enterprise Miner GainSmarts Intelligent Miner MineSet Model 1 ModelQuest Enterprise PRW S-Plus S-Plus Scenario
Linear Y + + +
Logistic
Input Selection
Factor Analysis
Clementine
+ +
+ +
+ +
T8-23
Usability
Clementine Darwin DataCruncher Enterprise Miner GainSmarts Intelligent Miner MineSet Model 1 ModelQuest Enterprise PRW CART Scenario NeuroShell OLPARS See5 S-Plus WizWhy
Model Building + + + + + + +
Model Understanding + + + + + + + + +
Technical Support + + + +
Overall + + + + + +
T8-24
Visualization
Clementine Darwin DataCruncher Enterprise Miner GainSmarts Intelligent Miner MineSet Model 1 ModelQuest Enterprise PRW CART Scenario NeuroShell OLPARS See5 S-Plus WizWhy
Scatter/ Classification Rotating Conditional Line Decision Scatter Plots Plots Regions
Correlation Plots
T8-25
Automation
Clementine Darwin DataCruncher Enterprise Miner GainSmarts Intelligent Miner MineSet Model 1 ModelQuest PRW CART Scenario NeuroShell OLPARS See5 S-Plus WizWhy
1998 Elder Research
Method of Automation Visual Programming, Programming Language Programming Language (Task manager) Visual Programming, Programming Language Macro Language, Wizards (Wizards) Data History, Log Model Wizard Batch Agenda Experiement Manager; Macros Built-in Basic Scripting
T8-26
Distinctives
Clementine Darwin DataCruncher Enterprise Miner GainSmarts Intelligent Miner MineSet Model 1 ModelQuest PRW CART Scenario NeuroShell OLPARS See5 S-Plus WizWhy
Strengths vis ual inte rfa c e ; algorithm bre a d th e ffic ient c lie nt-s e rve r; intuitive inte rfa c e o p tions e a s e o f us e d e p th of algorithm s ; visual inte rfa c e data trans fo rmations , built on SAS ; a lgorithm option depth algorithm bre a d th; graphical tre e /clus te r output data visualization e a s e o f us e ; automate d m o d e l dis c o ve ry bre a d th of alg o rithms e xte n s ive a lgorithms; automate d m o d e l s e le c tion d e p th of tre e o p tions e a s e o f us e multiple neural ne twork archite c ture s multiple s tatis tical algorithms; cla s s -bas e d vis ualization d e p th of tre e o p tions d e p th of algorithm s ; visualization; programable /e xte ndable e a s e o f us e ; e a s e o f mode l unde rs tanding
Weaknesses s c a lability no uns upe rvis e d ; limite d vis ualization s ingle a lgorithm harde r to u s e ; ne w product is s ue s no uns upe rvis e d ; limite d vis ualization fe w a lg o rithm options ; no automation fe w a lg o rithms; no model e xport re a lly a ve rtical to o l s o m e non-intuitive inte rfa c e o p tions limite d visualization difficult file I/O; limite d visualization narrow analysis path unorthodox inte rfa c e ; only neural ne tworks date d inte rfa c e ; difficult file I/O limite d visualization; fe w data options limite d inductive m e tho d s ; s te e p le a rning curve limite d visualization
T8-29
Closing Observations
Data Mining Tools Can:
Enhance inference process Speed up design cycle
Forthcoming Report
Report provides detailed comparison of high-end data mining tools, including capabilities, ease of use, and practical tips. Available for $695 from Elder Research (http://www.datamininglab.com), Q4 1998. Purchasers receive brief free consulting session to explore report findings in more detail, if desired.
Note: The analyses and reviews were performed completely independently, and were made possible by the cooperation of the vendors, for which Elder Research is very grateful. The companies, however, provided no financial support, and had no influence on its editorial content.
1998 Elder Research T8-31 updated October 19, 1998