You are on page 1of 69

Visual Data Mining:

Background, Techniques, and


Drug Discovery Applications

Mihael Ankerst
The Boeing Company

Georges Grinstein
UMass Lowell and AnVil Inc.

Daniel Keim
AT&T Research and University of Konstanz

A color version of the tutorial notes can be found via


http://www.fmi.uni-konstanz.de/~keim

KDD2002 Conference
Emails and URLs
Data Exploration

Mihael Ankerst
Definition
Mihael.Ankerst@boeing.com Data Exploration is the process of searching and analyzing
http://www.visualclassification.com/ankerst databases to find implicit but potentially useful information
Daniel A. Keim
Keim@research.att.com more formally
keim@informatik.uni-konstanz.de
Data Exploration is the process of finding a
http://www.fmi.uni-konstanz.de/~keim
subset D of the database D and
George Grinstein
Grinstein@cs.uml.edu hypotheses Hu(D,C)
http://genome.uml.edu that a user U considers useful in an application context C
http://www.anvilinfo.com

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 2 Georges Grinstein, UMass Lowell and AnVil Inc. 5

Overview Abilities of Humans and Computers


Part I: Visualization Techniques
1. Introduction
2. Visual Data Exploration Techniques abilities of Data Storage
3. Distortion and Interaction Techniques the computer Numerical Computation
4. Visual Data Mining Systems
Searching
Part II: Specific Visual Data Mining Techniques
1. Association Rules Planning
2. Classification Diagnosis
3. Clustering Logic
Prediction
4. Text Mining
5. Tightly Integrated Visualization Perception
Part III: Drug Discovery Applications Creativity
1. Biology and Chemistry General Knowledge
2. Bioinformatics and Cheminformatics
3. Examples human abilities
4. Bioinformatics Packages
5. Cheminformatics Packages

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 3 Georges Grinstein, UMass Lowell and AnVil Inc. 6

Goals of Visualization Techniques Brief Historical Overview of Exploratory


Data Visualization Techniques (cf. [WB 95])
Presentation
pioneering work of Tufte [Tuf 83, Tuf 90] and Bertin [Ber 81]
starting point: facts to be presented are fixed a priori
focuses on
process: choice of appropriate presentation techniques
result: high-quality visualization of the data to present facts visualization of data with inherent 2D-/3D-semantics
general rules for layout, color composition, attribute mapping, etc.
Confirmatory Analysis
starting point: hypotheses about the data
development of visualization techniques for different types
process: goal-oriented examination of the hypotheses
of data with an underlying physical model
result: visualization of data to confirm or reject the hypotheses
geographic data, CAD data, flow data, image data, voxel data, etc.
Exploratory Analysis
starting point: no hypotheses about the data development of visualization techniques for arbitrary
process: interactive, usually undirected search for structures, trends multidimensional data (without an underlying physical model)
result: visualization of data to lead to hypotheses about the data applicable to databases and other information resources

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 4 Georges Grinstein, UMass Lowell and AnVil Inc. 7

1
Data Preprocessing Techniques Overview
Techniques for Dimension Reduction Part I: Visualization Techniques
(Set of d-dim Data Items -> Set of k-dim. Data Items; k<<d) 1. Introduction
2. Visual Data Exploration Techniques
Principal Component Analysis [DE 82] 3. Distortion and Interaction Techniques
Determines a minimal set of principal components (linear combinations of the 4. Visual Data Mining Systems
original dimensions) which explain the main variations of the data.
Part II: Specific Visual Data Mining Techniques
Factor Analysis [Har 67] 1. Association Rules
Determines a set of unobservable common factors which explain the main 2. Classification
variations of the data. The original dimensions are linear combinations of the 3. Clustering
common factors. 4. Text Mining
Multidimensional Scaling [SRN 72] 5. Tightly Integrated Visualization
Uses the similarity (or dissimilarity) matrix of the data as defining coordinate Part III: Drug Discovery Applications
axes in multidimensional space. The Euclidean distance in that space is a 1. Biology and Chemistry
measure of the data items.
2. Bioinformatics and Cheminformatics
Fastmap [FL 95] 3. Examples
Fastmap also operates on a given similarity matrix and iteratively reduces the 4. Bioinformatics Packages
number of dimensions while preserving the distances as much as possible. 5. Cheminformatics Packages

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 8 Georges Grinstein, UMass Lowell and AnVil Inc. 11

Data Preprocessing Techniques Visual Data Exploration Techniques


Subsetting Techniques
(Set of Data Items -> Subset of Data Items)
Sampling (determines a representative subset of a database)
Querying (determines a certain, usually a-priori fixed subset of the Standard 2D/3D Displays
database
Segmentation Techniques Geometric Transformations
(Set of Data-Items -> Set of (Set of Data Items))

Segmentation based upon attribute values or attribute ranges Iconic Displays


Aggregation Techniques
(Set of Data-Items -> Set of Aggregate Values) Dense Pixel Displays
Aggregation (sum, count, min, max,...) based upon
- attribute values Stacked Displays
- topological properties, etc.
Visualization of Aggregations:
- Histograms
- Pie Charts, Bar Charts, Line Graphs, etc.
Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 9 Georges Grinstein, UMass Lowell and AnVil Inc. 12

Classification Standard 2D/3D Displays


Data Type to be Visualized
Examples from the VisualInsights WebPage

1. one-dimensional

2. two-dimensional
Visualization Technique

3. multi-dimensional
Stacked Display

4. text/web Dense Pixel Display

Iconic Display
5. hierarchies/graphs
Geometrically-transformed Display

6. algorithm/software Standard 2D/3D Display

Standard Projection Filtering Zoom Distortion Link&Brush


Interaction and Distortion Technique

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 10 Georges Grinstein, UMass Lowell and AnVil Inc. 13

2
Geometric Transformations Geometric Transformations
Prosection Views [FB 94, STDS 95]
Basic Idea:
Visualization of geometric transformations and

used by permission of R. Spence, Imperial College London


used by permission of R. Spence, Imperial College London
projections of the data

Scatterplot-Matrices [And 72, Cle 93]


Landscapes [Wis 95]
Projection Pursuit Techniques [Hub 85]
(D techniques for finding meaningful projections of multidimensional data)

Prosection Views [FB 94, STDS 95 schematic representation example


Hyperslice [WL 93] matrix of all orthogonal projections where the result of the selected
multidimensional range is colored differently
Parallel Coordinates [Ins 85, ID 90] (combination of selections and projections)
Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 14 Georges Grinstein, UMass Lowell and AnVil Inc. 17

Geometric Transformations Geometric Transformations


Hyperslice [ 93]
Scatterplot-Matrices [Cle 93]
Used by ermission of M. Ward, Worcester Polytechnic Institute

used by permission of J. J. van Wijk

matrix of scatterplots
(x-y-diagrams) of the k-dim.
data [total of (k2/2-k) scatterplots]

matrix of k slices through the k-dim. Data (the slices are determined interactively)
Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 15 Georges Grinstein, UMass Lowell and AnVil Inc. 18

Geometric Transformations Geometric Transformations


Landscapes [Wis 95] Parallel Coordinates [Ins 85, ID 90]
n equidistant axes which are parallel to one of the screen axes
and correspond to the attributes
Used by permission of B. Wright, Visible Decisions Inc.

the axes are scaled to the [minimum, maximum] - range of the


news articles
corresponding attribute
visualized as
a landscape every data item corresponds to a polygonal line which intersects each
of the axes at the point which corresponds to the value for the attribute

visualization of the data as perspective landscape


the data needs to be transformed into a (possibly artificial)
2D spatial representation which preserves the characteristics Attr. 1 Attr. 2 Attr. 3 Attr. k

of the data
Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 16 Georges Grinstein, UMass Lowell and AnVil Inc. 19

3
Geometric Transformations Geometric Transformations
Parallel Coordinates (contd)
x5
(x1, x2, x3, x4, x5, x6,
x7)

x1
x6 x7
x4
x2 x3

15.000 data items with noise 5 % of the data (750 data items)
Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 20 Georges Grinstein, UMass Lowell and AnVil Inc. 23

Geometric Transformations Geometric Transformations


Parallel Coordinates (contd) Parallel Coordinates (contd)

15.000 data items with a query-dependet coloring


Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 21 Georges Grinstein, UMass Lowell and AnVil Inc. 24

Geometric Transformations Geometric Transformations


Parallel Coordinates (contd) (x1, x2, x3, x4, x5, x6, x7)
x7
x6

x1

x5

x2
RadViz
x4
used by permission of A. Inselberg, Tel Aviv University, Israel used by permission of A. Inselberg
x3
points on a line in 10-dim. space points on a circle in 2-dim. space
Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 22 Georges Grinstein, UMass Lowell and AnVil Inc. 25

4
Iconic Displays Iconic Displays
Basic Idea: Visualization of the data values as Stick Figures (contd)
features of icons.

Overview

used by permission of G. Grinstein, University of Massachusettes at Lowell


Chernoff-Faces [Che 73, Tuf 83]
5-dim. Image
data from the
Stick Figures [Pic 70, PG 88] great lake region

Shape Coding [Bed 90]

Color Icons [Lev 91, KK 94]

TileBars [Hea 95]


(a use of small icons representing the relevance feature
vectors in document retrieval)

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 26 Georges Grinstein, UMass Lowell and AnVil Inc. 29

Iconic Displays Iconic Displays


Chernoff-Faces [Che 73, Tuf 83]
H. Chernoff, according to [Che 73]

ExVis
Census
Icons

visualization of the multidim. data using the properties of a face icon


(shape of nose, mouth, eyes, and the shape of the face itself)

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 27 Georges Grinstein, UMass Lowell and AnVil Inc. 30

Iconic Displays Iconic Displays


Stick Figures [pic 70, PG 88]
visualization of the multidim. data using the properties of a face icon

two attributes of the data are mapped to the display axes and the
remaining attributes are mapped to the angle and/or length of the limbs
texture patterns in the visualization show certain data
characteristics
Census
data image

Stick Figure Icon A Family of Stick Figures


Outliers
Trends
Clusters
Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 28 Georges Grinstein, UMass Lowell and AnVil Inc. 31

5
Iconic Displays Iconic Displays
Stick Figures (contd) Color Icons [Lev 91, KK 94]
visualization of the data using color icons
color icons are arrays of color fields representing the
attribute values
arrangement is query-dependent (e.g., spiral)

schematic
representation
of 6-dim. data attr. 1 attr. 2 attr. 3

attr. 4 attr. 5 attr. 6

properties of the triangulation of molecute data

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 32 Georges Grinstein, UMass Lowell and AnVil Inc. 35

Iconic Displays Iconic Displays


Shape Coding [Bed 90] Color Icons (contd)
the data are visualized using small arrays of fields
each field represents one attribute value
arrangement of attribute fields (e.g., 12-dimensional data):

random data
attr. 2 attr. 3 containing several
attr. 1 attr. 4 clusters

attr. 6 attr. 7

attr. 5 attr. 8
attr. 9 attr. 12
attr. 10 attr. 11

arrays are arranged line-by-line according to a given sorting


(e.g., the time attribute for time-series data)
Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 33 Georges Grinstein, UMass Lowell and AnVil Inc. 36

Iconic Displays Iconic Displays


Shape Coding (contd) Color Icon

FBI Homicide Data


time series of
NASA earth
observation data
used by permission of J. Beddow

14 Dimensions

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 34 Georges Grinstein, UMass Lowell and AnVil Inc. 37

6
Iconic Displays Dense Pixel Displays
Color Icon Basic Idea
each attribute value is represented by one colored pixel
(the value ranges of the attributes are mapped to a fixed colormap)
used by permission of G. Grinstein, University of Massachusettes at Lowell

the attribute values for each attribute are presented in separate


subwindows
Fused MRI
Image data example:
attribute 1 attribute 2 attribute 3

visualization of
attribute values
six-dim. data of a data item

attribute 4 attribute 5 attribute 6

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 38 Georges Grinstein, UMass Lowell and AnVil Inc. 41

Iconic Displays Four Questions


3D Color Icon
How should the pixels be arranged
Dimension 1 Dimension 2 Dimension 3

within the subwindows?


used by permission of G. Grinstein, University of Massachusettes at Lowell

Dimension 4 Dimension 5 Dimension 6

Fused MRI
Image data
Are alternative shapes of the subwindows possible?

How can an appropriate ordering of the dimensions


be achieved?

What can be done with geometry-related data?

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 39 Georges Grinstein, UMass Lowell and AnVil Inc. 42

Dense Pixel Displays Arrangement of Pixels


Task: data exploration and analysis

Principle:
very large amounts of multidimensional data
use the perceptual abilities of humans
Given: Ordered Set of n data items{a1, ..., an}
consisting of k data values each ( a ,L , a )
1
i
k
i

adequate presentation of as much information as possible


Goal: Two-dim. arrangement of the data values,
Goal: use each pixel of the display to visualize one data value i.e. bijective mapping f: {1...n} {1...b}{1...h}
about 1.3 million data values may be displayed at one point (n b * h), such that the function
of time

(b )
n n
Idea: map each data value to a colored pixel and arrange them d(f ( i) , f ( j) ) - d ( 0 ,0 ) , i- j
n
,h
i- j
n
adequately i = 1 j= 1

pixel-oriented visualization techniques


is minimal, where d(f (i), f (j)) is the Lp-distance
(p=1,2) of the pixels belonging to ai and aj
Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 40 Georges Grinstein, UMass Lowell and AnVil Inc. 43

7
Example of a Structured Arrangement
IBM DOLLA R IBM DOLLAR Column 1 Column 2 Column 12
...

Row 1

...
Row 2

...

...

...

...
...

Row 8
DOW JONES GOLD .US$ DOW JONES GOLD.US$
(w1, h1) =(3, 3), (w2, h2) = (5,1), (w3, h3) =(1, 4), (w4, h4) = (12, 1) and (w5, h5) = (1, 8)
Peano-Hilbert Morton (Z-Curve)
Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 44 Georges Grinstein, UMass Lowell and AnVil Inc. 47

IBM Dollar

Dow Jones Gold Price


Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 45 Georges Grinstein, UMass Lowell and AnVil Inc. 48

FAZ Stock Index (Jan. 74 - Apr. 95)


Recursive Pattern Technik
Idea: recursive generalization of line- and
column-oriented arrangements
semantic arrangement by allowing user
interaction to determine the height hi and
width wi for each recusion level

Algorithm for recursion level i:


Draw wi pattern of recursion level (i-1)
in left-right direction and repeat this hi times
in top-down direction

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 46 Georges Grinstein, UMass Lowell and AnVil Inc. 49

8
FAZ Aktien Index (Jan. 74 - Apr. 95) 50 Stocks of the FAZ Stock Index

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 50 Georges Grinstein, UMass Lowell and AnVil Inc. 53

Shape of Subwindows Comparison of Circle Segments and


Recursive Pattern Technique
Idea: Pixel, which belong to the k data values of the same
data item, should have a small distance.
Dimension 1 Dimension 2 Dimension 3

Goal: Shape of the Subwindows, such that


n
( )
k k Dimension 4 Dimension 5 Dimension 6

n k
1 1 d f (a ik ), f (a kj ) is minimal
k =1 i = 1 j= 1
i
( j
)
where d f (a k ), f (a k ) is the Lp-distance
i
of two pixels a k and a kj belonging to two
different dimensions
Circle Segments Recursive Pattern
Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 51 Georges Grinstein, UMass Lowell and AnVil Inc. 54

Geometry-related Visualizations
Shape of Subwindows
Two Possibilities for the Shape of the Subwindows: Task: Visualizing Spatial Data
1. two-dim. array of rectangles
Problem: Overlap in the Visualization
2. segmented circle
Example: Lightning Strikes in Southern Germany

Recursive Pattern & Spiral Technique Circle Segments Technique


Dimension 1 Dimension 2 Dimension 3
Dimension 1
Dimension 6

one data item


Dimension 2

one data item

Dimension 5

Dimension 3
Dimension 4 Dimension 5 Dimension 6
Dimension 4

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 52 Georges Grinstein, UMass Lowell and AnVil Inc. 55

9
The VisualPoints System Stacked Displays
Dimensional Stacking [contd]

Visualization of
oil mining data
with longitude
and latitude
mapped to the
outer x-, y-axes
and ore grade
and depth
mapped to the
inner x-, y-axes

Application of
Gridfit Alg. to
Telcom Data Used by permission of M. Ward, Worcester Polytechnic Institute

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 56 Georges Grinstein, UMass Lowell and AnVil Inc. 59

Stacked Displays Stacked Displays


Basic Idea: Visualization of the data using a hierarchical
Treemap [contd]
partitioning into subspaces.

Overview

Dimensional Stacking [LWW 90]


Worlds-within-Worlds [FB 90a/b]
Treemap [Shn 92, Joh 93]
Cone Trees [RMC 91]
InfoCube [RG 93] Used by permission of B. Shneiderman, University of Maryland Used by permission of B. Shneiderman, University of Maryland

Venn Diagram Tree-Map

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 57 Georges Grinstein, UMass Lowell and AnVil Inc. 62

Stacked Displays Stacked Displays


Dimensional Stacking [LWW 90] Treemap [contd]
Treemap of a
file system
attribute 4 containing
attribute 2 about 1000
files
attribute 3

attribute 1
partitioning of the n-dimensional attribute space in
2-dimensional subspaces which are stacked into
each other
partitioning of the attribute value ranges into classes
the important attributes should be used on the outer levels
adequate especially for data with ordinal attributes of low
cardinality Used by permission of B. Shneiderman, University of Maryland

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 58 Georges Grinstein, UMass Lowell and AnVil Inc. 63

10
Stacked Displays Comparison of the Techniques
Cone Trees [RMC 91, CK 95] Comparsion: An Attempt
cluster- multi- no. of no. of cate- visual learning
animated 3 D Visualizations of hierarchical data ing variate variates data gorical overlap curve
hot spot items data
Scatterplot Matrices ++ ++ + + - o ++
Landscapes + + - o o + +
File system Geometric
structure Prosection Views ++ ++ + + - o +
Transformations
visualized as Hyperslices + + + + - o o
a cone tree Parallel Coordinates o ++ ++ - o -- o
Stick Figures o o + - - - o
Iconic Displays Shape Coding o - ++ + - + -
Color Icon o - ++ + - + -
Query-Independent + + ++ ++ - ++ +
Pixel Displays
Query-Dependent + + ++ ++ - ++ -
Dimensional Stacking + + o o ++ o o
Worlds-within-Worlds o o o + o o o
Stacked
Treemaps + o + o ++ + o
Displays
Cone Trees + + o + o + +
InfoCube o o - - o o +
Used by permission of S. Card, Xerox Parc

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 64 Georges Grinstein, UMass Lowell and AnVil Inc. 68

Hybrid Techniques Overview


Basic Idea: Integrated use of multiple techniques in one or multiple Part I: Visualization Techniques
windows to enhance the expressiveness of the 1. Introduction
visualizations. 2. Visual Data Exploration Techniques
linking diverse visualization techniques may 3. Distortion and Interaction Techniques
4. Visual Data Mining Systems
provide additional information
virtually all visualization techniques are Part II: Specific Visual Data Mining Techniques
1. Association Rules
combined with dynamics & interactivity
2. Classification
3. Clustering
Examples:
4. Text Mining
IVEE [AW 95a/b] uses Starfield Displays [AS 94] which are scatterplots 5. Tightly Integrated Visualization

of icons with dynamic zooming and mapping (combination of geometric, Part III: Drug Discovery Applications
icon-based, and dynamic techniques) 1. Biology and Chemistry
2. Bioinformatics and Cheminformatics
XmDv [War 94] allows to dynamically link and brush scatterplot matrices,
3. Examples
star icons, parallel coordinates, and dimensional stacking combination of
4. Bioinformatics Packages
geometric, icon-based, hierarchical and dynamic techniques)
5. Cheminformatics Packages

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 66 Georges Grinstein, UMass Lowell and AnVil Inc. 69

Comparison of the Technique Distortion and Interaction Techniques


Criteria for Comparison [KK 96]
comparison of the described information visualization Projection
techniques based on their suitability for certain
data characteristics Filtering
(e.g., no. of variates, no. of data items, categorical data, ...)
task characteristics Zooming
(e.g., clustering, multi variate hot spots, ...)
visualization characteristics Linking and Brushing
(e.g., visual overlap, learning curve, ...)

Disclaimer: The following comparison table expresses my personal opinion obtained from reading the literature and Distortion
experimenting with several of the described techniques. Many of the ratings are arguable and largely depend on the
considered data, the exploration task, experience of the user, etc. In addition, implementations of the techniques in
real systems usually avoid the drawbacks of a single technique by combining it with other techniques, which is also
not reflected in the ratings.

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 67 Georges Grinstein, UMass Lowell and AnVil Inc. 70

11
Interactive Projections Interactive Filtering

dynamic or interactive variation of the projections Filter-Flow Model [YS 93]


visualization of the remaining parameters in 2D or 3D selection based on a dataflow-oriented model:
the data flows through filter-units which reduce the flow
automatic variation results in an animation of the data
examples: GrandTour [Asi 85] especially useful for an intuitive specification of
complex boolean queries:
S Plus [BCW 88]
XGobi [SCB 92, BCS 96] AND-connected query portions may be specified using
Influence & Attribute Explorer [STDS 95, multiple filter units in a pipeline fashion
SDTS 95] OR-connected query portions may be specified using
... multiple independent flows which reunite into a single
bigger flow

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 71 Georges Grinstein, UMass Lowell and AnVil Inc. 74

Interactive Filtering Interactive Filtering


Filter-Flow Model [YS 93]
dynamic or interactive determination of subsets of the
database Complex boolean
distinction between query:
selection: direct selection of the desired subset Find the
querying: specification of properties of the desired subset accountants or
specific problem: specification of complex boolean engineers from
conditions Georgia who are
managed by
examples: Magic Lenses [Bie 93] / Moveable Filter [FS 95] Elizabeth or the
Filter-Flow Model [YS 93] clerks from Georgia
InfoCrystal [Spo 93] who make more
DEVise [Liv 97] than 30.000!
Dynamic Queries [AS 94, Eic 94, GR 94]
...
Used by permission of B. Shneiderman, university of Maryland
Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 72 Georges Grinstein, UMass Lowell and AnVil Inc. 75

Interactive Filtering Interactive Filtering


Magic Lenses / Moveable Filter InfoCrystal [Spo 93]
[Bie 93, SFB 94, FS 95]
specification of complex boolean queries using an intuitive
interactive selection using lens-like tools which selectively model for specifying complex subsets
filter the data in the considered areas
basic idea:
multiple lenses / moveable filters can be used for a A
multi-level filtering (allowing complex conditions)

A 1

3
C B 2 4

5 6 7

Xerox PARC
C B
Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 73 Georges Grinstein, UMass Lowell and AnVil Inc. 76

12
Interactive Zooming Interactive Linking and Brushing
visualization of large amounts of data in reduced form
to provide an overview of the data prerequisite: multiple visualizations of the same data
(e.g., visualizations of different projections)
variable zooming of the data with automatic changes
of the visualization modes to present more details interactive changes made in one visualization are
automatically reflected in the other visualizations
examples: PAD++ [PF 93, Bed 94, BH 94]
IVEE [AW 95a/b] examples: Xmdv-Tool [War 94]
DataSpace [ADLP 95]
S Plus [BCW 88]
...
XGobi [SCB 92, BCS 96]
a comparison of fisheye and zooming techniques DataDesk [Vel 92, WUT 95]
can be found in [Sch 93] ...

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 77 Georges Grinstein, UMass Lowell and AnVil Inc. 80

Interactive Zooming Interactive Linking and Brushing


IVEE / Spotfire [AW 95a/b] XGobi [SCB 92, BSC 96]
Used by permission of A. Buja, Bell Labs, Am. Stat. Assoc

taken from [BSC 96]


.

used by permission of C. Ahlberg, IVEE Development


climate and housing data of the US

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 78 Georges Grinstein, UMass Lowell and AnVil Inc. 81

Interactive Zooming Distortion Techniques


InfoZoom [Hum 01]
Basic Idea: Distortion of the image to allow a visualization
of larger amounts of data
web-pages Overview [LA 94]
of the
German Perspective Wall MRC 91]
Automobile Bifocal Displays [SA 82]
Association
TableLens [RC 94]
(ADAC)
Graph. Fisheye Views [Fur 86, SB 94]
Hyperbolic Repr. [LR 94, LRP 95]

Hyperbolic Repr. [LR 94, LRP 95]


3D-Hyperbolic Repr. [MB 95]
used by permission of H.-G. Lindner, Human IT Hyperbox [AC 91]
Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 79 Georges Grinstein, UMass Lowell and AnVil Inc. 82

13
Distortion Techniques Distortion Techniques
Fisheye View [Fur 86, SB 94]
Perspective Wall [MRC 91]

used by permission of G. Furnas, University of Michigan


presentation of the data on a perspective wall

the data outside the focal area are


perspectively reduced in size

the perspective wall is a variant of the bifocal lens display


[SA 82] which horizontally compresses the sides of the
workspace by direct scaling
original graph fisheye view of the graph
shows an area of interest quite large and with detail and the
other areas successively smaller and in less detail
graph visualization using a fisheye perspective
Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 83 Georges Grinstein, UMass Lowell and AnVil Inc. 86

Distortion Techniques Distortion Techniques


Perspective Wall (contd) Hyperbolic Trees [LR 94, LRP 95]

documents
arranged on a
visualization
perspective of a large
wall organizational
hierarchy

used by permission of R. Rao, Xerox PARC

used by permission of S. Card, Xerox Parc visualization of tree structure in hyperbolic space with different foci
Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 84 Georges Grinstein, UMass Lowell and AnVil Inc. 87

Distortion Techniques Distortion Techniques


Hyperbolic Trees [LR 94, LRP 95]
Table Lens [RC 94]

Visualization of a
used by permission of T. Munzner, Stanford University

baseball database visualization


with a few rows of a large
being selected in number of
full detail connected
web-pages

used by permission of R. Rao, Xerox PARC

compact visualization of a table (spreadsheet/ database) with the visualization of a graph in 3D hyperbolic conetree-like representation
possibility of viewing portions of the table in more details
Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 85 Georges Grinstein, UMass Lowell and AnVil Inc. 88

14
Distortion Techniques Visual Data Mining Systems
Statistics-oriented Systems
Hyperbox [AC 91]
S Plus [BCW 88] / Trellis [BCS 96]
(J generic system for statistical analysis and visualization)

Parallel XGobi [XGobi, SCB 92, BSC 96]


processing (J extensible lisp-based system for statistical analysis and visualization)
used by permission of B. Alpern, IBM Watson Research Center

performance Data Desk [Vel 92, WUT 95]


data visualized (J commercial system for statistical analysis and visualization;
as a hyperbox features: dyn. linking & brushing of scatterplots and histograms)
Diamond (SPSS)
(J commercial system for statistical analysis and visualization;
features: dyn. linking & brushing of scatterplots, parallel coordinates, etc.)
DataSpace [ADLP 95]
(J 3D-arrangement of a large number of arbitrary visualizations)
mapping of scatterplots onto a hyperbox
Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 89 Georges Grinstein, UMass Lowell and AnVil Inc. 92

Overview Visual Data Mining Systems


Part I: Visualization Techniques
Visualization-oriented Systems
1. Introduction ExVis [GPW 89]
2. Visual Data Exploration Techniques ( features: stick figure and other icon-based techniques)
3. Distortion and Interaction Techniques
4. Visual Data Mining Systems Parallel Visual Explorer (IBM)
( features: parallel coordinate technique with query-based coloring, etc.)
Part II: Specific Visual Data Mining Techniques
1. Association Rules XmDv [War 94, MW 95]
2. Classification ( features: scatterplot matrices, star icons, parallel coordinates,
3. Clustering
dimensional stacking, dynamic linking and brushing)
4. Text Mining
5. Tightly Integrated Visualization Influence & Attribute Explorer [STDS 95, SDTS 95]
Part III: Drug Discovery Applications ( features: scatterplot and prosection matrices, histograms,
1. Biology and Chemistry dynamic linking and brushing)
2. Bioinformatics and Cheminformatics
Information Visualizer (Xerox) [HC 86, CRY 96]
3. Examples
4. Bioinformatics Packages ( features: diverse information visualization techniques including
5. Cheminformatics Packages perspective wall, table lens, cone trees)
Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 90 Georges Grinstein, UMass Lowell and AnVil Inc. 93

Visual Data Mining Systems


Visual Data Mining Systems
Database-oriented Systems
Hy+ [CM 93]
Overview ( features: query and visualizations of hygraphs)
TreeViz [Joh 93]
Statistics-oriented Systems ( features: treemap technique)
VisDB [KK 94, KK 95]
Visualization-oriented Systems ( system for interactive slider-based exploration of very large databases
features: stick figure, parallel coordinate, and pixel-oriented techniques)
Database-oriented Systems IVEE [AW 95a/b] / Spotfire
( commercial system for database exploration;
Special Purpose Visualization Systems features: generic interactive slider-based visualization environment)
DEVise [Liv 97]
( system for the generation of interactive special purpose
database visualizations)
Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 91 Georges Grinstein, UMass Lowell and AnVil Inc. 94

15
Visual Data Mining Systems The Humans Role (I)
Special Purpose Visualization Systems
Typical KDD Process
Software & Algorithm Visualization
(e.g., SeeSoft [ESS 92, BE 96] - a listing of Software & Algorithm User provides domain knowledge
Interfaces can be found under
Focus Pre-
Pre- Trans-
Trans- Data Evaluation
http://wwwbroy.informatik.tu-muenchen.de/~trilk/sv.html formation Mining
processing of patterns
for an overview paper see [SP 92])
Web Visualization
(e.g., Narcissus [HDWB 95], WebBook and WebForager [CRY 96] Knowledge
Database
- a listing of Web Visualization Interfaces can be found under
http://www.geog.ucl.ac.uk/casa/martin/geography_of_cyberspace.html)
Trial and
Visual Information Retrieval
Iterate ! Error ?
(e.g., Vibe [Ols 93] - a bibliography of Information Retrieval Interfaces can
be found under http://www.pitt.edu/~korfhage/viri_bib.htm;
for an overview paper see [Car 96])
Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 95 Georges Grinstein, UMass Lowell and AnVil Inc. 98

Overview The Humans Role (II)


Part I: Visualization Techniques Human-centered KDD Process
1. Introduction
2. Visual Data Exploration Techniques User provides domain knowledge
3. Distortion and Interaction Techniques
4. Visual Data Mining Systems
Focus Pre-
Pre- Trans-
Trans- Data Evaluation
Part II: Specific Visual Data Mining Techniques processing formation Mining of patterns
1. Association Rules
2. Classification
3. Clustering
4. Text Mining Database Knowledge
5. Tightly Integrated Visualization

Part III: Drug Discovery Applications Input of


1. Biology and Chemistry
Iterate ! domain
2. Bioinformatics and Cheminformatics knowledge
3. Examples
4. Bioinformatics Packages
5. Cheminformatics Packages

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 96 Georges Grinstein, UMass Lowell and AnVil Inc. 99

Specific Visual Data Mining Data Mining <-> Visualization


Techniques

Association Rules Data Mining Algorithms Visualization

Classification Actionable + -
Clustering Evaluation + -
Text Mining Flexibility - +
Tightly Integrated Visualization User Interaction - +

Visual Data Mining


Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 97 Georges Grinstein, UMass Lowell and AnVil Inc. 100

16
Human Involvement Association Rules
When ?

Definitions:
Right before the data mining step
Display initial data I = {i1,...,im} I Items,
Focus on/ narrow relevant search space
tI t Transactions,
D = {t1,...,tN}, ti I D Database,
During the data mining step X, Y I
Display intermediate results {t D : X t}
Direct the search
Support of X, s(X): D
After the data mining step
Confidence of X and Y, c(X,Y): s(X Y )
s(X )
Display the result

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 101 Georges Grinstein, UMass Lowell and AnVil Inc. 104

Overview Association Rules

Problem description:
Data
Data Data
Visualization + Interaction

DM-Algorithm
Visualization of
Visualization of step 1 Find all association rules X -> Y with
the data DM-Algorithm
the result

DM-Algorithm
step n
s(XY) smin
DM-Algorithm Result
and c(X, Y) cmin
Visualization of Result
Result the result

Knowledge Knowledge Knowledge

Preceding Subsequent Tightly integrated


Visualization (PV) Visualization (SV) Visualization (TIV)

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 102 Georges Grinstein, UMass Lowell and AnVil Inc. 105

Subsequent Visualization:
Overview Association Rules
Part I: Visualization Techniques
1. Introduction Rule Visualizer (MineSet) [Min 01]
2. Visual Data Exploration Techniques
3. Distortion and Interaction Techniques
4. Visual Data Mining Systems

Part II: Specific Visual Data Mining Techniques


1. Association Rules
2. Classification LHS and RHS items are mapped to
3. Clustering
x-, y-axis
4. Text Mining Confidence, support correspond to
5. Tightly Integrated Visualization height of the bar or disc, respectively

Part III: Drug Discovery Applications Interestingness is mapped to Color


1. Biology and Chemistry
2. Bioinformatics and Cheminformatics
3. Examples
4. Bioinformatics Packages
5. Cheminformatics Packages

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 103 Georges Grinstein, UMass Lowell and AnVil Inc. 106

17
Subsequent Visualization: Subsequent Visualization:
Association Rules Association Rules
Rule Visualizer (contd)
Market basket Analysis Visualizer [HHD+ 01]
Initialization

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 107 Georges Grinstein, UMass Lowell and AnVil Inc. 110

Subsequent Visualization: Subsequent Visualization:


Association Rules Association Rules
Rule-to-item visualization [WWT 99] MAV contd
Relaxation
Limitations of item-to-item visualizations:

A->C, B->C or A+B -> C ? Object occlusion <-> metadata

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 108 Georges Grinstein, UMass Lowell and AnVil Inc. 111

Subsequent Visualization: Subsequent Visualization:


Association Rules Association Rules
Rule-to-item visualization [WWT 99]
MAV contd
Navigation

Items are mapped to


one axis
Rules can be sorted
according to a criterion

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 109 Georges Grinstein, UMass Lowell and AnVil Inc. 112

18
Subsequent Visualization:
Association Rules Overview
Part I: Visualization Techniques
Association Ball Graph (DBMiner) [DBM 01] 1. Introduction
2. Visual Data Exploration Techniques
3. Distortion and Interaction Techniques
4. Visual Data Mining Systems

Part II: Specific Visual Data Mining Techniques


1. Association Rules
Items are visualized 2. Classification
as balls 3. Clustering
Arrows indicate rule 4. Text Mining
implication 5. Tightly Integrated Visualization
Size represents Part III: Drug Discovery Applications
support 1. Biology and Chemistry
2. Bioinformatics and Cheminformatics
3. Examples
4. Bioinformatics Packages
5. Cheminformatics Packages

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 113 Georges Grinstein, UMass Lowell and AnVil Inc. 116

Subsequent Visualization:
Classification
Association Rules
Interactive Mosaic Plots [HSW 00]
Problem description:
Given a set of objects with known class
labels.
Age Description
a Build model describing the data
a a
a with respect to the class
a b b
a Prediction
b
b b b
Use model to predict the class
label of objects
Height

Visualization of contingency table of attributes within a rule


Recursive height/width splitting

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 114 Georges Grinstein, UMass Lowell and AnVil Inc. 117

Subsequent Visualization:
Classification
Association Rules
Double Decker Plots [HSW 00]
Decision Tree

age < 35

Age Salary Sex Class


25 15 M Y
42 40 M N
Y sal < 67
29 63 F Y
81 45 F N
57 89 M Y

Recursive width splitting Y N

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 115 Georges Grinstein, UMass Lowell and AnVil Inc. 118

19
Subsequent Visualization: Subsequent Visualization:
Classification Classification
Decision Tree Visualizer (MineSet) [Min 01]
SAS EM [SAS 01]
Tree Viewer

Color corresponds to relative frequency of a class in a node


Branch line thickness is proportional to the square root of the objects

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 119 Georges Grinstein, UMass Lowell and AnVil Inc. 122

Subsequent Visualization: Subsequent Visualization:


Classification Classification
Option Tree Visualizer (MineSet) [Min 01] SAS EM [SAS 01]

Tree Ring Tree Map

Color corresponds to relative frequency of a class in a node


Number of objects in a node are reflected proportionally

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 120 Georges Grinstein, UMass Lowell and AnVil Inc. 123

Subsequent Visualization: Subsequent Visualization:


Classification Classification
SPSS AnswerTree [SPS+ 01] Visual Classification [AEK 00]

Level 1
Level 2
...

leaf

split point
inherited split point

Level 18

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 121 Georges Grinstein, UMass Lowell and AnVil Inc. 124

20
Subsequent Visualization:
Cluster Analysis
Classification
ParallAX [IA 00]

Cluster analysis Problem description:


Given a set of objects.
Debt ..
.. . ..
Group data into clusters so that
.. . . . . objects within a cluster are

. .. very similar

.. . . objects not in the same


cluster are dissimilar

Income

Select and order subset of predicting attributes


Visualize the result based on the parallel coordinates technique

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 125 Georges Grinstein, UMass Lowell and AnVil Inc. 128

Subsequent Visualization: Subsequent Visualization:


Classification Cluster Analysis
WinViz [LO 96] 3D Dynamic Projections [Yan 00]

3D subspace is determined by
centroids of 4 clusters 0, 1, 3, 5
Projection preserves inter-cluster
distances
Projection-determining cluster
centroids are visualized as big spheres
Other cluster centroids are
represented as small cubes

Left of attribute value: width of box indicates number of objects


Right of attribute value: class histograms

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 126 Georges Grinstein, UMass Lowell and AnVil Inc. 129

Subsequent Visualization:
Overview Cluster Analysis
Part I: Visualization Techniques
1. Introduction 3D Dynamic Projections [Yan 00]
2. Visual Data Exploration Techniques
3. Distortion and Interaction Techniques
4. Visual Data Mining Systems Volume rendering (by splatting) of multi-
dimensional volume data to overcome clutter
Part II: Specific Visual Data Mining Techniques
1. Association Rules
2. Classification
3. Clustering
4. Text Mining
5. Tightly Integrated Visualization

Part III: Drug Discovery Applications


1. Biology and Chemistry
2. Bioinformatics and Cheminformatics
3. Examples
4. Bioinformatics Packages
5. Cheminformatics Packages

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 127 Georges Grinstein, UMass Lowell and AnVil Inc. 130

21
Subsequent Visualization: Subsequent Visualization:
Cluster Analysis Cluster Analysis
VxInsight [VXI 02]
3D Dynamic Projections [Yan 00]

Cluster similarity graph can be overlaid on Clusters are visualized as hills


to data projections
SQL query to database
User-defined threshold for distance
between two cluster centroids Multi-resolution exploration

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 131 Georges Grinstein, UMass Lowell and AnVil Inc. 134

Subsequent Visualization: Subsequent Visualization:


Cluster Analysis Cluster Analysis

H-BLOB (Hierarchical BLOB) [SBG 00] OPTICS [ABKS 99]


Motivation
OPTICS = Ordering Points To Identify the Clustering Structure

Insensitive to Parameters

Augmented Cluster Ordering

Reachability-distance: Basis for Interactive Cluster Analysis

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 132 Georges Grinstein, UMass Lowell and AnVil Inc. 135

Subsequent Visualization: Subsequent Visualization:


Cluster Analysis Cluster Analysis
H-BLOB (Hierarchical BLOB) [SBG 00] OPTICS [ABKS 99]
Cluster hierarchies are shown for 1,
5, 10 and 20 clusters The Reachability-
Reachability-Plot
{
{

Represents the density-based clustering structure


Easy to analyze
Independent of the dimension of the data

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 133 Georges Grinstein, UMass Lowell and AnVil Inc. 136

22
Subsequent Visualization:
Overview Text Mining
Part I: Visualization Techniques Starlight [STA 01]
1. Introduction
2. Visual Data Exploration Techniques
3. Distortion and Interaction Techniques
4. Visual Data Mining Systems

Part II: Specific Visual Data Mining Techniques


1. Association Rules
2. Classification
3. Clustering
4. Text Mining
5. Tightly Integrated Visualization

Part III: Drug Discovery Applications


1. Biology and Chemistry
2. Bioinformatics and Cheminformatics
3. Examples
4. Bioinformatics Packages
5. Cheminformatics Packages

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 137 Georges Grinstein, UMass Lowell and AnVil Inc. 140

Subsequent Visualization:
Text Mining
Text Mining
Shape-based Visual Interface for Text Retrieval [RSE 99]

Problem description: Document terms are features


Term vector proportional to term
Given unstructured documents. weight
Term vectors are spread evenly
Cluster documents about the sphere (for all documents)

Summarize documents Spherical density source field is


used to form a surface for each
Classify documents document

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 138 Georges Grinstein, UMass Lowell and AnVil Inc. 141

Subsequent Visualization: Subsequent Visualization:


Text Mining Text Mining
ThemeRiver [WCF+ 00]
Shape-based Visual Interface for Text Retrieval

Shape-based visualization of query result


(Macbeth query by example)
Visualization of thematic Changes in documents Close-up of King Lear
Vertical distance indicates collective strength of the themes
Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 139 Georges Grinstein, UMass Lowell and AnVil Inc. 142

23
TIV:
Subsequent Visualization
Visual Classification [AEK 00]
... and a lot more ... !
attr.1 attr.2 ... class attr. 1 class attr. 2 class
0.3 23.3 ... Y 0.2 Y 0.5 N
2.4 2.0 ... N 0.3 Y 1.3 Y
e.g. Homepage of visualization group of PNNL: ...
0.3 Y 2.0 N
http://www.pnl.gov/InfoViz/technologies.html
0.5 N 2.5 Y

...

...

...

...
1.1 Y 5.1 N
or SOM-based Visualization:

...

...
...

...
http://www.cis.hut.fi/~juuso/
Each attribute is sorted and visualized separately
Each attribute value is mapped onto a unique pixel
The color of a pixel is determined by the class label of the object
The order is reflected by the arrangement of the pixels

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 143 Georges Grinstein, UMass Lowell and AnVil Inc. 146

TIV:
Overview Visual Classification [AEK 00]
Part I: Visualization Techniques
1. Introduction
2. Visual Data Exploration Techniques
3. Distortion and Interaction Techniques
... Pixel Arrangement
4. Visual Data Mining Systems Attribute 1
Part II: Specific Visual Data Mining Techniques ...
Attribute 2
1. Association Rules PBC Color Scale
2. Classification
3. Clustering
4. Text Mining
An Example:
Example:
n-ary split
5. Tightly Integrated Visualization

Part III: Drug Discovery Applications Attribute 1


1. Biology and Chemistry
2. Bioinformatics and Cheminformatics Attribute 2
3. Examples
4. Bioinformatics Packages
5. Cheminformatics Packages binary split

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 144 Georges Grinstein, UMass Lowell and AnVil Inc. 147

Tightly Integrated Visualization TIV:


(TIV) Visual Classification [AEK 00]

Visualization of algorithmic decisions Shuttle data set


Data
Visualization + Interaction

DM-Algorithm Data and patterns are better understood


step 1

DM-Algorithm User can make decisions based on


step n perception

Segment data set


Result User can make decisions based on
domain knowledge

Knowledge
Visualization of result enables user
specified feedback for next algorithmic run

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 145 Georges Grinstein, UMass Lowell and AnVil Inc. 148

24
Visual Classification Visual Classification
Decision Tree Visualization for the Segment Dataset
A New Visualization of a Decision Tree

age < 35 Level 1


Level 2

...
leaf

split point
inherited split point

Level 18

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 149 Georges Grinstein, UMass Lowell and AnVil Inc. 152

TIV: HD-Eye [HKW 99]


Visual Classification
Combining a high-dimensional clustering algorithm with visualization
techniques
A New Visualization of a Decision Tree
The OptiGrid Algorithm

age < 35 1. Determine a set of contracting projections {P0,,Pk}


2. Determine the best q separators {H0,Hq} in the projections
3. If there are no good separators exit;
Otherwise:
4. Determine a multi-dimensional grid based on {H0,, Hk}
G 5. Find Clusters Ci in the grid by determining highly-populated
Salary grid cells
6. For each Ci: OptiGrid(Ci)
< 40 > 80
[40,80]
- > map problem space of finding projections and separators onto
visualization space

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 150 Georges Grinstein, UMass Lowell and AnVil Inc. 153

TIV: HD-Eye [HKW 99]


Visual Classification
Visualization techniques for finding contracting projections

A New Visualization of a Decision Tree

age < 35

G Salary

< 40 > 80
[40,80]

G V P

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 151 Georges Grinstein, UMass Lowell and AnVil Inc. 154

25
TIV: HD-Eye [HKW 99]
Overview
Visualization techniques for finding separators Part I: Visualization Techniques
1. Introduction
2. Visual Data Exploration Techniques
3. Distortion and Interaction Techniques
4. Visual Data Mining Systems

Part II: Specific Visual Data Mining Techniques


1. Association Rules
2. Classification
3. Clustering
4. Text Mining
5. Tightly Integrated Visualization

Part III: Drug Discovery Applications


1. Biology and Chemistry
in 1D projections in 2D projections 2. Bioinformatics and Cheminformatics
3. Examples
4. Bioinformatics Packages
5. Cheminformatics Packages

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 155 Georges Grinstein, UMass Lowell and AnVil Inc. 158

TIV: HD-Eye [HKW 99]


Biology 101 (Slide 1)
The HD-Eye system
Genomics is the study of the entire
genome sequence of an organism
Genes are turned on and off at different times in
response to different environmental and biological
factors
Gene expression determines for example
The differentiation of organs (ontogeny)
The susceptibility and onset to specific diseases

Proteomics is the study of the entire


protein output from all genes and
related activities
Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 156 Georges Grinstein, UMass Lowell and AnVil Inc. 159

Summary Biology 101 (Slide 2)


Visual data mining architectures
Cells
Preceding Subsequent Tightly int.
fundamental working units of every living system
Visualization Visualization Visualization
Present/ display
DNA
Benefits of Visualization/ Interaction

patterns Arrangement of bases along a strand


Search problem space Genome
with perception Organisms complete DNA
Incorporate domain Gene
knowledge
Basic physical and functional unit of heredity
Provide trust and Chromosome
understandability of
patterns Physically separated linear molecules consisting of
genes
Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 157 Georges Grinstein, UMass Lowell and AnVil Inc. 160

26
Biology 101 (Slide 3) Terminology

Information technology based


Active Site Bioinformatics
Cheminformatics (Chemoinformatics)
Computational technology based
Computational biology
Computational chemistry
Protein Lots of others
(Enzyme) Lots of overlap
DNA RNA
Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 161 Georges Grinstein, UMass Lowell and AnVil Inc. 164

Chemistry 101 (Slide 1) Overview


Proteins Part I: Visualization Techniques
1. Introduction
Perform most life functions 2. Visual Data Exploration Techniques
3. Distortion and Interaction Techniques
Make up majority of cellular structures 4. Visual Data Mining Systems
Made up of linear array of amino acids (from 20 Part II: Specific Visual Data Mining Techniques
different types) which folds into specific 3D structures 1. Association Rules
2. Classification
Proteome 3. Clustering
4. Text Mining
Collection of all proteins in a cell 5. Tightly Integrated Visualization

Part III: Drug Discovery Applications


1. Biology and Chemistry
2. Bioinformatics and Cheminformatics
3. Examples
4. Bioinformatics Packages
5. Cheminformatics Packages

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 162 Georges Grinstein, UMass Lowell and AnVil Inc. 165

Chemistry 101 (Slide 2) State of the Landscape

Proteins include
Structural materials (muscle), enzymes (carry out chemistry),
Tremendous IT demands due to
receptors (recognize and release messengers), antibodies Clean room robotics
(recognize foreign objects), peptides (small proteins that act Nanotechnology
as hormones and messengers)
High performance computing
Proteins carry out most of the biochemical processes
of the living organism Few tools integrate analysis and
Some are bound to DNA, RNA, and other molecules forming visualization
very large macromolecules
They must be present at a the right time in the right
Very few tools integrate biology and
amount chemistry!
Both present a great opportunity

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 163 Georges Grinstein, UMass Lowell and AnVil Inc. 166

27
Classic Informatics as IT Current Bioinformatics and
Applied to Biology and Chemistry Computational Biology
Information management, access, Information mining
manipulation, and retrieval More mature than similarity querying
Biology and Chemistry guides and textbooks Sequence similarity
Journal and database access Expression array analyses
Searches Lots of analysis tools
Author, Corporate Name, Subject, Patent, Gene Statistical
Name (various), SNP, Chemical Name and Formula, Machine learning
Structure, Analytical Chemistry, Physical Property, Some visualizations
Chemical Synthesis or Reactions, Chemical Safety or
Toxicology Information, Few integrated analysis and visualizations
Exact and similarity matches

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 167 Georges Grinstein, UMass Lowell and AnVil Inc. 170

Modern Computational Informatics Cheminformatics Goals


Predict a compounds benefit and liabilities
Computations (ADMET) as early as possible
Select (predict)
Most promising lead candidates
Based on genomic, biological, and chemical
Simulations Models databases
Run simulations and experiments
And then To validate drug classes and activities
Search results
Mine results

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 168 Georges Grinstein, UMass Lowell and AnVil Inc. 171

Current Computational Chemistry and


Cheminformatics Goals (2)
Cheminformatics
Lots of 3D modeling and visualizations
Queries, analyses, predictions Target compounds to a specific protein
Structures from NMR and X-ray Induce binding
Physical properties Modulate amount
Chemical synthesis or reactions (chemical and Inhibit binding or produce competition
enzymatic) Identify proteins responsive to a
Chemical safety or toxicology compound (or classes of proteins)

Identify compounds responsive to a
Lots of algorithms
specific protein (or classes of proteins)
Information mining
Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 169 Georges Grinstein, UMass Lowell and AnVil Inc. 172

28
Bioinformatics Goals (1) Others: Macromolecular Structure
Generate high-quality reference
Determine the 3D structure of a molecule
sequences for the entire human genome
Four approaches
Identify X-ray crystallography
All human genes Nuclear magnetic resonance
Their variants Mass spectrometry
Gene control elements and networks Computational chemistry
Heuristics
Haplotypes Molecular mechanics
Sequence and identify the genomes of all Quantum mechanics

living organisms Lots of new tools (docking programs, pattern


discovery, database searching, QSAR, )
Study human variations

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 173 Georges Grinstein, UMass Lowell and AnVil Inc. 176

Bioinformatics Goals (2) Integrated Analysis and Visualization Systems

Molecular medicine
Gene tests KXEN R, S-Plus
Gene therapy NeuroGenetic SAS
Create drugs based on molecular information Optimizer Matlab
Risk assessment AnswerTree CrossGraphs
Clementine Intelligent Miner
Bioarcheology Alice DecisionSite
DNA identification CART, MARS Partek
Cubist SOMine
Agriculture and livestock breeding
See5 (C5.0) OmniVis
Bioprocessing
Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 174 Georges Grinstein, UMass Lowell and AnVil Inc. 177

Others: Proteomics Bio/Cheminformatics Tools

Isolate, identify, determine proteins Examples


Sequence matching
Determine function, interactions and
Gene expression analysis
pathways
QSAR and 3D-QSAR modeling
Key issues ADME and toxicology prediction
Large number of proteins (many more than genes) Pattern recognition
One gene can form possibly 8 alternative spicing variants Molecular similarity analysis
The proteome varies continuously in each of the 200 or so
types of cells over time Diversity analysis
Population
Some enzymes cut and rearrange proteins!
Structure
Many proteins interact indirectly via messengers
Post translational modifications (5 per protein)

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 175 Georges Grinstein, UMass Lowell and AnVil Inc. 178

29
Overview Simple Array Data
Part I: Visualization Techniques Each assay provides expression of genes
1. Introduction
2. Visual Data Exploration Techniques
3. Distortion and Interaction Techniques
Each measures different conditions
4. Visual Data Mining Systems
The array is a matrix of images which are
Part II: Specific Visual Data Mining Techniques
1. Association Rules converted to a matrix of numbers
2. Classification
3. Clustering Informatics activities
4. Text Mining
Manipulate, convert, store, validate, compute, identify, search,
5. Tightly Integrated Visualization
mine, present,
Part III: Drug Discovery Applications
1. Biology and Chemistry
2. Bioinformatics and Cheminformatics
3. Examples
4. Bioinformatics Packages
5. Cheminformatics Packages

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 179 Georges Grinstein, UMass Lowell and AnVil Inc. 182

Computations
Statistics that are meaningful are hard to
get need replicates
Bioinformatics Goal is to come up with hypotheses
Specific Example: Classes, clusters, relationships, outliers

In many cases only a small number of


Gene Expression Array genes change expression levels
significantly
Lots of noise
Some missing values
Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 183

Array Instrumentation Define Expression Levels


Array chips Threshold differences
Silicon-based, Oligos, 10,000+ genes
Custom or potted cDNA arrays, glass, 1,000-5,000+ genes
N-fold ratio change
PE glass beads Statistics (f, T, non-parametric)
Fiber optics T assumes normal distributions
Lots of other arrays Large data set requires Bonferroni correction
Too small (p = .05, 10,000 genes, yields p/tests=
.05/10,000 = .000005)

Use novel techniques


Use known-data

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 181 Georges Grinstein, UMass Lowell and AnVil Inc. 184

30
Compute Cancer Classifier
Analyze and mine 38 Training Patients

Define similarity and other metrics


Training Data Set
Use Euclidean or other cost functions
Reduce dimensionality (sigh)

Use external information (databases, 17

structures, domain experts) 9

Predict or classify or
Cluster 27

Identify associations
21
Delineate outliers Legend:
20 [Class]
ALL
AML

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 185 Georges Grinstein, UMass Lowell and AnVil Inc. 188

In Summary Cancer Classifier

34 Test Patients
Test Data Set
Normal

Tissue
Group 1
Samples

Group 2

Analysis and Massive Data


Sets
Visualization Legend:
[Class]
ALL
AML

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 186 Georges Grinstein, UMass Lowell and AnVil Inc. 189

Golub & Slonim et al. Class Discovery & Identifying Gene Targets

Molecular Classification of Cancer: Class GA reduction to


Discovery and Class Prediction by Gene 3 genes
20
Expression Monitoring Golub & Slonim, et al., 2
(1999) SCIENCE 286 531-537. 1. KIAA0102 gene 27
2. IGB Immunoglobulin-
associated beta 21
Whitehead Institute/MIT Center for Genomic Research
(B29)
3. LEPR Leptin receptor 1
ALL - Acute Lymphoblastic Leukemia
AML - Acute Myeloid Leukemia Legend:
9 [Class]
B ALL
ALL
3 T ALL
Training 17
AML
Set

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 187 Georges Grinstein, UMass Lowell and AnVil Inc. 190

31
Picking Best Answers

NN SVM NB LR K

5 Highest & 5 Lowest GS Values

76 Principal Uncorrelated Genes 0 0 0


Cheminformatics
114 Principal Uncorrelated Genes 0 0 0 Specific Example:
35 Absent/Presented Genes 1
63 RadViz Selected Genes

39 User Selected Genes


Structural Predictive Toxicology
Zyxin Gene (only)

2 GA (Reduced 6817) Genes

0 1 2 3 +4 Number of Miss-classified Test Samples

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 191

PURS Cancer Classifier Gene Expression and Drug Toxicity

Training
Test Data
Data
Set
Set
Technical Objective
p< 0.10 Find marker genes informing on drug hepatotoxicity by animal dose
Visual 7 genes
and gender.
Verification
p < 0.05 of
6 genes
PURS Gene
p < 0.01 4 genes Clients Data
Selection 2 genes
p < 0.001
Affymetrix DNA microarray data
Dose and gender correlations to PK/TK and histopathology
Pilot study with experimental design problems and noisy data
Legend:
[Class]
ALL
AML

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 192 Georges Grinstein, UMass Lowell and AnVil Inc. 195

Testing the Classifier Example Predicative Toxicology


100,000 chemicals (records)
Virtaneva, Wright, Tanner et al. (2001) PNAS 98(3) 1124-
1129 76 PURS Selected Genes 280 data fields (variables)
Normal cytogenetics AML in 55% of cases 1 biological assay
38 Golub Training

34 Golub Testing

Trisomy 8(+8) most common numerical aberration 4 liver isozyme assays


Classification
25 Virtaneva

Apply Classifier to New Samples 275 chemical descriptors


9 AML
9 AML
Virtaneva
+8 AML 8(+8) WEKA Classification
Algorithms
Patients

Patients

Patients

14
7/9 correct Algorithms
166 Substructure Search Keys ISIS/Host
14/14
21/23
correct
correct
9 AML Normal cytogenetics
78%
100%
91.3% 109 Electro-topological State Indicators - MolConnZ

0 0 0/3 Logistic Regression


0 0 0/2 Neural Network
0 0 0/1 Support Vector Machine

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 193 Georges Grinstein, UMass Lowell and AnVil Inc. 196

32
Toxicity Distinction

Non-Toxic
Enriched
Before Genetic Algorithm Example Web Site
After Genetic Algorithm
Centre for Molecular and
Biomolecular Informatics
(CMBI)
test
University of Nijmegen, NL
Toxic Enriched Http://www.Kun.nl/

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 197

Toxicity Prediction CMBI Bioinformatics Services


Provide access to very fast database
searches
Smith & Waterman, FrameSearch and Profile searches
on the Bioccelerators, dedicated computers for high-speed,
sensitive searches
1. Identify new compounds
2. Compute descriptors and keys
Sequence Data Distribution Service
3. Plot in RadViz display For commercial users
Provides up-to-date sequence databases for local (in house)
access
test
may use this service for data transfer

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 198 Georges Grinstein, UMass Lowell and AnVil Inc. 201

Efficacy & Toxicity Programs

RadViz
General analysis packages

High Dose 1x F

Drug DoseGender
Phylogeny inference
Low Dose 12x F

Veh 12x F
(Hi/Lo) (F/M)
Database search
Marker Genes for Dose & Gender
GENE_ACC
U49513
FC(Hi/Lo)
2.6
FC(F/M) Description
0.9 Small inducible cytokine A9
Multiple alignment

AF045887 1.6 1.3 Angiotensinogen
High Dose 12x F U22516
X63023
1.6
1.5
1.5
1.0
Angiogenin
Cytochrome P450, steroid inducible 3a13 Linkage analysis
Low Dose 12x M U47737 1.5 1.3 thymic shared antigen-1 (TSA-1)

Modeling and 3D structure retrieval


U09010 1.4 0.6 Mannose binding lectin, liver (A)
X81580 1.3 1.2 Insulin-like growth factor binding protein 2
X04480 1.0 -1.2 Insulin-like growth factor 1

Gene Clusters D89613 -1.1 1.0 Cytokine inducible SH2-containing protein

Dose
X74938
X62940
M10062
-1.8
3.2
2.3
2.3
-2.6
-1.8
Hepatocyte nuclear factor 3 gamma (winged helix transcription factor)
Transforming growth factor beta 1 induced transcript 4
Mouse IgE-binding factor mRNA
Online documentation
U24428 1.5 -2.0 mu-class glutathione s-transferase (mGSTM5)
Gender AF109906
Y08135
1.1
1.1
-1.4
-1.4
MHC class III region RD
ASM-like phosphodiesterase 3a
Y14004 1.1 -2.2 acyl-CoA thioesterase
X15963 -1.8 -1.3 Cytochrome C oxidase, subunit Va
X70764 -2.0 2.0 ELKL motif kinase
U31993 -2.3 3.2 Interleukin 17 receptor
Mihael Ankerst, The Boeing Company
X53451 -- Daniel
-2.8 A. Keim,1.4 AT&T and Univ.
Glutathione of Konstanz
S-transferase, pi 2 Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein,
AF109905 UMass-4.5Lowell and8.0 AnVil
majorInc.
histocompatibility locus class III regions Hsc70t 199 Georges Grinstein, UMass Lowell and AnVil Inc. 202
J02980 -4.8 1.6 Alkaline phosphatase 2, liver

33
Databases Database Search Packages
SRS (Sequence Retrieval System, Etzold)
Major sequence and related databases The CMBI SRS Server features one of the largest collections of
interlinked databases available for this system
Genome databases Bioccelerator software & hardware
(Compugen)
NCBI software, like BLAST and ENTREZ
The FASTA suite of programs (Pearson)
The ACeDB database system (Durbin &
Thierry-Mieg)

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 203 Georges Grinstein, UMass Lowell and AnVil Inc. 206

General Analyses Packages Alignment Algorithms


The GCG package (a.k.a. The Oxford Molecular (OML) Software
WISCONSIN package), version 10.1 (8.1 CAMELEON and IDITIS

still accessible) CLUSTALW (Higgins et al.)


EMBOSS version 1.0 T-Coffee (Notredame et al.)
EGCG version 8.1, extensions to the GCG Principal Coordinate Analysis programs
package (rice et al.) (Higgins)
The STADEN package (Staden), version TkDCSE
2000.0 a multiple alignment editor (De Rijk)

Many local extensions and interfaces


Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 204 Georges Grinstein, UMass Lowell and AnVil Inc. 207

Phylogeny Inference Packages Secondary Structure Prediction


PredictProtein (Rost & Sander)
The PHYLIP package for inferring GOR IV (Garnier et al.)
phylogenies (Felsenstein) DSSP (Kabsch & Sander)
The PAUP* package (Swofford)
Stride, Predator (Argos et al.)
The MOLPHY program suite (Hasegawa) DSC (King)
PUZZLE (Strimmer & von Haeseler) ASSP, Alscript (Barton)
SecCons (Leunissen et al.)

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 205 Georges Grinstein, UMass Lowell and AnVil Inc. 208

34
Linkage Analysis Online Documentation
CRI-MAP GCG Program Manual
constructs multilocus linkage maps (Green)
EGCG Program Manual
The MapMaker programs EXP and QTL
PHYLIP Manual
(Lincoln, Daly, and Lander)
Clustal W Manual
Fastlink
faster versions of the standard Linkage programs And more
parallel version also available (Schaffer)

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 209 Georges Grinstein, UMass Lowell and AnVil Inc. 212

Modeling and 3D Structure Major Sequence and


Retrieval Sequence-Related Databases
The WhatIf program (Vriend)
EMBL TREMBL
SYBYL
Tripos MM/MD program with Basic, BioPolymer, Advanced
EMBLNEW SBASE
Computation, Dynamics, Composer and QSAR/CoMFA Modules GENBANK PRODOM
MolScript SWISS-PROT KABAT
a program to generate publication-quality plots of protein SWISSNEW PROSITE
structures
PIR PRINTS
ProCheck OWL REBASE
to verify protein structures
PDB HOVERGEN
GENESEQ BLOCKS

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 210 Georges Grinstein, UMass Lowell and AnVil Inc. 213

Modeling and 3D Structure Retrieval Genome Databases


CSD Cambridge Crystallographic The Dutch mirror of the Human Genome
Database with programs Quest, Pluto, Database (GDB) from the HosPital of Sick
Gstat, and Vista Children in Toronto
PDB (Protein, PDBBrowse) The Dutch mirror of the Mouse Genome
X-ray protein structure data Database (MGD) from The Jackson
MMCC Results Laboratory
A browse and search service for the issues Vol. 3 to Vol. 6 (1994
- 1997) of the MMCC Results abstracts A mirror of the GeneCards system from
the Weizmann Institute of Science in Isral

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 211 Georges Grinstein, UMass Lowell and AnVil Inc. 214

35
Other Specific Databases Remote Tools
Arabidopsis thaliana (AAtDB) SeWeR
Sequence Analysis using Web Resources (CMBI mirror)
Caenorhabditis elegans (ACeDB)
Sequence elongation, expression and
Saccharomyces cerevisiae (AScDB)
function
Mycobacterium (MycDB) Compugen's Lab On Web

Literature search, alert, and document


delivery
TheScientificWorld

And more

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 215 Georges Grinstein, UMass Lowell and AnVil Inc. 218

Bioinformatics Services Cheminformatics Services


Sequence Retrieval Server (SRS 6) Synthesis Planning
Human Genome Database (GDB) Reaction and Structure Retrieval
Mouse Genome Database (MGD) 3-D Structure Retrieval
GeneCards Modeling
human genes, proteins and diseases
Computational Chemistry
Bioinformatics Toolkit
BLAST, Clustal, PHD,
Visualization Tools and Utilities

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 216 Georges Grinstein, UMass Lowell and AnVil Inc. 219

Local Tools Synthesis Planning


LHASA, an expert system to assist organic
Database searches chemists in the design of efficient routes to
BLAST (assembled human genome or completed bacterial
genomes), FASTA, PSI-BLAST (profile searching) target molecules
Multiple alignment
Clustal W, T-Coffee, "Roll your own" PAM matrix

Protein 2D structure prediction


PredictProtein

Restriction enzyme mapping


TACG

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 217 Georges Grinstein, UMass Lowell and AnVil Inc. 220

36
Reaction and Structure Retrieval Computational Chemistry
Beilstein CrossFire
chemical database system containing a very large database of
MOPAC
semi-empirical molecular orbital package
compounds and their properties, reactions, and citations
(including abstracts) solid state and molecular structures and reactions

ISIS DeFT
density functional deMon LCGTO-DFT-based program
chemical information management system containing a
compound database and several reaction databases GAMESS-UK
SYNLIB ab initio molecular electronic structure program
SCF- and MCSCF-gradient calculations with post Hartree-Fock
organic reaction retrieval system containing a current-literature calculations
database
QCLDB
Quantum Chemistry Literature Database (web access)

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 221 Georges Grinstein, UMass Lowell and AnVil Inc. 224

3-D Structure Retrieval Visualization Tools and Utilities

CSD MOLDEN
Cambridge Crystallographic Database with programs molecular and electronic structures visualizations
ConQuest, Quest, Pluto, and Vista.
X-ray and neutron diffraction studies on organic compounds, Draw_Plot
organometallic compounds and metal complexes PLUTON-based
PDB visualization of small molecules retrieved from 3D databases
or model-based
PDBBrowse, X-ray protein structure data
Molscript
ICSD publication-quality plots of protein structures
X-ray analyses of inorganic compounds
Searchable using CRYSTIN software from Hundt and RasMol
Sievers, UofBonn, or Hewat's web interface proteins and nucleic acids visualizations

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 222 Georges Grinstein, UMass Lowell and AnVil Inc. 225

Modeling Limited Access Tools


SYBYL Quanta/CHARMm
Tripos' MM/MD program with Basic, BioPolymer, Advanced from Molecular Simulations Inc.
Computation, Dynamics, Composer and QSAR/CoMFA Modules
Insight II/Discover
MacroModel from Molecular Simulations Inc. (formerly Biosym Technologies
Prof. Clark Still, Columbia University, NY. Inc.)
MM/MD program including Batchmin and Xcluster

CSECM
Crystal Symmetry Modeler

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 223 Georges Grinstein, UMass Lowell and AnVil Inc. 226

37
Local Tools
Cambridge Crystallographic Database (CSD)
Free, though limited (15 CPU minutes/month) access is available to
academics from continental Europe
Crystal Symmetry Modeler (CSECM)
Examine space groups of arbitrary dimension Molecular Mining
Manipulate and inspect Wyckoff positions, point group symmetry and
systematic extinctions
Visualize space groups or space group elements (thru VRML - Virtual http://www.molecularmining.com/GeneLinker/Gold/
Reality Modeling Language)
Quantum Chemistry Literature Data Base
MOLDEN) Visualization of molecular and electronic
structure

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 227

Overview Features
Part I: Visualization Techniques A set of analysis tools which handle a
1. Introduction
2. Visual Data Exploration Techniques variety of tasks, including
3. Distortion and Interaction Techniques
data filtering and transformation
4. Visual Data Mining Systems
sorting and clustering
Part II: Specific Visual Data Mining Techniques
1. Association Rules classification and prioritization of genes
2. Classification
3. Clustering A suite of visualization tools which
4. Text Mining
5. Tightly Integrated Visualization
provides easily understood presentation of
Part III: Drug Discovery Applications data mining results
1. Biology and Chemistry
2. Bioinformatics and Cheminformatics
3. Examples
4. Bioinformatics Packages
5. Cheminformatics Packages

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 228 Georges Grinstein, UMass Lowell and AnVil Inc. 231

Packages Described Normalization


Molecular Mining Iobion
GeneLinker GeneTraffic
Silicon Genetics Gene-IT
GeneSpring LASSAP
BioDiscovery Rosetta Biosciences
Genesight (Merck)
Resolver

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 229 Georges Grinstein, UMass Lowell and AnVil Inc. 232

38
Clustering Centroid and Clustering Plots

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 233 Georges Grinstein, UMass Lowell and AnVil Inc. 236

Gene-Linker

GeneSpring

http://www.sigenetics.com/

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 234

Matrix Tree Plot Features

Easy data loading


Straightforward normalization
Powerful analysis tools
Numerous visualization tools
Automated gene ontology construction
and gene annotation tools
Convenient data management

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 235 Georges Grinstein, UMass Lowell and AnVil Inc. 238

39
Physical Position Gene Comparison View

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 239 Georges Grinstein, UMass Lowell and AnVil Inc. 242

Hierarchical Clustering

GeneSight
http://www.biodiscovery.com/

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 240

Scatterplot Features
Dataset builder
Pre-processing of data
Unregulated gene finding
Clustering of genes
Statistical significance analysis
Time series analysis

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 241 Georges Grinstein, UMass Lowell and AnVil Inc. 244

40
2D Self Organizing Maps Hybridization Navigator

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 245 Georges Grinstein, UMass Lowell and AnVil Inc. 248

Labeled Extract Description

GeneTraffic

http://www.iobion.com/

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 249

GeneTraffic Scanned Image View


A microarray gene expression data mining
system which
Stores
Analyzes
Visualizes

Compliant with MIAME & MAML standard


Features normalization, clustering, PCA,
multidimensional scaling,

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 247 Georges Grinstein, UMass Lowell and AnVil Inc. 250

41
Gene Report Hybridization Normalization

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 251 Georges Grinstein, UMass Lowell and AnVil Inc. 254

Scatter Plot Spot View

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 252 Georges Grinstein, UMass Lowell and AnVil Inc. 255

M vs A Scatter Plot Gene View

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 253 Georges Grinstein, UMass Lowell and AnVil Inc. 256

42
Cluster View Multidimensional Scaling View

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 257 Georges Grinstein, UMass Lowell and AnVil Inc. 260

Manage Data Set K-means Clustering View

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 258 Georges Grinstein, UMass Lowell and AnVil Inc. 261

Hierarchical Clustering

LASSAP

Http://www.geneit.Com/lassap.html

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 259

43
Functionality Features
Multi-platform capability
Comparison of entire sequence databases
Simultaneous handling of sequence and annotation
Error-models and quality statistics
Sequence comparison algorithms Multi-dimensional analysis
Parallel execution of all algorithms on Analysis and data visualization
multiprocessor platforms
Efficient handling of large amounts of data without
size limitation
Results can be used as input to further analysis
Extensive result analysis capabilities

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 263 Georges Grinstein, UMass Lowell and AnVil Inc. 266

Features Clustering Analysis Tools


Sequence database management Analysis Algorithms
Agglomerative, Divisive, K-means, K-medians, Self-Organizing
High-throughput sequence comparison Maps
engine Custom algorithms integrated using the Rosetta Resolver SDK

Result analysis Analysis Visualization


Flat Tree Viewer, 2D Matrix or Heat Map, Hyperbolic Lens
Viewer, Table Viewer

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 264 Georges Grinstein, UMass Lowell and AnVil Inc. 267

Hyperbolic Lens Viewer

Rosetta BioSciences
http://www.rosettabio.com/products/resolver/

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 268

44
ROAST rm Pathway Viewer

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 269 Georges Grinstein, UMass Lowell and AnVil Inc. 272

Class Prediction Algorithms Bayesian


Analysis and Data Visualization Classifiers
Image viewer
Histogram viewer
Intensity plot viewer
Synchronization and broadcast
Trend viewer

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 270 Georges Grinstein, UMass Lowell and AnVil Inc. 273

Synchronization and Broadcast Pattern Growing Algorithm

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 271 Georges Grinstein, UMass Lowell and AnVil Inc. 274

45
Analysis of Variance (ANOVA)

HyperChem
At the Thomson Instrument Company

http://www.hplc1.com/hyperchem/
hc5_features.html

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 275

Overview HyperChem 5
Part I: Visualization Techniques Molecular modeling program
1. Introduction
2. Visual Data Exploration Techniques 3-D visualization and animation
3. Distortion and Interaction Techniques
4. Visual Data Mining Systems
Quantum chemical calculations
Part II: Specific Visual Data Mining Techniques
1. Association Rules
2. Classification
Customizable with user controls
3. Clustering
4. Text Mining
Interfaces easily to other programs
5. Tightly Integrated Visualization

Part III: Drug Discovery Applications


1. Biology and Chemistry
2. Bioinformatics and Cheminformatics
3. Examples
4. Bioinformatics Packages
5. Cheminformatics Packages

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 276 Georges Grinstein, UMass Lowell and AnVil Inc. 279

Packages Described HyperChem Capabilities


HyperChem 5 CCDC
Accelrys Elsevier
MDL
CACTVC Beilstein
ACD/HNMR Scitegic
Tripos BioReason
Meteor

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 277 Georges Grinstein, UMass Lowell and AnVil Inc. 280

46
Examples: Electrostatic Potential

Accelrys

http://www.accelrys.com/

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 281

Molecular Dynamics Accelrys


Desktop productivity tools
Developer tools
Core databases and clients
Chemistry databases
ADME/Tox tools
Chemistry workflow products

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 282 Georges Grinstein, UMass Lowell and AnVil Inc. 285

Examples: Display of results

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 283 Georges Grinstein, UMass Lowell and AnVil Inc. 286

47
DIVA Capabilities
Assemble data
Generate Different Views
Observe Patterns and Trends
Plot Graphs and Charts
Statistical Analyses
Generate custom Reports

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 287 Georges Grinstein, UMass Lowell and AnVil Inc. 290

DIVA

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 288 Georges Grinstein, UMass Lowell and AnVil Inc. 291

TOPKAT Accord Database Explorer

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 289 Georges Grinstein, UMass Lowell and AnVil Inc. 292

48
Accord ChemExplorer CACTVS Capabilities
2D structure editor with advanced features
2D structure browser
Spectra display
IR, MS, Cont. MS, NMR, EPR, UV/VIS,GC
for JCAMP files
XY data plotter
WWW Chemical MIME access tool
Chemical GIF image viewer and generator
Networked WWW substructure search tool
Networked NMR shift archive access tool

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 293 Georges Grinstein, UMass Lowell and AnVil Inc. 296

2D Structure Browser

CACTVS
http://www2.chemie.uni-erlangen.de/ software/cactvs/

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 297

CACTVS 2D Structure Editor


Distributed client/server system;
Uses a world-wide network of databases
with many attribute
Visual programming paradigm
Searches its network of databases to
retrieve information

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 295 Georges Grinstein, UMass Lowell and AnVil Inc. 298

49
Sharc/Shigo ab initio
NMR Shift Archives

ACD/HNMR
http://www.acdlabs.com/products/spec_lab/predict_nm
r/hnmr/

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 299

Infrared Spectra Display and Image


Viewer
ACD/HNMR
General Features & Prediction Power
Calculates accurate HNMR spectra under any basic frequency
Explains the predicted shift
Allows user to increase the accuracy of prediction using a Self
Training System
Uses 3D molecular structure minimization and Karplus
relationships to predict proton-proton coupling constants

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 300 Georges Grinstein, UMass Lowell and AnVil Inc. 303

ACD/HNMR - Examples

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 301 Georges Grinstein, UMass Lowell and AnVil Inc. 304

50
ACD/HNMR - Examples ACD/Structure Elucidator Example

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 305 Georges Grinstein, UMass Lowell and AnVil Inc. 308

ACD/HNMR - Examples ACD/Structure Elucidator Example

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 306 Georges Grinstein, UMass Lowell and AnVil Inc. 309

ACD/Structure Elucidator ACD/Structure Elucidator Example


Determine the chemical structure from
experimental spectral data for an unknown
compound
Uses spectral search, structure generation
and filtration by spectral data
Ultimately propose a small set of
structures which fully correspond to the
input spectral data

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 307 Georges Grinstein, UMass Lowell and AnVil Inc. 310

51
ACD/LC Simulator Gradient Optimization

Simulates elution order and retention


times for LC separation of organic
chemical mixtures
Uses experimentally measured retention
times for a set of compounds and
calculates predicting equation
This equation can be used to extrapolate
retention times for new compounds under
given experimental conditions
Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 311 Georges Grinstein, UMass Lowell and AnVil Inc. 314

Structure-based pH Robustness Prediction

Tripos
http://www.tripos.com/index.html

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 312

Temperature Optimization MM3 (Tripos)

Molecular dynamics
Normal Mode Vibrational Analysis
Stochastic conformational searches
Thermodynamic Property Calculations
Molecular mechanics calculations
IR Frequencies
Integrated with SYBYL Interface

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 313 Georges Grinstein, UMass Lowell and AnVil Inc. 316

52
Sybyl Integrated with SYBYL
Molecular design and analysis
Comprehensive computational toolkit
Molecular spreadsheet with tools
Conformational searching
Biopolymer modeling
Combinatorial Chemistry and Library design
QSAR and advanced CoMFA
Protein homology modeling and analysis

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 317 Georges Grinstein, UMass Lowell and AnVil Inc. 320

Conformational Analysis Phosphoramide Peptide Analog


Conformational analysis
intramolecular contacts and strain energies
distance constraints or constrain conformers based on results
from previous searches on similar molecules
Ring searching

Random searching for systems too large


to handle systematically
Torsion driver for energy profiling of
specific rotatable bonds
Molecular Spreadsheet for analysis
Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 318 Georges Grinstein, UMass Lowell and AnVil Inc. 321

Molecular Spreadsheet UNITY


2D and 3D database retrieval
Stores compound structure and associated
data
Searches and retrieves Oracle data
Enterprise-wide access to structures and
data
Management, post-processing and
analysis of search results

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 319 Georges Grinstein, UMass Lowell and AnVil Inc. 322

53
UNITY-3D Search

Meteor

http://www.chem.leeds.ac.uk/LUK/mete
or/

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 323

UNITY - Examples LHASA Meteor


Contains a wide range of both Phase I and
Phase II biotransformations
Justifies predictions with relevant literature
references
Visualization of metabolic sequences
Knowledge base editor for the development of
proprietary knowledge
Uses Log P to estimate the lipophilicity of
metabolites
External links to Derek for toxicity prediction

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 324 Georges Grinstein, UMass Lowell and AnVil Inc. 327

UNITY - Example METEOR Analysis

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 325 Georges Grinstein, UMass Lowell and AnVil Inc. 328

54
METEOR Example QUEST

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 329 Georges Grinstein, UMass Lowell and AnVil Inc. 332

VISTA

Cambridge Crystallographic Data


Centre

http://www.ccdc.cam.ac.uk/

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 333

The Cambridge Structural Database ISOSTAR

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 331 Georges Grinstein, UMass Lowell and AnVil Inc. 334

55
SuperStar Features

Obtain 3D structural information


Explore and sample conformations
Align compounds
Visualize 3D relationships
Facilitate communication

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 335 Georges Grinstein, UMass Lowell and AnVil Inc. 338

Examples: Expanded Applications SCULPT

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 336 Georges Grinstein, UMass Lowell and AnVil Inc. 339

Paste Align

Elsevier
Molecular Design, Ltd.

http://www.mdli.com/

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 340

56
Set

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 341 Georges Grinstein, UMass Lowell and AnVil Inc. 344

Elsevier
Beilstein CrossFire
http://www.beilstein.com/beilst_1.shtml
or
http://www.mimas.ac.uk/crossfire/overvie
w.html

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 345

CrossFire Beilstein

Scitegic

http://www.scitegic.com/

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 343

57
Pipeline Pilot Data Modeling
Streamlines
The analysis and
The management of data

generate a model of biological activity using the hits and failures from a high-throughput screening (HTS) run

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 347 Georges Grinstein, UMass Lowell and AnVil Inc. 350

Data Pipelining

LeadPharmer
and ClassPharmer

http://www.bioreason.com

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 348

Capabilities Packages
Avoids data integration
ClassPharmer
Learns from data Chemical family analyzer

Easy customization and extensibility LeadPharmer


Classifies compound data according to learned common
substructures
DataPharmer
Identifies and isolates duplicate compound IDs and
structures
Identifies data exceptions (metals, salts, solvents)
DrugPharmer
view analysis results

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 349 Georges Grinstein, UMass Lowell and AnVil Inc. 352

58
Classification Results Using
LeadPharmer ClassPharmer Suite

Automated high throughput screening


Identifies drug leads and how they leads
work.
Three-level analysis

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 353 Georges Grinstein, UMass Lowell and AnVil Inc. 356

Three-Level Analysis

Summary and
Conclusions

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 354

Underlying Technology Summary (1)


Self-organizing maps Lots of commercial and academic products
Genetic algorithms Key differentiating factors include
Automated production of GASP (genetic Usability
algorithm similarity program) models Number of tools
Flexibility
Decision trees Robustness
Rules
Sensitivity
To guide tree growth
Interoperability
To choose input sets
Knowledge-based systems Some have newer and more novel
To analyze tree results visualization techniques which are
Spreadsheet visualizations applicable to database exploration
Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 355 Georges Grinstein, UMass Lowell and AnVil Inc. 358

59
Summary (2)
Some have newer and more novel
visualization techniques which are
applicable to database exploration
Different techniques apply to different
types of data (relational tables,
hierarchies, graphs, etc.)
no guarantee of success
many of the techniques are applicable to traditional relational
information sources

Customization of tools is still necessary


Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 359

Research Issues
Develop integrated information visualization and
exploration systems
with techniques from statistics, machine learning, databases, ...

Perform in-depth evaluations and comparisons


of visualization techniques for database
exploration
there are possibilities for improvement

Use more dynamics & interaction to steer the


mining process
Perform more case studies in a variety of
application areas
Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 360

60
Bibliography
[ABKS 99] Ankerst M., Breunig M., Kriegel H.P., Sander J.:OPTICS: Ordering Points To Identify the Clustering Structure, Proc. ACM
SIGMOD99, Int. Conf on Management of Data, Philadelphia, PA, 1999, pp. 49-60.
[AC 91] Alpern B., Carter L.: Hyperbox, Proc. Visualization 91, San Diego, CA, 1991, pp. 133-139.
[ADLP 95] Anupam V., Dar S., Leibfried T., Petajan E.: DataSpace: 3-D Visualization of Large Databases, Proc.
Int. Symp. on Information Visualization, Atlanta, GA, 1995, pp. 82-88.
[AEK 00] Ankerst M., Ester M., Kriegel H.P.:Towards an Effective Cooperation of the Computer and the User for Classification, Proc.
ACM SIGKDD Int. Conf. On Knowledge Discovery & Data Mining (KDD 2000), Boston, MA, 2000, pp. 179-188.
[AKK 96] Ankerst M., Keim D.A., Kriegel H.P.: Circle Segments: A Technique for Visually Exploring Large
Multidimensional Data Sets, Visualization 96, Hot Topic Session, San Francisco, CA, 1996.
[And 72] Andrews D. F.: Plots of High-Dimensional Data, in: Biometrics, Vol. 29, 1972, pp. 125-136.
[And 95] Andrews K.: Visualizing Cyberspace: Information Visualization in the Harmony Internet Browser, Proc.
Int. Symp. on Information Visualization, Atlanta, GA, 1995, pp. 97-104.
[AS 94] Ahlberg C., Shneiderman B.: Visual Information Seeking: Tight Coupling of Dynamic Query Filters with
Starfield Displays, Proc. Human Factors in Computing Systems CHI 94 Conf., Boston, MA, 1994, pp. 313-317.
[AS 95] Ayers E. Z., Stasko J. T.: Using Graphic History in Browsing the World Wide Web, Proc. 4th Int. World
Wide Web Conf., Boston, MA, 1995.
[Asi 85] Asimov D.: The Grand Tour: A Tool For Viewing Multidimensional Data, SIAM Journal of Science &
Stat. Comp., Vol. 6, 1985, pp. 128-143.
[AW 95a] Ahlberg C., Wistrand E.: IVEE: An Environment for Automatic Creation of Dynamic Queries
Applications, Proc. Human Factors in Computing Systems CHI 95 Conf., Demo Program, Denver, CO, 1995.
[AW 95b] Ahlberg C., Wistrand E.: IVEE: An Information Visualization and Exploration Environment, Proc. Int.
Symp. on Information Visualization, Atlanta, GA, 1995, pp. 66-73.

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 361

Bibliography
[BCS 96] Becker R. A., Cleveland W. S., Shyu M.-J.: The Visual Design and Control of Trellis Display, Journal of Computational and
Graphical Statistics, Vol. 5, No. 2, 1996, pp. 123-155.

[BSC 96] Buja A., Swayne D. F., Cook D.: Interactive High-Dimensional Data Visualization, Journal of
Computational and Graphical Statistics, Vol. 5, No. 1, 1996, pp. 78-99.
[BCW 88] Becker R., Chambers J. M., Wilks A. R.: The New S Language, Wadsworth & Brooks/Cole Advanced
Books and Software, Pacific Grove, CA, 1988.
[BE 96] Ball T., Eick S.: Software Visualization in the Large, Computer, Vol. 29, No. 4, 1996, pp. 33-43.
[Bed 90] Beddow J.: Shape Coding of Multidimensional Data on a Mircocomputer Display, Visualization 90,
San Francisco, CA, 1990, pp. 238-246.
[Bed 94] Bederson B.: Pad++: Advances in Multiscale Interfaces, Proc. Human Factors in Computing Systems
CHI 94 Conf., Boston, MA, 1994, p. 315.
[BEMW 90] Becker R. A., Eick S. G., Miller E. O., Wilks A. R.: Dynamic Graphics for Network Visualization, Proc.
Visualization 90, San Francisco, CA, 1990, pp. 93-95.
[Ben 95] Benford S., Snowdon D., Greenhaigh C., Ingram R., Knox I., Brown C.: Vr-vibe: A Virtual Environment
for Co-operative Information Retrieval, Proc. Eurographics 95, Maastricht, The Netherlands, 1995, pp. 349-360.
[Ber 81] Bertin J.: Graphics and Graphic Information Processing, Berlin, 1981.
[BETT 94] Battista G. D., Eades P., Tamassia R., Tollis I.: Annotated Bibliography on Graph Drawing Algorithms,
Computational Geometry: Theory and Applications, Vol. 4, 1994, pp. 235-282.
[BEW 95] Becker R. A., Eick S. G., Wilks A. R.: Visualizing Network Data, Transactions on Visualization and
Computer Graphics, Vol. 1, No. 1, 1995, pp. 16-28.
[BF 93] Beshers C., Feiner S.: AutoVisual: Rule-Based Design of Interactive Multivariate Visualizations, IEEE
Computer Graphics and Applications, Vol. 13, No. 4, 1993, pp. 41-49.

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 362

1
Bibliography
[BFN 86] Boecker H.-D., Fischer G., Nieper H.: The Enhancement of Unterstanding through Visual
Representations, Proc. Human Factors in Computing Systems CHI 86 Conf., Boston, MA, 1986, pp. 44-58.

[BH 86] Bryce D., Hull R.: SNAP: A Graphics Based Schema Manager, Proc. 2nd Int. Conf. on Data
Engineering, Los Angeles, CA, 1986, pp. 151-164.
[BH 91] Brown M. H., Hershberger J.: Color and Sound in Algorithm Animation, Proc. IEEE Symp. on Visual
Languages, 1991, pp. 10-17.
[BH 94] Bederson B. B., Hollan J. D.: Pad++: A Zooming Graphical Interface for Exploring Alternate Interface
Physics, Proc. UIST, 1994, pp. 17-26.
[Bie 93] Bier E. A., Stone M. C., Pier K., Buxton W., DeRose T.: Toolglass and Magic Lenses: The see-through
Interface, Proc. SIGGRAPH 93, Anaheim, CA, 1993, pp. 73-80.
[BMMS 91] Buja A., McDonald J. A., Michalak J., Stuetzle W.: Interactive Data Visualization Using Focusing and
Linking, Visualization 91, San Diego, CA, 1991, pp. 156-163.
[BN 93] Brown M. H., Najork M. A.: Algorithm Animation Using 3D Interactive Graphics, Proc. UIST, 1993, pp. 93-100.
[Bro 91] Brown M. H.: Zeus: A System for Algorithm Animation and Multi-View Editing, Proc. IEEE Symp. on
Visual Languages, 1991, pp. 4-9.
[Car 96] Card S. K.: Visualizing Retrieved Information: A Survey, IEEE CG&A, 1996, pp. 63-67.
[CE 95] Cox K. C., Eick S. G.: 3D Displays of Internet Traffic, Proc. Symp. on Information Visualization,
Atlanta, GA, 1995, pp. 129-131.
[CELR 94] Cohen R. F., Eades P., Lin T., Ruskey F.: Three-dimensional Graph Drawing, Proc. Graph Drawing,
1994, in: Lecture Notes in Computer Science, Vol. 894, Springer, 1994, pp. 1-11.
[Che 73] Chernoff H.: 'The Use of Faces to Represent Points in k-Dimensional Space Graphically, Journal
Amer. Statistical Association, Vol. 68, pp. 361-368.

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 363

Bibliography
[CK 95] Carrire J., Kazman R.: Interacting with Huge Hierarchies: Beyond Cone Trees, Proc. Symp. on
Information Visualization, Atlanta, GA, 1995, pp. 90-96.
[Cle 93] Cleveland W. S.: Visualizing Data, AT&T Bell Laboratories, Murray Hill, NJ, Hobart Press, Summit NJ, 1993.
[Cle 94] Cleveland W. S.: The Elements of Graphing Data, Hobart Press, Summit, NJ, revised edition.
[CM 93] Consens M. P., Mendelzon A. O.: Hy+: A Hygraph-based Query and Visualization System, Proc. ACM
SIGMOD Int. Conf. on Management of Data, Washington, DC, 1993, pp. 511-516.
[Con 94] Consens M. P., Eigler F. Ch., Hasan M. Z., Mendelzon A. O., Noik E. G., Ryman A. G., Vista D.:
Architecture and Applications of the Hy+ Visualization System, IBM Systems Journal, Vol. 33, No. 3,
1994, pp. 458-476.
[CPS 95] Chan M. C., Pacifici G., Stadler R.: Managing Real-Time Services in Multimedia Networks Using
Dynamic Visualization and high-level Controls, Proc. Multimedia 95, San Francisco, CA, 1995.
[CR 92] Cox K. C., Roman G.-C.: Abstraction in Algorithm Animation, Proc. Symp. on Visual Languages,
1992, pp. 18-23.
[CRMK 95] Chuah M. C., Roth S. F., Mattis J., Kolojejchick J.: SDM: Malleable Information Graphics, Proc. Symp.
on Information Visualization 95, Atlanta, GA, 1995, pp. 36-42.
[CRY 96] Card S. K., Robertson G. G., York W.: The WebBook and the WebForager: An Information Workspace
for the World Wide Web, Proc. Human Factors in Computing Systems CHI 96 Conf., 1996.
[CT 94] Cruz I. F., Tamassia R.: How to Visualize a Graph: Specification and Algorithms, 1994.
[DBM 01] http://www.dbminer.com
[DE 82] Dunn G., Everitt B.: An Introduction to Mathematical Taxonomy, Cambridge University Press,
Cambridge, MA, 1982.
[DH 95] Drew N. S., Hendley R. J.: Visualizing Complex Interacting Systems, Proc. Human Factors in
Computing Systems CHI95 Conf., Denver, CO, 1995, pp. 204-205.

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 364

2
Bibliography
[Doe 94] Doemel P.: Webmap - A Graphical Hypertext Navigation Tool, Proc. 2nd Int. World Wide Web Conf.,
Chicago, IL, 1994.
[Eic 94] Eick S. G.: Data Visualization Sliders, Proc. ACM UIST, 1994, pp. 119-120.
[Eic 96] Eick S. G.: Aspects of Network Visualization, Proc. IEEE CG&A, 1996, pp. 69-72.
[ES 92] Eick S. G., Steffen J. L.: Visualizing Code Profiling Line Oriented Statistics, Proc. Visualization 92,
Boston, MA, 1992, pp. 210-217.
[ESS 92] Eick S. G., Steffen J. L., Sumner E. E.: SeeSoft - A Tool for Visualizing Software, IEEE Transactions on
Software Engineering, Vol. 18, No. 11, 1992, pp. 957-968.
[EW 93] Eick S. G., Wills G. J.: Navigating Large Networks with Hierarchies, Visualization 93, San Jose, CA,
1993, pp. 204-210.
[FB 90a] Feiner S., Beshers C.: Visualizing n-Dimensional Virtual Worlds with n-Vision, Computer Graphics, Vol.
24, No. 2, 1990, pp. 37-38.
[FB 90b] Feiner S., Beshers C.: World within World: Metaphors for Exploring n-dimensional Virtual Worlds, Proc.
UIST, 1990, pp. 76-83.
[FB 94] Furnas G. W., Buja A.: Prosections Views: Dimensional Inference through Sections and Projections,
Journal of Computational and Graphical Statistics, Vol. 3, No. 4, 1994, pp. 323-353.
[FB 95] Furnas G. W., Bederson B. B.: Space-Scale Diagrams: Unterstanding Multiscale Interfaces, Proc. Human
Factors in Computing Systems CHI 95 Conf., Denver, CO, 1995.
[FL 95] Faloutsos C., Lin K.: Fastmap: A fast Algorithm for Indexing, Data-Mining and Visualization of Traditional
and Multimedia Datasets, Proc. ACM SIGMOD Int. Conf. on Management of Data, San Jose, CA, 1995,
pp. 163-174.
[FPF 88] Fairchild K., Poltrok S., Furnas G.: SemNet: Three Dimensional Graphic Representations of Large
Knowledge Bases, Lawrence Erlbaum (ed.), 1988, pp. 201-233.

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 365

Bibliography
[FS 95] Fishkin K., Stone M. C.: Enhanced Dynamic Queries via Movable Filters, Proc. Human Factors in
Computing Systems CHI 95 Conf., Denver, CO, 1995, pp. 415-420.
[Fur 86] Furnas G.: Generalized Fisheye Views, Proc. Human Factors in Computing Systems CHI86 Conf.,
Boston, MA, 1986, pp. 18-23.
[FW 94] Frolich M., Werner M.: Demonstration of the Interactive Graph Visualization System da vinci, Proc.
Graph Drawing, 1994, in: Lecture Notes in Computer Science, Vol. 894, Springer, 1994, pp. 266-269.
[FZ 94] Furnas G. W., Zacks J.: Multitrees: Enriching and Reusing Hierarchical Structures, Proc. Human Factors
in Computing Systems CHI 94 Conf., Boston, MA, 1994, pp. 330-336.
[Ger 95] Gershon N., LeVasseur J., Winstead J., Croall J., Pernick A., Ruh W.: Visualizing Internet Resources,
Proc. Symp on Information Visualization 95, Atlanta, GA, 1995, pp. 122-128.
[Ger 96] Gershon N.: Moving Happily Through the World Wide Web, Proc. IEEE CG&A, 1996, pp. 72-74.
[GKNV 93] Gansner E. R., Koutsofois E., North S. C., Vo K.-P.: A Technique for Drawing Directed Graphs, IEEE
Transactions on Software Engineering, Vol. 19, No. 3, 1993, pp. 214-229.
[GPW 89] Grinstein G, Pickett R., Williams M. G.: EXVIS: An Exploratory Visualization Environment, Proc.
Graphics Interface 89, London, Ontario, Canada, 1989.
[GR 94] Goldstein J., Roth S. F.: Using Aggregation and Dynamic Queries for Exploring Large Data Sets, Proc.
Human Factors in Computing Systems CHI 94 Conf., Boston, MA, 1994, pp. 23-29.
[GT 96] Garg A., Tamassia R.: Giotto3d: A System for Visualizing Hierarchical Structures in 3d, Proc. Graph
Drawing, 96, in: Lecture Notes in Computer Science, Springer, 1996.
[Har 67] Harman H. H.: Modern Factor Analysis, University of Chicago Press, 1967.
[HC 86] J.D.A. Henderson and S.K. Card: Rooms: The Use of Multiple Virtual Workspaces to Reduce Space
Contention in Window-based Graphical User Interfaces, ACM Trans. on Graphics, Vol. 5, No. 3, July
1986, pp. 211-241.

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 366

3
Bibliography
[HDWB 95] Hendley R. J., Drew N. S., Wood A. M., Beale R.: Narcissus: Visualizing Information, Proc. Int. Symp.
on Information Visualization, Atlanta, GA, 1995, pp. 90-94.
[HDD+ 01] Hao M., Hsu M ., Dayal U., Wei S.F., Sprenger T., Holenstein T.:Market Basket Analysis Visualization on a
Spherical Surface, http://www.hpl.hp.com/techreports/2001/HPL-2001-3.pdf
[Hea 95] Hearst M.: TileBars: Visualization of Term Distribution Information in Full Text Information Access,
Proc. of ACM Human Factors in Computing Systems Conf. (CHI'95), 1995, pp. 59-66.
[Him 94] Himsolt M.: Graphed: A Graphical Platform for the Implementation of Graph Algorithms, Proc. Graph
Drawing 94, in: Lecture Notes in Computer Science, Vol. 894, Springer, pp. 182-193.
[HKW 99] Hinneburg A., Keim D.A., Wawryniuk M.: HD-Eye: Visual Mining of High-Dimensional Data, IEEE Computer
Graphics and Applications, Vol. 19, No. 5, 1999.
[HSW 00] Hofmann H., Siebes A., Wilhelm A.:Visualizing Association Rules with Interactive Mosaic Plots, Proc. ACM
SIGKDD Int. Conf. On Knowledge Discovery & Data Mining (KDD 2000), Boston, MA, 2000.
[Hub 85] Huber P. J.: Projection Pursuit, The Annals of Statistics, Vol. 13, No. 2, 1985, pp. 435-474.
[Hum 01] HumanIT Web Page: http://www.humanit.de/infozoom, Feb. 2001.
[IA 00] Inselberg A., Avidan T.:Classification and Visualization for High-Dimensional Data, Proc. ACM SIGKDD Int.
Conf. On Knowledge Discovery & Data Mining (KDD 2000), Boston, MA, 2000.
[Ins 85] Inselberg A.: The Plane with Parallel Coordinates, Special Issue on Computational Geometry, The
Visual Computer, Vol. 1, 1985, pp. 69-97.
[ID 90] Inselberg A., Dimsdale B.: Parallel Coordinates: A Tool for Visualizing Multi-Dimensional Geometry,
Visualization 90, San Francisco, CA, 1990, pp. 361-370.
[JJ 94] Jeron T., Jard C.: 3D Layout of Reachability Graphics of Communicating Processes, Proc. Graph
Drawing 94, in: Lecture Notes in Computer Science, Vol. 894, 1994, pp. 25-32.
[Joh 93] Johnson B.: Visualizing Hierarchical and Categorical Data, Ph.D. Thesis, Department of Computer
Science, University of Maryland, 1993.

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 367

Bibliography
[JS 91] Johnson B., Shneiderman B.: Treemaps: A Space-filling Approach to the Visualization of Hierarchical
Information, Proc. Visualization 91 Conf., 1991, pp. 284-291.
[JS 95] Jerding D. F., Stasko J. T.: The Information Mural: A Technique for Displaying and Navigating Large
Information Spaces, Proc. Symp. on Information Visualization 95, Atlanta, GA, 1995, pp. 43-50.
[Kei 94] D. A. Keim: Visual Support for Query Specification and Data Mining, Shaker-Verlag, 1994, ISBN 3-
8265-0594-8.
[Kei 95] D. A. Keim: Enhancing the Visual Clustering of Query-dependent Database Visualization Techniques
using Screen-Filling Curves, Proc. Workshop on Database Issues for Data Visualization, Atlanta, GA,
1995, Springer, in: Lecture Notes in Computer Science, No. 1183, Springer, pp. 101-110.
[Kei 95] D. A. Keim: Enhancing the Visual Clustering of Query-dependent Database Visualization Techniques
using Screen-Filling Curves, Proc. Workshop on Database Issues for Data Visualization, Atlanta, GA,
1995, Springer, in: Lecture Notes in Computer Science, No. 1183, Springer, pp. 101-110.
[Kei 96] Keim D. A.: Pixel-oriented Visualization Techniques for Exploring Very Large Databases, Journal of
Computational and Graphical Statistics, Vol. 5, No. 1, 1996, pp. 58-77.
[KK 94] Keim D. A., Kriegel H.-P.: VisDB: Database Exploration using Multidimensional Visualization, Computer
Graphics & Applications, Sept. 1994, pp. 40-49.
[KK 95] Keim D. A., Kriegel H.-P.: VisDB: A System for Visualizing Large Databases, System Demonstration,
Proc. ACM SIGMOD Int. Conf. on Management of Data, San Jose, CA, 1995, p. 482.
[KKA 95] Keim D. A., Kriegel H.-P., Ankerst M.: Recursive Pattern: A Technique for Visualizing Very Large
Amounts of Data, Proc. Visualization 95, Atlanta, GA, 1995, pp. 279-286.
[KKS 93] D. A. Keim, H.-P. Kriegel, T. Seidl: Visual Feedback in Querying Large Databases, Proc. Visualization
93, San Jose, CA, 1993, pp. 158-165.
[KKS 94] Keim D. A., Kriegel H.-P., Seidl T.: Supporting Data Mining of Large Databases by Visual Feedback
Queries, Proc. 10th Int. Conf. on Data Engineering, Houston, TX, 1994, pp. 302-313.

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 368

4
Bibliography
[KK 96] Keim D. A., Kriegel H.-P.: Visualization Techniques for Mining Large Databases: A Comparison,
Transactions on Knowledge and Data Engineering, Vol. 8, No. 6, Dec. 1996, pp. 923-938.
[KL 93] Keim D. A., Lum V.: GRADI: A Graphical Database Interface for a Multimedia DBMS, Proc. Int. Workshop
on Interfaces to Databases, Glasgow, England, 1992, in: Workshops in Computing, Springer, 1993,
pp. 95-112.
[KRB 94] Kaugars K., Reinfelds J., Brazma A.: A Simple Algorithm for Drawing Large Graphs on Small Screens,
Proc. Graph Drawing 94, in: Lecture Notes in Computer Science, Vol. 894, Springer, 1994, pp. 278-281.
[LA 94] Leung Y., Apperley M.: A Review and Taxonomy of Distortion-oriented Presentation Techniques, Proc.
Human Factors in Computing Systems CHI 94 Conf., Boston, MA, 1994, pp. 126-160.
[Lev 91] Levkowitz H.: Color icons: Merging color and texture perception for integrated visualization of multiple
parameters, In Visualization 91, San Diego, CA, October 22-25 1991.
[Liv 97] M. Livny, R. Ramakrishnan, K. Beyer, G. Chen, D. Donjerkovic, S. Lawande, J. Myllymaki, and K. Wenger:
DEVise: Integrated Querying and Visual Exploration of Large Datasets, Proc. of ACM SIGMOD, May, 1997.
[LO 96] Lee H.-Y., Ong K.-L.:Visualization Support for Data Mining, IEEE Expert, Vol. 11, I. 5, pp.69-75, 1996.
[LR 94] Lamping J., Rao R.: Laying out and Visualizing Large Trees Using a Hyperbolic Space, Proc. UIST, 1994,
pp. 13-14.
[LRP 95] Lamping J., Rao R., Pirolli P.: A Focus + Context Technique Based on Hyperbolic Geometry for
Visualizing Large Hierarchies, Proc. Human Factors in Computing Systems CHI 95 Conf., Denver, CO,
1995, pp. 401-408.
[LSR 96] Lamm S. E., Scullin W. H., Reed D. A.: Real-time Geographic Visualization of World Wide Web Traffic,
Proc. WWW, 1996.
[LWW 90] LeBlanc J., Ward M. O., Wittels N.: Exploring N-Dimensional Databases, Visualization 90, San
Francisco, CA, 1990, pp. 230-239.

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 369

Bibliography
[MB 95] Munzner T., Burchard P.: Visualizing the Structure of the World Wide Web in 3D Hyperbolic Space, Proc.
VRML 95 Symp, San Diego, CA, 1995, pp. 33-38.
[MF 95] Mukherjea S., Foley J. D.: Showing the Context of Nodes in the World-Wide Web, Proc. ACM SIGCHI,
Denver, CO, 1995.
[MF 95] Mukherjea S., Foley J. D.: Visualizing Complex Hypermedia Networks through Multiple Hierarchical
Views, Proc. ACM SIGCHI, Denver, CO, 1995.
[MF 95] Mukherjea S., Foley J. D.: Visualizing the World-Wide Web with the Navigational View Builder, Proc. 3rd
Int. World Wide Web Conf., Darmstadt, Germany, 1995.
[MGTS 90] Mihalisin T., Gawlinski E., Timlin J., Schwendler J.: Visualizing A Scalar Field on an N-dimensional
Lattice, Visualization 90, San Francisco, CA, 1990, pp. 255-262.
[MHCF 96] Munzner T., Hoffman E., Claffy K., Fenner B.: Visualizing the Global Topology of the mbone, Proc.
Symp. on Information Visualization, 1996.
[Min 01] http://www.sgi.com/software/mineset/
[MMIK 95] Masui T., Minakuchi M., IV G. R., Kashiwagi K.: Multiple-View Approach for Smooth Information
Retrieval, Proc. UIST, 1995, pp. 199-206.
[MRC 91] Mackinlay J. D., Robertson G. G., Card S. K.: The Perspective Wall: Detail and Context Smoothly
Integrated, Proc. Human Factors in Computing Systems CHI 91 Conf., New Orleans, LA, 1991,
pp. 173-179.
[MSG 94] McCreary C., Shieh F.-S., Gill H.: Cg: A Graph Drawing System Using Graph-Grammar Parsing,
Proc. Graph Drawing 94, in: Lecture Notes in Computer Science, Vol. 894, Springer, 1994, pp. 270-273.
[MW 95] Martin A. R., Ward M. O.: High Dimensional Brushing for Interactive Exploration of Multivariate Data,
Visualization 95, Atlanta, GA, 1995, pp. 271-278.

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 370

5
Bibliography
[Mye 94] Myers B. A.: Visual Programming, Programming by Example and Program Visualization: A
Taxonomy, Proc. Human Factors in Computing Systems CHI 94 Conf., Boston, MA, 1994, pp. 59-66.
[NK 94] North S. C., Koutsofois E.: Applications of Graph Visualization, Proc. Graphics Interface, 1994, pp. 235-245.
[Noi 93] Noik E. G.: Layout-independent Fisheye Views of Nested Graphs, Proc. Symp. on Visual Languages,
1993, pp. 336-341.
[Noi 94] Noik E. G.: Encoding Presentation Emphasis Algorithms for Graphs, Proc. Graph Drawing 94, in:
Lecture Notes in Computer Science, Vol. 894, Springer, 1994, pp. 428-235.
[Noi 94] Noik E. G.: A Space of Presentation Emphasis Techniques for Visualizing Graphs, Proc. Graphics
Interface, 1994, pp. 225-233.
[Ols 93] Olsen K.A. et al.: Visualization of a Document Collection: The VIBE System, Information Processing
and Management, Vol. 29, No. 1, 1993, pp. 69-81.
[OW 93] Ozoyoglu G., Wang H.: Example-Based Graphical Database Query Languages, Computer, Vol. 26,
No. 5, 1993, pp. 25-38.

[PB 94] Pitkow J., Bharat K.: Webviz: A Tool for World-Wide Web Access Log Visualization, Proc. 1st Int. World
Wide Web Conf., Geneva, Switzerland, 1994.
[PF 93] Perlin K., Fox D.: Pad: An Alternative Approach to the Computer Interface, Proc. SIGGRAPH, Anaheim,
CA, 1993, pp. 57-64.
[PG 88] Pickett R. M., Grinstein G. G.: Iconographic Displays for Visualizing Multidimensional Data, Proc. IEEE
Conf. on Systems, Man and Cybernetics, IEEE Press, Piscataway, NJ, 1988, pp. 514-519.
[Pic 70] Pickett R. M.: Visual Analyses of Texture in the Detection and Recognition of Objects, in: Picture
Processing and Psycho-Pictorics, Lipkin B. S., Rosenfeld A. (eds.), Academic Press, New York, 1970.

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 371

Bibliography
[Rao 92] Rao R., Card S. K., Jellinek H. D., Mackinlay J. D., Robertson G. G.: The Information Grid: A Framework
for Information Retrieval and Retrieval-centered Applications, Proc. UIST, 1992, pp. 23-32.
[RC 94] Rao R., Card S. K.: The Table Lens: Merging Graphical and Symbolic Representation in an Interactive
Focus+Context Visualization for Tabular Information, Proc. Human Factors in Computing Systems CHI
94 Conf., Boston, MA, 1994, pp. 318-322.
[RCM 93] Robertson G. G., Card S. K., Mackinlay J. D.: Information Visualization Using 3D Interactive Animation,
Communications of the ACM, Vol. 36, No. 4, 1993, pp. 57-71.
[Reg 93] Regan B.: Two Algorithms for Drawing Trees in Three Dimensions, Proc. Graph Drawing, 1993.
[Rei 93] Reiss S. P.: A Framework for Abstract 3D Visualization, Proc. Symp. on Visual Languages, 1993,
pp. 108-115.
[Rei 94] Reiss S. P.: 3-D Visualization of Program Information, Proc. Graph Drawing, 1994, in: Lecture Notes in
Computer Science, Vol. 894, Springer, 1994, pp. 12-24.
[Ren 94] Rennison E.: Galaxy of News: An Approach to Visualizing and Understanding Expansive News
Landscapes, Proc. UIST, 1994, pp. 3-12.
[RG 93] Rekimoto J., Green M.: The Information Cube: Using Transparency in 3d Information Visualization, Proc.
3rd Annual Workshop on Information Technologies & Systems (WITS 93), 1993, pp. 125-132.
[RKJ+ 99] Ribarsky W., Katz J., Jiang F., Holland A.: Discovery Visualization using Fast Clustering, IEEE Computer
Graphics and Applications, Vol. 19, No. 5, 1999.
[RM 93] Robertson G. G., Mackinlay J. D.: The Document Lens, Proc. UIST, 1993, pp. 101-108.
[RMC 91] Robertson G. G., Mackinlay J. D., Card S. K.: Cone Trees: Animated 3D Visualizations of Hierarchical
Information, Proc. Human Factors in Computing Systems CHI 91 Conf., New Orleans, LA, 1991,
pp. 189-194.
[RSE 99] Rohrer R.M., Sibert J.L., Ebert D.S.:Shape-based Visual Interface for Text Retrieval, IEEE Computer Graphics
and Applications, Vol. 19, No. 5, 1999.

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 372

6
Bibliography
[SA 82] Spence R., Apperley M.: Data Base Navigation: An Office Environment for the Professional, Behaviour and Information
Technology, Vol. 1, No. 1, pp. 43-54.
[SAS 01] http://www.sas.com
[San 94] Sander G.: Graph Layout through the VCG Tool, Proc. Graph Drawing, 1994, in: Lecture Notes in
Computer Science, Springer, Vol. 894, Springer, 1994, pp. 194-205.
[SB 94] Sarkar M., Brown M.: Graphical Fisheye Views, Communications of the ACM, Vol. 37, No. 12, 1994,
pp. 73-84.
[SCB 92] Swayne D. F., Cook D., Buja A.: Users Manual for XGobi: A Dynamic Graphics Program for Data
Analysis, Bellcore Technical Memorandum, 1992.
[Sch 93] Schaffer, Doug, Zuo, Zhengping, Bartram, Lyn, Dill, John, Dubs, Shelli, Greenberg, Saul, Roseman, Mark:
Comparing Fisheye and Full-Zoom Techniques for Navigation of Hierarchically Clustered Networks, Proc. Graphics
Interface (GI 93), Toronto, Ontario, 1993, in: Canadian Information Processing Soc., Toronto, Ontario, Graphics Press,
Cheshire, CT, 1993, pp. 87-96.
[SDTS 95] Su H., Dawkes H., Tweedie L., Spence R.: An Interactive Visualization Tool for Tolerance Design,
Technical Report, Imperial College, London, 1995.
[SFB 94] Stone M. C., Fishkin K., Bier E. A.: The Movable Filter as a User Interface Tool, Proc. Human Factors in
Computing Systems CHI 94 Conf., Boston, MA, 1994, pp. 306-312.
[SGB 00] Sprenger T.C., Brunella R., Gross M.H.:Hierarchical Visual Clustering Method using Implicit Surfaces, ETH Zurich, Tech report,
http://graphics.ethz.ch, 2000
[Shn 92] Shneiderman B.: Tree Visualization with Treemaps: A 2D Space-Filling Approach, ACM Transactions on
Graphics, Vol. 11, No. 1, pp. 92-99, 1992.
[SM 94] Sugiyama K., Misue K.: A Simple and Unified Method for Drawing Graphs: Magnetic-spring Algorithm, Proc.
Graph Drawing, 1994, in: Lecture Notes in Computer Science, Vol. 894, Springer, 1994, pp. 364-375.

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 373

Bibliography
[SP 92] Stasko, John T. and Patterson, Charles, "Understanding and Characterizing Software Visualization
Systems", Proc. of the 1992 IEEE Workshop on Visual Languages, Seattle, WA, September 1992, pp. 3-10.
[Spo 93] Spoerri A.: InfoCrystal: A Visual Tool for Information Retrieval, Visualization 93, San Jose, CA, 1993,
pp. 150-157.
[SPS+ 01]http://www.spss.com
[STA 01] http://www.pnl.gov/nsd/commercial/starlight, 2001
[STDS 95] Spence R., Tweedie L., Dawkes H., Su H.: Visualization for Functional Design, Proc. Int. Symp. on
Information Visualization (InfoVis 95), Atlanta, GA, 1995, pp.4-10.
[SRN 72] Shepard R. N., Romney A. K., Nerlove S. B.: Multidimensional Scaling, Seminar Press, New York, 1972.
[SW 93] Stasko J. T., Wehrli J. F.: Three-dimensional Computation Visualization, Proc. Symp. on Visual
Languages, Bergen, Norway, 1993, pp. 100-107.
[TD 95] Tal A., Dobkin D.: Visualization of Geometric Algorithms, Transactions on Visualization and Computer
Graphics, Vol. 1, No. 2, 1995, pp. 194-204.
[Tuf 83] Tufte E. R.: The Visual Display of Quantitative Information, Graphics Press, Cheshire, CT, 1983.
[Tuf 90] Tufte E. R.: Envisioning Information, Graphics Press, Cheshire, CT, 1990.
[Vas 94] Vasudevan V.: Supporting High Bandwidth Navigation in Object-Bases, Proc. 10th Int. Conf. on Data
Engineering, Houston, TX, 1994, pp. 294-301.
[Vel 92] Velleman P. F: Data Desk 4.2: Data Description, Ithaca, NY, 1992.
[VXI 02] http://www.cs.sandia.gov/projects/VxInsight.html, 2001
[War 94] Ward M. O.: XmdvTool: Integrating Multiple Methods for Visualizing Multivariate Data, Visualization94,
Washington, DC, 1994, pp. 326-336.

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 374

7
Bibliography
[WCF+ 00]Wong P., Cowley W., Foote H., Jurrus E., Thomas J.:Visualizing Sequential Patterns for Text Mining,
InfoVis00.
[Wis 95] Wise J. A., Thomas J. J., Pennock K., Lantrip D., Pottier M., Schur A., Crow V.: Visualizing the Non-
Visual: Spatial Analysis and Interaction with Information from Text Documents, Proc. Symp. on
Information Visualization, Atlanta, GA, 1995, pp. 51-58.
[WB 95] Wong P.C., Bergeron R. D.: 30 Years of Multidimensional Multivariate Visualization, Proc. Workshop on
Scientific Visualization, IEEE Computer Society Press, 1995.
[WDBH 95] Wood A., Drew N., Beale R., Hendley B.: Hyperspace: Web Browsing with Visualisation, Proc. 3rd Int.
World-Wide Web Conf., Poster, 1995, Darmstadt, Germany.
[WL 93] van Wijk J. J., van Liere R.. D.: Hyperslice, Proc. Visualization 93, San Jose, CA, 1993, pp. 119-125.
[Wri 95] Wright W.: Information Animation Applications in the Capital Markets, Proc. Int. Symp. on Information
Visualization, Atlanta, GA, 1995, pp. 19-25.
[WUT 95] Wilhelm A., Unwin A.R., Theus M.: Software for Interactive Statistical Graphics - A Review, Proc. Int.
Softstat 95 Conf., Heidelberg, Germany, 1995.
[WWT 99]Wong P., Whitney P., Thomas J.:Visualizing Association Rules for Text Mining, InfoVis99.
[XGobi] XGobi Web-Page including pointers to publications and the most recent release of the XGobi software:
http://www.research.att.com/~andreas/xgobi/.
[Yan 00] Yang L.:Interactive Exploration of Very Large Relational Data Sets Through 3D Dynamic Projections, Proc.
ACM SIGKDD Int. Conf. On Knowledge Discovery & Data Mining (KDD 2000), Boston, MA, 2000.
[YS 93] Young D. and Shneiderman B.: A graphical filter/flow model for boolean queries: An implementation and
experiment, Journal of the American Society for Information Science Vol. 44,No. 6, 1993, pp. 327-339.
[Zlo 77] Zloof M. M.: Query-By-Example: A Data Base Language, IBM Systems Journal, Vol. 4, 1977,
pp. 324-343.

Mihael Ankerst, The Boeing Company -- Daniel A. Keim, AT&T and Univ. of Konstanz
Georges Grinstein, UMass Lowell and AnVil Inc. 375

You might also like