You are on page 1of 29

Tecniche di Apprendimento Automatico per

Applicazioni di Data Mining

Visualization Techniques
in Data Mining
Prof. Pier Luca Lanzi
Laurea in Ingegneria Informatica
Politecnico di Milano
Polo di Milano Leonardo

Outline

Goals of visualization
Advantages
Methodologies
Techniques
User interaction
Problems

Pier Luca Lanzi

Goals of Data Visualization


Today there is the need to manage a huge

amount of data, and computer systems help


us in this task
Visual Data Mining help to deal with this
flood of information, integrating the human
in the data analysis process
Visual Data Mining allows the user to gain
insight into the data, drawing conclusions
and directly interacting with the data

Pier Luca Lanzi

Advantages of visualization
techniques
The main advantages of the application of Visual
data mining techniques are:
Visual data exploration can easily deal with very large,
highly non homogeneous and noisy amount of data

Visual data exploration requires no understanding of


complex mathematical or statistical algorithms

Visualization techniques provide a qualitative overview


useful for further quantitative analysis

Pier Luca Lanzi

Approach methodologies
Presentation:

starting point: facts to be presented are fixed a priori


result: high-quality visualization of the data presenting the facts
Confirmative Analysis:

starting point: hypotheses about the data


result: visualization of the data allowing confirmation or rejection of
the hypotheses

Explorative Analysis:

starting point: data without hypotheses


result: visualization of the data, which can provide hypotheses
about data distribution

Pier Luca Lanzi

Visualization techniques
Geometric techniques: scatterplots matrices, Hyperslice,
parallel coordinates
Pixel-oriented techniques: simple line-by-line, spiral and
circle segments
Hierarchical techniques: Treemap, cone trees
Graph-based techniques: 2D and 3D graph
Distortion techniques: hyperbolic tree, fisheye view,
perspective wall
User interaction: brushing, linking, dynamic projections and
rotations, dynamic queries

Pier Luca Lanzi

Geometric techniques
Basic idea:

Visualization of geometric transformations and


projections of the data

Methods:

Scatterplot matrices
Hyperslice
Parallel coordinates
Pier Luca Lanzi

Scatterplot matrices
A scatterplot matrix

is composed of
scatter plots of all
possible pairs of
variables in a dataset

Assuming a

N-dimension dataset,
there are (N2-N)/2
pairs of two
dimension plots

Pier Luca Lanzi

Hyperslice
HyperSlice is an

extension of the
scatterplot matrix

They represent a

multi-dimensional
function as a
matrix of
orthogonal
two-dimensional
slices

Pier Luca Lanzi

Parallel Coordinates
The axes are defined as parallel
vertical lines separated

A point in Cartesian coordinates


correspond to a polyline in
parallel coordinates

Able to visualize data that may be

occluded in Cartesian coordinates

Pier Luca Lanzi

Pixel-oriented techniques
Basic idea:
The basic idea of pixel-oriented techniques is to map each

data value to a colored pixel


Each attribute value is represented by a pixel with a color
tone proportional to a relevance factor in a separate
window

Methods:

Simple Arrangement Line-by-Line


Spiral and Circle Segments Techniques

Pier Luca Lanzi

Pixel-oriented techniques
Simple arrangement line-by-line

Pier Luca Lanzi

Pixel-oriented techniques
Spiral

Circle segments

Pier Luca Lanzi

Hierarchical techniques
Basic idea:
Visualization of the data using a hierarchical
partitioning into two- or three-dimensional
subspaces

Methods:

Treemap
Cone trees
Pier Luca Lanzi

Treemap

Visualization of hierarchical collections of quantitative data as files


on a hard drive, financial analysis, bioinformatics, etc..

Divide a limited screen space display area into a sequence of

rectangles whose areas correspond to an attribute of data set

http://www.smartmoney.com/marketmap/

Pier Luca Lanzi

Cone trees
3-dimensional extension of the more familiar
2-D hierarchical tree structures, to a more
intuitive navigation and display of information

Pier Luca Lanzi

Graph-based visualization
Graphs (edges + nodes) with labels and
attributes
Used where emphasis is on data relationship
(databases, telecom)
Coordinates not always meaningful
Useful for discovering patterns

Pier Luca Lanzi

Graph-based visualization
Color and thickness code values
Asymmetric relations:

Pier Luca Lanzi

Graph-based visualization
E-mail (SeeNet)

Pier Luca Lanzi

Graph-based visualization
3D graphs:
more room for objects
different points of view

Example (hypertexts Narcissus):

Pier Luca Lanzi

Focus vs. context


Too much data in too small screens
Solutions:
dual views (detailed + global)
distorted view (e.g. fisheye view)

Pier Luca Lanzi

Distortion
Hyperbolic tree

Fisheye view

Perspective wall

Pier Luca Lanzi

User interaction
Brushing: selecting points or regions
Linking: more views work together

Pier Luca Lanzi

User interaction
Dynamic projections and rotations
Interactively and continuously moving through
subspaces

Dynamic queries
Visual interface (button and sliders)
Incremental behavior (undo)

Pier Luca Lanzi

Problems
Missing attributes
Ignore
Fill blanks with:
a predefined constant
a value extracted according to the inferred
distribution

Assess the effect of interpolated values

Pier Luca Lanzi

Problems
Large data sets
Typical screens have one million pixels
Subsampling
Voxel/pixel bins
Jittering

Large number of attributes


Principal component analysis
Factor analysis
Etc.
Pier Luca Lanzi

Conclusions
Human and computer skills can be integrated
with visual data mining
Visualization may be useful for:
understanding what is happening
searching novel patterns

User interaction is paramount in these

Pier Luca Lanzi

References (I)

D. A. Keim. Visual Techniques for Exploring Databases. Int.


Conference on Knowledge Discovery in Databases, 1997.
D. A. Keim. Information visualization and visual data mining. IEEE
Trans. on Visualization and Computer Graphics, jan 2002, vol. 8, no. 1,
pp. 1-8
J. Van Wijk, R. Van Liere. HyperSlice - Visualization of scalar
functions of many variables. IEEE Visualization, 1993, pp.119-125.
P. C. Wong, A. H. Crabb, R. D. Bergeron. Dual multiresolution
HyperSlice for multivariate data visualization. InfoVis 1996
D. A. Keim. Pixel-oriented Database Visualizations. SIGMOD
RECORD, Special Issue on Information Visualization, 1996.
M. Ankerst, D. A. Keim, H.-P. Kriegel. Circle Segments: A Technique
for Visually Exploring Large Multidimensional Data Sets. Visualization
'96, 1996.
B. B. Bederson, B. Shneiderman, M. Wattenberg. Ordered and
Quantum Treemaps: Making Effective Use of 2D Space to Display
Hierarchies. ACM Transactions on Graphics, 2002, pp. 833-854.
Pier Luca Lanzi

References (II)

R. A. Becker, S. G. Eick, A. R. Wilks. Visualizing Network Data.


IEEE Trans. on Visualization and Computer Graphics, mar 1995, vol. 1,
no. 1, pp. 16-28
R. J. Hendley, N. S. Drew, A. M. Wood, R. Beale. Narcissus:
visualising information. InfoVis 1995, p. 90
T. A. Keahey, E. L. Robertson (1996). Techniques for non-linear
magnification transformations. InfoVis 1996
J. Lamping, R. Rao, P. Pirolli. A focus+context technique based on
hyperbolic geometry for visualizing large hierarchies. CHI '95, pp.
401-408
J. D. Mackinlay, G. G. Robertson, S. K. Card. The perspective wall:
detail and context smoothly integrated. CHI '91, pp. 173-176

Pier Luca Lanzi

You might also like