You are on page 1of 4

IEEE TRANSACTIONS ON JOURNAL NAME, MANUSCRIPT ID

Enhanced Visual Clustering by Reordering of Dimensions in Parallel Coordinates


K. AMEUR, N.BENBLIDIA, S.OUKID-KHOUAS
Abstract The high dimensional dataset presents a serious challenge of visualization techniques such as Parallel Coordinates. The order and arrangement of dimensions in Parallel Coordinates has a major impact on the user analysis task. Therefor we need to find an expressive and effective order that helps the user to explore and analyze visual display of data mining results. This problem is the key motivation of our work. In this paper, we extended the concept of relative entropy measure like distance measure between dimensions. After the application of the proposed measure on datasets, the obtained results demonstrate that this measure is able to reorder dimensions, while the set of clusters are reorganized to help the user to detect where the clusters' behavior are simillar and different. Moreover, it shows clearly the user the most imporatant dimensions thar can be used to analyze datamining results using Parallel Coordinates. Index TermsParallel Coordinates, Relative entropy, visual data mining; multidimensional data.

1. INTRODUCTION
Correlation, Spearman's Correlation for continuous variable or Pearson's chi square, 2-way interaction I(A;B)H(A;B) , 3-way interaction I(A;B)H(A;B) for discrete variable. After computing similarity matrix, the next step is to define the best arrangement of dimensions there are two ways:[3] 1. Use optimal algorithm to compute every possible permutation of dimensions based on neighboring matrix after finding an optimal order. However, this problem is consider as NP-Hard problem that which means when the number of dimensions augments the complexity to find the best arrangement augments or becomes impossible. 2. Use heuristic or meta-heuristic algorithm like Ant Colony, genetic, neuronal network and the others. In this paper [1], the author treats the oneDimensional Arrangement Problem (linear and circular case) and Two- Dimensional Arrangement Problem that based on General and Partial Similarity measures. He demonstrates that this problem is considered as NP-Hard problem and uses Ant Colony system to find optimal order of dimensions. To reorder dimensions, [3] proposes interactive hierarchical dimension ordering DOSFA based on dimension hierarchieswhich are derived from similarities among dimensions that decreases the complexity problem. Generally, the arrangement of dimensions has major impact on visual data mining process and presents the key motivation Khadidja .AMEUR is PhD Student with Laboratory for the Development for our proposed measure. In the following section, we of Computer System, Faculty of Sciences- Saad Dahlab University, Blida will show more details of our contribution. Algeria. E-mail: ameur_khadidja@hotmail.fr
Nadjia .BENBLIDIA is with Laboratory for the Development of Computer System, Faculty of Sciences- Saad Dahlab University, Blida Algeria. Email: benblidia@yahoo.com Saliha OUKID-KHOUAS is with Laboratory for the Development of Computer System, Faculty of Sciences- Saad Dahlab University, Blida Algeria. E-mail: osalyha@yahoo.com
xxxx-xxxx/0x/$xx.00 200x IEEE

he imporatant goal of visual Data Exploration is to allow the user to get an overview of the data, draw conclusions, and interact directly with the data using visualization techniques. These techniques provide much higher degree of confidence in the visual exploration of data mining results. This fact leads to a high demand of visualization techniques and makes them indispensable in each Data mining Process. To make effective techniques of visual data miming, it is necessary to present dimensions in best order that helps user in his analysis tasks. The order and arrangement of dimensions in Parallel Coordinates has a major influence on visual data mining process. So that we need to find an expressive and an effective order that helps the user to explore and analyze visual display of data. This problem is considered as NP-Hard problem [1]. To solve this problem, we firstly need to define the similarity measures (distance measures) to rearrange the dimensions that are similar. In the literature, there are few works that treat the problem of order and arrangement of dimensions in parallel coordinate directly such as [1] [3]. Some works treat the problem of ordering dimensions in order to reduce visual clutter in multi-dimensional data visualization [4] [2] [5]. Selecting the most appropriate similarity measure (distance measure) ie related with the application filed and data characteristic like Euclidian Meseurs, Pearson's

2. MATERIALS AND METHODS


The goals of our work provide an effective order of dimensions for presentation, exploration, validation,
Published by the IEEE Computer Society

IEEE TRANSACTIONS ON XXXXXXXXXXXXXXXXXXXX, VOL. #, NO. #, MMMMMMMM 2013

TABLE 1
TEST DATA SET

TABLE 2
NEIGHBORS MATRIX OF DIMENSIONS

X 0 0 0 0 1 1 1 1

Y W Z U CLASS 2 2 0 1 2 2 2 3 1 2 1 2 0 2 1 1 2 3 3 3 2 1 0 1 2 1 1 3 2 1 1 1 0 2 1 2 1 3 3 3

X Y W Z U

X 0.0 -2.07 0.0 -1.0 -5.88

Y 2.07 0.0 2.07 1.47 -1.68

W 0 -2.07 0.0 -1.0 -5.88

Z 0.75 -1.32 0.75 0.0 -5.13

U 7.5 3.63 7.5 4.5 0.0

refinement the data mining (clustering, classification) results using the Parallel Coordinates visualization technique and helping the user to extract more knowledge. To reach our aim, we reorder dimensions focusing on the quantity of information that separate clearly the set of clusters. Based on Kullback Leibler distance, we propose our measure that's used in reordering dimensions. It is a degree of relativity between dimensions in class (clusters) distributions. And after computing the dictance matrix, we reorder dimensions from low to high degree. 2.1 Definition of Relative Entropy[6] The relative entropy or Kullback Leibler distance between two probability mass functions p(x) and q(x) is defined as: (| =

Fig.1. Graph representation of dimensions of Test dataset. = 0 |() = | ( ) cxy = 0 |( )| cxy log ( ) > 0 | > | | ( )| < 0 | < | 1 (| =0, is the case that X and Y have the same distribution of class that mean X and Y have the same behavior. In the case D1 > 0, X becomes before Y in the aim of dimensions reordering according to the degree of separation of class; we can say X is more general than. Else X becomes after Y (D1<0). In the following example (Table 1), we use our proposed measure and we show the results of application using graph representation of neighbors' matrix (Table 2) to find Hamilton path. We see clearly in the neighbors matrix, the distance between X and W is equal 0 that means the two dimensions have the same distrubition of class between values and there is no diffrence if we order X befor W or W befor X. 2.3 Process of dimensions reordering In the first step, we based on proposed relative entropy meseurs to compute neighbors' dimensions matrix. The next step is to select the best order that helps us to present clearly clustering results, to detect the sub clusters if exists or to detect where the clusters are similar and where are different. In this step, we use graph representation of problem and select the optimal path that contains all dimensions (if it is possible) that means we search for the Hamilton path. In Figure1, we detect the optimal path (red line) form W to final distination U. The new order of dimensions is W->X->Z>Y->U. Figure 2 shows the result of dimesions reordering. And after application of proposed measure (fig2.b), It clearly shows the effect of this order in class

log

(1)

The relative entropy is always non-negative and is zero if and only if p = q, not symmetric and does not satisfy the triangle inequality. 2.2 Proposed Relative Entropy Based on the definition of Relative Entropy (Eq.1), the following measure is proposed to evaluate dimensions aims to order the dimensions in sequential. We take into consideration the distribution of class in each value of dimensions X and Y. 1 ( | =
y Y (cxy )

log (

(2)

With: = | |/| | : is the probability of elements of cluster (c) of value x; = | |/|| : is the probability of elements of cluster (c) of value y; (cxy ) It is the injected probability in this measure. cxy = | |/|| ; This measure is defined in R, related on

AUTHOR: TITLE

(a) (clusters) visualization and how reorganized the clusters in the way the user can detect the similar and differents behavior of clusters. The figure 2.b shows the effectiveness of this measure to reorder the dimensions according to the degree of separation of clusters since most genearle dimension to the more specific.

(b)

Fig. 2. Display of Test dataset using Parallel Coordinate before and after using proposed reordering measure

4. CONCLUSION AND FUTURE WORK:


In this paper, we proposed new measure that's used to reorder dimensions for visualizing the clustering and classification results. In overall our Experimental results show clearly the impact of this measure in visual display of clustering results and that can improve visual exploratory analysis to be more semantic and meaningful for the user by reordering dimensions in an effective and expressive order. The proposed measure detects where the set of cluster are similar and different by reordering dimensions based on degree of separation of clusters. In our future work, we will try to apply our proposed measure in other visualization techniques such as RadViz, and Circle Segments to stand more data mining techniques.

3. EXPERIMENTAL EVALUATION:
To evaluate our proposed measure, we use 3 data sets form litterature [7] and the figures 3 to 5 which provide an example of application of our proposed measure. These figures show the influence of this measure in visualization of clusters using Parallel Coordinates and demonstrate the effect of dimensions reordering in visual analysis process. Figure 3 presents the result of using the Parallel Coordinate technique for visualizing Letter Recognition data set with default order (Figure3.a) and the result of reordering using proposed measure (Figure 3.b) that contains 7 Classes and 16 attributes. It clearly shows the effect of this measure on the arrangement of dimensions and how reorganized clusters help the user to detect where the clusters have the similar and different behavior. After the application of the proposed measure on the TIC data set that has 10 dimensions and 2 clusters, Figure 4.b shows the effect of this measure to detect the set of dimensions that clearly separate clusters. Even for small number of dimensions in the case of Iris data (4 dimensions, Figure 5) our proposed measure leads to reorder dimensions in a way we can detect the difference between dimensions based on the class distribution of the values of each dimension. Considering, for example in Figure 5.b, dimensions 1 and 2, where we can easily see that the clusters are superposed on each other what can explain that the clusters are distributed between each values of dimensions. As a conclusion, we can not use dimension 1 and 2 to separate clusters like the dimension 3 and 4 that can be well separated clusters.

REFERENCES
[1] ANKERST M., BERCHTOLD S., KEIM D. A.: Similarity clustering of dimensions for an enhanced visualization of multidimensional data. In Proc. of IEEE Symp. on Information Visualization (1998), pp. 5260. [2] THEISEL H.: Higher order parallel coordinates. In Proc. of the Conference on Vision Modeling and Visualization (2000), pp. 415420. [3] YANG J., PENG W., WARD M. O., RUNDENSTEINER E. A.: Interactive hierarchical dimension ordering, spacing and filtering for exploration of high dimensional datasets. In Proc. of IEEE Symp. on Information Visualization (2003), pp. 105112. [4] PENG W., WARD M. O., RUNDENSTEINER E. A.: Clutter reduction in multi-dimensional data visualization using dimension reordering. In Proc. of IEEE Symp. on Information Visualization (2004), pp. 8996. [5] Hong Z, Xiaoru Y., Huamin Q., Weiwei C., Baoquan C.:Visual Clustering in Parallel Coordinates. IEEE-VGTC Symposium on Visualization 2008. A. Vilanova, A. Telea, G. Scheuermann, and T. Mller (Guest Editors). Volume 27 (2008), Number

[6] Thomas M. Cover, Joy A. Thomas, Elements of Information Theory, Copyright 1991 John Wiley & Sons, Inc., Print ISBN 0-471-06259-6 Online ISBN 0-471-20061-1 [7] UC Irvine Machine Learning Repository availabel at http://archive.ics.uci.edu/ml/datasets/

IEEE TRANSACTIONS ON XXXXXXXXXXXXXXXXXXXX, VOL. #, NO. #, MMMMMMMM 2013

(a)

(b)

Fig3. Display of Letter Recognition dataset using Parallel Coordinate before and after using proposed reordering measure

(a) (b) Fig4. Display of TIC dataset using Parallel Coordinate before and after using proposed reordering measure

(a) (b) Fig5. Display of Iris data set using Parallel Coordinate before and after using proposed reordering measure

You might also like