You are on page 1of 16

Information visualization based on self-organized maps (SOM)

ERNESTO GUTIERREZ, Universidad de las Americas Puebla


Self-organized maps (SOM) are a well-known neural network model, stable and with a wide spread of applications including clustering of data and information visualization. Two important characteristics are useful when visualizing a selforganized map: relations between the entities and topology preservation of the data. Many efforts have been made to present results obtained from a dataset being clustered with SOM. These include showing and presenting classes, relations and clusters within the map. After all, the main objective of any visualization technique is to provide insight to the user into the collection to help him/her understand it. We present information visualization techniques based on self-organized maps from a Human-Computer Interaction (HCI) perspective. We discuss the advantages, open problems and future directions of these techniques. Categories and subject descriptors: H.3.3 Information Search and Retrieval, H.5.2 User Interfaces Graphical User Interfaces (GUI), I.3.6 COMPUTER GRAPHICS - Methodology and Techniques Interaction Techniques General Terms: Visualization Additional Key Words and Phrases: Self-organized maps, information visualization,

1. INTRODUCTION

Handling and visualizing big collections is one of the main challenges for Human-Computer Interaction (HCI), Computer Graphics, Visual Design and Psychology. This work focuses on the HCI perspective. One of the main objectives for information visualization is to provide insight according to (Card, Mackinlay and Schneiderman 1999). Even though, information visualization requires complex computing processes, algorithms and sophisticated design techniques the ultimate purpose should be to provide to the user a manner to understand the data being presented. There are four largely distinctive processes through which users gain insight while using an information visualization system (Yi et al 2008): 1) Provide Overview, 2) Adjust, 3) Detect Pattern, 4) Match Mental Model. Provide Overview is the process through which the user understands globally the collections being examined. An important underlying concept here is denoted by (Chang et al 2004) as collection understanding that means to have a general idea of the whole collection by visualizing the entities that constitutes it from a wide perspective, without having previous knowledge of the contents of the collection. Under this overview approach the user starts a learning process in which discovers and explores the collection. Adjust is the process through which the user filters the data being presented. Collections may have information not interesting for the user. By applying filters or selecting ranges in data being presented the user gains a better insight. Detect Pattern means that visualization facilitates the discovery of trends, distributions, frequencies or structure of the collection. Match Mental Model refers to the cognitive process by which user understand the data presented by the visualization. Visualization techniques should provide a mental model easy to manage by user such that does not represent a high cognitive load. This said, this work present visualization techniques of big collections based on selforganized maps. They are reviewed from the HCI perspective, the characteristics previously defined and other inherent SOM characteristics.
1.1 Visualizing Information using SOM vs. Visualizing SOM

Self-organized maps have been utilized to visualize multidimensional datasets as well as for clustering and viewing relationships between elements in those datasets. However, there are few approaches that tackle collection understanding and HCI aspects such as usability, interactivity and the previous defined concept of insight. Throughout this survey two different approaches are discussed: 1) Visualizing Information using SOM and 2) Visualizing SOM. The former takes into account HCI aspect while the latter only tries to visualize SOM characteristics. Nevertheless, second approach is always necessary to generate first one. In this survey, we first review the techniques used to visualize SOM and then we review techniques that take advantages of SOM characteristics to visualize information.

2. SELF-ORGANIZED MAPS (SOM) 2.1 Neural Network Model

A self-organized map (SOM), or Kohonen map, is a neural network that competes by means of mutual lateral interaction (Kohonen 1990). A SOM consist of neurons organized in a lowdimensional grid (typically two dimensions) defining the output layer. Each neuron, in the output layer, is represented by an n-dimensional weight vector (a.k.a. prototype vector, codebook vector). The input layer is a vector of the same n-dimensionality that represents each entity in the collection through a succession of iterations over it. The main difference between a self-organized network and a conventional one is that correct output cannot be defined a priori, therefore a SOM utilizes an unsupervised learning algorithm. This algorithm classifies the collection and presents the in a grid 2-dimensional while preserving the topology of the original n-dimensional dataset. The 2-dimensional grid obtained from the algorithm is the used for visualization through several techniques described next.
2.2 Visualizing SOM Techniques

U-Matrix It is the unified distance matrix, the classic visualization method, which shows the distances between the neurons codified by a scheme of colors (Ultsch and Siemon 1990). Darker colors refer to bigger distances while lighter refer to closer distances that conform clusters. In figure 1, it is shown this method. This method originally is a gray scale to depict distances between neurons, but can also be codified using RGB schema. The objective of this visualization is identified clusters.

Figure 1. U-Matrix Visualization P-Matrix While U-matrix work fine for well-separated clusters, it has problems to identify clusters that overlap. P-Matrix visualization is based on density measured at the prototype vectors. The PMatrix (Ultsch 2003) displays the local density measured with the Pareto Density Estimation (PDE). Figure 2 shows the P-Matrix used for the same dataset used in Figure 1. The objective of this visualization is identified clusters.

Figure 2. P-Matrix Visualization U*-Matrix The U*-Matrix is a combination of distance (U-Matrix) and density (P-Matrix) based visualizations (Ultsch 2004). In figure 3, the U*-Matrix is depicted to show the same dataset that Figure 1 and Figure 2 shows. The objective of this visualization is identified clusters.

Figure 3. U*-Matrix Visualization Smoothed Data Histograms The objective of Smoothed Data Histograms is to visualize the clusters through estimation of the probability density of the high dimensional data (Pampalk, Rauber and Merkl 2002). This is achieved by counting a number of most likely positions for each sample. The visualization obtained with this method is a landscape with island and mountains in densely occupied regions and oceans in between. In Figure 4, it is shown Smoothed Data Histogram Visualization. The objective of this visualization is identified clusters.

Figure 4. Smoothed Data Histogram visualization Hit Histogram This visualization shows the number of hits (items mapped in each neuron) codified either by size or color or both. In addition, a label with the number of hits can be displayed in each neuron to clarified small difference not noticeable by the human eye. In Figure 5, a SOM Hit Histogram visualization is shown, where number of hits are codified by size and also is added a label with the number. This visualization is useful to identify the structure and tendencies of data.

Figure 5. Hit Histogram Visualization Neighborhood Graph This is another density-based visualization like P-Matrix, U*Matrix and Smoothed Data Histograms visualization. This method defines graphs resulting from calculation of distances between neurons (nearest neighbor and radius-based) (Poelzbauer, Rauber and Dittenbach 2005). The addition of a graph-based approach provides a visualization that shows relations between neurons. Figure 6, shows this graph-based visualization. This visualization is useful to understand relations between the items of the map.

Figure 6. Neighborhood Graph Visualization Vector Fields This method can display a flow diagram of vectors either pointing to the center of most likely cluster or pointing in a way that emphasize cluster boundaries. This visualization method is designed for users with a high level of abstraction for vectors (like engineers). A careful analysis of vectors leads to identify clustering structure, correlations and dependencies of data. Figure 7, shows and example of this visualization where vectors are pointing to the center of the cluster.

Figure 7. Vector Field Visualization Sky Metaphor Visualization It is a visualization that represents each neuron not in the center of the map units but shifts them towards the closest neighbors (Latif and Mayer 2007). The purpose of this visualization is to reveal more details about the relations between the elements that are mapped onto the same unit. In figure 8, it is shown an example of this visualization technique, which is useful in discovering underlying relations between the elements of the dataset.

Figure 8. Sky Metaphor Visualization Metro Map This visualization method helps to identify the influence of single variables on clustering. It uses the metro-map metaphor where each line categorizes a variable (Neumayer et al 2007). This allows seeing different components on one single plot. Figure 9 shows a Metro-Map visualization where it is possible to observe that high correlation of variables tend to form a cluster, therefore if we match this visualization with another focused on clusters we can obtain which variables are determinant to form clusters.

Figure 9. Metro-Map Visualization Class Visualization The Class Visualization technique helps to discover distribution and arrangement of classes over the map. With this visualization user has a better understanding and thus a better analysis over the data being presented in the map. In Figure 10, it is shown how Class Map visualization smoothly colors a SOM according to the distribution and location of the given class labels (Mayer, Aziz and Rauber 2007). If any manual label is available it helps to assess and compare manual vs. automatic labeling. In order to achieve this visualization a Voronoi diagram is constructed over the SOM a graph algorithm is applied to establish the boundaries and then each Voronoi region is colored according to the class or classes that has mapped.

Figure 10. Class Visualization

2.3 SOM Visualization techniques from an HCI perspective

While visualizations listed above were developed to help users to understand the data being presented as well as to discover information underlying in self-organized maps most of them requires high level of abstraction (high cognitive load due to a complex mental model), a profound understanding of what a self-organized map is and what characteristics are being displayed or highlighted through the visual elements utilized, and in some cases (like in Vector Fields) to have a good engineering knowledge to be able to take full advantage of them. It is interesting to note that none of the above visualizations were evaluated with users. Perhaps, these visualization techniques were developed to help understand very complex datasets from a scientific perspective. Following the HCI perspective we remark some interesting characteristics that could be exploited to help designers to construct visualizations based on self-organized maps. Table 1. Characteristics of different SOM visualization techniques Visualization\Best Characteristic Clustering U,U*,P Matrices Hit Histogram Smoothed Data Histogram Neighborhood Graph Vector Field Sky Metaphor Metro-Map Class Visualization X X X X X X X X X X X Relation discovery Structure understanding

As we may observe in Table 1, each visualization method was developed to highlight some of main characteristics of SOM. For example the U, U* and P matrices visualizations helps to discover clusters within the SOM, however their visual implementation is not as clear as Smoothed Data Histogram to show same clusters. Class Visualization vs. Vector Field gives another example of clarity, both visualizations try to show clustering and to provide a better understanding of the structure of the dataset, yet Class Visualization is much easier to understand due to the engineering background needed for Vector Field visualization. This said, is would be necessary to evaluate visualization techniques in order to select not only the better characteristics but also the easiest to understand by users.

In table 2 we show our perspective according to the difficulty or easiness that visualization presents from a user perspective. Table 2. Cognitive load for users presented by SOM visualizations. SOM Visualization U,U*,P Matrices Hit Histogram Smoothed Data Histogram Neighborhood Graph Vector Field Sky Metaphor Metro-Map Class Visualization Cognitive load for user High Low Medium High High Medium High Low

3. INFORMATION VISUALIZATION BASED ON SELF-ORGANIZED MAPS

As we state in the introduction, visualization techniques listed above are designed to visualize SOM characteristics leaving aside HCI perspective. In this section, we discuss Information Visualization methods based on self-organized maps. The main difference between this approach and the previous seen (Visualizing SOM), is that Information Visualization techniques try to incorporate HCI concepts such as interactivity, usability, insight and collection understanding in order to provide users useful visualizations with the intention of facilitate information seeking, knowledge discovery, relationships discovery, exploration and analysis of collections. We present Information Visualization methods based on self-organized maps and we analyze their characteristics from HCI perspective.
3.1 WEBSOM

WEBSOM is the traditional example of an Information Visualization based on SOM. Proposed by (Kaski et al 1999) the WEBSOM method organizes a text document collection and displays the resultant categorization in a 2D self-organized map using U-Matrix. In Table 3, it is shown the principal characteristics of this method. It provides an overview of the collection and through navigation it is possible to view more details about categorization, however, there is no information about context and it is easy to get loss while exploring the sea of documents. Inherited from U-Matrix visualization technique, this method presents a high complex model giving to the user a big cognitive load. In Table 4, WEBSOM is evaluated according to data presentation approach. As we may observe in Table 4, the data is presented as a Topic cloud. For users familiarized with SOM it is clear that relations between documents rely on spatial proximity. Nevertheless, this information about relations is not codified at all, thus this easily leads to get loss in a sea of labels where no extra information is provided. In Table 5, it is noticeable the lack of interaction of this method: neither zoom nor details on demand. One important issue is the lack of context information while navigating, this characteristic is important for navigating in big collections. In general, the idea of using SOM to present big collections was initially good but by not considering important HCI aspects represents only the first step towards a useful visualization. WEBSOM method is shown in Figure 11.

Figure 11. WEBSOM Method


3.2 SOMLib

SOMLib is an Information Visualization Method based on SOM. The objective of SOMLib is to represent a digital library system taking advantage of organization and categorization provided by SOM (Rauber and Merkl 1999). In this visualization, authors add a bookshelf metaphor that assists users in intuitively understanding the contents of the library and at the same time providing an overview of the collection held. In Table 3, it is shown that this Visualization Method gives the user a good overview of the contents of the collection. Navigation options like zoom, pan provides also adjust of level abstraction. Bookshelf metaphor is easy to understand and to intuitively navigate. However this limits the visualization only for documents. In Table 4, the SOMLib is analyzed from the data presentation approach. The clustering characteristic is highlighted by the SOM-based construction of the visualization as well as for the bookshelf metaphor. Labels for topics are relevant for the proper understanding of the collection and represents each cluster. In Table 5, SOMLib is presented by its interaction characteristics. In Figure 12, SOMLib Visualization is presented (LibViewer according to the author). One little disadvantage of this visualization is that relation between the elements of the collections is not clearly presented due to the bookshelf organization

Figure 12. SOMLib Visualization


3.3 ThemeView and ThemeScape Visualizations

Even ThemeView and ThemeScape use another algorithm for clustering they use similar visualization techniques to SOM visualizations. ThemeView uses Sky Metaphor Visualization in a 3D fashion to visualize documents collections. Sky metaphor can visualize tendencies in data, clusters and relations but is not well fitted to visualize the structure of the whole collection. In Table 3, we can observe that as a cluster-based visualization it provides a comprehensive overview of the collection being visualized, also provides an easy metaphor that is well understood by users when labels of topics are displayed. A series of interactive tools like zoom, pan, selecting, filtering makes this visualization good for user interaction. ThemeView is showed in Figure 13.

Figure 13. ThemeView Visualization ThemeScape uses visualization similar to Smoothed Data Histogram in a 3D Landscape-like fashion.

It also provides an overview of the collection. The main visual difference with ThemeView is the coloring. In Figure 14, it is shown this visualization method.

Figure 14. ThemeScape Visualization

3.4 ET Map Visualization

ET Map visualization is a scalable multilayered and graphical SOM approach for Internet categorization; it was developed by (Chen et al 1998). This method presents information of web pages in a hierarchical navigation structure. It uses rudimentary class visualization similar to (Lin et al 1991), but general idea of hierarchical navigation and presentation of classes is very useful for users to understand the universe of pages organized by the map. This class visualization helps to distinguish between clusters but hide the relations among the elements. The navigation presents an easy map metaphor that users understand without effort. Due to the navigation emphasis this visualization does not show its full hierarchical structure so it shows only one layer at a time as we may observe in Figure 15.

Figure 15. ET Map Visualization


3.5 Principal characteristics of Visualizations

In the next tables we show main characteristics previously discussed in each visualization method. Table 3. Main characteristics that visualizations should provide

Overview ThemeScape WEBSOM Viscovery Koua Visualization ET Map SOMLib (libViewer) Kartoo Grokker Cropcircles Docuburst Information Slices Treemaps Voronoi Treemaps ThemeRiver X X X X X X X X

Adjust abstraction level X X X X X X X X X X X

X X X

Complexity of Mental Model Low High Medium High Low Low Low Medium Medium Low Low Medium Medium Medium

Table 4. Presentation of data approach and visualizations Hierarchy ThemeScape WEBSOM Viscovery Koua Visualization ET Map SOMLib (libViewer) Kartoo Grokker Cropcircles Docuburst Information Slices Treemaps Voronoi Treemaps Clusters X X X X X X X X X X X Topic based X X Network X not very clear X

X X X X X X

X X

Table 5. Characteristics of visualizations according to interation Zoom ThemeScape WEBSOM Viscovery Koua Visualization ET Map SOMLib (libViewer) Kartoo Grokker X X Filtering X Details on demand X Animation and transitions

X X X X

X X X

Cropcircles Docuburst Information Slices Treemaps Voronoi Treemaps ThemeRiver

X X X X X

* * X X

4. CONCLUSION

We have presented two distinct approaches to visualize data from a self organized-map. First one (SOM visualization) is oriented only to show SOM characteristics like neighbor distances, number of elements mapped, clusters within the map and relationships between elements. Second approach (Information Visualization using SOM) is oriented to help user to understand the collection, gain insight and add interaction to visualizations. Since our point of view, in second approach some of these characteristics have not been well tackled. Most visualization methods showed under second approach are still attached in great part to first approach so they are not providing insight. It is in here where further efforts should be made in order to accomplish this objective of Information Visualization. We have also presented other Information Visualization methods different to those using SOM. We contrasted and presented all visualizations together in Table 3, Table 4, Table 5 to understand advantages and disadvantages while using SOM to present data. For more information about other information visualization methods references provides useful papers. In general, self-organized maps provide useful characteristics that help in collection understanding. In this sense, inherent SOM characteristics should be exploited from HCI perspective in order to provide insight to the user. SOM visualization techniques can be used together to improve visualization. Another important factor is user interaction; in this sense actual Information Visualization using SOM Methods dont tackle this issue properly. Coloring is a good example of this, none of these methods were aware about coloring techniques to help user to understand some not-evident SOM characteristics.

REFERENCES Andrews, K., & Heidegger, H. (1998). Information slices: Visualising and exploring large hierarchies using cascading, semi-circular discs. Proc of IEEE Infovis 98 late breaking Hot Topics (pp. 9 11). Recuperado a partir de http://www.iicm.tugraz.at/liberation/iicm_papers/ivis98.pdf?classname=ChangeLanguageHandler &methodname=showLanguage&static=true&language=en Balzer, M., Deussen, O., & Lewerentz, C. (2005). Voronoi treemaps for the visualization of software metrics. Proceedings of the 2005 ACM symposium on Software visualization , SoftVis 05 (pp. 165172). New York, NY, USA: ACM. doi:10.1145/1056018.1056041 Card, S. K., Mackinlay, J. D., & Shneiderman, B. (1999). Readings in Information Visualization: Using Vision to Think. The Morgan Kaufmann Series in Interactive Technologies. Morgan Kaufmann. Recuperado a partir de http://books.google.co.uk/books?id=wdh2gqWfQmgC Chen, H., Houston, A. L., Sewell, R. R., & Schatz, B. R. (1998). Internet Browsing and Searching: User Evaluation of Category Map and Concept Space Techniques. Journal of the American Society for Information Science, Special Issue on AI Techniques for Emerging Information Systems Applications, 49(7). Recuperado a partir de http://hdl.handle.net/10150/105118 Havre, S., Hetzler, B., & Nowell, L. (2000). ThemeRiver: Visualizing theme changes over time. Information Visualization, 2000. InfoVis 2000. IEEE Symposium on (pp. 115123). Recuperado a partir de http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=885098 Kohonen, T., Schroeder, M. R., & Huang, T. S. (Eds.). (2001). Self-Organizing Maps (3rd ed.). Secaucus, NJ, USA: Springer-Verlag New York, Inc. Lagus, K., Honkela, T., Kaski, S., & Kohonen, T. (1999). Websom for Textual Data Mining. Artif. Intell. Rev., 13(5-6), 345364. Latif, K., & Mayer, R. (2007). Sky-Metaphor Visualisation for Self-Organising Maps. Proceedings of the 7th International Conference on Knowledge Management (I-KNOW07). Graz, Austria. Lin, X., Soergel, D., & Marchionini, G. (1991). A self-organizing semantic map for information retrieval. Proceedings of the 14th annual international ACM SIGIR conference on Research and

development in information retrieval, SIGIR 91 (pp. 262269). New York, NY, USA: ACM. doi:10.1145/122860.122887 Mayer, R., Aziz, T. A., & Rauber, A. (2007). Visualising Class Distribution on Self-organising Maps. En J. M. de S, L. A. Alexandre, W. Duch, & D. P. Mandic (Eds.), ICANN (2), Lecture Notes in Computer Science (Vol. 4669, pp. 359368). Springer. Recuperado a partir de http://dblp.unitrier.de/db/conf/icann/icann2007-2.html#MayerAR07 Neumayer, R., Mayer, R., Plzlbauer, G., & Rauber, A. (2007). The Metro Visualisation of Component Planes for Self-Organising Maps. IJCNN (pp. 27882793). IEEE. Recuperado a partir de http://dblp.uni-trier.de/db/conf/ijcnn/ijcnn2007.html#NeumayerMPR07 Pampalk, E., Rauber, A., & Merkl, D. (2002). Using Smoothed Data Histograms for Cluster Visualization in Self-Organizing Maps. En J. R. Dorronsoro (Ed.), ICANN, Lecture Notes in Computer Science (Vol. 2415, pp. 871876). Springer. Recuperado a partir de http://dblp.unitrier.de/db/conf/icann/icann2002.html#PampalkRM02 Plzlbauer, G., Dittenbach, M., & Rauber, A. (2006). Advanced visualization of Self-Organizing Maps with vector fields. Neural Networks, 19(6-7), 911922. Plzlbauer, G., Rauber, A., & Dittenbach, M. (2005). Advanced Visualization Techniques for Selforganizing Maps with Graph-Based Methods. En J. Wang, X. Liao, & Z. Yi (Eds.), ISNN (2), Lecture Notes in Computer Science (Vol. 3497, pp. 7580). Springer. Recuperado a partir de http://dblp.uni-trier.de/db/conf/isnn/isnn2005-2.html#PolzlbauerRD05 Rauber, A., & Merkl, D. (1999). SOMLib: A Digital Library System Based on Neural Networks. ACM DL (pp. 240241). ACM. Recuperado a partir de http://dblp.unitrier.de/db/conf/dl/dl99.html#RauberM99 Shneiderman, B. (1992). Tree visualization with tree-maps: 2-d space-filling approach. ACM Transactions on Graphics (TOG), 11(1), 9299. doi:http://doi.acm.org/10.1145/102377.115768 Ultsch, A., & Siemon, H. P. (1990). Kohonens Self Organizing Feature Maps for Exploratory Data Analysis. Proceedings of International Neural Networks Conference (INNC) (pp. 305308). Paris: Kluwer Academic Press. Recuperado a partir de http://www.unimarburg.de/fb12/datenbionik/pdf/pubs/1990/UltschSiemon90

Ultsch, Alfred. (2003a). Maps for the Visualization of high-dimensional Data Spaces. In Proceedings Workshop on Self-Organizing Maps (WSOM 2003) (pp. 225230). Kyushu, Japan. Ultsch, Alfred. (2003b). U*-matrix: a tool to visualize clusters in high dimensional data ( No. 36). Philipps-University Marburg, Germany. Recuperado a partir de http://www.mathematik.unimarburg.de/ databionics/en/downloads/papers/ultsch03ustar.pdf Wang, T. D., & Parsia, B. (2006). CropCircles: Topology Sensitive Visualization of OWL Class Hierarchies. En I. F. Cruz, S. Decker, D. Allemang, C. Preist, D. Schwabe, P. Mika, M. Uschold, et al. (Eds.), International Semantic Web Conference, Lecture Notes in Computer Science (Vol. 4273, pp. 695708). Springer. Recuperado a partir de http://dblp.unitrier.de/db/conf/semweb/iswc2006.html#WangP06a Wise, J. A., Thomas, J. J., Pennock, K., Lantrip, D., Pottier, M., Schur, A., & Crow, V. (1995). Visualizing the non-visual: spatial analysis and interaction with information from text documents. En N. D. Gershon & S. G. Eick (Eds.), INFOVIS (pp. 5158). IEEE Computer Society. Recuperado a partir de http://dblp.uni-trier.de/db/conf/infovis/infovis1995.html#WiseTPLPSC95 Yi, J. S., Kang, Y., Stasko, J. T., & Jacko, J. A. (2008). Understanding and characterizing insights: how do people gain insights using information visualization? Proceedings of the 2008 Workshop on BEyond time and errors: novel evaLuation methods for Information Visualization , BELIV 08 (pp. 4:14:6). New York, NY, USA: ACM. doi:10.1145/1377966.1377971

You might also like