You are on page 1of 4

Visualization and categorization of emerging technologies using keywords of papers

M.Pavankumar Reddy EE10B020 M S Santosh Kumar EE10B017 November 28, 2012

There have been lot of papers around in each eld of emerging technology. With the increasing number of papers, it has been very dicult to locate trends in technologies and analyze the evolving literature works. As a solution to this problem, many methods have been developed in recent years and visualization techniques are still considered best among them. In this respect, our project is concerned with trying out a new visualization method which is dierent from the one that are presently used. We use author keywords from papers of a target technology and we cluster the papers by using k-Means clustering algorithm and graph partitioning. With these clusters we form a semantic network of keywords combined with published dates of papers and frequency of keywords and hence present the information in a much comprehensible way. This set of clusters can also be used for dividing the papers into categories. A similar visualization technique has been proposed for patent networks in the paper "Visualization of patent analysis for emerging technologies" by Young Gil Kim. We used a similar approach in our case as well.

Abstract

Introduction

There are several techniques that are in use in order to provide dierent details about the paper networks. However, there aren't many which can provide an overview of sub-categories involved in a eld or the evolution of technologies in a eld. It is very essential to have an overview of the emerging technologies in the eld of interest and organize the papers accordingly, in order to reduce the eort and time spent in nding out the correct category of interest. Since most of the papers are technical, it is quite dicult for a non-expert to identify the approximate categories of interest using the present search based paper network analysis. Using a semantic network of keywords, it would be very easy to identify the category by just looking at the visualization and following it up with the 1

related papers in this particular category. This project is an eort to bring such unstructured paper networks into a more structured and comprehensible way. The next few sections describe the methodologies we used for our project and an outcome of this techniques applied to particular test case (human-computer interaction in this case).

Points to ponder

inserting formula ideal number of groups from k-means clustering

Methodology

Our methodology can be summed in the following lines: 1) Collection of paper information and keywords related to a particular eld 2) Creating a semantic network of papers and keywords, and dening a feature vector for each paper based on keywords 3) Grouping dierent papers together using K-means clustering algorithm and graph-partitioning algorithm and analyzing the results in both cases. 4) Associating the networks with publishing dates and frequency, nally leading to a visualization of the paper network related to a particular eld. 5) Testing our techniques for the chosen eld and analyzing the advantages and drawbacks of this techniques and shedding a light on further developments. We shall describe the above methodologies in brief in the following sections:

3.1 Collection of paper information and keywords


We begin by identifying some keywords related to target eld ( can be collected from experts in that particular eld). We expand our set of keywords by aggregating the initial set with keywords from the papers obtained using the initial set. We proceed till a consider able number of keywords has been formed. Then we collect all the papers related those keywords and link them together. In real case scenario, we need to this for all the papers in that particular eld. However, in our test case we use a limited number of papers due to resource constraints. The papers are obtained from ocial IEEE website and the keywords list is derived from the author keywords of these set of papers.

3.2 Creating a semantic network of papers and keywords, and dening a feature vector for each paper based on keywords
Now we form a list of keywords collected from all the papers. Now, we form a keyword existence matrix with column index of keywords (1,2,..,j,..,m) and a row index of papers (1,2,..,i,.,n). We ll this matrix by this rule - if j th keyword 2

exists in the i th paper, then the element (i,j) of the matrix if lled by a 1. So, we now have a keyword existance matrix lled with 0 or 1. We now use the rows of the matrix as feature vecture for the papers. So, the i th row will form the feature vector for the i th paper. After forming the feature vectors for the papers, we cluster the papers using dierent techniques. In this project we have clustered using two methods - k-means clustering algorithm and graph partitioning. We now explain here (as explained in the paper), how we form semantic network of keywords. Let us assume paper 'A' and 'B' belong to group 1. We see that paper 'A' has keywords 'a', and 'c'. And paper 'B' has keywords 'b' and 'c'. Then, the group 1 consists of three keywords 'a', 'b', 'c'. We investigate the keywords for each group. The formation of semantic network of keywords is as follows. This is explained in the gure we have included below. Let group 1 has keywords of 'a', 'b', and 'c' and let group 2 has keywords 'c' and 'd'. The two groups share 'c', and therefore relationship between the two groups can be represened by three nodes : (a, b), (c) and (d). Here the shared node is higher than the others, so arrows are drawn from (c) to (a, b) and (d). Like this we form semantic network of keywords consisting of nodes with one of more keywords. The number of keywords is dependent on the k-means clustering algorithm. We perform this n times and this results in n-semantic networks. From the n semantic networks we have to choose one for visulization. The problem here is

You might also like