You are on page 1of 10

TECHNICAL REPORT AND USER MANUAL FOR THE PROJECT

SPATIAL PATTERN IN LARGE DATA: IDENTIFYING CLUSTERS

Prepared for
GIS Course

Instructor
Prof S. Rajagopalan

Team Members:
Vignesh.R(MT2014133)
Vignesh.S(MT2014134)
Satish Kulkarni(MT2014106)

Table of contents
Contents

Page No

1. Introduction
1.1 Purpose
1.2 Scope
1.3 Definitions, Acronyms and Abbreviations
1.4 Overview of Documents

3
3
3
3
3

2. System Overview

3.User Manual
3.1 Launching the Application
3.2 Choosing the Clustering Algorithm
3.3 Choosing Dataset
3.4 Choosing ATM type
3.5 Choose number of clusters
3.6 Visualize

5
5
5
6
6
7
7

4.Results
4.1 Rainfall data DB scan clustering
4.2 Rainfall data Hierarchical clustering
4.3 ATM Public data Kmeans-3 cluster
4.4 ATM Private data- Kmeans- 5 cluster
4.5 ATM Public data- DB Scan
4.6 ATM Private data- Hierarchical clustering

8
8
8
9
9
10
10

1. Introduction
1.1 Purpose
The Purpose of this project is to develop a web application to view clustering of spatial data
(Rainfall, ATM datasets). The application provides features that enable users to view data point
clusters using different clustering techniques
1.2 Scope
The scope of this project is to develop a web application using some good visualization techniques
to showcase the clusters in spatial datasets to the User. The application allows the user to select the
dataset in which the clusters needs to be identified and also select the clustering technique to be
followed.
1.3 Definitions, Acronyms and Abbreviations
Centroid based Clustering:
Clusters are represented by a central vector, which may not necessarily be a member of the dataset.
Eg. K Means Clustering: When the number of clusters is fixed to k, the objective of the K-Means
algorithm id to find the k cluster centres and assign the objects to the nearest cluster centre, such
that the squared distances from the cluster are minimized.
Density based Clustering:
Clusters are defined as areas of higher density than the remainder of the data set. Objects in other
sparse areas, that are required to separate clusters, are usually considered to be noise and border
points
Eg. Mean Shift Clustering: is a clustering approach where each object is moved to the densest
area in its vicinity, based on kernel density estimation. Eventually, objects converge to local
maxima of density. These local maxima points act as cluster cluster representatives
Distribution based Clustering:
Clusters can then easily be defined as objects belonging most likely to the same distribution.
Eg. Gaussian Mixture Models: One prominent method is known as Gaussian mixture models
(using the expectation-maximization algorithm). Here, the data set is usually modelled with a
fixed number of Gaussian distributions that are initialized randomly and whose parameters
are iteratively optimized to fit better to the data set
1.4 Overview of the document
The technical report specifies the technical aspects that we have used to develop this application.
This document also acts as a user manual on how to use this application.

2. System Overview:
This section provides a detail overview of the application.
2.1 Application Perspective
This application allows user to select the dataset to identify clusters and the type of clustering the
User wishes to visualize. Application will show the visualization of the clusters on a Map. Good
visualization techniques for enhanced user viewing experience will be incorporated (like different
colouring of different clusters, highlighting the Outliers based on the dataset).
2.2 Operating environment
a. Windows operating system for development.
b. D3 for Visualization
2.3 Dependencies
The project uses D3 for showing the results of the clustering algorithm on a Map.
2.4 User Classes and Characteristics
The application should cater to the following user classes:
Primary User Those who wish to view the clustering of the dataset identified by different
algorithms
Developer The role of a developer is to implement the algorithm and maintain the
application up and running.

3.User Manual
3.1 Launching the Application
The application is hosted on the server. It can be accessed by using its URL. Once launced the
application displays a plain India map and controls to choose the dataset and clustering techniques.

3.2 Choosing the Clustering Algorithm


The first step is to choose the clustering algorithm. The application supports 3 clustering techniques
viz, K-means, DB scan, hierarchical clustering.

3.3 Choosing Dataset


The second step is to choose the dataset. The application has two dataset, Rainfall data and ATM
data. Choose one from the dropdown.

3.4 Choosing ATM type


The ATM data has two types, public banks and private banks. Choose one from the drop down. If
you have selected Rainfall data then this dropdown would be disabled.

3.5 Choose number of clusters


The next step is to choose the number of clusters. The application supports 2,3 or 5 clusters. Note
this option is available only if the algorithm you chose is K-means.

3.6 Visualize
One you click submit button the clusters can be visualized on the India map.

4.Results
Here we present some of the visualization for different options chosen.
4.1 Rainfall data DB scan clustering

4.2 Rainfall data Hierarchical clustering

4.3 ATM Public data Kmeans-3 cluster

4.4 ATM Private data- Kmeans- 5 cluster

4.5 ATM Public data- DB Scan

4.6 ATM Private data- Hierarchical clustering

You might also like