You are on page 1of 9

ANNEXURE- I

MINOR RESEARCH PROJECT PROPOSAL

PART - A : GENERAL INFORMATION

1 Basic Subject Area of Research : Engineering and Technology (Data Mining)

2 Title of the Proposed Project : Design of iterative clustering algorithm to mine Big Data

3 Name, Qualification and : Principal Investigator


Designation of the Principal Name : Dr. M. Vijayalakshmi
Investigator / Co-Investigator Qualification : PhD
Designation : Professor

Co-Investigator
Name : Parth Chandarana
Qualification : M.E
Designation : Assistant Professor

4 Teaching and Research Experience : Teaching : 27 Years


of Principal Investigator Research : 15 Years

5 Name and address of the Institution : VES Institute of Technology


where the proposal will be Hashu advani Memorial Complex
executed VES Campus II, Chembur - 400074

6 Whether the College / University is : Yes


approved by the UGC

7 Details of facilities provided / to be : Computer laboratory, external hard drives, router,


made available at the College / intranet and internet
University

8 Have you ever applied before for : NO


Minor Research Project?
If yes, give details

9 Whether the Project or part of : NO


Project is approved by the
University for the Doctoral Degrees
If Yes, give details

10 Details of the Research Project and : NA


research funding (Major /Minor)
received in the past and/ongoing
projects.
PART - B : PROJECT DETAILS

1 Details of the proposed project to be undertaken:


(Attach additional pages if required)

Origin, Need and Objective of the Research Proposal:


Origin:
In the last few years, the amount of data generated and collected has been increased by multifold.
Though the systems to store and process such huge amount of data has also been upgraded, but
the generated data in the digital universe is far more than the modern IT system could store and
most importantly process. The generated data is very different from the traditional data generated
over the web space since the inception of web and computer systems. This data is termed as Big
data due to its innate characteristics such as sheer volume, high velocity, variety and veracity.
The effect of big data has forced us to review our existing processes to store and process the
data and how we utilize it for decision making. This has created tremendous opportunities among
the researchers and industry to come up with the solutions to tackle and effectively utilize the
power of big data as our traditional data storage, management and processing systems and
techniques are unable to handle such large phenomenon. Moreover, these data centric
applications, will not only solve the posed challenges, but it also brings with it several dimensions
where usage of data centric solutions were imaginary. The domains of application where big data
analytics is applicable are numerous and health care, banking and finance, ecommerce,
governance, agriculture to name a few.
Need:
Without the use of data centric applications, it is very difficult to cater the fast paced requirement
of technology users. To provide the data enable solutions to the customers of any domain, data
storage and processing models have to be updated and should solve the challenges posed by
characteristics of big data.
More specifically, any big data application requires effective techniques to process large amount
of multi source data in timely manner. The nature of the big data makes it obvious that we need
the data mining algorithm such that without having any prior knowledge of underlying data
structure, the algorithm will bring value out of raw data. The inherent nature of traditional data
clustering algorithm becomes the first choice among the researchers.
However, there are certain challenges posed by the traditional data clustering algorithms. Most
data clustering algorithms requires sheer computational power and are repetitive in nature.
Hence, traditional data clustering algorithm can not be applied directly to big data using traditional
infrastructure.
Objective:
The following objectives are aimed to fulfill as part of the project.
to study the existing data clustering algorithm and identify suitable for big data analytics.
Design of data clustering algorithm suitable for iterative big data analytics.
Rationale for taking up the proposed project and its interdisciplinary relevance:
This research aims to design mining algorithms incorporating key characteristics of big data to
enhance data driven decision making. To satisfy this, we will first conduct an assessment of some
key traditional data mining algorithms used to mine multivariate and stream data. Further, an
empirical analysis of these algorithms for big data will be conducted. The result of the analysis will
help in deciding key design principles for big data mining algorithms. Lastly, qualitative analysis of
newly designed algorithm with the help of big data use case will validate the applicability of the
algorithms.
The proposed project aims to propose iterative data clustering algorithm that can be used to
develop mobile and web based analytics application. Personalized banking services, personalized
and preventative health care systems, intelligent agriculture, real time security systems are some
of the example domains where big data analytics solution using our proposed algorithm can be
used.
Review of Research and Development in the field :
Applicability of the classical data mining algorithms for big data analytics is popular among
researchers and industry. Several efforts has been made to use data mining algorithms for big
data mining. In one such study [1], the usage of several data mining algorithms in several
domains has been mentioned. Among all data mining algorithms, several new modification of
existing data clustering algorithms are been proposed. To get better customer insight and for
customer analytics modified version of hierarchical clustering is been proposed [2]. Another recent
research has been conducted to cluster customer transactional data using modified version of
hierarchical clustering and generating hierarchical cluster of the results [3]. All of these recent
research efforts clearly indicate the importance of big data analytics algorithms for years to come.
Relevance to social benefit by this R&D in the proposed area :
The proposed project aims to design a data clustering algorithm and study the suitability of
existing data mining algorithm with respect to big data. This can be than used to develop
application to solve critical issues. For instance, a mobile app can be developed using data mining
algorithm to increase awareness for road condition. A personal health care assistant using mobile
or personalized educational application can be developed for children deprived of basic education
facilities using data mining algorithms to be studied as part of the project.
Moreover, the importance of data mining techniques for design of several governance can not be
overlooked. For example, using data mining technique to design mobile application to create
awareness about several agricultural aspect before every season in local language is one
example of many possible application.
Work Plan (including Detailed Methodology and Time Schedule):
The proposed project will progress in three steps as mentioned below:

Stage Proposed work Time required

Stage 1 In this stage, we will study the 2 months


applicability of the classical
data mining algorithms for
several benchmark big data
sets. This will enable us to
identify challenges of big data
sets while processing

Stage 2 This stage will propose the 2 months


clustering algorithm for
iterative data clustering for
exploratory analysis of big
data.

Stage 3 In this stage we will analyze 2 months


the performance of proposed
algorithm against benchmark
big data sets which we have
used in stage 1

Expected Results, Conclusion and Future Plans:


The undertaken project aims to identify the challenges of data mining algorithms with respect to
big data and propose the iterative data clustering algorithm for exploratory analysis. Hence, the
outcome of proposed work is an design of iterative data clustering algorithm with suggestion of its
suitability for application domains.

Conclusion and future scope

The proposed project will shed a light on certain critical aspect of big data analytics for application
domains. More specifically, several newer application can be developed if suitable data
processing techniques are available in the literature. This project will enable to upgrade the
literature on data clustering algorithms and will act as a stepping stone to for development of
newer data centric applications which can not only benefit the researchers, but to every individual
also.
Although the proposed algorithm tries to provide a strategy for exploratory analysis, there are
certain limitations with the algorithm. For huge video data such as data from surveillance
cameras, video from social media etc are not being considered. Extension to the algorithm can be
worked upon in the future to make the algorithm more suitable for diverse data types.

2 Collaboration for the proposed project (if any) : NA

3 Details of financial requirements with justification

Sr. Head
No.
1 Consumables and Chemicals NA

2 Equipments (minor) Server 25000

3 Travel 5000

4 Books & Peripherals / 10000


Conference and Journal
Publication

5 Contingency 5000

Total 50000
Justification
Equipment: For testing of proposed algorithm, server is required with high capacity RAM and
storage.
Conference and Journal Publication: We aim to update the existing literature in the area of
proposed project. Hence, the journal and conference in relevant field will be chosen to publish the
acquired knowledge through empirical analysis
Travel and Contingency : To purchase other required small component during the course of the
project, contingency amount is reserved. For attending conference, travel amount is mentioned.

4 Any other information in support of the proposed project

PART - C : Bio-data and Endorsement

Detailed Bio-data of the Principal Investigator as per Annexure -II

Statement from the Present Employer as per Annexure-III


ANNEXURE - II

Detailed Bio-data

1. Name of the Applicant: Dr. M. Vijayalakshmi


2. Mailing Address: Vivekanand Education Societys Institute of Technology,
Hashu Advani Memorial Complex, Collectors colony, Chembur, Mumbai-
400074
a. Telephone: 022-61532592
b. Fax: 022-61532555
c. E-mail: m.vijayalakshmi@ves.ac.in
3. Date of Birth: 17/08/1966
4. Educational Qualification (Starting from Graduation onwards):

Sr. Percentage
Degree University Year Subjects
No. / CGPA

1 PhD IIT Bombay 2013 Data Mining -

Database and
2 M.Tech IIT Bombay 2000 Data CPI - 9.5
Management

5. A. Details of Professional Training and Research Experience, specifying period

i. Teaching Experience: UG : 27 Years


PG : 15 Years
ii. Research Experience : 15 Years

B. Details of Employment (Past & Present) Professor, Information Technology Department,


Vivekanand Education Societys Institute of Technology (August, 1992)
C. List of significant publications (Research Papers and books) during last five years.(with details)

List of author/s as it Title Name of the Book & the ISSN/ISBN


appear it Publishers/Journal, Number
In the publication Issue no. and year of
Publication, page nos.
Vijayalakshmi M. & OPSEARCH : December ISSN: 0030-3887 (Print)
Menezes B. 2013, Volume 50, Issue 4, 0975-0320 (Online)
pp 455-474
Springer Journals

Darshana Chande, International Journal of ISSN: 0975 - 8887


M.Vijayalakshmi Computer Applications
61(12):31-38, January
2013. Published by
Foundation of Comp.
Science, New York, USA.

Manisha Garhiwal, International Journal of ISSN 2278 7917


M.Vijayalakshmi Advanced Studies in
Computer Science and
Engineering (IJASCSE);
Published by: IAASSE
(International Association
of Academicians,
Scholars, Scientists &
Engineers)
8Vol 2 Issue 1 2013:
55-66

Monica Sawlani, International Journal of ISSN: 2230-9608


M.Vijayalakshmi, Data Mining & (Online)
Knowledge Management 2231-007x (print)
Process (IJDKP)
AIRCC Publications;
Vol.3, No.1, January
2013; 39-56

Chandarana, P. and Big Data Analytics Proceedings of the http://dx.doi.org/10.1109


Vijayalakshmi, M. Frameworks International Conference /cscita.2014.6839299
on Circuits, Systems,
Communication and
Information Technology
Applications (CSCITA),
Mumbai, 4-5 April 2014,
430-434.

C. S Lifna, M Identifying Concept-drift Procedia Computer


Vijayalakshmi, in Twitter Streams Science 45, 86-94 2015

C. S. Lifna, M SOFT GRID - Big Data International


Vijayalakshmi Analytics for Smart Grid Technological
Conference-2014
(I-TechCON), 54 - 59

Chandarana, Parth, and "Big data information Big data analytics for
M. Vijayalakshmi. retrieval using Apache business (BDAB), 2014
Solr." international conference.

Masooda M.Aslam Privacy Preserving Data VESIT , International ISSN : 2347 - 8446
Modak, Dr. Vijayalakshmi Mining Techniques In The Technological (Online) ISSN : 2347 -
M. Cloud :A Comparative Conference-2014 9817 (Print)
Analysis- (I-TechCON), Jan. 03 www.ijarcst.com
04

Masooda M.Aslam Privacy Preserving 8th International ISBN:


Modak, Dr. Vijayalakshmi Association Rule Hiding Conference on Advanced 978-93-84935-00-9.
M. on Horizontally Computing and
Partitioned data Using Communication
Concept hierarchy Technologies
(ICACCT(-2014)

Monica G Tolani, Dr. Identification of International Journal of ISSN(Online):


M.Vijayalakshmi Community in Social Innovative Research in 2320-9801 ISSN (Print):
Network. Computer 2320-9798
and Communication
Engineering
Month of Publishing &
Volume Number: Vol. 4,
Issue 10, October 2016

Dr. Vijayalakshmi M, The How, When and Why ( IJCTA) International ISSSN:2229-6093
Shalu Chopra, of sentiment analysis Journal for computer
Sangeeta Oswal, Technology and
Deepshikha Chaturvedi Applications-vol 4
July-August 2013

Dr. Vijayalakshmi M, Boost Up! Sentiment Proceeding of ISBN


Sangeeta Oswal Classification for Hindi International Conference :978-0-9884925-4-7
Movies Using Machine and workshop on advance
Learning Techniques computing (ICWAC-2014)
Multicon Mumbai.

Aarti Sahitya, Dr. Feature Extraction From International Journal of ISSN 2348 4853
Vijayalakshmi M Big Data Advance Foundation and
Research in Computer
(IJAFRC) Volume 2, Issue
10, October - 2015.
Sahil Gandhi, Tejash An enterprise friendly IEEE International
Desai, Pranav Murlidhar, book recommendation Conference on
Sankalp Gupta, system for very sparse Computing, Analytics and
Vijayalakshmi M and data Security Trends
Girish Bhole. 19 21 December 2016
COEP Pune, India.

M. Vijayalakshmi Big Data Analytics Wiley India


Radha Shankarmani December 2015

M. Vijayalakshmi Big Data Analytics Wiley India July 2016


Radha Shankarmani

6. Professional Recognition, Awards, Fellowships received: Faculty Research


Award, Microsoft 2008
7. Any other information: NIL

Mumbai 19/08/2017 m.vijayalakshmi


Place & Date: Signature of the Applicant

You might also like