Professional Documents
Culture Documents
Abstract— Data Mining is a process of discovering useful 1.1 Data mining algorithms
information in large data repositories. Data mining
techniques have been used to enhance information retrieval Association analysis involves the discovery of
systems. Privacy-preserving data mining (PPDM) refers to association rules, showing the attribute value and
the area of data mining that is primarily concerned with conditions that are frequently in a given set of data.
protecting against disclosure of individual data records i.e., Classification is the process of finding a set of models
to safeguard sensitive information. To address about that describe and distinguish data classes or concepts, for
privacy researchers in data mining community have the purpose of being able to use the model to predict the
proposed various solutions. In this paper we present an class of objects whose class label is unknown.
extensive review of all privacy preserving data mining Clustering analysis is a process of decomposing the data
(PPDM) techniques. We use a classification scheme, which set into groups so that points in one group are similar to
is adopted from earlier studies, to review the techniques. each other and are different from the points in other groups.
Keywords— PPDM, privacy; data privacy, Data Mining, 2. Classification of existing PPDM techniques
data perturbation
PPDM techniques are mainly classified into three major
categories: data partitioning, data modification and data
1. Introduction restriction as shown in the below Fig 1.
1
Engineering and Scientific International Journal (ESIJ) ISSN 2394-7179 (Print)
Volume 2, Issue 1, January - March 2015 ISSN 2394-7187 (Online)
Noise Addition Techniques: In this technique some noise is 2.3 Data Distribution:
added to the original data to prevent the identification of
confidential information relating to a particular individual. Privacy preserving Data Mining Algorithms can be
To hide the patterns the attribute values can be randomly applied on centralized database or distributed database. In
shuffled. centralized database environment data are all stored in a
single database, where as in distributed database
These are of two types: environment data is distributed in different locations. On
centralized Data base we can perform Data Hiding and
Data swapping techniques: In this technique the value Rule Hiding privacy, whereas on Distributed database we
of individual records will be interchanged [7]. can perform data hiding privacy only. To perform data
Data distortion techniques: In this technique some hiding can be performed using classification, clustering and
distortion will be added to the data [8]. Most commonly association rules algorithms. To perform rule hiding also
used distortions are Uniform distribution over an interval [- we can use classification, clustering and Association rules
∞, ∞] and Gaussian distribution with mean µ=0. algorithms [13].
Data hiding refers to the cases where sensitive data from
Space Transformation Techniques: These techniques are original dataset like identity, name, and address that are
used to protect the data values used to perform clustering linked directly or indirectly to an individual person to hide.
without affecting the similarity between objects under Rule hiding refers to the removal of sensitive knowledge
analysis. derived from original databases after applying data mining
algorithms.
These are of three types: The privacy preserving techniques used in a distributed
database is mainly based on cryptography techniques.
Rotation Based Transformation: In this technique the
idea is to that the attributes of a database are split into pair 3. Conclusion
wise attributes selected randomly. One attribute can be
selected and rotated more than once and the angle Ѳ PPDM is emerged as a new field of study. In this paper
between an attribute pair is also selected randomly. This we have surveyed different privacy preserving data mining
technique is used for cluster analysis [9]. techniques. And we also identified which type of privacy
2
Engineering and Scientific International Journal (ESIJ) ISSN 2394-7179 (Print)
Volume 2, Issue 1, January - March 2015 ISSN 2394-7187 (Online)
will be provided by using which type of data mining [9] S. R. M. Oliveira and O.R. Zaiane. Achieving Privacy Preservation
When Sharing Data for Clustering. In Proc. of the Workshop on
technique. For a new comer, this paper provides a brief
Secure Data Management in a Connected World (SDM’04) in
review about existing privacy preserving techniques. conjuction with VLDB’2004, pages 67-82, Toronto, Ontario,
Canada, August 2004.
References [10] Y. Saygin, V. S. Verykios, and C. Clifton. Using Unknowns to
Prevent Discovery of Association Rules. SIGMOD Record,
30(4):45-54, December 2001.
[1] C. Clifton, M. Kantarcioglu, and J. Vaidya. Defining Privacy For [11] A. P. Felty and S. Matwin: Privacy- Oriented Data Mining by
Data Mining. In Proc. of the National Science Foundation Proof Checking. In Proc. of the 6th European Conference on
Workshop on Next Generation Data Mining, pages 126-133, Principals of Data Mining and Knowledge Discovery (PKDD),
Baltimore, MD, USA, November 2002. Pages138-149, Helsinki, Finland, August 2002.
[2] R. Agarwal and R. Srikanth. Privacy- Preserving Data mining. In [12] A. Mucsi-Nagy and S. Matwin. Digital Fingerprinting for Sharing
Proc. of the 2000 ACM SIGMOD International Conference on of Confidential Data. In Proc. of the Workshop on Privacy and
Management of Data, pages 439-450, Dallas, Texas, May 2000. Security Issues in Data Mining, pages 11-26, Pisa, Italy, Septe,ber
[3] M. Kantarcioglu and J. Vaidya. Privacy Preserving Naïve Bayes 2004.
Classifier for Horizontally Partitioned Data. In Proc. of the IEEE [13] Xiaodan Wu, Yunfeng Wang, Chao-Hsien Chu. A Close Look at
ICDM Workshop on Privacy Preserving Data Mining, Pages 3-9, Privacy Preserving Data Mining Methods. In Proc. of The Tenth
Melbourne, FL, USA, November 2003. Pasific Asia Conference on Information Systems (PACIS 2006),
[4] J. Vaidya and C. Clifton. Privacy Preserving Association Rule pages167-173.
Mining in Vertically Partitioned Data. In Proc. of the 8 th ACM
SIGKDD Intl. Cong. On Knowledge Discovery and Data Mining, M.Jyothi received B.Tech degree in Computer
Pages 639-644, Edmonton, AB, Canada, July 2002. Science and Information Technology from AITAM
[5] J. Vaidya and C. Clifton. Privacy Preserving K-Means Clustering College, Tekkali, India and M.Tech degree in
Over Vertically Partitioned Data. In Proc. of the 9 th ACM SIGKDD Computer Science and Engineering from JNTUK,
Intl. Cong. On Knowledge Discovery and Data Mining, Pages 206- Kakinada, India. She is pursuing her Ph.D from
215, Washington, DC, USA, August 2003. Acharya Nagarjuna University Guntur, in the area of
[6] S. Meregu and J.Ghosh. Privacy- Preserving Distributed Clustering Data Mining. Currently she is working as Assistant
using Generative Models. In Proc. of the 3rd IEEE International professor in Information Technology Department at
Conference on Data Mining (ICDM’03), pages 211-218, GMRIT, Rajam.
Melbourne, Florida, USA, November 2003.
[7] V. Estivill-Castro and L. Brankovic. Data Swapping: Balancing M.Suneetha received B.Tech degree in Computer
Privacy Against Precision in Mining of Logic Rules. In Proc. of Science and Information Technology from AITAM
Data Warehousing and Knowledge Discovery DaWaK-99, pages College, Tekkali, India, M.Tech degree in Software
389-398, Florence, Italy, August 1999. Engineering from JNTUH, Hyderabad, India. She is
[8] C. W. wu. Privacy Preserving Data Mining: A Signal Processing pursuing her Ph.D from Acharya Nagarjuna
Perspective and A Simple Data Perturbation Protocol. In Proc. of University Guntur, in the area of Data Mining.
the IEEE ICDM Workshop on Privacy Preserving Data Mining, Currently she is working as a Assistant professor in
pages 10-17, Melbourne, FL, USA, November 2003. Information Technology Department at GMRIT,
Rajam.