Professional Documents
Culture Documents
ISSN: 2319-1163
M.E Scholar, CTA, 2 Asst. Prof, CSE, SSCET, Bhilai, CG, India,
jharna.chopra@gmail.com, sampada.satav@gmail.com
Abstract
In this paper different privacy preservation techniques are compared. Classification is the most commonly applied data mining
technique, which employs a set of pre-classified examples to develop a model that can classify the population of records at large.
Fraud detection and credit risk applications are particularly well suited to this type of analysis. This approach frequently employs
decision tree or neural network-based classification algorithms. The data classification process involves learning and classification.
In Learning the training data are analyzed by classification algorithm. In classification test data are used to estimate the accuracy of
the classification rules. If the accuracy is acceptable the rules can be applied to the new data tuples . For a fraud detection
application, this would include complete records of both fraudulent and valid activities determined on a record-by-record basis. The
classifier-training algorithm uses these pre-classified examples to determine the set of parameters required for proper discrimination.
The algorithm then encodes these parameters into a model called a classifier
Index Terms: Data Mining, Privacy Preservation, Clustering, Classification Techniques, Naive Bayes.
-----------------------------------------------------------------------***----------------------------------------------------------------------1. INTRODUCTION
Data mining is the process of discovering new patterns from
large data sets involving methods at the intersection of
artificial intelligence, machine learning, statistics and database
systems. The goal of data mining is to extract knowledge from
a data set in a human-understandable structure and involves
database and data management, data pre-processing, model
and inference considerations, interestingness metrics,
complexity considerations, post-processing of found structure,
visualization and online updating.
Privacy has become an increasingly important issue in data
mining [5]. Privacy concerns restrict the free flow of
information. Privacy is one of the most important properties of
information. The protection of sensible information has a
relevant role Organization for legal a commercial do not want
to reveal their private database and information. Even the
individual does not want to reveal their personal data to other
than those they give permission to. This implies that revealing
of an instance to be classified may be equivalent to revealing
secret and private information .The protection of sensible
information has a relevant role .Privacy preserving data
mining algorithms have been recently introduced with the aim
of preventing the discovery of sensible information .
Privacy-Preserving Data Mining is developing models without
seeing the data is receiving growing attention [4]. Privacypreserving data mining considers the problem of running data
mining algorithms on confidential data that is not supposed to
be revealed even to the party running the algorithm. There are
two significance settings for privacy-preserving data mining In
the first, the data is divided amongst two or more different
__________________________________________________________________________________________
Volume: 02 Issue: 04 | Apr-2013, Available @ http://www.ijret.org
537
(divisive)
ISSN: 2319-1163
__________________________________________________________________________________________
Volume: 02 Issue: 04 | Apr-2013, Available @ http://www.ijret.org
538
ISSN: 2319-1163
__________________________________________________________________________________________
Volume: 02 Issue: 04 | Apr-2013, Available @ http://www.ijret.org
539
ISSN: 2319-1163
Classification
Algorithm
DECISION
TABLE
K-.NEAREST
NEIGHBOURS
ARTIFICIAL.N
EURAL
NETWORK
SUPPORT
VECTOR
MACHINE
Advantage
Disadvantage
1)Because all the work is done at runtime-KNN can have poor run-time
performance if the training set is
large.
2)k-NN is very sensitive to irrelevant
or
redundant features because all
features
contribute to the similarity and thus
to
the classification. This can be
ameliorated by careful feature
selection or feature weighting.
__________________________________________________________________________________________
Volume: 02 Issue: 04 | Apr-2013, Available @ http://www.ijret.org
540
ISSN: 2319-1163
REFERENCES
[1]
Murat
kantarcioglu
and
Jaideep
Vaidya
.PrivacyPreserving
Naive
Bayes
Classifier
for
HorizontallyPartitioned Data. Purdue University.
[2] Zhihua Wei, Hongyun Zhang ,Zhifei Zhang, Wen Li, and
duoquian Miao. July 2011.A Naive Bayesian Multi-label
Classification Algorithm with Application to Visualize Text
Search Result.Shanghai ,China.
[3]Zhiqiang Yang, Sheng Zhong, Rebecca n.Wright PrivacyPreserving Classification of Customer Data without Loss of
Accuracy.Rutgers University,Piscataway.
[4] Nidhi Bhatia and Kiran Jyoti . An Analysis of Heart
Disease Prediction using Different Data Mining Techniques.
International Journal of Engineering Research & Technology
Vol.1 Issue 8,ISSN:2278-0181
[5] Emmanouil Magkos , Manolis Maragoudakis, Vassilis
Chrissikopoulos
and
Stefanos
Gritzalis.13
April
2009.Accurate and Large Scale Privacy Preserving Data
mining Using the Election Paradigm.
[6] Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic
Smyth, From Data Mining to Knowledge Discovery in
Databases, Providence, Rhode Island July 2731, 1997.
[7] Mrs. Bharati and M. Ramageri ,Data mining Technique
and Applications Indian Journal of Computer Science and
Engineering Vol. 1 No. 4 301-305, ISSN : 0976- 5166.
[8] Lior Rokach and Oded Maimon, Decision Tree ,
DataMining and Knowledge Discovery Handbook.
[9] Megha Gupta and Naveen Aggarwal, Classification
techniques Analysis, NCCI 2010 -National Conference on
Computational Instrumentation CSIO Chandigarh, INDIA, 1920 March 2010.
[10]Goldreich and Oded, Foundations of Cryptography:
Volume 2, Basic Applications. Vol. 2. Cambridge university
press, 2004.
[11] Benny Pinkas, Cryptographic techniques for privacy
preserving data mining, SIGKDD Explorations. Volume 4,
Issue 2 - page 18.
[12] Jaideep Vaidya ,Murat Kantarco and glu Chris
Clifton, Privacy-preserving Nave classification, Received:
30 September 2005 / Revised: 15 March 2006 / Accepted: 25
July 2006 / Published online: 3 February 2007 SpringerVerlag 2007.
[13] Rivest, R.; A. Shamir; L. Adleman (1978). "A Method for
Obtaining Digital Signatures and Public-Key Cryptosystems.
Communications
of
the
ACM
21
(2):120126.
doi:10.1145/359340.359342.
[14] Tjen-Sien Lim, Wei-Yin Loh , and Yu-Shih.A
Comparison of Prediction Accuracy ,Complexity ,and
__________________________________________________________________________________________
Volume: 02 Issue: 04 | Apr-2013, Available @ http://www.ijret.org
541