Privacy preserving distributed data mining by using tree based randomization

Ace p.

Order the writing of a tailor-made Computer science Term papers

Free quote online

Thesis Format .pdf

Privacy preserving distributed data mining by using tree based randomization

Download

Read an extract

Reader
Abstract
Contents
Extract

Abstract

Due to increasing concerns related to privacy, organizations are forced to take actions to protect the privacy of individuals when revealing customers' records during data mining activities. In this paper we present a methodology that preserves statistical relationship between attributes using kd-tree based tree perturbation method and mix nets to provide the sender anonymity. Tree perturbation method is used to partition the data set into sets of homogenous data subset. The confidential data in the concluding subsets are then replaced with the average subset. The perturbed data is sent to the centralized Mix net, which is used to protect the privacy of participants by clouding the sender information. Finally the data is sent to the data miner for analysis.

Keywords— Data Mining, Privacy Preservation, Tree Perturbation method, Mix-nets

Abstract
Introduction
Related work
1. Privacy preserving datamining
2. Tree based perturbation technique
3. Algorithm for tree based perturbation
Proposed system
1. Steps in distributed perturbation algorithm
The mixnet system
Experimental evaluation
Conclusion
References

Get this table of contents for free after login.

Extract

[...] Another approach to achieve privacy-preserving data mining is to use Secure Multi-party Computation (SMC) techniques. Several SMC-based privacy-preserving data mining schemes have been proposed. Basic idea of multiparty computation is that for three or more party wants to find some important knowledge by combining the databases of them which contains private data with it. So each party will know only the final result and nothing about the other party's data. Multiparty based technique provides privacypreserving protocols that eliminate the use of trusted third party while ensuring that each party learns nothing more than he or she would in the ideal model in which each party sends his/her input to a trusted third party who carries out the computation on the received inputs and sends the appropriate results back to each party III. [...]

[...] Perturb the data by replacing these values with their average xt = (1/nt ) xtk .Repeat this step for each leaf in the tree built in step3. The Tree based perturbation algorithm recursively divides a data set into smaller subsets such that data points within each subset are more homogeneous after each partition. This method is primarily used for numeric data. This algorithm typically selects the attribute with the largest variance and splits the data into two subsets at the median or midrange of the attribute. [...]

[...] Wenliang Du And Zhijun Zhan: Using Randomized Response Techniques For Privacy-Preserving Data Mining.Department Of Electrical Engineering And Computer Science Syracuse University, Syracuse,NY:13244,2003 M.Kantarcioglu, J.Vaidya,Privacy Preserving Naive Bayes Classifier for Horizontally Partitioned Data, In IEEE ICDM Workshop on Privacy Preserving Data Mining, pp. A.Evfimievski,Ramakrishnan Srikant, Rakesh Agrawal and Johannes Gehrke,” Privacy Preserving Mining of Association Rules”, IBM Almaden Research Center ACM158113567X/02/ J. Domingo-Ferrer and J.M. Mateo-Sanz, “Practical Data-Oriented Microaggregation for Statistical Disclosure Control,” IEEE Trans. Knowledge and Data Eng., vol no pp. [...]

[...] Here the author considered the situation where all data are owned by a single organization and the focus on how to protect individual privacy when the organization releases the data to a third party for performing data mining. But there are situations where the data is spread across multiple sites. Sharing of these data can lead to mutual gain to all the parties. Due to privacy concern the data holders send their data to central site after randomizing their data by using tree based randomization technique. [...]

[...] In randomization technique, privacy preserving randomization algorithm for classification techniques including decision trees on randomized data naïve Bayes classification and for association rule mining A well-studied technique for masking sensitive information, primarily studied in statistics, is randomizing sensitive attributes by adding random error to values. Recently, this technique was studied in data mining In these works, privacy was quantified by how closely the original values of a randomized attribute can be estimated. Evfimievski et. al proposed the privacy-preserving data mining to extends traditional data mining techniques to handle randomized data. [...]

pdf