A Micro-Aggregation Algorithm Based on Density Partition Method for Anonymizing Biomedical Data

Xiang        Wu; Yuyang       Wei; Tao        Jiang; Yu        Wang; Shuguang        Jiang

Abstract

Objective: Biomedical data can be de-identified via micro-aggregation achieving k - anonymity privacy. However, the existing micro-aggregation algorithms result in low similarity within the equivalence classes, and thus, produce low-utility anonymous data when dealing with a sparse biomedical dataset. To balance data utility and anonymity, we develop a novel microaggregation framework.

Methods: Combining a density-based clustering method and classical micro-aggregation algorithm, we propose a density-based second division micro-aggregation framework called DBTP . The framework allows the anonymous sets to achieve the optimal k- partition with an increased homogeneity of the tuples in the equivalence class. Based on the proposed framework, we propose a k − anonymity algorithm DBTP − MDAV and an l − diversity algorithm DBTP − l − MDAV to respond to different attacks.

Conclusions: Experiments on real-life biomedical datasets confirm that the anonymous algorithms under the framework developed in this paper are superior to the existing algorithms for achieving high utility.

Keywords: Privacy protection, micro-aggregation, k − anonymity, l − diversity, clustering, biomedical data.

« Previous Next »

Graphical Abstract

[1] 
Kohlmayer F, Prasser F, Kuhn KA. The cost of quality: Implementing generalization and suppression for anonymizing biomedical data with minimal information loss. J Biomed Inform  2015; 58: 37-48.
[2] 
Gkoulalas-Divanis A, Loukides G, Sun J. Publishing data from electronic health records while preserving privacy: a survey of algorithms. J Biomed Inform  2014; 50(8): 4-19.
[3] 
Malin BA, Emam KE, O’Keefe CM. Biomedical data privacy: problems, perspectives, and recent advances. J Am Med Inform Assoc  2013; 20(1): 2-6.
[4] 
Sweeney L. k-anonymity: A model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst  2002; 10(5): 1-14.
[5] 
Soria-Comas J, Domingo-Ferrer J, Sanchez D, et al. t-Closeness through Microaggregation: Strict Privacy with En-hanced Utility Preservation. IEEE Trans Knowl Data Eng  2015; 27(11): 3098-110.
[6] 
Mezher AM, Álvarez AG, Rebollo-Monedero D, et al. Com-putational improvements in parallelized k-anonymous mi-croaggregation of large databases. IEEE 37th International Conference
on Distributed Computing Systems Workshops;. 2017 June 5-8; Atlanta, GA, USA..  IEEE 258-64.
[7] 
Tonni SM, Rahman MZ, Parvin S, et al. Securing Big Data Efficiently through Microaggregation Technique. IEEE 37th International
Conference on Distributed Computing Systems Workshops; . 2017 June 5-8;; Atlanta, GA, USA. .  IEEE 125-30
[8] 
Rebollo-Monedero D, Forné J, Soriano M, et al. p-Probabilistic k-anonymous microaggregation for the anony-mization of surveys with uncertain participation. Inf Sci  2017; 382: 388-414.
[9] 
Salari M, Jalili S, Mortazavi R. TBM, a transformation based method for microaggregation of large volume mixed data. Data Min Knowl Discov  2016; 31(1): 1-27.
[10] 
Ester M, Kriegel HP, Sander J, et al. A density-based algo-rithm for discovering clusters in large spatial databases with noise. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining.  1996 Aug 02-04; Portland, Oregon, USA. 226-31.
[11] 
Hundepool A, Van de Wetering A, Ramaswamy R, et al. μ- ARGUS version 40 Software and User’s Manual Statistics Netherlands, Voorburg NL.  2005.
[12] 
Chakraborty S, Tripathy BK. Privacy preservation in rela-tional data through l-diversity and recursive (c, l) diversity anonymisation. International Journal of Mathematical Model-ling and Numerical Optimisation  2016; 7(3-4): 338-62.
[13] 
Nakamura Y, Sawaguchi S, Nishi H. Implementation and evaluation of an FPGA-based network data anonymizer. IEEJ Trans Electr Electron Eng  2017; 12(5): S134-40.
[14] 
Yin C, Zhang S, Xi J, et al. An improved anonymity model for big data security based on clustering algorithm. Concurr Comput  2017; 29(7)
[15] 
Ranjan A, Ranjan P. Two-phase entropy based approach to big data anonymization. International Conference on Compu-ting, Communication and Automation.  2016 Apr 29-30; Noida, India: IEEE. 2017 ; pp. 76-81.
[16] 
Chen Y, Zhang Z, Lin D, Cao Z. Anonymous Identity-Based Hash Proof System and Its Applications. In: Takagi T, Wang G, Qin Z, Jiang S, Yu Y, Eds. 6th International Conference on Provable Security.  2012 Sep 26-28; Berlin, Heidelberg ProvSec. 143-60.
[17] 
Xiao J, Li XU. A-Yong Y E, et al Research on Trajectory Privacy Preserving over Road Network Based on Voronoi Di-agram. Netinfo Security 2016.
[18] 
Karthikeyan B, Issac R. Secure Collaborative Data Publishing by Using Computation Trusted Third-Party Protocols. International Journal of Engineering Sciences & Research Technology  2014; 3(4)
[19] 
Jia J, Chen F. Research on Anonymization of User Identity in Digital Library Computer Engineering 2016.
[20] 
Rebollo-Monedero D, Forné J, Soriano M, et al. k-Anonymous microaggregation with preservation of statistical dependence. Inf Sci  2016; 342: 1-23.
[21] 
Sánchez D, Domingo-Ferrer J, Martínez S, et al. Utility-preserving differentially private data releases via individual ranking microaggregation. Inf Fusion  2016; 30(C): 1-14.
[22] 
You T, Cheng HM, Ning YZ, et al. Community detection in complex networks using density-based clustering algorithm and manifold learning. Physica A  2016; 464: 221-30.
[23] 
Zhang W Q, Chen W H, Statistics S O. Multiple Partition Clus-tering Algorithm Based on High Dimensional Step ProjectionStatistics & Information Forum 2017. 2: 002
[24] 
Anwar T, Liu C, Hai LV, et al. Partitioning road networks using density peak graphs. Inf Syst  2017; 64(C): 22-40.
[25] 
Huang S. Cloud Computing K-Means Text Clustering Filtering Algorithm based on Hadoop. International Conference on Machinery, Materials and Information Technology Applica-tions, Advances in Computer Science Research.  volume 71: 1516-21.
[26] 
Yang G, Yang J, Zhang J, et al. Research on data streams pub-lishing of privacy preserving. IEEE International Conference on Information Theory and Information Security.  2010 Dec 17-19; Beijing, China IEEE. 2010; pp. 199-202.
[27] 
Machanavajjhala A, Gehrke J, Kifer D, et al. L-diversity: pri-vacy beyond k-anonymity. ACM Trans Knowl Discov Data [TKDD]  2006; 1(1): 3.
[28] 
Wang Q, Zhang GJ, Wang Q, Zhang G. Microaggregation algorithm for single sensitive attribute diversely. Computer Engineering and Applications  2015; 51(11): 72-5.
[29] 
Thang TM, Kim J. The anomaly detection by using DBSCAN clustering with multiple parameters. International Conference on Information Science and Applications.  2011 Apr 26-29; Jeju Island, South Korea  IEEE . 2011; pp. 1-5.

Rights & Permissions Print Cite

Article Metrics

43

3

1

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1574893614666190416152025	Print ISSN 1574-8936
Publisher Name Bentham Science Publisher	Online ISSN 2212-392X

Current Bioinformatics

A Micro-Aggregation Algorithm Based on Density Partition Method for Anonymizing Biomedical Data

Abstract

Graphical Abstract

Related Journals

Related Books

Related Articles