A Micro-Aggregation Algorithm Based on Density Partition Method for Anonymizing Biomedical Data

Author(s): Xiang Wu, Yuyang Wei, Tao Jiang, Yu Wang, Shuguang Jiang*.

Journal Name: Current Bioinformatics

Volume 14 , Issue 7 , 2019

Become EABM
Become Reviewer

Graphical Abstract:


Abstract:

Objective: Biomedical data can be de-identified via micro-aggregation achieving k - anonymity privacy. However, the existing micro-aggregation algorithms result in low similarity within the equivalence classes, and thus, produce low-utility anonymous data when dealing with a sparse biomedical dataset. To balance data utility and anonymity, we develop a novel microaggregation framework.

Methods: Combining a density-based clustering method and classical micro-aggregation algorithm, we propose a density-based second division micro-aggregation framework called DBTP . The framework allows the anonymous sets to achieve the optimal k- partition with an increased homogeneity of the tuples in the equivalence class. Based on the proposed framework, we propose a k − anonymity algorithm DBTP − MDAV and an l − diversity algorithm DBTP − l − MDAV to respond to different attacks.

Conclusions: Experiments on real-life biomedical datasets confirm that the anonymous algorithms under the framework developed in this paper are superior to the existing algorithms for achieving high utility.

Keywords: Privacy protection, micro-aggregation, k − anonymity, l − diversity, clustering, biomedical data.

[1]
Kohlmayer F, Prasser F, Kuhn KA. The cost of quality: Implementing generalization and suppression for anonymizing biomedical data with minimal information loss. J Biomed Inform 2015; 58: 37-48.
[2]
Gkoulalas-Divanis A, Loukides G, Sun J. Publishing data from electronic health records while preserving privacy: a survey of algorithms. J Biomed Inform 2014; 50(8): 4-19.
[3]
Malin BA, Emam KE, O’Keefe CM. Biomedical data privacy: problems, perspectives, and recent advances. J Am Med Inform Assoc 2013; 20(1): 2-6.
[4]
Sweeney L. k-anonymity: A model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 2002; 10(5): 1-14.
[5]
Soria-Comas J, Domingo-Ferrer J, Sanchez D, et al. t-Closeness through Microaggregation: Strict Privacy with En-hanced Utility Preservation. IEEE Trans Knowl Data Eng 2015; 27(11): 3098-110.
[6]
Mezher AM, Álvarez AG, Rebollo-Monedero D, et al. Com-putational improvements in parallelized k-anonymous mi-croaggregation of large databases. IEEE 37th International Conference on Distributed Computing Systems Workshops;. 2017 June 5-8; Atlanta, GA, USA.. IEEE 258-64.
[7]
Tonni SM, Rahman MZ, Parvin S, et al. Securing Big Data Efficiently through Microaggregation Technique. IEEE 37th International Conference on Distributed Computing Systems Workshops; . 2017 June 5-8;; Atlanta, GA, USA. . IEEE 125-30
[8]
Rebollo-Monedero D, Forné J, Soriano M, et al. p-Probabilistic k-anonymous microaggregation for the anony-mization of surveys with uncertain participation. Inf Sci 2017; 382: 388-414.
[9]
Salari M, Jalili S, Mortazavi R. TBM, a transformation based method for microaggregation of large volume mixed data. Data Min Knowl Discov 2016; 31(1): 1-27.
[10]
Ester M, Kriegel HP, Sander J, et al. A density-based algo-rithm for discovering clusters in large spatial databases with noise. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining. 1996 Aug 02-04; Portland, Oregon, USA. 226-31.
[11]
Hundepool A, Van de Wetering A, Ramaswamy R, et al. μ- ARGUS version 40 Software and User’s Manual Statistics Netherlands, Voorburg NL. 2005.
[12]
Chakraborty S, Tripathy BK. Privacy preservation in rela-tional data through l-diversity and recursive (c, l) diversity anonymisation. International Journal of Mathematical Model-ling and Numerical Optimisation 2016; 7(3-4): 338-62.
[13]
Nakamura Y, Sawaguchi S, Nishi H. Implementation and evaluation of an FPGA-based network data anonymizer. IEEJ Trans Electr Electron Eng 2017; 12(5): S134-40.
[14]
Yin C, Zhang S, Xi J, et al. An improved anonymity model for big data security based on clustering algorithm. Concurr Comput 2017; 29(7)
[15]
Ranjan A, Ranjan P. Two-phase entropy based approach to big data anonymization. International Conference on Compu-ting, Communication and Automation. 2016 Apr 29-30; Noida, India: IEEE. 2017 ; pp. 76-81.
[16]
Chen Y, Zhang Z, Lin D, Cao Z. Anonymous Identity-Based Hash Proof System and Its Applications. In: Takagi T, Wang G, Qin Z, Jiang S, Yu Y, Eds. 6th International Conference on Provable Security. 2012 Sep 26-28; Berlin, Heidelberg ProvSec. 143-60.
[17]
Xiao J, Li XU. A-Yong Y E, et al Research on Trajectory Privacy Preserving over Road Network Based on Voronoi Di-agram. Netinfo Security 2016.
[18]
Karthikeyan B, Issac R. Secure Collaborative Data Publishing by Using Computation Trusted Third-Party Protocols. International Journal of Engineering Sciences & Research Technology 2014; 3(4)
[19]
Jia J, Chen F. Research on Anonymization of User Identity in Digital Library Computer Engineering 2016.
[20]
Rebollo-Monedero D, Forné J, Soriano M, et al. k-Anonymous microaggregation with preservation of statistical dependence. Inf Sci 2016; 342: 1-23.
[21]
Sánchez D, Domingo-Ferrer J, Martínez S, et al. Utility-preserving differentially private data releases via individual ranking microaggregation. Inf Fusion 2016; 30(C): 1-14.
[22]
You T, Cheng HM, Ning YZ, et al. Community detection in complex networks using density-based clustering algorithm and manifold learning. Physica A 2016; 464: 221-30.
[23]
Zhang W Q, Chen W H, Statistics S O. Multiple Partition Clus-tering Algorithm Based on High Dimensional Step ProjectionStatistics & Information Forum 2017. 2: 002
[24]
Anwar T, Liu C, Hai LV, et al. Partitioning road networks using density peak graphs. Inf Syst 2017; 64(C): 22-40.
[25]
Huang S. Cloud Computing K-Means Text Clustering Filtering Algorithm based on Hadoop. International Conference on Machinery, Materials and Information Technology Applica-tions, Advances in Computer Science Research. volume 71: 1516-21.
[26]
Yang G, Yang J, Zhang J, et al. Research on data streams pub-lishing of privacy preserving. IEEE International Conference on Information Theory and Information Security. 2010 Dec 17-19; Beijing, China IEEE. 2010; pp. 199-202.
[27]
Machanavajjhala A, Gehrke J, Kifer D, et al. L-diversity: pri-vacy beyond k-anonymity. ACM Trans Knowl Discov Data [TKDD] 2006; 1(1): 3.
[28]
Wang Q, Zhang GJ, Wang Q, Zhang G. Microaggregation algorithm for single sensitive attribute diversely. Computer Engineering and Applications 2015; 51(11): 72-5.
[29]
Thang TM, Kim J. The anomaly detection by using DBSCAN clustering with multiple parameters. International Conference on Information Science and Applications. 2011 Apr 26-29; Jeju Island, South Korea IEEE . 2011; pp. 1-5.


Rights & PermissionsPrintExport Cite as

Article Details

VOLUME: 14
ISSUE: 7
Year: 2019
Page: [667 - 675]
Pages: 9
DOI: 10.2174/1574893614666190416152025
Price: $58

Article Metrics

PDF: 21
HTML: 3
EPUB: 1
PRC: 1