Big Data Security Challenges and Solution of Distributed Computing in Hadoop Environment: A Security Framework

Author(s): Gurjit Singh Bhathal*, Amardeep Singh Dhiman

Journal Name: Recent Advances in Computer Science and Communications
Formerly Recent Patents on Computer Science

Volume 13 , Issue 4 , 2020


Become EABM
Become Reviewer
Call for Editor

Graphical Abstract:


Abstract:

Background: In current scenario of internet, large amounts of data are generated and processed. Hadoop framework is widely used to store and process big data in a highly distributed manner. It is argued that Hadoop Framework is not mature enough to deal with the current cyberattacks on the data.

Objective: The main objective of the proposed work is to provide a complete security approach comprising of authorisation and authentication for the user and the Hadoop cluster nodes and to secure the data at rest as well as in transit.

Methods: The proposed algorithm uses Kerberos network authentication protocol for authorisation and authentication and to validate the users and the cluster nodes. The Ciphertext-Policy Attribute- Based Encryption (CP-ABE) is used for data at rest and data in transit. User encrypts the file with their own set of attributes and stores on Hadoop Distributed File System. Only intended users can decrypt that file with matching parameters.

Results: The proposed algorithm was implemented with data sets of different sizes. The data was processed with and without encryption. The results show little difference in processing time. The performance was affected in range of 0.8% to 3.1%, which includes impact of other factors also, like system configuration, the number of parallel jobs running and virtual environment.

Conclusion: The solutions available for handling the big data security problems faced in Hadoop framework are inefficient or incomplete. A complete security framework is proposed for Hadoop Environment. The solution is experimentally proven to have little effect on the performance of the system for datasets of different sizes.

Keywords: Big data, CP-ABE, encryption, hadoop security, kerberos, vulnerabilities.

[1]
FUJITSUF Technology Solutions, "Solution approaches for big data."Fujitsu.Munich, Germany, 2017.
[2]
A. Oussous, F.Z. Benjelloun, and A.A. Lahcen, "Big data technologies: A survey.", J. King Saud University-Computer and Information Sciences. pp. 1-18, 2017.
[3]
S. P. Bappalige, "An introduction to Apache Hadoop for big data", 2014, [Retrieved March 26, 2019, from: , https://opensource.comhttps://opensource.com/life/14/8/intro-apach e-hadoop-big-data].
[4]
"Hadoop Common ", 2019. [Retrieved March 20, 2019, from, , https://www.techopedia.comhttps://www.techopedia.com/definition /30427/hadoop-common].
[5]
D. Borthakur, "HDFS Architecture Guide", 2018, [Retrieved March 20, 2019, from: , https://hadoop.apache.orghttps: //hadoop.apache.org/docs/r1.2.1/hd fs_design.html].
[6]
"MapReduce Tutorial ", 2018, [Retrieved March 21, 2019, from, , https://hadoop.apache.orghttps://hadoop.apache.org/docs/r1.2.1/ma pred_tutorial.html].
[7]
P. Derbekoa, S. Dolevb, E. Gudesb, and S. Sharma, "Security and privacy aspects in MapReduce on clouds: A survey", Comput. Sci. Rev., . pp. 1-28, 2016.
[http://dx.doi.org/10.1016/j.cosrev.2016.05.001]
[8]
"Apache Hadoop YARN ", 2018, [Retrieved March 21, 2019, from, , https://hadoop.apache.orghttps://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html
[9]
T. Ozawa, 2019. [Retrieved March 27, 2019, from , https://issues.apache.org; https://issues.apache.org/jira/browse/MAPREDUCE5221?jql=proje ct%20in%20(HDFS%2C%20MAPREDUCE)%20AND%20issuety pe%20%3D%20Bug%20AND%20status%20%3D%20%22In%20 Progress%22]
[10]
G.S. Bhathal, and A.S. Dhiman, "Big data solution: Improvised distributions", In , Proceedings of the Second International Conference on Intelligent Computing and Control Systems (ICICCS 2018), 2018, pp. 35-38
[http://dx.doi.org/10.1109/ICCONS.2018.8663142]
[11]
"Hadoop in Secure Mode ", 2018. [Retrieved March 24, 2019, from,, http://hadoop.apache.orghttp://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SecureMode.html
[12]
"Hadoop HDFS ", 2018. [Retrieved March 24, 2019, from, , https://hadoop.apache.org/https://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
[13]
B. Saraladevia, N. Pazhanirajaa, P.V. Paula, M.S. Saleem Bashab, and P. Dhavachelvan, "Big Data and Hadoop -A study in Security Prespective", In ; 2nd International Symposium in Big Data and Cloud Computing (ISBCC 15) Elsevier, 2015, , pp. 596-601.
[14]
D.S. Terzi, R. Terzi, and S. Sagiroglu, A survey on security and privacy issues in big data. In , The 10th International Confrence for Internet Technology and Secured Transactions (ICITST-2015).. IEEE, 2015, pp. 202-207.
[15]
R. R. Parmar, S. Roy, D. Bhattacharyya, S. K. Bandyopadhyay, T. H. Kim, , "Large-Scale encryption in the hadoop environment: challenges and solutions", IEEE Access, vol. 5, pp. 7156-7163, 2017.
[16]
J. Bethencourt, A. Sahai, and B. Waters, "Ciphertext-policy attribute-based encryption", In , 2007 IEEE Symposium on Security and Privacy(SP’07), 2007, pp. 1-14.
[17]
P. Johri, and A. Kumar, "Security framework using hadoop for big data", In , International Conference on Computing Communication and Automation (ICCCA-2017) IEEE, 2017, pp. 268-272.
[18]
D. Boneh, R. Canetti, S. Halevi, and J. Katz, "Chosen-ciphertext security from identity-based encryption", SIAM J. of Computing (SICOMP), vol. 36, no. 5, pp. 915-942, 2006.
[19]
V. Goyal, O. Pandey, A. Sahai, and B. Waters, Attribute-based encryption for fine-grained access control of encrypted data. Computer and communications security..ACM:, Virginia, USA, 2006, pp. 89-98.
[20]
H. Zhou, and Q. Wen, "A new solution of data security accessing for hadoop based on CP-ABE.", IEEE. 2014, pp. 525-528.
[http://dx.doi.org/10.1109/ICSESS.2014.6933621]
[21]
J. Bethencourt, A. Sahai, and B. Waters, (2011, March 24).; "Advanced Crypto Software Collection", [Retrieved January 10, 2019, from , http://hms.isi.jhu.edu;http://hms.isi.jhu.edu/acsc/cpabe/# documentation
[22]
B. Lynn, "PBC Library", [Retrieved Jaunary 10, 2019, from, , https://crypto.stanford.edu/;https://crypto.stanford.edu/pbc/
[23]
G.S. Bhathal, and A. Singh, "Big data computing with distributed computing frameworks" In ; Innovations in Electronics and Communication Engineering, Lecture Notes in Networks and Systems, .H. Saini, R. Singh, G. Kumar, G. Rather, K. Santhi, (eds), Hydrabad: Springer Nature Singapore Pte Ltd.,, , vol. 65, 2019, pp. 467- 477.
[http://dx.doi.org/10.1007/978-981-13-3765-9_49]


Rights & PermissionsPrintExport Cite as

Article Details

VOLUME: 13
ISSUE: 4
Year: 2020
Published on: 18 October, 2020
Page: [790 - 797]
Pages: 8
DOI: 10.2174/2213275912666190822095422
Price: $25

Article Metrics

PDF: 12
HTML: 1