Malicious apps Identification in Android Devices Using Machine Learning Algorithms

Ravinder       Ahuja; Vineet       Maheshwari; Siddhant      Manglik; Abiha       Kazmi; Rishika       Arora; Anuradha       Gupta

Abstract

Background & Objective: In this paper, malicious apps detection system is implemented using machine learning algorithms. For this 330 permission based features of 558 android applications are taken into consideration.

Methods: The main motto of this work is to develop a model which can effectively detect the malicious and benign apps. In this we have used six feature selection techniques which will extract important features from 330 permission based features of 558 apps and further fourteen classification algorithms are applied using Python language.

Results: In this paper, an efficient model for detecting malicious apps has been proposed.

Conclusion: Proposed model is able to detect malicious apps approx. 3% better than existing system.

Keywords: Classification techniques, feature selection, malware identification, ensemble algorithms, static analysis, python language.

« Previous Next »

Graphical Abstract

[1] 
Aswini AM, Vinod P. Android Malware Analysis Using Ensemble Features. In: Chakraborty RS, Matyas V, Schaumont P (eds)Security, Privacy, and Applied Cryptography Engineering SPACE.  2014.Lecture Notes in Computer Science,; vol 8804. Springer,Cham.. 
[http://dx.doi.org/10.1007/978-3-319-12060-7_20] 
[2] 
Wang W, Li Y, Wang X, Liu J, Zhang X. Detecting android malicious apps and categorizing benign apps with an ensemble of classifiers. Future Gener Comput Syst  2018; 78: 987-94.
[http://dx.doi.org/10.1016/j.future.2017.01.019] 
[3] 
Pandita R, Xiao X, Yang W, Enck W, Xie T. WHYPER: Towards automating risk assessment of mobile applications the 22nd USENIX Security Symposium (USENIX Security 13),2013. 
[4] 
Barrera D, Kayacik HG, van Oorschot PC, Somayaji A. A methodology for empirical analysis of permission-based security models and its application to android. In: . Proceedings of the17th ACM conference on Computer and communications security,.  2010.
[5] 
Mahindru A, Singh P. Dynamic permissions based android malware detection using machine learning techniques. Proceedings of the 10th Innovations in Software Engineering Conference 2017. 
[http://dx.doi.org/10.1145/3021460.3021485] 
[6] 
Felt AP, Greenwood K, Wagner D. The effectiveness of application permissions. Proceedings of WebApps’11 Proceedings of the 2nd USENIX conference on Web application development 2011.
[7] 
Enck W, Ongtang M, McDaniel P. On lightweight mobile phone application certification. Proceedings of the 16th ACM conference on Computer and communications security 2009.
[8] 
Zhou Y, Wang Z, Zhou W, Jiang X. Hey, you, get off of my market: Detecting malicious apps in official and alternative android markets. Proceedings of the 19th Network and Distributed System Security Symposium NDSS 2012.
[9] 
Shabtai A, Fledel Y, Elovici Y. Automated static code analysis for classifying android applications using machine learning.2010 International Conference on Computational Intelligence and Security In: Nanning, China 2010.
[10] 
La Polla M, Martinelli F, Sgandurra D. A survey on security for mobile devices. IEEE Comm Surv Tutor  2013; 15(1): 446-71.
[http://dx.doi.org/10.1109/SURV.2012.013012.00028] 
[11] 
Nath H, Mehtre B. Static malware analysis using machine learning methodsIn: Communications in Computer and Information Science (CCIS2014), 2014.. 
[http://dx.doi.org/10.1007/978-3-642-54525-2_39] 
[12] 
Lindorfer M, Neugschwandtner M, Platzer C.  MARVIN: Efficient and comprehensive mobile app classification through static and dynamic analysis. 2015 IEEE 39th Annual Computer Software and Applications Conference (COMPSAC),. 2015.
[13] 
Pirscoveanu R, Hansen S, Larsen T, et al. IEEE International Conference on Cyber Situational Awareness, Data Analytics and Assessment (CyberSA) 2017.
[14] 
Amin M, Zaman M, Hossain M, et al. Behavioral malware detection approaches for Android. 2016 IEEE International Conference on Communications (ICC) In: Kuala Lumpur, Malaysia 2016.
[15] 
Wang W, Zhao M, Wang J. Effective android malware detection with a hybrid model based on deep autoencoder and convolutional neural network. J Ambient Intell Humaniz Comput  2019; 10: 3035-43.
[16] 
Zhou Y, Wang Z, Zhou W, Jiang X. .Hey, you, get off of my market: Detecting malicious apps in official and alternative android markets. NDSS 2012; (4): 50-2.. 
[17] 
Jain A, Singh AK. Integrated malware analysis using machine learning. 2nd International Conference on Telecommunication and Networks (TEL-NET) In: Noida, India 2017.
[18] 
Karbab EB, Debbabi M, Derhab A, Mouheb D. MalDozer: Automatic framework for android malware detection using deep learning. Digit Invest  2018; 24: S48-59.
[http://dx.doi.org/10.1016/j.diin.2018.01.007] 
[19] 
López CCU, Cadavid AN. Framework for malware analysis in Android. Syst Telemat  2016; 14(37): 45-56.
[20] 
Díaz-Uriarte R, Alvarez de Andrés S. Gene selection and classification of microarray data using random forest. BMC Bioinformat  2006; 7(1): 3.
[http://dx.doi.org/10.1186/1471-2105-7-3 PMID: 16398926] 
[21] 
Ding H, Feng PM, Chen W, Lin H. Identification of bacteriophage virion proteins by the ANOVA feature selection and analysis. Mol Biosyst  2014; 10(8): 2229-35.
[http://dx.doi.org/10.1039/C4MB00316K PMID: 24931825] 
[22] 
Demir O, Yılmaz CA. Computer-aided detection of lung nodules using outer surface features. Biomed Mater Eng  2015; 26(1): S1213-22.
[http://dx.doi.org/10.3233/BME-151418 PMID: 26405880] 
[23] 
Koller D, Sahami M. Toward optimal feature selection. Stanford InfoLab 1996.
[24] 
Granitto PM, Furlanello C, Biasioli F, Gasperi F. Recursive feature elimination with the random forest for PTR-MS analysis of agro-industrial products. Chemom Intell Lab Syst  2006; 83(2): 83-90.
[http://dx.doi.org/10.1016/j.chemolab.2006.01.007] 
[25] 
Jin X, Xu A, Bie R, Guo P. Machine learning techniques and chi-square feature selection for cancer classification using SAGE gene expression profiles. International Workshop on Data Mining for Biomedical Applications 2006.
[http://dx.doi.org/10.1007/11691730_11] 
[26] 
Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B. Support vector machines. IEEE Intell Syst Their Appl  1998; 13(4): 18-28.
[http://dx.doi.org/10.1109/5254.708428] 
[27] 
Vincent P, Bengio Y. K-local hyperplane and convex distance nearest neighbour algorithms.In: Advances in Neural Information Processing Systems, 2002.
[28] 
McCallum A, Nigam K. A comparison of event models for naive Bayes text classification AAAI-98 workshop on learning for text categorization 1998; 752(1): 41-8.. 
[29] 
Agatonovic-Kustrin S, Beresford R. Basic concepts of Artificial Neural Network (ANN) modeling and its application in pharmaceutical research. J Pharm Biomed Anal  2000; 22(5): 717-27.
[http://dx.doi.org/10.1016/S0731-7085(99)00272-1 PMID: 10815714] 
[30] 
Liaw A, Wiener M. Classification and regression by random forest. R News  2002; 2(3): 18-22.
[31] 
Soucy P, Mineau GW. A simple KNN algorithm for text categorization. Proceedings 2001 IEEE International Conference on Data Mining San Jose, CA, USA. 2001.
[http://dx.doi.org/10.1109/ICDM.2001.989592] 
[32] 
Sanders SR, Noworolski JM, Liu XZ, Verghese GC. Generalized averaging method for power conversion circuits. IEEE Trans Power Electron  1991; 6(2): 251-9.
[http://dx.doi.org/10.1109/63.76811] 
[33] 
Dietterich TG. Ensemble methods in machine learning. International workshop on multiple classifier systems, 2000.. 
[http://dx.doi.org/10.1007/3-540-45014-9_1] 
[34] 
Breiman L. Bagging predictors. Mach Learn  1996; 24(2): 123-40.
[http://dx.doi.org/10.1007/BF00058655] 
[35] 
Ke G, Meng Q, Finley T, et al. Lightgbm: A highly efficient gradient boosting decision tree.In: Advances in Neural Information Processing Systems,  2017.
[36] 
Korada NK, Kumar NSP. Implementation of naïve Bayesian classifier and ada-boost algorithm using maize expert system. Int J Informat Sci Tech  2012; 2(3): 2.
[37] 
Ridgeway G. Generalized Boosted Models: A guide to the GBM package. Update  2007; 2007: 1.
[38] 
Naess OE. Superstack- An iterative stacking algorithm. Geophys Prospect  1979; 27(1): 16-28.
[http://dx.doi.org/10.1111/j.1365-2478.1979.tb00956.x] 
[39] 
Rokach L. Ensemble-based classifiers. Artif Intell Rev  2010; 33(1-2): 1-39.
[http://dx.doi.org/10.1007/s10462-009-9124-7] 
[40] 
Ruta D, Gabrys B. Classifier selection for majority voting. Inf Fusion  2005; 6(1): 63-81.
[http://dx.doi.org/10.1016/j.inffus.2004.04.008] 
[41] 
Umanol M, Okamoto H, Hatono I, et al. Fuzzy decision trees by fuzzy ID3 algorithm and its application to diagnosis systems. In: Proceedings of 1994 IEEE 3rd International Fuzzy Systems Conference. Orlando, FL, USA 1994.
[http://dx.doi.org/10.1109/FUZZY.1994.343539] 
[42] 
Tabaei BP, Herman WH. A multivariate logistic regression equation to screen for diabetes: Development and validation. Diabetes Care  2002; 25(11): 1999-2003.
[http://dx.doi.org/10.2337/diacare.25.11.1999 PMID: 12401746] 
[43] 
Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine learning in Python. J Mach Learn Res  2011; 12: 2825-30.

Rights & Permissions Print Cite

Article Metrics

8

2

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/2210327909666191204125100	Print ISSN 2210-3279
Publisher Name Bentham Science Publisher	Online ISSN 2210-3287

International Journal of Sensors, Wireless Communications and Control

Malicious apps Identification in Android Devices Using Machine Learning Algorithms

Abstract

Graphical Abstract

Federated learning for biomedical applications

Information, Trust, and Risk: Exploring the Intersection of Sensing, Wireless Communications, and Control

Machine Learning for Industry 4.0 manufacturing applications

Next-Generation Network Architecture, Algorithms, and Security

International Journal of Sensors, Wireless Communications and Control

Malicious apps Identification in Android Devices Using Machine Learning Algorithms

Abstract

Graphical Abstract

Call for Papers in Thematic Issues

Federated learning for biomedical applications

Information, Trust, and Risk: Exploring the Intersection of Sensing, Wireless Communications, and Control

Machine Learning for Industry 4.0 manufacturing applications

Next-Generation Network Architecture, Algorithms, and Security

Related Journals

Related Books