Comparative Analysis of Classification Methods with PCA and LDA for Diabetes

Author(s): Dilip Kumar Choubey*, Manish Kumar, Vaibhav Shukla, Sudhakar Tripathi, Vinay Kumar Dhandhania

Journal Name: Current Diabetes Reviews

Volume 16 , Issue 8 , 2020

Become EABM
Become Reviewer
Call for Editor


Background: The modern society is extremely prone to many life-threatening diseases, which can be easily controlled as well as cured if diagnosed at an early stage. The development and implementation of a disease diagnostic system have gained huge popularity over the years. In the current scenario, there are certain factors such as environment, sedentary lifestyle, genetic (hereditary) are the major factors behind the life threatening diseases such as ‘diabetes.’ Moreover, diabetes has achieved the status of the modern man’s leading chronic disease. So one of the prime needs of this generation is to develop a state-of-the-art expert system which can predict diabetes at a very early stage with a minimum of complexity and in an expedited manner. The primary objective of this work is to develop an indigenous and efficient diagnostic technique for detection of diabetes.

Method & Discussion: The proposed methodology comprises of two phases: In the first phase The Pima Indian Diabetes Dataset (PIDD) has been collected from the UCI machine learning repository databases and Localized Diabetes Dataset (LDD) has been gathered from Bombay Medical Hall, Upper Bazar Ranchi, Jharkhand, India. In the second phase, the dataset has been processed through two different approaches. The first approach entails classification through Adaboost, Classification via Regression (CVR), Radial Basis Function Network (RBFN), K-Nearest Neighbor (KNN) on Pima Indian Diabetes Dataset and Localized Diabetes Dataset. In the second approach, Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) have been applied as a feature reduction method followed by using the same set of classification methods used in the first approach. Among all of the implemented classification methods, PCA_CVR achieves the maximum performance for both the above mentioned datasets.

Conclusion: In this article, comparative analysis of outcomes obtained by with and without the use of PCA and LDA for the same set of classification method has been done w.r.t performance assessment. Finally, it has been concluded that PCA & LDA both are useful to remove the insignificant features, decreasing the expense and computation time while improving the ROC and accuracy. The used methodology may similarly be applied to other medical diseases.

Keywords: Adaboost, classification, CVR, feature reduction, KNN, localized diabetes dataset, LDA, RBF N, PCA, pima Indian diabetes dataset.

Choubey Dilip Kumar. Paul, Sanchita., Sandilya, Smita., Dhandhania, Vinay Kumar. (2020). Implementation and Analysis of Classification algorithms for Diabetes. Current Medical Imaging, Bentham Science,Vol. 16, Issue 4, pp. 340-354
Choubey Dilip Kumar. Tripathi, Sudhakar., Kumar, Prabhat., Shukla, Vaibhav., Dhandhania, Vinay Kumar.. (2019). Classification of Diabetes by Kernel based SVM with PSO. Recent Advances in Computer Science and Communications, Bentham Science, Vol. 12, No. 1, pp. 1-14 90716094836
Dogantekin Esin. Dogantekin, Akif., Avci, Derya., Avci, Levent. An Intelligent Diagnosis System for Diabetes on Linear Discriminant Analysis and Adaptive Network Based Fuzzy Inference System: LDA–ANFIS. Digital Signal Processing, Elsevier 2010; 20: 1248-55.
Polat Kemal. Gunes, Salih. An expert system approach based on principal component analysis and adaptive neuro fuzzy inference system to diagnosis of diabetes disease. Digital Signal Processing, Elsevier 2007; 17: 702-10.
Meza-Palacios. Ramiro, Aguilar-Lasserre, Alberto A., Enrique L., Vázquez-Rodríguez, Carlos F., Posada-Gómez, Rubén., and Trujillo-Mata, Armín. Development of a fuzzy expert system for the nephropathy control assessment in patients with type 2 diabetes mellitus. Expert Systems with Applications, Elsevier 2017; 72: 335-43.
Selva kumar, S, Senthamarai Kannan, K, Gothai Nachiyar, S. Pre diction of Diabetes Diagnosis Using Classification Based Data Mining Techniques. International Journal of Statistics and Systems 2017; 12(2): 183-8.
Guo Yang, Bai Guohua. and Hu,Yan.. Using bayes network for prediction of type-2 diabetes Internet Technology and SecuredTransactions, IEEE 2012; 471-2..
Parashar Ankita. Burse, Kavita., and Rawat, Kavita. A Comparative Approach for Pima Indians Diabetes Diagnosis using LDASupport Vector Machine and Feed Forward Neural Network. Int J Adv Res Comput Sci Softw Eng 2014; 4(11): 378-83.
Parashar Ankita. Burse, Kavita and Rawat, Kavita. (2014).Diagnosis of Pima Indians Diabetes by LDA-SVM Approach: A Survey. International Journal of Engineering Research & Technology (IJERT), Vol 3, Issues 10, pp 1192-1194
Choubey Dilip Kumar. Paul, Sanchita., Dhandhania, Vinay Kumar. (2017). Rule Based Diagnosis System for Diabetes. Biomedical Research. Allied Academies 2017; 28(12): 5196-209.
Vijyan V. Veena., Aswathy, Ravi Kumar. (2014). Study of Data Mining Algorithms for Prediction and Diagnosis of Diabetes Mellitus. Int J Comput Appl 2014; 95(17): 12-6.
Chen, Peihua., and Pan, Chuandi. (2018). Diabetes classification model based on boosting algorithms. BMC Bioinformatics 2018; 19(1): 109.
[ PMID: 29587624]
Saravananathan K, Velmurugan T. Analyzing Diabetic Data using Classification Algorithms in Data Mining. Indian J Sci Technol 2016; 9(43): 1-6.
Kandhasamy J. Pradeep., Balamurali, S. (2015). Performance Analysis of Classifier Models to Predict Diabetes Mellitus. Procedia Computer Science, Elsevier 2015; 47: 45-51.
Seera Manjeevan. Lim, Chee Peng. A hybrid intelligent system for medical data classification. Expert Systems with Applications, Elsevier 2014; 41: 2239-49.
Orkcu H. Hasan., Hasan Bal.. Comparing performances of backpropagation and genetic algorithms in the data classification. Expert Systems with Applications, Elsevier 2011.38: 3703-9..
Luukka Pasi. Feature selection using fuzzy entropy measures with similarity classifier. Expert Systems with Applications, Elsevier 2011; 38: 4600-7.
Temurtas Hasan. Yumusak, Nejat., Temurtas, Feyzullah. A Comparative Study On Diabetes Disease Diagnosis Using Neural Networks. Expert Systems with Applications, Elsevier 2009; 36: 8610-5.
Aslam Muhammad Waqar. Zhu, Zhechen., Nandi, Asoke Kumar. Feature generation using genetic programming with comparative partner selection for diabetes classification. Expert Systems with Applications, Elsevier 2013; 40: 5402-12.
Goncalves Laerico Brito. Bernardes, Marley Maria., and Vellasco, Rebuzzi. Inverted Hierarchical Neuro–Fuzzy BSP System: A Novel Neuro-Fuzzy Model for Pattern Classification and Rule Extraction in Databases. IEEE Trans Syst Man Cybern C 2006; 36(2): 236-48.
Selvakuberan K, Kayathiri D, Harini B, Devi M. Indra. An efficient feature selection method for classification in Health Care Systems using Machine Learning Techniques. In: IEEE. 2011; pp. 223-6.
Kayaer Kamer. , Yildirim, Tulay.. Medical Diagnosis on Pima Indian Diabetes Using General Regression Neural Networks. IEEE In: 2003; pp. 181-4.
Polat Kemal. Gunes, Salih., Arslan, Ahmet. A cascade learning system for classification of diabetes disease: Generalized Discriminant Analysis and Least Square Support Vector Machine. Expert Systems with Applications, Elsevier 2008; 34: 482-7.
UCI Repository of Bioinformatics Databases [online]. Repository.html
Ganji Mostafa Fathi. Abadeh, Mohammad Saniee Using fuzzy Ant Colony Optimization for Diagnosis of Diabetes Disease Proceedings of ICEE May 11-13. IEEE 2010; pp. 501-5.
Choubey, Dilip Kumar. Paul, Sanchita.. GA_SVM-A Classification System for Diagnosis of Diabetes, Handbook of Research on Nature Inspired Soft Computing and Algorithms, IGI Global. 359- 97.
Choubey Dilip Kumar Paul Sanchita. GA_J48graft DT: A Hybrid Intelligent System for Diabetes Disease Diagnosis International Journal of Bio-Science and Bio-Technology (IJBSBT) SERSC. 2015; 7(5): 135-50. ISSN: 2233-7849.
Kahramanli Humar. Allahverdi, Novruz. Design of a hybrid system for the diabetes and heart diseases. Expert Systems with Applications, Elsevier 2008; 35: 82-9.
Naser Samy S, Abu , Ola Abu Zaiter A. An Expert System for Diagnosing Eye Diseases Using Clips. Journal of Theoretical and Applied Information Technology (JATIT) 2005-2008; 923-30.
Lee C-S, Wang MH. A fuzzy expert system for diabetes decision support application. IEEE Trans Syst Man Cybern B Cybern 2011; 41(1): 139-53.
[ PMID: 20501347]
Karatsiolis, Savvas., Schizas, Christos N. Region based Support vector machine algorithm for medical diagnosis on pima indian diabetes dataset. Proceedings of the IEEE 12th International Conference on Bioinformatics Bioengineering (BIBE), Larnaca, Cyprus. 139-4..
Ephzibah EP. Cost effective approach on feature selection using genetic algorithms and fuzzy logic for diabetes diagnosis. Int J Soft Comput 2011; 2(1): 1-10. [IJSC]
C Kalaiselvi. G. M. Nasira, Ph.D.. A new approach for diagnosis of diabetes and prediction of cancer using ANFIS. World Congress on Computing and Communication Technologies. 188-90..
Qasem Sultan Noman. Shamsuddin, Siti Mariyam. Radial basis function network based on time variant multi objective particle swarm optimization for medical diseases diagnosis. Applied Soft Computing, Elsevier 2011; 11: 1427-38.
Karegowda Asha Gowda. Manjunath, A.S., Jayaram, M.A. Application of Genetic Algorithm Optimized Neural Network Connection Weights For Medical Diagnosis of Pima Indians Diabetes. Int J Soft Comput 2011; 2(2): 15-23. [IJSC]>
Jayalakshmi T, Santhakumaran A. A Novel Classification Method for Diagnosis of Diabetes Mellitus Using Artificial Neural Networks.
Choubey Dilip Kumar. Paul, Sanchita.; GA_MLP NN: A Hybrid Intelligent System for Diabetes Disease Diagnosis. International Journal of Intelligent Systems and Applications (IJISA) MECS, ISSN: 2074-904X (Print), ISSN: 2074-9058 (Online). 2016; 8(1): 49-59..
Barakat, Nahla H. Bradley, Andrew P., and Barakat, Mohamed Nabil H. Intelligible support vector machines for diagnosis of diabetes mellitus. IEEE Trans Inf Technol Biomed 2010; 14(4): 1114-20.
[ PMID: 20071261]
Barakat Nahla H, and, Bradley Andrew P. Rule Extraction from Support Vector Machines: A Sequential Covering Approach. IEEE Trans Knowl Data Eng 2007; 19(6): 729-41.
Choubey, Dilip Kumar., Paul, Sanchita., Dhandhenia, Vinay Kumar.. GA_NN: An intelligent Classification System for Diabetes, Chapter 2, Soft Computing for Problem Solving. In: Advances in Intelligent Systems and Computing 817, Springer. 2; pp. 11-23.
Choubey Dilip Kumar. Paul, Sanchita., Bhattacharjee, Joy.; Soft Computing Approaches for Diabetes Disease Diagnosis: A Survey. International Journal of Applied Engineering Research (IJAER). RIP 2014; 9: 11715-26.
Barakat N. Rule extraction from support vector machines: Medical diagnosis prediction and explanation. Ph.D. thesis, School Inf. Technol. Electr. Eng. (ITEE), Univ. Queensland, Brisbane, Australia.. 2007.
Choubey Dilip Kumar. Paul, Sanchita. Classification Techniques for Diagnosis of Diabetes Disease: A Review. International Journal of Biomedical Engineering and Technology (IJBET). Inderscience 2016; 21(1): 15-39.
Choubey Dilip Kumar. Paul, Sanchita. GA_RBF NN: A Classification System for Diabetes, International Journal of Biomedical Engineering and Technology (IJBET). Inderscience 2017; 23(1): 71-93.
Patil BM, Joshi RC. Toshniwal, Durga. Association rule for classification of type -2 diabetic patients. Second International Conference on Machine Learning and Computing. 330-4.
Palivela Hemant. Thotadara Pushpavathi A novel approach to predict diabetes by Cascading Clustering and Classification Computing Communication Networking Technologies. ICCCNT 2012; pp. 1-7.
Daho Mostafa El Habib. Settouti,Nesma., Lazouni, Mohammed El Amine., Chikh, M. Amine. Recognition of Diabetes Disease Using A New Hybrid Learning Algorithm For Nefclass. 8th International Workshop on Systems, Signal Processing and their Applications (WoSSPA). 239-43.
Sathasivam Saratha. Hamadneh, Nawaf., Choon, Ong Hong. (2011). Comparing Neural Networks: Hopfield Network and RBF Network. Appl Math Sci 2011; 5(69): 3439-52.
Kala Rahul. Evolutionary Radial Basis Function Network for Classificatory Problems. International Journal of Computer Science and Applications. Technomathematics Research Foundation 2010; 7(4): 34-49.
Duygu., Calisir, and. Dogantekin, Esin. An Automatic Diabetes Diagnosis System based on LDA-Wavelet Support Vector Machine classifier. Expert System with Applications 2011; 38: 8311-5.
Choubey Dilip Kumar. Paul, Sanchita., Kumar, Santosh., Kumar, Shankar. (2017). Classification of pima indian diabetes dataset using naive bayes with genetic algorithm as an attribute selection. CRC Press Taylor Francis, Communication and Computing Systems: Proceedings of the International Conference on Communication and Computing System (ICCCS 2016), PP. 451-455
Choubey Dilip Kumar, Paul Sanchita, Bala Kanchan, Kumar Manish, Singh Uday Pratap. Implementation of a Hybrid Classification Method for diabetes In: Innovations in Multimedia Data Engineering and Management. 201-40..
Jaakkola Tommi S. MIT CSAIL “Machine Learning: Lecture 5”.

Rights & PermissionsPrintExport Cite as

Article Details

Year: 2020
Published on: 11 September, 2020
Page: [833 - 850]
Pages: 18
DOI: 10.2174/1573399816666200123124008
Price: $65

Article Metrics

PDF: 23