Prediction of K562 Cells Functional Inhibitors Based on Machine Learning Approaches

Author(s): Yuan Zhang, Zhenyan Han, Qian Gao, Xiaoyi Bai, Chi Zhang*, Hongying Hou*.

Journal Name: Current Pharmaceutical Design

Volume 25 , Issue 40 , 2019

Become EABM
Become Reviewer

Abstract:

Background: β thalassemia is a common monogenic genetic disease that is very harmful to human health. The disease arises is due to the deletion of or defects in β-globin, which reduces synthesis of the β-globin chain, resulting in a relatively excess number of α-chains. The formation of inclusion bodies deposited on the cell membrane causes a decrease in the ability of red blood cells to deform and a group of hereditary haemolytic diseases caused by massive destruction in the spleen.

Methods: In this work, machine learning algorithms were employed to build a prediction model for inhibitors against K562 based on 117 inhibitors and 190 non-inhibitors.

Results: The overall accuracy (ACC) of a 10-fold cross-validation test and an independent set test using Adaboost were 83.1% and 78.0%, respectively, surpassing Bayes Net, Random Forest, Random Tree, C4.5, SVM, KNN and Bagging.

Conclusion: This study indicated that Adaboost could be applied to build a learning model in the prediction of inhibitors against K526 cells.

Keywords: Machine learning, cross-validation test, independent set test, Adaboost; feature selection, K526 cells.

[1]
Giardine B, Borg J, Viennas E, et al. Updates of the HbVar database of human hemoglobin variants and thalassemia mutations. Nucleic Acids Res 2014; 42(Database issue): D1063-9.
[http://dx.doi.org/10.1093/nar/gkt911] [PMID: 24137000]
[2]
Old JM. Screening and genetic diagnosis of haemoglobin disorders. Blood Rev 2003; 17(1): 43-53.
[http://dx.doi.org/10.1016/S0268-960X(02)00061-9] [PMID: 12490210]
[3]
Watanapokasin R, Sanmund D, Winichagoon P, Muta K, Fucharoen S. Hydroxyurea responses and fetal hemoglobin induction in beta-thalassemia/HbE patients’ peripheral blood erythroid cell culture. Ann Hematol 2006; 85(3): 164-9.
[http://dx.doi.org/10.1007/s00277-005-0049-1] [PMID: 16389564]
[4]
Kohli-Kumar M, Marandi H, Keller MA, Guertin K, Hvizdala E. Use of hydroxyurea and recombinant erythropoietin in management of homozygous beta0 thalassemia. J Pediatr Hematol Oncol 2002; 24(9): 777-8.
[http://dx.doi.org/10.1097/00043426-200212000-00021] [PMID: 12468925]
[5]
De Franceschi L, Beuzard Y, Jouault H, Brugnara C. Modulation of erythrocyte potassium chloride cotransport, potassium content, and density by dietary magnesium intake in transgenic SAD mouse. Blood 1996; 88(7): 2738-44.
[http://dx.doi.org/10.1182/blood.V88.7.2738.bloodjournal8872738] [PMID: 8839870]
[6]
Olivieri NF, Rees DC, Ginder GD, et al. Treatment of thalassaemia major with phenylbutyrate and hydroxyurea. Lancet 1997; 350(9076): 491-2.
[http://dx.doi.org/10.1016/S0140-6736(05)63080-2] [PMID: 9274590]
[7]
McDonagh KT, Dover GJ, Donahue RE, et al. Hydroxyurea-induced HbF production in anemic primates: augmentation by erythropoietin, hematopoietic growth factors, and sodium butyrate. Exp Hematol 1992; 20(10): 1156-64.
[PMID: 1385194]
[8]
Macari ER, Lowrey CH. Induction of human fetal hemoglobin via the NRF2 antioxidant response signaling pathway. Blood 2011; 117(22): 5987-97.
[http://dx.doi.org/10.1182/blood-2010-10-314096] [PMID: 21464371]
[9]
Witt O, Monkemeyer S, Rönndahl G, et al. Induction of fetal hemoglobin expression by the histone deacetylase inhibitor apicidin. Blood 2003; 101(5): 2001-7.
[http://dx.doi.org/10.1182/blood-2002-08-2617] [PMID: 12393499]
[10]
Hu Y, Lu Y, Wang S, et al. Application of Machine Learning Approaches for the design and study of anticancer drugs. Curr Drug Targets 2019; 20(5): 488-500.
[PMID: 30091413]
[11]
Zhao M, Wang L, Zheng L, et al. 2D-QSAR and 3D-QSAR Analyses for EGFR inhibitors. BioMed Res Int 2017; 2017 4649191
[http://dx.doi.org/10.1155/2017/4649191] [PMID: 28630865]
[12]
Niu B, Zhao M, Su Q, et al. 2D-SAR and 3D-QSAR analyses for acetylcholinesterase inhibitors. Mol Divers 2017; 21(2): 413-26.
[http://dx.doi.org/10.1007/s11030-017-9732-0] [PMID: 28275924]
[13]
Niu B, Zhang M, Du P, et al. Small molecular floribundiquinone B derived from medicinal plants inhibits acetylcholinesterase activity. Oncotarget 2017; 8(34): 57149-62.
[http://dx.doi.org/10.18632/oncotarget.19169] [PMID: 28915661]
[14]
Niu B, Li J, Li G, Poon S, Harrington PB. Analysis and modeling for big data in cancer research. BioMed Res Int 2017; 20171972097
[http://dx.doi.org/10.1155/2017/1972097] [PMID: 28691016]
[15]
Zhang C, Wang X, Gu L, et al. Prediction of an interaction between bakuchiol and acetylcholinesterase using adaboost. Curr Bioinform 2016; 11(1): 79-86.
[http://dx.doi.org/10.2174/1574893611666151119220248]
[16]
Niu B, Xing Z, Zhao M, et al. Study of drug-drug combinations based on molecular descriptors and physicochemical properties. Comb Chem High Throughput Screen 2016; 19(2): 153-60.
[http://dx.doi.org/10.2174/1386207319666151110122931] [PMID: 26552439]
[17]
Bhola A, Singh S. Gene selection using high dimensional gene expression data: an appraisal. Curr Bioinform 2018; 13(3): 225-33.
[http://dx.doi.org/10.2174/1574893611666160610104946]
[18]
Du X, Li X, Li W, et al. Identification and analysis of cancer diagnosis using probabilistic classification vector machines with feature selection. Curr Bioinform 2018; 13(6): 625-32.
[http://dx.doi.org/10.2174/1574893612666170405125637]
[19]
Kumar N. Md. Hoque A, Md. Shahjaman, et al. A new approach of outlier-robust missing value imputation for metabolomics data analysis. Curr Bioinform 2019; 14(1): 43-52.
[http://dx.doi.org/10.2174/1574893612666171121154655]
[20]
Liao Z, Wan S, He Y, et al. Classification of small GTPases with hybrid protein features and advanced machine learning techniques. Curr Bioinform 2018; 13(5): 492-500.
[http://dx.doi.org/10.2174/1574893612666171121162552]
[21]
Naseem I, Khan S, Togneri R, Bennamoun M. ECMSRC: a sparse learning approach for the prediction of extracellular matrix proteins. Curr Bioinform 2017; 12(4): 361-8.
[http://dx.doi.org/10.2174/1574893611666151215213508]
[22]
Özkan A, Belgin İşgör SB, Şengül G, İşgör YG, et al. Benchmarking classification models for cell viability on novel cancer image datasets. Curr Bioinform 2019; 14(2): 108-14.
[http://dx.doi.org/10.2174/1574893614666181120093740]
[23]
Peng L, Peng M, Liao B, Huang G, Li W, Xie D. The advances and challenges of deep learning application in biological big data processing. Curr Bioinform 2018; 13(4): 352-9.
[http://dx.doi.org/10.2174/1574893612666170707095707]
[24]
Rajappan S, Rangasamy D. Adaptive genetic algorithm with exploration-exploitation tradeoff for preprocessing microarray datasets. Curr Bioinform 2017; 12(5): 441-51.
[http://dx.doi.org/10.2174/1574893611666161118142801]
[25]
Tanchotsrinon W, Lursinsap C, Poovorawan Y. An efficient prediction of hpv genotypes from partial coding sequences by chaos game representation and fuzzy k-nearest neighbor technique. Curr Bioinform 2017; 12(5): 431-40.
[http://dx.doi.org/10.2174/1574893611666161110112006]
[26]
Yao Y, Li X, Geng L, Nan X, Qi Z, Liao B. Recent progress in long noncoding RNAs prediction. Curr Bioinform 2018; 13(4): 344-51.
[http://dx.doi.org/10.2174/1574893612666170905153933]
[27]
Lu Y, Deng X, Chen J, Wang J, Chen Q, Niu B. Risk analysis of african swine fever in poland based on spatio-temporal pattern and latin hypercube sampling, 2014-2017. BMC Vet Res 2019; 15(1): 160.
[http://dx.doi.org/10.1186/s12917-019-1903-z] [PMID: 31118049]
[28]
Xiao X, Cheng X, Chen G, Mao Q, Chou KC. pLoc-mGpos: predict subcellular localization of gram-positive bacterial proteins by quasi-balancing training dataset and PseAAC. Genomics 2019; 111(4): 886-92.
[29]
Qiu WR, Sun BQ, Xiao X, Xu ZC, Jia JH, Chou KC. iKcr-PseEns: identify lysine crotonylation sites in histone proteins with pseudo components and ensemble classifier. Genomics 2018; 110(5): 239-46.
[PMID: 29107015]
[30]
Feng P, et al. iDNA6mA-PseKNC: identifying DNA N 6 -methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. Genomics 2018.
[31]
Cheng X, Xiao X, Chou KC. pLoc-mEuk: predict subcellular localization of multi-label eukaryotic proteins by extracting the key GO information into general PseAAC. Genomics 2018; 110(1): 50-8.
[http://dx.doi.org/10.1016/j.ygeno.2017.08.005] [PMID: 28818512]
[32]
Cheng X, Xiao X, Chou KC. pLoc-mGneg: predict subcellular localization of gram-negative bacterial proteins by deep gene ontology learning via general PseAAC. Genomics 2017; S0888- 7543(17): 30102-7.
[PMID: 28989035]
[33]
Taguchi YH, Wang H. Genetic association between amyotrophic lateral sclerosis and cancer. Genes (Basel) 2017; 8(10)E243
[http://dx.doi.org/10.3390/genes8100243] [PMID: 28953220]
[34]
Bloomingdale P, Mager DE. Machine learning models for the prediction of chemotherapy-induced peripheral neuropathy. Pharm Res 2019; 36(2): 35.
[http://dx.doi.org/10.1007/s11095-018-2562-7] [PMID: 30617559]
[35]
Consonni V, Todeschini R. Molecular descriptors Recent advances in QSAR studies: methods and applications. Dordrecht: Springer Netherlands 2010; pp. 29-102.
[http://dx.doi.org/10.1007/978-1-4020-9783-6_3]
[36]
Ansary I, Roy H, Das A, Mitra D. Regioselective synthesis, molecular descriptors of (1,5-Disubstituted 1,2,3-Triazolyl)Coumarin/Quinolone derivatives and their docking studies against cancer targets. ChemistrySelect 2019; 4(12): 3486-94.
[http://dx.doi.org/10.1002/slct.201900114]
[37]
Basak SC. Editor’s perspective: molecular descriptor landscape in the twenty first century and its proper use for computer-aided drug design. Curr Comput Aided Drug Des 2019; 15(1): 1-2.
[http://dx.doi.org/10.2174/157340991501181214103556] [PMID: 30569845]
[38]
Benguerba Y, Alnashef I, Erto A, Balsamo M, et al. A quantitative prediction of the viscosity of amine based DESs using S sigma-profile molecular descriptors. J Mol Struct 2019; 1184: 357-63.
[http://dx.doi.org/10.1016/j.molstruc.2019.02.052]
[39]
Bian L, Sorescu DC, Chen L, et al. Machine-learning identification of the sensing descriptors relevant in molecular interactions with metal nanoparticle-decorated nanotube field-effect transistors. ACS Appl Mater Interfaces 2019; 11(1): 1219-27.
[http://dx.doi.org/10.1021/acsami.8b15785] [PMID: 30547572]
[40]
Chang ED, Hogstrand C, Miller TH, Owen SF, Bury NR. The use of molecular descriptors to model pharmaceutical uptake by a fish primary gill cell culture epithelium. Environ Sci Technol 2019; 53(3): 1576-84.
[http://dx.doi.org/10.1021/acs.est.8b04394] [PMID: 30589539]
[41]
Esmaeili E, Shafiei F. QSAR models to predict physico-chemical properties of some barbiturate derivatives using molecular descriptors and genetic algorithm-multiple linear regressions. Iranian Chemical Communication 2019; 7(2): 170-9.
[42]
Jeschke S, Cole IS. 3D-QSAR for binding constants of β-cyclodextrin host-guest complexes by utilising spectrophores as molecular descriptors. Chemosphere 2019; 225: 135-8.
[http://dx.doi.org/10.1016/j.chemosphere.2019.03.020] [PMID: 30870630]
[43]
Ma H, Peng Q, An Z, Huang W, Shuai Z. Efficient and long-lived room-temperature organic phosphorescence: theoretical descriptors for molecular designs. J Am Chem Soc 2019; 141(2): 1010-5.
[http://dx.doi.org/10.1021/jacs.8b11224] [PMID: 30565929]
[44]
Martínez-Santiago O, Marrero-Ponce Y, Vivas-Reyes R, et al. Higher-order and mixed discrete derivatives such as a novel graph- theoretical invariant for generating new molecular descriptors. Curr Top Med Chem 2019; 19(11): 944-56.
[http://dx.doi.org/10.2174/1568026619666190510093651] [PMID: 31074367]
[45]
Nazeer W, Farooq A, Younas M, Munir M, Kang SM. On molecular descriptors of carbon nanocones. Biomolecules 2018; 8(3)E92
[http://dx.doi.org/10.3390/biom8030092] [PMID: 30205520]
[46]
Nichols CM, Dodds JN, Rose BS, et al. Untargeted molecular discovery in primary metabolism: collision cross section as a molecular descriptor in ion mobility-mass spectrometry. Anal Chem 2018; 90(24): 14484-92.
[http://dx.doi.org/10.1021/acs.analchem.8b04322] [PMID: 30449086]
[47]
Rácz A, Bajusz D, Héberger K. Intercorrelation limits in molecular descriptor preselection for QSAR/QSPR. Mol Inform 2019; 38(8-9)1800154
[http://dx.doi.org/10.1002/minf.201800154] [PMID: 30945814]
[48]
Viarengo L, Whitty A. Development of macrocycle-specific molecular descriptors and their application in machine learning. Protein Sci 2018; 27: 221-1.
[49]
Winter R, Montanari F, Noé F, Clevert DA. Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations. Chem Sci (Camb) 2018; 10(6): 1692-701.
[http://dx.doi.org/10.1039/C8SC04175J] [PMID: 30842833]
[50]
Zhang PB, Yang ZX. A novel AdaBoost framework with robust threshold and structural optimization. IEEE Trans Cybern 2018; 48(1): 64-76.
[http://dx.doi.org/10.1109/TCYB.2016.2623900] [PMID: 27898387]
[51]
Niu B, Lu Y, Wang J, et al. 2D-SAR, Topomer CoMFA and molecular docking studies on avian influenza neuraminidase inhibitors. Comput Struct Biotechnol J 2018; 17: 39-48.
[http://dx.doi.org/10.1016/j.csbj.2018.11.007] [PMID: 30595814]
[52]
Lu Y, Wang S, Wang J, et al. An epidemic avian influenza prediction model based on google trends. Lett Org Chem 2019; 16(4): 303-10.
[http://dx.doi.org/10.2174/1570178615666180724103325]
[53]
Cortes C, Vapnik VN. Support vector networks. Mach Learn 1995; 3: 273-97.
[http://dx.doi.org/10.1007/BF00994018]
[54]
Vapnik VN. An overview of statistical learning theory. IEEE Trans Neural Netw 1999; 10(5): 988-99.
[http://dx.doi.org/10.1109/72.788640] [PMID: 18252602]
[55]
Cortes C, Vapnik V. Support-vector networks. Mach Learn 1995; 20(3): 273-97.
[http://dx.doi.org/10.1007/BF00994018]
[56]
Zhang M, Su Q, Lu Y, Zhao M, Niu B. Application of machine learning approaches for protein-protein interactions prediction. Med Chem 2017; 13(6): 506-14.
[http://dx.doi.org/10.2174/1573406413666170522150940] [PMID: 28530547]
[57]
Chen C-H, Tanaka K, Funatsu K. Random forest model with combined features: a practical approach to predict liquid-crystalline property. Mol Inform 2019; 38(4)e1800095
[http://dx.doi.org/10.1002/minf.201800095] [PMID: 30548221]
[58]
Rostami Z, Pourbasheer E. A comparative QSAR study of aryl-substituted isobenzofuran-1(3H)-ones inhibitors. Iranian Chemical Communication 2019; 7(1): 79-92.
[59]
Ai H, Wu X, Zhang L, et al. QSAR modelling study of the bioconcentration factor and toxicity of organic compounds to aquatic organisms using machine learning and ensemble methods. Ecotoxicol Environ Saf 2019; 179: 71-8.
[http://dx.doi.org/10.1016/j.ecoenv.2019.04.035] [PMID: 31026752]
[60]
C45: Programs for Machine Learning. Elsevier Science & Technology Books 1992.
[61]
Chen G, Peijnenburg W, Kovalishyn V, Vijver M. Development of nanostructure-activity relationships assisting the nanomaterial hazard categorization for risk assessment and regulatory decision-making. RSC Advances 2016; 6(57): 52227-35.
[http://dx.doi.org/10.1039/C6RA06159A]
[62]
Cheng F, Shen J, Yu Y, et al. In silico prediction of tetrahymena pyriformis toxicity for diverse industrial chemicals with substructure pattern recognition and machine learning methods. Chemosphere 2011; 82(11): 1636-43.
[http://dx.doi.org/10.1016/j.chemosphere.2010.11.043] [PMID: 21145574]
[63]
Kong Y, Yan A. QSAR models for predicting the bioactivity of polo-like kinase 1 inhibitors. Chemom Intell Lab Syst 2017; 167: 214-25.
[http://dx.doi.org/10.1016/j.chemolab.2017.06.011]
[64]
Sun X, Li Y, Liu X, et al. Classification of bioaccumulative and non-bioaccumulative chemicals using statistical learning approaches. Mol Divers 2008; 12(3-4): 157-69.
[http://dx.doi.org/10.1007/s11030-008-9092-x] [PMID: 18937041]
[65]
Yang X-G, Chen D, Wang M, Xue Y, Chen YZ. Prediction of antibacterial compounds by machine learning approaches. J Comput Chem 2009; 30(8): 1202-11.
[http://dx.doi.org/10.1002/jcc.21148] [PMID: 18988254]
[66]
Ambure P, Halder AK, González Díaz H, Cordeiro MNDS. QSAR-Co: an open source software for developing robust multitasking or multitarget classification-based QSAR models. J Chem Inf Model 2019; 59(6): 2538-44.
[http://dx.doi.org/10.1021/acs.jcim.9b00295] [PMID: 31083984]
[67]
Ancuceanu R, Dinu M, Neaga I, Laszlo FG, Boda D. Development of QSAR machine learning-based models to forecast the effect of substances on malignant melanoma cells. Oncol Lett 2019; 17(5): 4188-96.
[http://dx.doi.org/10.3892/ol.2019.10068] [PMID: 31007759]
[68]
Cardoso-Silva J, Papadatos G, Papageorgiou LG, Tsoka S. Optimal piecewise linear regression algorithm for QSAR modelling. Mol Inform 2019; 38(3)e1800028
[http://dx.doi.org/10.1002/minf.201800028] [PMID: 30251339]
[69]
García-Jacas CR, Marrero-Ponce Y, Cortés-Guzmán F, et al. Enhancing acute oral toxicity predictions by using consensus modeling and algebraic form-based 0D-to-2D molecular encodes. Chem Res Toxicol 2019; 32(6): 1178-92.
[http://dx.doi.org/10.1021/acs.chemrestox.9b00011] [PMID: 31066547]
[70]
Kaneko H. Data visualization, regression, applicability domains and inverse analysis based on generative topographic mapping. Mol Inform 2019; 38(3) e1800088
[http://dx.doi.org/10.1002/minf.201800088] [PMID: 30259699]
[71]
Veríssimo GC, Menezes Dutra EF, Teotonio Dias AL, et al. HQSAR and random forest-based QSAR models for anti-T. Vaginalis activities of nitroimidazoles derivatives. J Mol Graph Model 2019; 90: 180-91.
[http://dx.doi.org/10.1016/j.jmgm.2019.04.007] [PMID: 31100677]
[72]
Chen W, Peng J, Hong H, et al. Landslide susceptibility modelling using GIS-based machine learning techniques for chongren county, jiangxi province, china. Sci Total Environ 2018; 626: 1121-35.
[http://dx.doi.org/10.1016/j.scitotenv.2018.01.124] [PMID: 29898519]
[73]
Farahani FV, Ahmadi A, Zarandi MHF. Hybrid intelligent approach for diagnosis of the lung nodule from CT images using spatial kernelized fuzzy c-means and ensemble learning. Math Comput Simul 2018; 149: 48-68.
[http://dx.doi.org/10.1016/j.matcom.2018.02.001]
[74]
Jain S, Kotsampasakou E, Ecker GF. Comparing the performance of meta-classifiers-a case study on selected imbalanced data sets relevant for prediction of liver toxicity. J Comput Aided Mol Des 2018; 32(5): 583-90.
[http://dx.doi.org/10.1007/s10822-018-0116-z] [PMID: 29626291]


Rights & PermissionsPrintExport Cite as

Article Details

VOLUME: 25
ISSUE: 40
Year: 2019
Page: [4296 - 4302]
Pages: 7
DOI: 10.2174/1381612825666191107092214
Price: $65

Article Metrics

PDF: 17
HTML: 4