Review Article

Recent Progress of Machine Learning in Gene Therapy

Author(s): Cassandra Hunt, Sandra Montgomery, Joshua William Berkenpas, Noel Sigafoos, John Christian Oakley, Jacob Espinosa, Nicola Justice, Kiyomi Kishaba, Kyle Hippe, Dong Si, Jie Hou, Hui Ding and Renzhi Cao*

Volume 22, Issue 2, 2022

Published on: 22 June, 2021

Page: [132 - 143] Pages: 12

DOI: 10.2174/1566523221666210622164133

Price: $65


With new developments in biomedical technology, it is now a viable therapeutic treatment to alter genes with techniques like CRISPR. At the same time, it is increasingly cheaper to perform whole genome sequencing, resulting in rapid advancement in gene therapy and editing in precision medicine. Understanding the current industry and academic applications of gene therapy provides an important backdrop to future scientific developments. Additionally, machine learning and artificial intelligence techniques allow for the reduction of time and money spent in the development of new gene therapy products and techniques. In this paper, we survey the current progress of gene therapy treatments for several diseases and explore machine learning applications in gene therapy. We also discuss the ethical implications of gene therapy and the use of machine learning in precision medicine. Machine learning and gene therapy are both topics gaining popularity in various publications, and we conclude that there is still room for continued research and application of machine learning techniques in the gene therapy field.

Keywords: Machine learning, gene therapy, cancer, hemophilia, cardiovascular disease, neurodegenerative disease, CRISPR, ethics.

Graphical Abstract
Behnke JA. Double helix revisited the double helix: A personal account of the discovery of the structure of DNA James D. Watson Gunther S. Stent. Bioscience 1981; 31(9): 692-3.
Gonçalves GAR, Paiva RMA. Gene therapy: Advances, challenges and perspectives. Einstein (Sao Paulo) 2017; 15(3): 369-75. [São Paulo].
[] [PMID: 29091160]
International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 2004; 431(7011): 931-45.
[] [PMID: 15496913]
Lander ES. Initial impact of the sequencing of the human genome. Nature 2011; 470(7333): 187-97.
[] [PMID: 21307931]
Mardis ER. A decade’s perspective on DNA sequencing technology. Nature 2011; 470(7333): 198-203.
[] [PMID: 21307932]
Van Berkum NL, Lieberman-Aiden E, Williams L, et al. Hi-C: A method to study the three-dimensional architecture of genomes. J Vis Exp 2010; (39): 1869.
[] [PMID: 20461051]
Wang Z, Cao R, Taylor K, Briley A, Caldwell C, Cheng J. The properties of genome conformation and spatial gene interaction and regulation networks of normal and malignant human cell types. PLoS One 2013; 8(3): e58793.
[] [PMID: 23536826]
Cao R, Cheng J. Deciphering the association between gene function and spatial gene-gene interactions in 3D human genome conformation. BMC Genomics 2015; 16: 880.
[] [PMID: 26511362]
Cao R, Cheng J. Integrated protein function prediction by mining function associations, sequences, and protein-protein and gene-gene interaction networks. Methods 2016; 93: 84-91.
[] [PMID: 26370280]
Cheng L, Qi C, Zhuang H, Fu T, Zhang X. gutMDisorder: A comprehensive database for dysbiosis of the gut microbiota in disorders and interventions. Nucleic Acids Res 2020; 48(13): 7603.
[] [PMID: 32515792]
Wang J, Chen S, Dong L, Wang G. CHTKC: A robust and efficient k-mer counting algorithm based on a lock-free chaining hash table. Brief Bioinform 2020; bbaa063.
[] [PMID: 32438416]
Giacca M, Zacchigna S. VEGF gene therapy: therapeutic angiogenesis in the clinic and beyond. Gene Ther 2012; 19(6): 622-9.
[] [PMID: 22378343]
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015; 521(7553): 436-44.
[] [PMID: 26017442]
Conover M, Staples M, Si D, Sun M, Cao R. AngularQA: Protein model quality assessment with LSTM networks. Computational and Mathematical Biophysics 2019; 7(1): 1-9.
Zhou N, Jiang Y, Bergquist TR, et al. The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens. Genome Biol 2019; 20(1): 244.
[] [PMID: 31744546]
Si D, Moritz SA, Pfab J, et al. Deep learning to predict protein backbone structure from high-resolution Cryo-EM density maps. Sci Rep 2020; 10(1): 4282.
[] [PMID: 32152330]
Chen, Chen C, Hou J, Shi X, Yang H, Birchler JA, et al. DeepGRN: Prediction of transcription factor binding site across cell-types using attention-based deep neural networks. 2021. Available from:
Guo Z, Hou J, Cheng J. DNSS2: Improved ab initio protein secondary structure prediction using advanced deep learning architectures. Proteins 2021; 89(2): 207-17.
[] [PMID: 32893403]
Lawson CL, Kryshtafovych A, Adams PD, Afonine P. Outcomes of the 2019 EMDataResource model challenge: Validation of cryo-EM models at near-atomic resolution. BioRxiv 2020. Available from:
Tang Q, Kang J, Yuan J, et al. DNA4mC-LIP: A linear integration method to identify N4-methylcytosine site in multiple species. Bioinformatics 2020; 36(11): 3327-35.
[] [PMID: 32108866]
Zou Q, Qu K, Luo Y, Yin D, Ju Y, Tang H. Predicting diabetes mellitus with machine learning techniques. Front Genet 2018; 9: 515.
[] [PMID: 30459809]
Senior AW, Evans R, Jumper J, et al. Improved protein structure prediction using potentials from deep learning. Nature 2020; 577(7792): 706-10.
[] [PMID: 31942072]
Zhao T, Hu Y, Peng J, Cheng L. DeepLGP: A novel deep learning method for prioritizing lncRNA target genes. Bioinformatics 2020; 36(16): 4466-72.
[] [PMID: 32467970]
Yu L, Shi Y, Zou Q, Wang S, Zheng L, Gao L. Exploring drug treatment patterns based on the action of drug and multilayer network model. Int J Mol Sci 2020; 21(14): E5014.
[] [PMID: 32708644]
Ao C, Jin S, Ding H, Zou Q, Yu L. Application and development of artificial intelligence and intelligent disease diagnosis. Curr Pharm Des 2020; 26(26): 3069-75.
[] [PMID: 32228416]
Zhao X, Jiao Q, Li H, et al. ECFS-DEA: An ensemble classifier-based feature selection for differential expression analysis on expression profiles. BMC Bioinformatics 2020; 21(1): 43.
[] [PMID: 32024464]
Dong L, Wang J, Wang G. BYASE: A Python library for estimating gene and isoform level allele-specific expression. Bioinformatics 2020; 36(19): 4955-6.
[] [PMID: 32678892]
Sun S, Wang C, Ding H, Zou Q. Machine learning and its applications in plant molecular studies. Brief Funct Genomics 2020; 19(1): 40-8.
[] [PMID: 31867668]
He S, Guo F, Zou Q, Ding H. MRMD2.0: A Python tool for machine learning features ranking and reduction. Vol. 15. Curr Bioinform 2020; 15(10): 1213-21.
Maglogiannis IG. Emerging artificial intelligence applications in computer engineering: real word AI systems with applications in Ehealth, HCI, information retrieval and pervasive technologies. IOS Press 2007. In: Available from:
Grus J. Data science from scratch. 2nd ed. 2019. In: Available from: 9781492041122/
Huang S, Cai N, Pacheco PP, Narrandes S, Wang Y, Xu W. Applications of support vector machine (SVM) learning in cancer genomics. Cancer Genomics Proteomics 2018; 15(1): 41-51.
[] [PMID: 29275361]
Larrañaga P, Calvo B, Santana R, et al. Machine learning in bioinformatics. Brief Bioinform 2006; 7(1): 86-112.
[] [PMID: 16761367]
Tradigo G, Rondinelli F, Pollastri G. Algorithms for structure comparison and analysis: prediction of tertiary structures of proteins. In: Encyclopedia of bioinformatics and computational biology. 2019; 1: pp. 32-7.
Jumper J, Evans R, Pritzel A, et al. High accuracy protein structure prediction using deep learning. Fourteenth Critical Assessment of Techniques for Protein Structure Prediction 2020; 22: 24. [Abstract Book]. Available from:
Office of the Commissioner. FDA approval brings first gene therapy to the United States 2017. Available from:
Office of the Commissioner. Cord Blood: What You Need to Know 2020. Available from:
Blumenthal GM, Pazdur R. Approvals in 2017: Gene therapies and site-agnostic indications. Nat Rev Clin Oncol 2018; 15(3): 127-8.
[] [PMID: 29384145]
Center for Biologics Evaluation, Research. Allocord [HPC Cord Blood] Lead Page 2019. Available from:
Center for Biologics Evaluation, Research. lisocabtagene maraleucel 2021. Available from:
Center for Biologics Evaluation, Research. Clevecord [HPC Cord Blood] Lead Page 2019. Available from:
Center for Biologics Evaluation, Research. Ducord [HPC Cord Blood] 2019. Available from:
Center for Biologics Evaluation, Research. Hemacord [HPC, cord blood] 2019. Available from:
Center for Biologics Evaluation, Research. HPC, Cord Blood 2019. Available from:
Center for Biologics Evaluation, Research. HPC, Cord Blood - MD Anderson Cord Blood Bank 2019. Available from:
Center for Biologics Evaluation, Research. HPC, Cord Blood - LifeSouth 2019. Available from:
Center for Biologics Evaluation, Research. HPC, Cord Blood - Bloodworks 2019. Available from:
Center for Biologics Evaluation, Research. IMLYGIC [talimogene laherparepvec] 2019. Available from:
Center for Biologics Evaluation, Research. KYMRIAH [tisagenlecleucel] 2019. Available from:
Center for Biologics Evaluation, Research. LAVIV [Azficel-T] 2019. Available from:
Center for Biologics Evaluation, Research. Luxturna Home Page 2019. Available from:
Center for Biologics Evaluation, Research. Provenge [sipuleucel-T] Lead Page 2019. Available from:
Center for Biologics Evaluation, Research. Tecartus 2020. Available from:
Center for Biologics Evaluation, Research. Yescarta Lead Page 2020. Available from:
Center for Biologics Evaluation, Research. Zolgensma 2020. Available from:
Cord Blood Transplants Provide an Opportunity for a Cure from Blood Cancer. 2018. Available from:
Office of the Commissioner. Statement from FDA Commissioner Scott Gottlieb, MD and Peter Marks, MD, PhD, Director of the Center for Biologics Evaluation and Research on new policies to advance development of safe and effective cell and gene therapies 2019. Available from:
Dunbar CE, High KA, Keith Joung J, Kohn DB, Ozawa K, Sadelain M. Gene therapy comes of age. Science 2018. Available from:
Shouval R, Ruggeri A, Labopin M, Mohty M, Sanz G, Michel G, et al. A machine learning based model to predict two-year leukemia free survival in cord blood transplantation for acute leukemia - A data mining study, on behalf of Eurocord, Cord Blood Committee and the Acute Leukemia Working Party of the EBMT. Blood 2015; Vol. 126: 3211-1.
Collins M, Thrasher A. Gene therapy: Progress and predictions Proc Biol Sci 2015; 282(1821): 20143003.
Yu L, Zhou D, Gao L, Zha Y. Prediction of drug response in multilayer networks based on fusion of multiomics data. Methods 2020; S1046-2023(20)30104-3.
[] [PMID: 32798653]
Áyen Á, Jiménez Martínez Y, Boulaiz H. Targeted gene delivery therapies for cervical cancer. Cancers (Basel) 2020; 12(5): E1301.
[] [PMID: 32455616]
Folkman J. Tumor angiogenesis: Therapeutic implications. N Engl J Med 1971; 285(21): 1182-6.
[] [PMID: 4938153]
Li T, Kang G, Wang T, Huang H. Tumor angiogenesis and anti-angiogenic gene therapy for cancer. Oncol Lett 2018; 16(1): 687-702. [Review].
[] [PMID: 29963134]
Cheng L. Computational and Biological Methods for Gene Therapy. Curr Gene Ther 2019; 19(4): 210-0.
[] [PMID: 31762421]
Cheng L, Zhao H, Wang P, et al. Computational methods for identifying similar diseases. Mol Ther Nucleic Acids 2019; 18: 590-604.
[] [PMID: 31678735]
Jiang Q, Wang G, Jin S, Li Y, Wang Y. Predicting human microRNA-disease associations based on support vector machine. Int J Data Min Bioinform 2013; 8(3): 282-93.
[] [PMID: 24417022]
Borisov N, Tkachev V, Suntsova M, et al. A method of gene expression data transfer from cell lines to cancer patients for machine-learning prediction of drug efficiency. Cell Cycle 2018; 17(4): 486-91.
[] [PMID: 29251172]
Su R, Liu X, Wei L, Zou Q. Deep-Resp-Forest: A deep forest model to predict anti-cancer drug response. Methods 2019; 166: 91-102.
[] [PMID: 30772464]
Ao C, Gao L, Yu L. Identifying G-protein coupled receptors using mixed-feature extraction methods and machine learning methods. IEEE Access 2020; 1-1.
Sheth D, Giger ML. Artificial intelligence in the interpretation of breast cancer on MRI. J Magn Reson Imaging 2020; 51(5): 1310-24.
[] [PMID: 31343790]
Ghanat Bari M, Ung CY, Zhang C, Zhu S, Li H. Machine learning-assisted network inference approach to identify a new class of genes that coordinate the functionality of cancer networks. Sci Rep 2017; 7(1): 6993.
[] [PMID: 28765560]
Bashiri A, Ghazisaeedi M, Safdari R, Shahmoradi L, Ehtesham H. Improving the prediction of survival in cancer patients by using machine learning techniques: Experience of gene expression data: A narrative review. Iran J Public Health 2017; 46(2): 165-72.
[PMID: 28451550]
Akbulut H. Immune gene therapy of cancer. Turk J Med Sci 2020; 50(SI-2): 1679-90.
[] [PMID: 32512674]
Bradshaw AC, Baker AH. Gene therapy for cardiovascular disease: Perspectives and potential. Vascul Pharmacol 2013; 58(3): 174-81.
[] [PMID: 23142171]
Husso T, Ylä-Herttuala S, Turunen MP. A new gene therapy approach for cardiovascular disease by non-coding RNAs acting in the nucleus. Mol Ther Nucleic Acids 2014; 3: e197.
[] [PMID: 25405466]
Yan Y, Zhang J-W, Zang G-Y, Pu J. The primary use of artificial intelligence in cardiovascular diseases: What kind of potential role does artificial intelligence play in future medicine? J Geriatr Cardiol 2019; 16(8): 585-91.
[PMID: 31555325]
Mathur P, Srivastava S, Xu X, Mehta JL. Artificial intelligence, machine learning, and cardiovascular disease. Clin Med Insights Cardiol 2020; 14: 1179546820927404.
[] [PMID: 32952403]
Krittanawong C, Zhang H, Wang Z, Aydar M, Kitai T. Artificial intelligence in precision cardiovascular medicine. J Am Coll Cardiol 2017; 69(21): 2657-64.
[] [PMID: 28545640]
SAGE Journals: Your gateway to world-class research journals. Available from:
Villanueva-Meyer JE, Chang P, Lupo JM, Hess CP, Flanders AE, Kohli M. Machine learning in neurooncology imaging: From study request to diagnosis and treatment. AJR Am J Roentgenol 2019; 212(1): 52-6.
[] [PMID: 30403523]
Kagiyama N, Shrestha S, Farjo PD, Sengupta PP. Artificial intelligence: Practical primer for clinical research in cardiovascular disease. J Am Heart Assoc 2019; 8(17): e012788.
[] [PMID: 31450991]
Hippe K, Gbenro S, Cao R. ProLanGO2. Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics.
Stephenson N, Shane E, Chase J, et al. Survey of machine learning techniques in drug discovery. Curr Drug Metab 2019; 20(3): 185-93.
[] [PMID: 30124147]
Abadi S, Yan WX, Amar D, Mayrose I. A machine learning approach for predicting CRISPR-Cas9 cleavage efficiencies and patterns underlying its mechanism of action. PLOS Comput Biol 2017; 13(10): e1005807.
[] [PMID: 29036168]
Barrangou R, Fremaux C, Deveau H, et al. CRISPR provides acquired resistance against viruses in prokaryotes. Science 2007; 315(5819): 1709-12.
[] [PMID: 17379808]
Pennisi E. The CRISPR craze. Science 2013; 341(6148): 833-6.
[] [PMID: 23970676]
Shabto J. A CRISPR Way to Change Genes. J Pediatr Ophthalmol Strabismus 2016; 53(5): 268-9.
[] [PMID: 27637018]
Yu L, Xu F, Gao L. Predict new therapeutic drugs for hepatocellular carcinoma based on gene mutation and expression. Front Bioeng Biotechnol 2020; 8: 8.
[] [PMID: 32047745]
Hirakawa MP, Krishnakumar R, Timlin JA, Carney JP, Butler KS. Gene editing and CRISPR in the clinic: Current and future perspectives. Biosci Rep 2020; 40(4): BSR20200127.
[] [PMID: 32207531]
Nayarisseri A, Udhwani T. A Machine Learning approach for the identification of CRISPR/Cas9 nuclease off-target for the treatment of Hemophilia. Proceedings of MOL2NET 2019, International Conference on Multidisciplinary Sciences, 5th edition.
Aoki K, Sakamoto M, Furutani H. Analysis of genetic disease Haemophilia A by using machine learning. Journal of Robotics, Networking and Artificial Life 2015; Vol. 2: 115.
Singh VK, Maurya NS, Mani A, Yadav RS. Machine learning method using position-specific mutation based classification outperforms one hot coding for disease severity prediction in haemophilia ‘A’. Genomics 2020; 112(6): 5122-8.
[] [PMID: 32927010]
O’Connor DM, Boulis NM. Gene therapy for neurodegenerative diseases. Trends Mol Med 2015; 21(8): 504-12.
[] [PMID: 26122838]
The Challenge of Neurodegenerative Diseases. Available from:
McMenamin MM, Wood MJA. Progress and prospects: Immunobiology of gene therapy for neurodegenerative disease: Prospects and risks. Gene Ther 2010; 17(4): 448-58.
[] [PMID: 20147982]
Yu X, Lai S, Chen H, Chen M. Protein-protein interaction network with machine learning models and multiomics data reveal potential neurodegenerative disease-related proteins. Hum Mol Genet 2020; 29(8): 1378-87.
[] [PMID: 32277755]
Myszczynska MA, Ojamies PN, Lacoste AMB, et al. Applications of machine learning to diagnosis and treatment of neurodegenerative diseases. Nat Rev Neurol 2020; 16(8): 440-56.
[] [PMID: 32669685]
Friedmann T. Genetic therapies, human genetic enhancement, and … eugenics? Gene Ther 2019; 26(9): 351-3.
[] [PMID: 31273325]
Brokowski C, Adli M. CRISPR ethics: Moral considerations for applications of a powerful tool. J Mol Biol 2019; 431(1): 88-101.
[] [PMID: 29885329]
Gaskell G, Bard I, Allansdottir A, et al. Public views on gene editing and its uses. Nat Biotechnol 2017; 35(11): 1021-3.
[] [PMID: 29121022]
Zhang Z-M, Wang J-S, Zulfiqar H, Lv H, Dao F-Y, Lin H. Early diagnosis of pancreatic ductal adenocarcinoma by combining relative expression orderings with Machine-Learning method. Front Cell Dev Biol 2020; 8: 582864.
[] [PMID: 33178697]
Zhang Z-M, Tan J-X, Wang F, Dao F-Y, Zhang Z-Y, Lin H. Early diagnosis of hepatocellular carcinoma using machine learning method. Front Bioeng Biotechnol 2020; 8: 254.
[] [PMID: 32292778]
Lai H-Y, Feng C-Q, Zhang Z-Y, Tang H, Chen W, Lin H. A brief survey of machine learning application in cancerlectin identification. Curr Gene Ther 2018; 18(5): 257-67.
[] [PMID: 30209997]
Liu Y, Huang Y, Wang G, Wang Y. A deep learning approach for filtering structural variants in short read sequencing data. Brief Bioinform 2020; bbaa370.
[] [PMID: 33378767]
Wei L, Zhou C, Chen H, Song J, Su R. ACPred-FL: A sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides. Bioinformatics 2018; 34(23): 4007-16.
[] [PMID: 29868903]
Wei L, Hu J, Li F, Song J, Su R, Zou Q. Comparative analysis and prediction of quorum-sensing peptides using feature representation learning and machine learning algorithms. Brief Bioinform 2018.
[] [PMID: 30383239]
Wei L, Ding Y, Su R, Tang J, Zou Q. Prediction of human protein subcellular localization using deep learning. Vol. 117. J Parallel Distrib Comput 2018; 212-7.
Govindaraj RG, Subramaniyam S, Manavalan B. Extremely-randomized-tree-based Prediction of N6-Methyladenosine Sites in Saccharomyces cerevisiae. Curr Genomics 2020; 21(1): 26-33.
[] [PMID: 32655295]
Hasan MM, Manavalan B, Khatun MS, Kurata H. i4mC-ROSE, a bioinformatics tool for the identification of DNA N4-methylcytosine sites in the Rosaceae genome. Int J Biol Macromol 2020; 157: 752-8.
[] [PMID: 31805335]
Hasan MM, Manavalan B, Shoombuatong W, Khatun MS, Kurata H. i6mA-Fuse: Improved and robust prediction of DNA 6 mA sites in the Rosaceae genome by fusing multiple feature representation. Plant Mol Biol 2020; 103(1-2): 225-34.
[] [PMID: 32140819]
Hasan MM, Manavalan B, Shoombuatong W, Khatun MS, Kurata H. i4mC-Mouse: Improved identification of DNA N4-methylcytosine sites in the mouse genome using multiple encoding schemes. Comput Struct Biotechnol J 2020; 18: 906-12.
[] [PMID: 32322372]
Wei L, He W, Malik A, Su R, Cui L, Manavalan B. Computational prediction and interpretation of cell-specific replication origin sites from multiple eukaryotes by exploiting stacking framework. Brief Bioinform 2020; bbaa275.
[] [PMID: 33152766]
Vayena E, Blasimme A, Cohen IG. Machine learning in medicine: Addressing ethical challenges. PLoS Med 2018; 15(11): e1002689.
[] [PMID: 30399149]
Char DS, Shah NH, Magnus D. Implementing machine learning in health care - Addressing ethical challenges. N Engl J Med 2018; 378(11): 981-3.
[] [PMID: 29539284]

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy