HS-MMGKG: A Fast Multi-objective Harmony Search Algorithm for Two-locus Model Detection in GWAS

Author(s): Liyan Sun, Guixia Liu*, Lingtao Su, Rongquan Wang

Journal Name: Current Bioinformatics

Volume 14 , Issue 8 , 2019

Become EABM
Become Reviewer

Graphical Abstract:


Abstract:

Background: Genome-Wide Association Study (GWAS) plays a very important role in identifying the causes of a disease. Because most of the existing methods for genetic-interaction detection in GWAS are designed for a single-correlation model, their performances vary considerably for different disease models. These methods usually have high computation cost and low accuracy.

Methods: We present a new multi-objective heuristic optimization methodology named HSMMGKG for detecting genetic interactions. In HS-MMGKG, we use harmony search with five objective functions to improve the efficiency and accuracy. A new strategy based on p-value and MDR is adopted to generate more reasonable results. The Boolean representation in BOOST is modified to calculate the five functions rapidly. These strategies take less time complexity and have higher accuracy while detecting the potential models.

Results: We compared HS-MMGKG with CSE, MACOED and FHSA-SED using 26 simulated datasets. The experimental results demonstrate that our method outperforms others in accuracy and computation time. Our method has identified many two-locus SNP combinations that are associated with seven diseases in WTCCC dataset. Some of the SNPs have direct evidence in CTD database. The results may be helpful to further explain the pathogenesis.

Conclusion: It is anticipated that our proposed algorithm could be used in GWAS which is helpful in understanding disease mechanism, diagnosis and prognosis.

Keywords: Single-nucleotide polymorphism, epistasis, genome-wide association study, harmony search, optimization.

[1]
Sniekers S, Stringer S, Watanabe K, et al. Genome-wide association meta-analysis of 78,308 individuals identifies new loci and genes influencing human intelligence. Nat Genet 2017; 49(7): 1107-12.
[2]
Savage JE, Jansen PR, Stringer S, et al. Genome-wide association meta-analysis in 269,867 individuals identifies new genetic and functional links to intelligence. Nat Genet 2018; 50(7): 912-9.
[3]
Brant SR, Okou DT, Simpson CL, et al. Genome-Wide Association Study Identifies African-Specific Susceptibility Loci in African Americans With Inflammatory Bowel Disease. Gastroenterology 2017; 152(1): 206-217.e2.
[4]
Tian C, Hromatka BS, Kiefer AK, et al. Genome-wide association and HLA region fine-mapping studies identify susceptibility loci for multiple common infections. Nat Commun 2017; 8(1): 599.
[5]
Sud A, Kinnersley B, Houlston RS. Genome-wide association studies of cancer: current insights and future perspectives. Nat Rev Cancer 2017; 17(11): 692-704.
[6]
Wang Z, McGlynn KA, Rajpert-De Meyts E, et al. Testicular Cancer Consortium.Meta-analysis of five genome-wide association studies identifies multiple new loci associated with testicular germ cell tumor. Nat Genet 2017; 49(7): 1141-7.
[7]
Chang D, Nalls MA, Hallgrímsdóttir IB, et al. International Parkinson’s Disease Genomics Consortium.23andMe Research Team. A meta-analysis of genome-wide association studies identifies 17 new Parkinson’s disease risk loci. Nat Genet 2017; 49(10): 1511-6.
[8]
Erdmann J, Kessler T, Munoz Venegas L, Schunkert H. A decade of genome-wide association studies for coronary artery disease: the challenges ahead. Cardiovasc Res 2018; 114(9): 1241-57.
[9]
Maguire LH, Handelman SK, Du X, Chen Y, Pers TH, Speliotes EK. Genome-wide association analyses identify 39 new susceptibility loci for diverticular disease. Nat Genet 2018; 50(10): 1359-65.
[10]
Giacomini KM, Yee SW, Mushiroda T, et al. Genome-wide association studies of drug response and toxicity: an opportunity for genome medicine. Nat Rev Drug Discov 2017; 16(1): 1.
[11]
Elliott LT, Sharp K, Alfaro-Almagro F, et al. Genome-wide association studies of brain imaging phenotypes in UK Biobank. Nature 2018; 562(7726): 210-6.
[12]
Pulit SL, Stoneman C, Morris AP, et al. Meta-analysis of genome-wide association studies for body fat distribution in 694 649 individuals of European ancestry. Hum Mol Genet 2019; 28(1): 166-74.
[13]
Yengo L, Sidorenko J, Kemper KE, et al. GIANT Consortium.Meta-analysis of genome-wide association studies for height and body mass index in ∼700000 individuals of European ancestry. Hum Mol Genet 2018; 27(20): 3641-9.
[14]
Collins A, Lonjou C, Morton NE. Genetic epidemiology of single-nucleotide polymorphisms. Proc Natl Acad Sci USA 1999; 96(26): 15173-7.
[15]
Schork NJ, Fallin D, Lanchbury JS. Single nucleotide polymorphisms and the future of genetic epidemiology. Clin Genet 2000; 58(4): 250-64.
[16]
Cordell HJ. Epistasis: what it means, what it doesn’t mean, and statistical methods to detect it in humans. Hum Mol Genet 2002; 11(20): 2463-8.
[17]
Ivanova-Stoevska M, Penchev M, Stoyanova V, et al. Investigation of candidate genes reveals significant statistical epistasis between DISC1 and TPH2 in Bulgarian affective disorder patients. Biotechnol Biotechnol Equip 2017; 31(6): 1178-83.
[18]
Meng S, Liu G, Su L, et al. Functional clusters analysis and research based on differential coexpression networks. Biotechnol Biotechnol Equip 2018; 32(1): 171-82.
[19]
Ivanova N, Postadzhiyan A, Apostolova MD. An Application of Logistic Regression and Multifactor Dimensionality Reduction Analyses for Detecting Genotype-Phenotype Interactions Associated with Developing of Atherosclerosis in Bulgarian Cohort. Biotechnol Biotechnol Equip 2012; 26(Suppl. 1): 191-9.
[20]
Wei WH, Hemani G, Haley CS. Detecting epistasis in human complex traits. Nat Rev Genet 2014; 15(11): 722-33.
[21]
Mackay TF, Moore JH. Why epistasis is important for tackling complex human disease genetics. Genome Med 2014; 6(6): 124.
[22]
Hirschhorn JN, Daly MJ. Genome-wide association studies for common diseases and complex traits. Nat Rev Genet 2005; 6(2): 95-108.
[23]
Manolio TA. Genomewide association studies and assessment of the risk of disease. N Engl J Med 2010; 363(2): 166-76.
[24]
Schork NJ, Murray SS, Frazer KA, Topol EJ. Common vs. rare allele hypotheses for complex diseases. Curr Opin Genet Dev 2009; 19(3): 212-9.
[25]
Altmüller J, Palmer LJ, Fischer G, Scherb H, Wjst M. Genomewide scans of complex human diseases: true linkage is hard to find. Am J Hum Genet 2001; 69(5): 936-50.
[26]
Ritchie MD, Hahn LW, Roodi N, et al. Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet 2001; 69(1): 138-47.
[27]
Gyenesei A, Moody J, Semple CAM, Haley CS, Wei W-H. High-throughput analysis of epistasis in genome-wide association studies with BiForce. Bioinformatics 2012; 28(15): 1957-64.
[28]
Purcell S, Neale B, Todd-Brown K, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007; 81(3): 559-75.
[29]
Zhu Z, Tong X, Zhu Z, et al. Development of GMDR-GPU for gene-gene interaction analysis and its application to WTCCC GWAS data for type 2 diabetes. PLoS One 2013; 8(4)e61943
[30]
Zhang Y, Liu JS. Bayesian inference of epistatic interactions in case-control studies. Nat Genet 2007; 39(9): 1167-73.
[31]
Wan X, Yang C, Yang Q, et al. BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies. Am J Hum Genet 2010; 87(3): 325-40.
[32]
Zhang X, Zou FEI, Wang WEI. In:Biocomputing 2009 FASTCHI: An Efficient Algorithm For Analyzing gene-gene interactions. World scientific 2008; pp. 528-39.
[33]
Wu TT, Chen YF, Hastie T, Sobel E, Lange K. Genome-wide association analysis by lasso penalized logistic regression. Bioinformatics 2009; 25(6): 714-21.
[34]
Schwarz DF, König IR, Ziegler A. On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data. Bioinformatics 2010; 26(14): 1752-8.
[35]
Nguyen TT, Huang J, Wu Q, Nguyen T, Li M. Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests. BMC Genomics 2015; 16(S2)(Suppl. 2): S5.
[36]
Li J, Horstman B, Chen Y. Detecting epistatic effects in association studies at a genomic level based on an ensemble approach. Bioinformatics 2011; 27(13): i222-9.
[37]
Meng YA, Yu Y, Cupples LA, Farrer LA, Lunetta KL. Performance of random forest when SNPs are in linkage disequilibrium. BMC Bioinformatics 2009; 10(1): 78.
[38]
Wu Q, Ye Y, Liu Y, Ng MK. SNP selection and classification of genome-wide SNP data using stratified sampling random forests. IEEE Trans Nanobioscience 2012; 11(3): 216-27.
[39]
Jing P-J, Shen H-B. MACOED: a multi-objective ant colony optimization algorithm for SNP epistasis detection in genome-wide association studies. Bioinformatics 2015; 31(5): 634-41.
[40]
Wang Y, Liu X, Robbins K, Rekaya R. AntEpiSeeker: detecting epistatic interactions for case-control studies using a two-stage ant colony optimization algorithm. BMC Res Notes 2010; 3(1): 117.
[41]
Aflakparast M, Salimi H, Gerami A, Dubé MP, Visweswaran S, Masoudi-Nejad A. Cuckoo search epistasis: a new method for exploring significant genetic interactions. Heredity 2014; 112(6): 666-74.
[42]
Sun Y, Shang J, Liu JX, Li S, Zheng CH. epiACO - a method for identifying epistasis based on ant Colony optimization algorithm. BioData Min 2017; 10(1): 23.
[43]
Yuan L, Yuan CA, Huang DS. FAACOSE: A Fast Adaptive Ant Colony Optimization Algorithm for Detecting SNP Epistasis. Complexity 2017; 2017(1): 1-10.
[44]
Tuo S, Zhang J, Yuan X, Zhang Y, Liu Z. FHSA-SED: Two-Locus Model Detection for Genome-Wide Association Study with Harmony Search Algorithm. PLoS One 2016; 11(3)e0150669
[45]
Tuo S, Zhang J, Yuan X, He Z, Liu Y, Liu Z. Niche harmony search algorithm for detecting complex disease associated high-order SNP combinations. Sci Rep 2017; 7(1): 11529.
[46]
Yang C, He Z, Wan X, Yang Q, Xue H, Yu W. SNPHarvester: a filtering-based approach for detecting epistatic interactions in genome-wide association studies. Bioinformatics 2009; 25(4): 504-11.
[47]
Yang C-H, Chuang L-Y, Lin Y-D. CMDR based differential evolution identifies the epistatic interaction in genome-wide association studies. Bioinformatics 2017; 33(15): 2354-62.
[48]
Manjarres D, Landa-Torres I, Gil-Lopez S, et al. A survey on applications of the harmony search algorithm. Eng Appl Artif Intell 2013; 26(8): 1818-31.
[49]
Breiman LI, Friedman JH, Olshen RA, Stone CJ. Classification and Regression Trees (CART) In: Encyclopedia of Ecology. 1998; 40: 582-8.
[50]
Visweswaran S, Wong AKI, Barmada MM. A Bayesian Method for Identifying Genetic Interactions. AMIA Annual Symposium proceedings / AMIA Symposium AMIA Symposium 2009; 2009: 673.
[51]
Cooper GF, Herskovits E. A Bayesian method for the induction of probabilistic networks from data. Mach Learn 1992; 9(4): 309-47.
[52]
Hoey J. The Two-Way Likelihood Ratio (G) Test and Comparison to Two-Way Chi Squared Test. arXiv e-prints 2012. 6
[53]
Burton PR, Clayton DG, Cardon LR, et al. Wellcome Trust Case Control Consortium.Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 2007; 447(7145): 661-78.


Rights & PermissionsPrintExport Cite as

Article Details

VOLUME: 14
ISSUE: 8
Year: 2019
Page: [749 - 761]
Pages: 13
DOI: 10.2174/1574893614666190409110843
Price: $65

Article Metrics

PDF: 24
HTML: 5