A Method for Analyzing Two-locus Epistasis of Complex Diseases based on Decision Tree and Mutual Entropy

Author(s): Xiong Li* , Hui Yang , Kaifu Wen , Xiaoming Zhong , Xuewen Xia , Liyue Liu , Dehao Qin .

Journal Name: Current Proteomics

Volume 16 , Issue 5 , 2019

Become EABM
Become Reviewer

Graphical Abstract:


Abstract:

Background: Epistasis makes complex diseases difficult to understand, especially when heterogeneity also exists. Heterogeneity of complex diseases makes the distribution of case population more confused. However, the traditional methods proposed to detect epistasis often ignore heterogeneity, resulting in low power of association studies.

Methods: In this study, we firstly use rank information in the Classification Decision Tree and Mutual Entropy (CTME) to construct two different evaluation scores, namely multiple objectives. In addition, we improve the calculation of joint entropy between SNPs and disease label, which elevates the efficiency of CTME. Then, the ant colony algorithm is applied to search two-locus epistatic combination space. To handle the potential heterogeneity, all candidate two-locus SNPs are merged to recognize multiple different epistatic combinations. Finally, all these solutions are tested by χ2 test.

Results and Conclusion: Experiments show that our method CTME improves the power of association study. More importantly, CTME also detects multiple epistatic SNPs contributing to heterogeneity. The experimental results show that CTME has advantages on power and efficiency.

Keywords: Complex diseases, epistasis, heterogeneity, data mining, entropy, association study.

[1]
Wei, W.H.; Hemani, G.; Haley, C.S. Detecting epistasis in human complex traits. Nat. Rev. Genet., 2014, 15(11), 722-733.
[2]
Li, P.; Guo, M.; Wang, C.; Liu, X.; Zou, Q. An overview of SNP interactions in genome-wide association studies. Brief. Funct. Genomics, 2014, 14(2), 143-155.
[3]
Roberts, J.M.; Mascalzoni, D.; Ness, R.B.; Poston, L. Collaboration to understand complex diseases. Hypertension, 2016, 67(4), 681-687.
[4]
Hu, J.X.; Thomas, C.E.; Brunak, S. Network biology concepts in complex disease comorbidities. Nat. Rev. Genet., 2016, 17(10), 615-629.
[5]
Klein, R.J.; Zeiss, C.; Chew, E.Y.; Tsai, J.Y.; Sackler, R.S.; Haynes, C.; Henning, A.K.; SanGiovanni, J.P.; Mane, S.M.; Mayne, S.T.; Bracken, M.B.; Ferris, F.L.; Ott, J.; Barnstable, C.; Hoh, J. Complement factor h polymorphism in age-related macular degeneration. Science, 2007, 308(5720), 385-389.
[6]
Moore, J.H.; Asselbergs, F.W.; Williams, S.M. Bioinformatics challenges for genome-wide association studies. Bioinformatics, 2010, 26(4), 445.
[7]
Wang, M.H.; Sun, R.; Guo, J.; Weng, H.; Lee, J.; Hu, I.; Sham, P.C.; Zee, B.C. A fast and powerful W-test for pairwise epistasis testing. Nucleic Acids Res., 2016, 44(12)e115
[8]
Jing, P.J.; Shen, H.B. MACOED: a multi-objective ant colony optimization algorithm for SNP epistasis detection in genome-wide association studies. Bioinformatics, 2015, 31(5), 634.
[9]
Han, B.; Meeyoung, P.; Chen, X.W. A markov blanket-based method for detecting causal SNPs in GWAS. BMC Bioinformatics, 2010, 3(Suppl. 3), S5.
[10]
Ding, X.; Wang, J.; Zelikovsky, A.; Guo, X.; Xie, M.; Pan, Y. Searching high-order SNP combinations for complex diseases based on energy distribution difference. IEEE/Acm Trans. Comput. Biol. Bioinform., 2015, 12(3), 695-704.
[11]
Sluga, D.; Curk, T.; Zupan, B.; Lotric, U. Heterogeneous computing architecture for fast detection of SNP-SNP interactions. BMC Bioinformatics, 2014, 15(1), 216.
[12]
Li, X. A fast and exhaustive method for heterogeneity and epistasis analysis based on multi-objective optimization. Bioinformatics, 2017, 33(18), 2829-2836.
[13]
Xuan, G.; Yu, M.; Ning, Y.; Yi, P. Cloud computing for detecting high-order genome-wide epistatic interaction via dynamic clustering. BMC Bioinformatics, 2014, 15(1), 1-16.
[14]
Kamthong, T.; Azencott, C.A.; Cayton, L.; Pütz, B.; Altmann, A.; Karbalai, N.; Sämann, P.G.; Schölkopf, B.; Müller-Myhsok, B.; Borgwardt, K.M. Glide: GPU-based linear regression for detection of epistasis. Hum. Hered., 2012, 73(4), 220-236.
[15]
Beam, A.L.; Motsingerreif, A.; Doyle, J. Bayesian neural networks for detecting epistasis in genetic association studies. BMC Bioinformatics, 2014, 15(1), 368.
[16]
Lee, I.; Blom, U.M.; Wang, P.I.; Shim, J.E.; Marcotte, E.M. Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome Res., 2011, 21(7), 1109.
[17]
Chen, L.S.; Hutter, C.M.; Potter, J.D.; Liu, Y.; Prentice, R.L.; Peters, U.; Hsu, L. Insights into colon cancer etiology via a regularized approach to gene set analysis of GWAS data. Am. J. Hum. Genet., 2010, 86(6), 860-871.
[18]
Braun, R.; Buetow, K. Pathways of distinction analysis: a new technique for multi-SNP analysis of GWAS data. PLoS Genet., 2010, 7(6)e1002101
[19]
Askland, K.; Read, C.; O’Connell, C.; Moore, J.H. Ion channels and schizophrenia: a gene set-based analytic approach to GWAS data for biological hypothesis testing. Hum. Genet., 2012, 131(3), 373-391.
[20]
Gibson, G. Hints of hidden heritability in GWAS. Nat. Genet., 2010, 42(7), 558-560.
[21]
Holmes, J.H.; Lanzi, P.L. Learning classifier systems: new models, successful applications. Inf. Process. Lett., 2000, 82(1), 23-30.
[22]
John, U.R.; Andrew, A.S.; Rita, K.M.; Moore, J.H. Role of genetic heterogeneity and epistasis in bladder cancer susceptibility and outcome: a learning classifier system approach. J. Am. Med. Inform. Assoc., 2013, 20(4), 603-612.
[23]
Urbanowicz, R.J.; Jeff, K.; Sinnott-Armstrong, N.A.; Tamra, H.; Fisher, J.M.; Moore, J.H. Gametes: a fast, direct algorithm for generating pure, strict, epistatic models with random architectures. BioData Min., 2012, 5(1), 16.
[24]
Boryczka, U.; Kozak, J. Enhancing the effectiveness of ant colony decision tree algorithms by co-learning. Appl. Soft Comput., 2015, 30, 166-178.
[25]
Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput., 2002, 6(2), 182-197.
[26]
Chaharsooghi, S.K.; Kermani, A.H.M. An effective Ant Colony Optimization algorithm (ACO) for Multi-Objective Resource Allocation Problem (MORAP). Appl. Math. Comput., 2008, 200(1), 167-177.
[27]
Tuo, S.; Zhang, J.; Yuan, X.; Zhang, Y.; Liu, Z. FHSA-SED: two-locus model detection for genome-wide association study with harmony search algorithm. PLoS One, 2016, 11(3)e0150669
[28]
Li, X.; Jiang, W. Method for generating multiple risky barcodes of complex diseases using ant colony algorithm. Theor. Biol. Med. Model., 2017, 14(1), 4.
[29]
Yang, C.H.; Lin, Y.D.; Chuang, L.Y.; Chang, H.W. Evaluation of breast cancer susceptibility using improved genetic algorithms to generate genotype SNP barcodes. IEEE ACM Trans. Computat. Biol, 2013, 10(2), 361.
[30]
Gabriel, C.A.; Mitra, N.; Demichele, A.; Rebbeck, T. Association of Progesterone Receptor Gene (PGR) variants and breast cancer risk in African American women. Breast Cancer Res. Treat., 2013, 139(3), 833.
[31]
Pharoah, P.D.; Tyrer, J.; Dunning, A.M.; Easton, D.F.; Ponder, B.A.; Investigators, S. Association between common variation in 120 candidate genes and breast cancer risk. PLoS Genet., 2007, 3(3)e42
[32]
Udler, M.S.; Azzato, E.M.; Healey, C.S.; Ahmed, S.; Pooley, K.A.; Greenberg, D.; Shah, M.; Teschendorff, A.E.; Caldas, C.; Dunning, A.M.; Ostrander, E.A.; Caporaso, N.E.; Easton, D.; Pharoah, P.D. Common germline polymorphisms in COMT, CYP19A1, ESR1, PGR, SULT1E1 and STS and survival after a diagnosis of breast cancer. Int. J. Cancer, 2009, 125(11), 2687-2696.


Rights & PermissionsPrintExport Cite as

Article Details

VOLUME: 16
ISSUE: 5
Year: 2019
Page: [366 - 373]
Pages: 8
DOI: 10.2174/1570164616666190123150236
Price: $58

Article Metrics

PDF: 35
HTML: 2