A Spectral Rotation Method with Triplet Periodicity Property for Planted Motif Finding Problems

Author(s): Xun Wang, Shudong Wang, Tao Song*

Journal Name: Combinatorial Chemistry & High Throughput Screening
Accelerated Technologies for Biotechnology, Bioassays, Medicinal Chemistry and Natural Products Research

Volume 22 , Issue 10 , 2019

Become EABM
Become Reviewer
Call for Editor


Background: Genes are known as functional patterns in the genome and are presumed to have biological significance. They can indicate binding sites for transcription factors and they encode certain proteins. Finding genes from biological sequences is a major task in computational biology for unraveling the mechanisms of gene expression.

Objective: Planted motif finding problems are a class of mathematical models abstracted from the process of detecting genes from genome, in which a specific gene with a number of mutations is planted into a randomly generated background sequence, and then gene finding algorithms can be tested to check if the planted gene can be found in feasible time.

Methods: In this work, a spectral rotation method based on triplet periodicity property is proposed to solve planted motif finding problems.

Results: The proposed method gives significant tolerance of base mutations in genes. Specifically, genes having a number of substitutions can be detected from randomly generated background sequences. Experimental results on genomic data set from Saccharomyces cerevisiae reveal that genes can be visually distinguished. It is proposed that genes with about 50% mutations can be detected from randomly generated background sequences.

Conclusion: It is found that with about 5 insertions or deletions, this method fails in finding the planted genes. For a particular case, if the deletion of bases is located at the beginning of the gene, that is, bases are not randomly deleted, then the tolerance of the method for base deletion is increased.

Keywords: Gene detection, motif finding, visualization method, fast algorithm, fourier spectrums, planted motif finding problem.

Grada, A.; Weinbrecht, K. Next-generation sequencing: methodology and application. J. Invest. Dermatol., 2013, 133(8)e11
[http://dx.doi.org/10.1038/jid.2013.248] [PMID: 23856935]
Hall, N. Advanced sequencing technologies and their wider impact in microbiology. J. Exp. Biol., 2007, 210(Pt 9), 1518-1525.
[http://dx.doi.org/10.1242/jeb.001370] [PMID: 17449817]
Church, G.M. Genomes for all. Sci. Am., 2006, 294(1), 46-54.
[http://dx.doi.org/10.1038/scientificamerican0106-46] [PMID: 16468433]
Kalb, G.; Moxley, R. Massively Parallel, Optical, and Neural Computing in the United States; IOS Press, 1992.
ten Bosch, J.R.; Grody, W.W. Keeping up with the next generation: massively parallel sequencing in clinical diagnostics. J. Mol. Diagn., 2008, 10(6), 484-492.
[http://dx.doi.org/10.2353/jmoldx.2008.080027] [PMID: 18832462]
Tucker, T.; Marra, M.; Friedman, J.M. Massively parallel sequencing: the next big thing in genetic medicine. Am. J. Hum. Genet., 2009, 85(2), 142-154.
[http://dx.doi.org/10.1016/j.ajhg.2009.06.022] [PMID: 19679224]
Pearson, H. Genetics: what is a gene? Nature, 2006, 441, 398-401.
Wang, X.; Miao, Y.; Cheng, M. Finding motifs in DNA sequences using low-dispersion sequences. J. Comput. Biol., 2014, 21(4), 320-329.
[http://dx.doi.org/10.1089/cmb.2013.0054] [PMID: 24597706]
Bailey, T.L.; Williams, N.; Misleh, C.; Li, W.W. MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res., 2006, 34(Suppl. 2), W369-W373.
[http://dx.doi.org/10.1093/nar/gkl198] [PMID: 16845028]
Baraquet, C.; Harwood, C.S. FleQ DNA binding consensus sequence revealed by studies of FleQ-dependent regulation of biofilm gene expression in Pseudomonas aeruginosa. J. Bacteriol., 2015, 198(1), 178-186.
[http://dx.doi.org/10.1128/JB.00539-15] [PMID: 26483521]
Machhi, V.; Patel, M.S.; Degama, J. Motif finding with application to the transcription factor binding sites problem. Int. J. Comput. Appl., 2015, 120(15), 7-10.
Altschul, S.F.; Gish, W.; Miller, W.; Myers, E.W.; Lipman, D.J. Basic local alignment search tool. J. Mol. Biol., 1990, 215(3), 403-410.
[http://dx.doi.org/10.1016/S0022-2836(05)80360-2] [PMID: 2231712]
Wang, X.; Miao, Y. GAEM: a hybrid algorithm incorporating GA with EM for planted edited motif finding problem. Curr. Bioinform., 2014, 9(5), 463-469.
Jiang, J.; Xing, F.; Zeng, X.; Zou, Q.; Ricyer, D.B. RicyerDB: a database for collecting rice yield-related genes with biological analysis. Int. J. Biol. Sci., 2018, 14(8), 965-970.
[http://dx.doi.org/10.7150/ijbs.23328] [PMID: 29989091]
Song, L.; Li, D.; Zeng, X.; Wu, Y.; Guo, L.; Zou, Q. nDNA-Prot: identification of DNA-binding proteins based on unbalanced classification. BMC Bioinformatics, 2014, 15(1), 298.
[http://dx.doi.org/10.1186/1471-2105-15-298] [PMID: 25196432]
Xu, H.; Zeng, W.; Zhang, D. MOEA/HD: a multiobjective evolutionary algorithm based on hierarchical decomposition. IEEE Trans. Cybern., 2019, 49(2), 517-526.
[http://dx.doi.org/10.1109/TCYB.2017.2779450] [PMID: 29990272]
Zou, Q.; Wan, S.; Zeng, X.; Ma, Z.S. Reconstructing evolutionary trees in parallel for massive sequences. BMC Syst. Biol., 2017, 11(6), 100.
[http://dx.doi.org/10.1186/s12918-017-0476-3] [PMID: 29297337]
Wang, X.; Song, T.; Gong, F.; Pan, Z. On the computational power of spiking neural P systems with self-organization. Sci. Rep., 2016, 2016, Article No.27624.
Chen, B.; Ji, P. Visualization of the protein-coding regions with a self adaptive spectral rotation approach. Nucleic Acids Res., 2011, 39(1), e3-e3.
[http://dx.doi.org/10.1093/nar/gkq891] [PMID: 20947567]
Maji, S.; Garg, D. Progress in gene prediction: principles and challenges. Curr. Bioinform., 2013, 8(2), 226-243.
Tiwari, S.; Ramachandran, S.; Bhattacharya, A.; Bhattacharya, S.; Ramaswamy, R. Prediction of probable genes by Fourier analysis of genomic sequences. Comput. Appl. Biosci., 1997, 13(3), 263-270.
[http://dx.doi.org/10.1093/bioinformatics/13.3.263] [PMID: 9183531]
Voss, R.F. Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. Phys. Rev. Lett., 1992, 68(25), 3805-3808.
[http://dx.doi.org/10.1103/PhysRevLett.68.3805] [PMID: 10045801]
Fickett, J.W.; Tung, C.S. Assessment of protein coding measures. Nucleic Acids Res., 1992, 20(24), 6441-6450.
[http://dx.doi.org/10.1093/nar/20.24.6441] [PMID: 1480466]
Kotlar, D.; Lavner, Y. Gene prediction by spectral rotation measure: a new method for identifying protein-coding regions. Genome Res., 2003, 13(8), 1930-1937.
[http://dx.doi.org/10.1101/gr.1261703] [PMID: 12869578]
Frenkel, F.E.; Korotkov, E.V. Classification analysis of triplet periodicity in protein-coding regions of genes. Gene, 2008, 421(1-2), 52-60.
[http://dx.doi.org/10.1016/j.gene.2008.06.012] [PMID: 18593596]
Jia, C.; Yang, Q.; Zou, Q. NucPosPred: Predicting species-specific genomic nucleosome positioning via four different modes of general PseKNC. J. Theor. Biol., 2018, 450, 15-21.
[http://dx.doi.org/10.1016/j.jtbi.2018.04.025] [PMID: 29678692]
Wei, L.; Su, R.; Wang, B.; Li, X.; Zou, Q. Integration of deep feature representations and handcrafted features to improve the prediction of N6-methyladenosine sites. Neurocomputing, 2019, 324, 3-9.
Zou, Q.; Liu, Q. Advanced machine learning techniques for bioinformatics. IEEE/ACM Trans.on Computational Biology and Bioinformatics, 2019, 16(4), 1182-1183.
Chen, X.; Wang, C.; Tang, S.; Yu, C.; Zou, Q. CMSA: a heterogeneous CPU/GPU computing system for multiple similar RNA/DNA sequence alignment. BMC Bioinformatics, 2017, 18(1), 315.
[http://dx.doi.org/10.1186/s12859-017-1725-6] [PMID: 28646874]
Song, T.; Zeng, X.; Zheng, P.; Jiang, M.; Rodriguez-Paton, A. A parallel workflow pattern modeling using spiking neural P systems with colored spikes. IEEE Trans. Nanobioscience, 2018, 17(4), 474-484.
[http://dx.doi.org/10.1109/TNB.2018.2873221] [PMID: 30281471]
Song, T.; Rodriguez-Paton, A.; Zheng, P. Spiking neural P systems with colored spikes. IEEE Transactions on Cognitive and Developmental Systems, 2018, 10(4), 1106-1115.
Song, T.; Liu, X.; Zeng, X. Asynchronous spiking neural P systems with anti-spikes. Neural Process. Lett., 2015, 42(3), 633-647.
Song, T.; Wang, X. Homogenous spiking neural P systems with inhibitory synapses. Neural Process. Lett., 2015, 42(1), 199-214.
Song, T.; Zheng, P.; Wong, M.L.D. Design of logic gates using spiking neural P systems with homogeneous neurons and astrocytes-like control. Inf. Sci., 2016, 372, 380-391.
Song, T.; Gong, F.; Liu, X.; Zhao, Y.; Zhang, X. Spiking neural P systems with white hole neurons. IEEE Trans. Nanobioscience, 2016, 15(7), 666-673.
[http://dx.doi.org/10.1109/TNB.2016.2598879] [PMID: 28029614]
Zhang, X.; Zheng, X.; Cheng, R. A competitive mechanism based multi-objective particle swarm optimizer with fast convergence. Inf. Sci., 2018, 427, 63-76.
Tian, Y.; Cheng, R.; Zhang, X. An indicator-based multiobjective evolutionary algorithm with reference point adaptation for better versatility. IEEE Trans. Evol. Comput., 2018, 22(4), 609-622.
Tian, Y.; Wang, H.; Zhang, X. Effectiveness and efficiency of non-dominated sorting for evolutionary multi-and many-objective optimization. Complex & Intelligent Systems, 2017, 3(4), 247-263.
Zhang, X.; Duan, F.; Zhang, L. Pattern recommendation in task-oriented applications: a multi-objictive perspective. IEEE Comput. Intell. Mag., 2017, 12(3), 43-53.

Rights & PermissionsPrintExport Cite as

Article Details

Year: 2019
Published on: 28 November, 2019
Page: [683 - 693]
Pages: 11
DOI: 10.2174/1386207322666191129112433
Price: $65

Article Metrics

PDF: 22
PRC: 1