Hypothetical Proteins as Predecessors of Long Non-coding RNAs

Author(s): Girik Malik, Tanu Agarwal, Utkarsh Raj, Vijayaraghava Seshadri Sundararajan, Obul Reddy Bandapalli*, Prashanth Suravajhala*

Journal Name: Current Genomics

Volume 21 , Issue 7 , 2020

Become EABM
Become Reviewer
Call for Editor

Graphical Abstract:


Hypothetical Proteins [HP] are the transcripts predicted to be expressed in an organism, but no evidence of it exists in gene banks. On the other hand, long non-coding RNAs [lncRNAs] are the transcripts that might be present in the 5’ UTR or intergenic regions of the genes whose lengths are above 200 bases. With the known unknown [KU] regions in the genomes rapidly existing in gene banks, there is a need to understand the role of open reading frames in the context of annotation. In this commentary, we emphasize that HPs could indeed be the predecessors of lncRNAs.

Keywords: Hypothetical proteins, lncRNA, aptamers, annotation, functional genomics, transcripts.

Logan, D.C. Known knowns, known unknowns, unknown unknowns and the propagation of scientific enquiry. J. Exp. Bot., 2009, 60(3), 712-714.
[http://dx.doi.org/10.1093/jxb/erp043] [PMID: 19269994]
Galperin, M.Y.; Nikolskaya, A.N.; Koonin, E.V. Novel domains of the prokaryotic two-component signal transduction systems. FEMS Microbiol. Lett., 2001, 203(1), 11-21.
Eisenstein, E.; Gilliland, G.L.; Herzberg, O.; Moult, J.; Orban, J.; Poljak, R.J.; Banerjei, L.; Richardson, D.; Howard, A.J. Biological function made crystal clear-annotation of hypothetical proteins via structural genomics. Curr. Opin. Biotechnol., 2000, 11(1), 25-30.
Sharma, M.; Vedithi, S.C.; Das, M.; Roy, A.; Ebenezer, M. Sequence homology and expression profile of genes associated with DNA repair pathways in Mycobacterium leprae. Int. J. Mycobacteriol., 2017, 6(4), 365-378.
[http://dx.doi.org/10.4103/ijmy.ijmy_111_17] [PMID: 29171451]
Nimrod, G.; Schushan, M.; Steinberg, D.M.; Ben-Tal, N. Detection of functionally important regions in “hypothetical proteins” of known structure. Structure, 2008, 16(12), 1755-1763.
[http://dx.doi.org/10.1016/j.str.2008.10.017] [PMID: 19081051]
Shahbaaz, M.; Hassan, M.I.; Ahmad, F. Functional annotation of conserved hypothetical proteins from Haemophilus influenzae Rd KW20. PLoS One, 2013, 8(12) e84263
[http://dx.doi.org/10.1371/journal.pone.0084263] [PMID: 24391926]
Ansell, B.R.E.; Pope, B.J.; Georgeson, P.; Emery-Corbin, S.J.; Jex, A.R. Annotation of the Giardia proteome through structure-based homology and machine learning. Gigascience, 2019, 8(1), 8.
[http://dx.doi.org/10.1093/gigascience/giy150] [PMID: 30520990]
Yang, Z.; Tsui, S.K. Functional annotation of proteins encoded by the minimal bacterial genome based on secondary structure element alignment. J. Proteome Res., 2018, 17(7), 2511-2520.
[http://dx.doi.org/10.1021/acs.jproteome.8b00262] [PMID: 29757649]
Murakami, M.; Nakagawa, M.; Olson, E.N.; Nakagawa, O. A WW domain protein TAZ is a critical coactivator for TBX5, a transcription factor implicated in Holt-Oram syndrome. Proc. Natl. Acad. Sci. USA, 2005, 102(50), 18034-18039.
[http://dx.doi.org/10.1073/pnas.0509109102] [PMID: 16332960]
Shidhi, P.R.; Nair, A.S.; Suravajhala, P. Identifying pseudogenes from hypothetical proteins for making synthetic proteins. Syst. Synth. Biol., 2014, 8(2), 169-171.
[http://dx.doi.org/10.1007/s11693-014-9148-4] [PMID: 24799963]
Rehman, H.U.; Benso, A.; Di Carlo, S.; Politane, G.; Savino, A.; Suravajhala, P. Combining homolog and motif similarity data with Gene Ontology relationships for protein function prediction. 2012IEEE International Conference on Bioinformatics and Biomedicine, , pp. 1-4.
Sundararajan, V.S.; Malik, G.; Ijaq, J.; Kumar, A.; Das, P.S.; Shidhi, P.R.; Nair, A.S.; Dhar, P.K.; Suravajhala, P. Hypo: a database of human hypothetical proteins. Protein Pept. Lett., 2018, 25(8), 799-803.
[http://dx.doi.org/10.2174/0929866525666180828110444] [PMID: 30152276]
Comfort, N. Genetics: we are the 98%. Nature, 2015, 520(7549), 615.
Ijaq, J.; Malik, G.; Kumar, A.; Das, P.S.; Meena, N.; Bethi, N.; Sundararajan, V.S.; Suravajhala, P. A model to predict the function of hypothetical proteins through a nine-point classification scoring schema. BMC Bioinformatics, 2019, 20(1), 14.
[http://dx.doi.org/10.1186/s12859-018-2554-y] [PMID: 30621574]
Desler, C.; Zambach, S.; Suravajhala, P.; Rasmussen, L.J. Introducing the hypothome: a way to integrate predicted proteins in interactomes. Int. J. Bioinform. Res. Appl., 2014, 10(6), 647-652.
[http://dx.doi.org/10.1504/IJBRA.2014.065247] [PMID: 25335568]
Liu, C.; Bai, B.; Skogerbø, G.; Cai, L.; Deng, W.; Zhang, Y.; Bu, D.; Zhao, Y.; Chen, R. NONCODE: an integrated knowledge database of non-coding RNA. Nucleic Acids Res., 2005, 33, D112-5.
Volders, P.J.; Helsens, K.; Wang, X.; Menten, B.; Martens, L.; Gevaert, K.; Vandesompele, J.; Mestdagh, P. LNCipedia: a database for annotated human lncRNA transcript sequences and structures. Nucleic Acids Res., 2013, 41(D1), D246-D251.
Muetze, T.; Goenawan, I.H.; Wiencko, H.L.; Bernal-Llinares, M.; Bryan, K.; Lynn, D.J. Contextual Hub Analysis Tool (CHAT): A Cytoscape app for identifying contextually relevant hubs in biological networks. F1000 Res., 2016, 5, 1745.
[http://dx.doi.org/10.12688/f1000research.9118.1] [PMID: 27853512]
Hallen, M.A.; Martin, J.W.; Ojewole, A.; Jou, J.D.; Lowegard, A.U.; Frenkel, M.S.; Gainza, P.; Nisonoff, H.M.; Mukund, A.; Wang, S.; Holt, G.T.; Zhou, D.; Dowd, E.; Donald, B.R. OSPREY 3.0: Open-source protein redesign for you, with powerful new features. J. Comput. Chem., 2018, 39(30), 2494-2507.
[http://dx.doi.org/10.1002/jcc.25522] [PMID: 30368845]
Finocchi, I.; Finocchi, M.; Fusco, E.G. Clique counting in MapReduce: theory and experiments. J. Exp. Algorithmics, 2014, 20.
de Castro, M.R.; Tostes, C.D.S.; Dávila, A.M.R.; Senger, H.; da Silva, F.A.B. SparkBLAST: scalable BLAST processing using in memory operations. BMC Bioinformatics, 2017, 18(1), 318.
[http://dx.doi.org/10.1186/s12859-017-1723-8] [PMID: 28655296]
Meng, Z.; Li, J.; Zhou, Y.; Liu, Q.; Liu, Y.; Cao, W. Cloud BLAST: An efficient mapreduce program for bioinformatics applications. Proceedings of 4th International Conference on Biomedical Engineering and Informatics [BMEI], Shanghai; , 2011, pp. 2072-2076.
Yang, X.; Liu, Y.; Yuan, C.; Huang, Y. Parallelization of BLAST with MapReduce for long sequence alignment. 2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming, Tianjin, 2011, pp. 241-246.
Gao, R.; Wang, M.; Zhou, J.; Fu, Y.; Liang, M.; Guo, D.; Nie, J. Prediction of enzyme function based on three parallel deep CNN and amino acid mutation. Int. J. Mol. Sci., 2019, 20(11), 2845.
[http://dx.doi.org/10.3390/ijms20112845] [PMID: 31212665]
Sureyya Rifaioglu, A.; Doğan, T.; Jesus Martin, M.; Cetin-Atalay, R.; Atalay, V. DEEPred: automated protein function prediction with multi-task feed-forward deep neural networks. Sci. Rep., 2019, 9(1), 7344.
[http://dx.doi.org/10.1038/s41598-019-43708-3] [PMID: 31089211]
Liu, X. Deep recurrent neural network for protein function prediction from sequence. arXiv preprint. arXiv:1701.08318, 2017.
Lavallée-Adam, M.; Park, S.K.; Martínez-Bartolomé, S.; He, L.; Yates, J.R., III From raw data to biological discoveries: a computational analysis pipeline for mass spectrometry-based proteomics. J. Am. Soc. Mass Spectrom., 2015, 26(11), 1820-1826.
[http://dx.doi.org/10.1007/s13361-015-1161-7] [PMID: 26002791]
Keller, A.; Shteynberg, D. Software pipeline and data analysis for MS/MS proteomics: the trans-proteomic pipeline. Bioinformatics for Comparative Proteomics; Humana Press, 2011, pp. 169-189.
Song, K-M.; Lee, S.; Ban, C. Aptamers and their biological applications. Sensors (Basel), 2012, 12(1), 612-631.
[http://dx.doi.org/10.3390/s120100612] [PMID: 22368488]
Suravajhala, P; Burri, HVR Heiskanen, a combining aptamers and in silico interaction studies to decipher the function of hypothetical proteins, 2014, 3(8), 809-810.

Rights & PermissionsPrintExport Cite as

Article Details

Year: 2020
Page: [531 - 535]
Pages: 5
DOI: 10.2174/1389202921999200611155418
Price: $65

Article Metrics

PDF: 25
PRC: 1