Single-molecule Real-time (SMRT) Isoform Sequencing (Iso-Seq) in Plants: The Status of the Bioinformatics Tools to Unravel the Transcriptome Complexity

Author(s): Yubang Gao, Feihu Xi, Hangxiao zhang, Xuqing Liu, Huiyuan Wang, Liangzhen zhao, Anireddy S.N. Reddy, Lianfeng Gu*.

Journal Name: Current Bioinformatics

Volume 14 , Issue 7 , 2019

Become EABM
Become Reviewer

Graphical Abstract:


Abstract:

Background: The advent of the Single-Molecule Real-time (SMRT) Isoform Sequencing (Iso-Seq) has paved the way to obtain longer full-length transcripts. This method was found to be much superior in identifying full-length splice variants and other post-transcriptional events as compared to the Next Generation Sequencing (NGS)-based short read sequencing (RNA-Seq). Several different bioinformatics tools to analyze the Iso-Seq data have been developed and some of them are still being refined to address different aspects of transcriptome complexity. However, a comprehensive summary of the available tools and their utility is still lacking.

Objective: Here, we summarized the existing Iso-Seq analysis tools and presented an integrated bioinformatics pipeline for Iso-Seq analysis, which overcomes the limitations of NGS and generates long contiguous Full-Length Non-Chimeric (FLNC) reads for the analysis of posttranscriptional events.

Results: In this review, we summarized recent applications of Iso-Seq in plants, which include improved genome annotations, identification of novel genes and lncRNAs, identification of fulllength splice isoforms, detection of novel Alternative Splicing (AS) and Alternative Polyadenylation (APA) events. In addition, we also discussed the bioinformatics pipeline for comprehensive Iso-Seq data analysis, including how to reduce the error rate in the reads and how to identify and quantify post-transcriptional events. Furthermore, the visualization approach of Iso-Seq was discussed as well. Finally, we discussed methods to combine Iso-Seq data with RNA-Seq for transcriptome quantification.

Conclusion: Overall, this review demonstrates that the Iso-Seq is pivotal for analyzing transcriptome complexity and this new method offers unprecedented opportunities to comprehensively understand transcripts diversity.

Keywords: Pacific Bioscience (PacBio), SMRT Isoform Sequencing (Iso-Seq), Next-Generation Sequencing (NGS), Alternative Splicing (AS), Alternative Polyadenylation (APA), genome annotation, novel genes.

[1]
Rhoads A, Au KF. PacBio sequencing and its applications. Genomics Proteomics Bioinformatics 2015; 13(5): 278-89.
[2]
Gonzalez-Garay ML. Introduction to isoform sequencing using pacific biosciences technology (Iso-Seq) Transcriptomics and Gene Regulation. Springer 2016; pp. 141-60.
[3]
Abdel-Ghany SE, Hamilton M, Jacobi JL, et al. A survey of the sorghum transcriptome using single-molecule long reads. Nat Commun 2016; 7: 11706.
[4]
Wang T, Wang H, Cai D, et al. Comprehensive profiling of rhizome-associated alternative splicing and alternative polyadenylation in moso bamboo (Phyllostachys edulis). Plant J 2017; 91(4): 684-99.
[5]
Wang T, Wang H, Cai D, et al. Comprehensive profiling of rhizome-associated alternative splicing and alternative polyadenylation in moso bamboo (Phyllostachys edulis). Plant J 2017; 91(4): 684-99.
[6]
Li S, Yamada M, Han X, Ohler U, Benfey PN. High-resolution expression map of the Arabidopsis root reveals alternative splicing and lincRNA regulation. Dev Cell 2016; 39(4): 508-22.
[7]
Au KF, Underwood JG, Lee L, Wong WH. Improving PacBio long read accuracy by short read alignment. PLoS One 2012; 7(10)e46679
[8]
Wu TD, Watanabe CK. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 2005; 21(9): 1859-75.
[9]
Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 2012; 28(23): 3150-2.
[10]
Wang B, Tseng E, Regulski M, et al. Unveiling the complexity of the maize transcriptome by single-molecule long-read sequencing. Nat Commun 2016; 7: 11708.
[11]
Zhu FY, Chen MX, Ye NH, et al. Proteogenomic analysis reveals alternative splicing and translation as part of the abscisic acid response in Arabidopsis seedlings. Plant J 2017; 91(3): 518-33.
[12]
Xu Q, Zhu J, Zhao S, et al. Transcriptome Profiling Using Single-Molecule Direct RNA Sequencing Approach for In-depth Understanding of Genes in Secondary Metabolism Pathways of Camellia sinensis. Front Plant Sci 2017; 8: 1205.
[13]
Xu Z, Peters RJ, Weirather J, et al. Full-length transcriptome sequences and splice variants obtained by a combination of sequencing platforms applied to different root tissues of Salvia miltiorrhiza and tanshinone biosynthesis. Plant J 2015; 82(6): 951-61.
[14]
Metzker ML. Sequencing technologies - the next generation. Nat Rev Genet 2010; 11(1): 31-46.
[15]
Travers KJ, Chin C-S, Rank DR, Eid JS, Turner SW. A flexible and efficient template format for circular consensus sequencing and SNP detection. Nucleic Acids Res 2010; 38(15)e159
[16]
Pelechano V, Wei W, Jakob P, Steinmetz LM. Genome-wide identification of transcript start and end sites by transcript isoform sequencing. Nat Protoc 2014; 9(7): 1740-59.
[17]
Dong L, Liu H, Zhang J, et al. Single-molecule real-time transcript sequencing facilitates common wheat genome annotation and grain transcriptome research. BMC Genomics 2015; 16(1): 1039.
[18]
Tilgner H, Jahanbani F, Blauwkamp T, et al. Comprehensive transcriptome analysis using synthetic long-read sequencing reveals molecular co-association of distant splicing events. Nat Biotechnol 2015; 33(7): 736-42.
[19]
Liu X, Mei W, Soltis PS, Soltis DE, Barbazuk WB. Detecting alternatively spliced transcript isoforms from single-molecule long-read sequences without a reference genome. Mol Ecol Resour 2017; 17(6): 1243-56.
[20]
Wu X, Liu M, Downie B, et al. Genome-wide landscape of polyadenylation in Arabidopsis provides evidence for extensive alternative polyadenylation. Proc Natl Acad Sci USA 2011; 108(30): 12533-8.
[21]
Zhang Y, Gu L, Hou Y, et al. Integrative genome-wide analysis reveals HLP1, a novel RNA-binding protein, regulates plant flowering by targeting alternative polyadenylation. Cell Res 2015; 25(7): 864-76.
[22]
Filichkin SA, Priest HD, Givan SA, et al. Genome-wide mapping of alternative splicing in Arabidopsis thaliana. Genome Res 2010; 20(1): 45-58.
[23]
Muniz L, Davidson L, West S. Poly (A) polymerase and the nuclear poly (A) binding protein, PABPN1, coordinate the splicing and degradation of a subset of human pre-mRNAs. Mol Cell Biol 2015; 35(13): 2218-30.
[24]
Li Y, Dai C, Hu C, Liu Z, Kang C. Global identification of alternative splicing via comparative analysis of SMRT- and Illumina-based RNA-seq in strawberry. Plant J 2017; 90(1): 164-76.
[25]
Shepard PJ, Choi E-A, Lu J, Flanagan LA, Hertel KJ, Shi Y. Complex and dynamic landscape of RNA polyadenylation revealed by PAS-Seq. RNA 2011; 17(4): 761-72.
[26]
Nam DK, Lee S, Zhou G, et al. Oligo(dT) primer generates a high frequency of truncated cDNAs through internal poly(A) priming during reverse transcription. Proc Natl Acad Sci USA 2002; 99(9): 6152-6.
[27]
Sherstnev A, Duc C, Cole C, et al. Direct sequencing of Arabidopsis thaliana RNA reveals patterns of cleavage and polyadenylation. Nat Struct Mol Biol 2012; 19(8): 845-52.
[28]
Ozsolak F, Platt AR, Jones DR, et al. Direct RNA sequencing. Nature 2009; 461(7265): 814-8.
[29]
Li H, Wang J, Mor G, Sklar J. A neoplastic gene fusion mimics trans-splicing of RNAs in normal human cells. Science 2008; 321(5894): 1357-61.
[30]
Zhang G, Guo G, Hu X, et al. Deep RNA sequencing at single base-pair resolution reveals high complexity of the rice transcriptome. Genome Res 2010; 20(5): 646-54.
[31]
Koren S, Schatz MC, Walenz BP, et al. Adam M Phillippy. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat Biotechnol 2012; 30(7): 693-700.
[32]
Salmela L, Rivals E. LoRDEC: accurate and efficient long read error correction. Bioinformatics 2014; 30(24): 3506-14.
[33]
Chaisson MJ, Tesler G. Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinformatics 2012; 13(1): 238.
[34]
Li H. Minimap2: versatile pairwise alignment for nucleotide sequences. arXiv 2017. 1708
[35]
Dobin A, Davis CA, Schlesinger F, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 2013; 29(1): 15-21.
[36]
Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 2010; 26(5): 589-95.
[37]
Kent WJ. BLAT--the BLAST-like alignment tool. Genome Res 2002; 12(4): 656-64.
[38]
Krizanovic K, Echchiki A, Roux J, Sikic M. Evaluation of tools for long read RNA-seq splice-aware alignment. bioRxiv 2017. 126656
[39]
Au KF, Sebastiano V, Afshar PT, et al. Characterization of the human ESC transcriptome by hybrid sequencing. Proc Natl Acad Sci USA 2013; 110(50): E4821-30.
[40]
Shen S, Park JW, Lu ZX, et al. rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-Seq data. Proc Natl Acad Sci USA 2014; 111(51): E5593-601.
[41]
Campbell MA, Haas BJ, Hamilton JP, Mount SM, Buell CR. Comprehensive analysis of alternative splicing in rice and comparative analyses with Arabidopsis. BMC Genomics 2006; 7(1): 327.
[42]
Foissac S, Sammeth M. ASTALAVISTA: dynamic and flexible analysis of alternative splicing events in custom gene datasets. Nucleic Acids Res 2007; 35(Web Server issue): W297-9.
[43]
Pertea M, Pertea GM, Antonescu CM, Chang T-C, Mendell JT, Salzberg SL. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol 2015; 33(3): 290-5.
[44]
Hu J, Uapinyoying P, Goecks J. Interactive analysis of Long-read RNA isoforms with Iso-Seq Browser. bioRxiv 2017. 102905s
[45]
Zhou R, Moshgabadi N, Adams KL. Extensive changes to alternative splicing patterns following allopolyploidy in natural and resynthesized polyploids. Proc Natl Acad Sci USA 2011; 108(38): 16122-7.
[46]
Ner-Gaon H, Leviatan N, Rubin E, Fluhr R. Comparative cross-species alternative splicing in plants. Plant Physiol 2007; 144(3): 1632-41.
[47]
VanBuren R, Bryant D, Edger PP, et al. Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum. Nature 2015; 527(7579): 508-11.
[48]
Badouin H, Gouzy J, Grassa CJ, et al. The sunflower genome provides insights into oil metabolism, flowering and Asterid evolution. Nature 2017; 546(7656): 148-52.
[49]
Wang X, Xu Y, Zhang S, et al. Genomic analyses of primitive, wild and cultivated citrus provide insights into asexual reproduction. Nat Genet 2017; 49(5): 765-72.
[50]
Chin C-S, Alexander DH, Marks P, et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods 2013; 10(6): 563-9.


Rights & PermissionsPrintExport Cite as

Article Details

VOLUME: 14
ISSUE: 7
Year: 2019
Page: [566 - 573]
Pages: 8
DOI: 10.2174/1574893614666190204151746
Price: $58

Article Metrics

PDF: 31
HTML: 3
EPUB: 1
PRC: 1