Systematic Comparisons of Positively Selected Genes between Gossypium arboreum and Gossypium raimondii Genomes

Author(s): Yue Guo , Zhen Peng , Jing Liu , Na Yuan , Zhen Wang , Jianchang Du* .

Journal Name: Current Bioinformatics

Volume 14 , Issue 7 , 2019

Become EABM
Become Reviewer

Graphical Abstract:


Background: Studies of Positively Selected Genes (PSGs) in microorganisms and mammals have provided insights into the dynamics of genome evolution and the genetic basis of differences between species by using whole genome-wide scans. Systematic investigations and comparisons of PSGs in plants, however, are still limited.

Objective: A systematic comparison of PSGs between the genomes of two cotton species, Gossypium arboreum (G. arboreum) and G. raimondii, will give the key answer for revealing molecular evolutionary differences in plants.

Methods: Genome sequences of G. arboreum and G. raimondii were compared, including Whole Genome Duplication (WGD) events and genomic features such as gene number, gene length, codon bias index, evolutionary rate, number of expressed genes, and retention of duplicated copies.

Results: Unlike the PSGs in G. raimondii, G. arboreum comprised more PSGs, smaller gene size and fewer expressed gene. In addition, the PSGs evolved at a higher rate of synonymous substitutions, but were subjected to lower selection pressure. The PSGs in G. arboreum were also retained with a lower number of duplicate gene copies than G. raimondii after a single WGD event involving Gossypium.

Conclusion: These data indicate that PSGs in G. arboreum and G. raimondii differ not only in Ka/Ks, but also in their evolutionary, structural, and expression properties, indicating that divergence of G. arboreum and G. raimondii was associated with differences in PSGs in terms of evolutionary rates, gene length, expression patterns, and WGD retention in Gossypium.

Keywords: Positively selected genes, ka, Ks, evolution, Gossypium arboreum, Gossypium raimondii.

Nielsen R. Molecular signatures of natural selection. Annu Rev Genet 2005; 39: 197-218.
WenHsiung L. Molecular evolution. Sunderland . 1997.
Kondrashov FA, Koonin EV. A common framework for understanding the origin of genetic dominance and evolutionary fates of gene duplications. Trends Genet 2004; 20(7): 287-90.
Kondrashov FA, Rogozin IB, Wolf YI, Koonin EV. Selection in the evolution of gene duplications. Genome Biol 2002; 3(2): 1-9.
Yang Z. The power of phylogenetic comparison in revealing protein function. Proc Natl Acad Sci USA 2005; 102(9): 3179-80.
Bakewell MA, Shi P, Zhang J. More genes underwent positive selection in chimpanzee evolution than in human evolution. Proc Natl Acad Sci USA 2007; 104(18): 7489-94.
Nei M, Kumar S. Molecular evolution and phylogenetics. Oxford 2000.
Kosiol C, Vinař T, da Fonseca RR, et al. Patterns of positive selection in six mammalian genomes. PLoS Genet 2008; 4(8)e1000144
Lander ES, Linton LM, Birren B, et al. Initial sequencing and analysis of the human genome. Nature 2001; 409(6822): 860-921.
The Chimpanzee Sequencing and Analysis Consortium.. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 2005; 437(7055): 69-87.
Gibbs RA, Rogers J, Katze MG, et al. Evolutionary and biomedical insights from the rhesus macaque genome. Science 2007; 316(5822): 222-34.
Chinwalla AT, Cook LL, Delehaunty KD, et al. Initial sequencing and comparative analysis of the mouse genome. Nature 2002; 420(6915): 520-62.
Gibbs RA, Weinstock GM, Metzker ML, et al. Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 2004; 428(6982): 493-521.
Lindblad-Toh K, Wade CM, Mikkelsen TS, et al. Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature 2005; 438(7069): 803-19.
Mondragón-Palomino M, Meyers BC, Michelmore RW, Gaut BS. Patterns of positive selection in the complete NBS-LRR gene family of Arabidopsis thaliana. Genome Res 2002; 12(9): 1305-15.
Wang K, Wang Z, Li F, et al. The draft genome of a diploid cotton Gossypium raimondii. Nat Genet 2012; 44(10): 1098-103.
Paterson AH, Wendel JF, Gundlach H, et al. Repeated polyploidization of Gossypium genomes and the evolution of spinnable cotton fibres. Nature 2012; 492(7429): 423-7.
Li F, Fan G, Wang K, et al. Genome sequence of the cultivated cotton Gossypium arboreum. Nat Genet 2014; 46(6): 567-72.
Zhang T, Hu Y, Jiang W, et al. Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement. Nat Biotechnol 2015; 33(5): 531-7.
Li F, Fan G, Lu C, et al. Genome sequence of cultivated Upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution. Nat Biotechnol 2015; 33(5): 524-30.
Liu X, Zhao B, Zheng HJ, et al. Gossypium barbadense genome sequence provides insight into the evolution of extra-long staple fiber and specialized metabolites. Sci Rep 2015; 5: 1.
Wendel JF, Albert VA. Phylogenetics of the cotton genus (Gossypium): character-state weighted parsimony analysis of chloroplast-DNA restriction site data and its systematic and biogeographic implications. Syst Bot 1992; 17(1): 115-43.
Hendrix B, Stewart JM. Estimation of the nuclear DNA content of Gossypium species. Ann Bot 2005; 95(5): 789-97.
Zhang HB, Li Y, Wang B, Chee PW. Recent advances in cotton genomics. Int J Plant Genomics 2008; 2008: 1.
Yang Z, Nielsen R. Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol Biol Evol 2002; 19: 908-17.
Yang Z, Bielawski JP. Statistical methods for detecting molecular adaptation. Trends Ecol Evol 2000; 15(12): 496-503.
Yu J, Jung S, Cheng CH, et al. CottonGen: a genomics, genetics and breeding database for cotton research. Nucleic Acids Res 2014; 42(D1): D1229-36.
Östlund G, Schmitt T, Forslund K, et al. InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nucleic Acids Res 2010; 38(Suppl. 1): D196-203.
Du J, Tian Z, Sui Y, et al. Pericentromeric effects shape the patterns of divergence, retention, and expression of duplicated genes in the paleopolyploid soybean. Plant Cell 2012; 24(1): 21-32.
Wang Y, Tang H, DeBarry JD, et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res 2012; 40e49
Yang Z. PAML: a program package for phylogenetic analysis by maximum likelihood. CABIOS 1997; 15: 555-6.
Sokal R, Rohlf FJ. Biometry: the principles and practice of statistics in biological research. San Francisco 1995.
Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc Proc Natl Acad Sci USA 2003; 100(16): 9440-5.
Ye J, Fang L, Zheng H. WEGO: a web tool for plotting GO annotations. Nucleic Acids Res 2006; 34(Suppl. 2): W293-7.
Chen S, Yang P, Jiang F, et al. De novo analysis of transcriptome dynamics in the migratory locust during the development of phase traits. PLoS One 2010; 5e15633
Huang DW, Sherman BT, Lempicki RA. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 2009; 37: 1-13.
Beißbarth T, Speed TP. GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics 2004; 20: 1464-5.
Lu Y, Li H, Chen X, Huang H, Tong Z. Development amplification concensus genetic markers in Betula Luminifera based on birch EST database. Wood Res 2011; 56(2): 169-77.
Field A, Miles J, Field Z. Discovering statistics using R. Thousand Oaks 2012.
Chen FC, Li WH. Genomic divergences between humans and other hominoids and the effective population size of the common ancestor of humans and chimpanzees. Am J Hum Genet 2001; 68(2): 444-56.
Stone AC, Griffiths RC, Zegura SL, Hammer MF. High levels of Y-chromosome nucleotide diversity in the genus Pan. Proc Natl Acad Sci USA 2002; 99(1): 43-8.
Kaessmann H, Wiebe V, Pääbo S. Extensive nuclear DNA sequence diversity among chimpanzees. Science 1999; 286(5442): 1159-62.
Fischer A, Wiebe V, Pääbo S, Przeworski M. Evidence for a complex demographic history of chimpanzees. Mol Biol Evol 2004; 21(5): 799-808.
Ferris S, Wilson A, Brown W. Evolutionary tree for apes and humans based on cleavage maps of mitochondrial DNA. Proc Natl Acad Sci USA 1981; 78(4): 2432-6.
Kaessmann H, Wiebe V, Weiss G, Pääbo S. Great ape DNA sequences reveal a reduced diversity and an expansion in humans. Nat Genet 2001; 27(2): 155-6.
Wall JD. Estimating ancestral population sizes and divergence times. Genetics 2003; 163(1): 395-404.
Takahata N, Satta Y, Klein J. Divergence time and population size in the lineage leading to modern humans. Theor Popul Biol 1995; 48(2): 198-221.
Wendel JF, Cronn RC. Polyploidy and the evolutionary history of cotton. Adv Agron 2003; 78: 139-86.
Gaut B, Yang L, Takuno S, Eguiarte LE. The patterns and causes of variation in plant nucleotide substitution rates. Annu Rev Ecol Evol Syst 2011; 42: 245-66.
Yang L, Gaut BS. Factors that contribute to variation in evolutionary rate among Arabidopsis genes. Mol Biol Evol 2011; 28(8): 2359-69.
Pál C, Papp B, Hurst LD. Highly expressed genes in yeast evolve slowly. Genetics 2001; 158(2): 927-31.
Subramanian S, Kumar S. Gene expression intensity shapes evolutionary rates of the proteins encoded by the vertebrate genome. Genetics 2004; 168(1): 373-81.
Drummond DA, Raval A, Wilke CO. A single determinant dominates the rate of yeast protein evolution. Mol Biol Evol 2006; 23(2): 327-37.
Duret L, Mouchiroud D. Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate. Mol Biol Evol 2000; 17(1): 68-74.
Zhang L, Li WH. Mammalian housekeeping genes evolve more slowly than tissue-specific genes. Mol Biol Evol 2004; 21(2): 236-9.
Rocha EPC, Danchin A. An Analysis of determinants of amino acids substitution rates in bacterial proteins. Mol Biol Evol 2004; 21: 108-16.
Drummond DA, Bloom JD, Adami C, et al. Why highly expressed proteins evolve slowly. Proc Natl Acad Sci USA 2005; 102: 14338-43.
Popescu CE, Borza T, Bielawski JP, et al. Evolutionary rates and expression level in Chlamydomonas. Genetics 2006; 172: 1567-76.
Lemos B, Bettencourt BR, Meiklejohn CD, et al. Evolution of proteins and gene expression levels are coupled in Drosophila and are independently associated with mRNA abundance, protein length, and number of protein-protein interactions. Mol Biol Evol 2005; 22: 1345-54.
Marais G, Domazet-Lošo T, Tautz D, et al. Correlated evolution of synonymous and nonsynonymous sites in Drosophila. J Mol Evol 2004; 59: 771-9.
Larracuente AM, Sackton TB, Greenberg AJ, et al. Evolution of protein-coding genes in Drosophila. Trends Genet 2008; 24: 114-23.
Wright SI, Yau C, Looseley M, et al. Effects of gene expression on molecular evolution in Arabidopsis thaliana and Arabidopsis lyrata. Mol Biol Evol 2004; 21: 1719-26.
Ingvarsson PK. Gene expression and protein length influence codon usage and rates of sequence evolution in Populus tremula. Mol Biol Evol 2007; 24: 836-44.
Guo Y, Liu J, Zhang J, et al. Selective modes determine evolutionary rates, gene compactness, and expression patterns in Brassica. Plant J 2017; 91: 34-44.
Warringer J, Blomberg A. Evolutionary constraints on yeast protein size. BMC Evol Biol 2006; 6: 61.
Lackner DH, Beilharz TH, Marguerat S, et al. A network of multiple regulatory layers shapes gene expression in fission yeast. Mol Cell 2007; 26: 145-55.
Ren XY, Vorst O, Fiers MW, et al. In plants, highly expressed genes are the least compact. Trends Genet 2006; 22: 528-32.
Woody JL, Severin AJ, Bolon YT. Gene expression patterns are correlated with genomic and genic structure in soybean. Genome 2010; 54(1): 10-8.
Sharp PM, Averof M, Lloyd AT, et al. DNA sequence evolution: the sounds of silence. Philos T R Soc B 1995; 349: 241-7.
Duret L. Evolution of synonymous codon usage in metazoans. Curr Opin Genet Dev 2002; 12: 640-9.
Chamary J, Parmley JL, Hurst LD. Hearing silence: non-neutral evolution at synonymous sites in mammals. Nat Rev Genet 2006; 7: 98.
Hershberg R, Petrov DA. Selection on codon bias. Annu Rev Genet 2008; 42: 287-99.
Sharp PM, Emery LR, Zeng K. Forces that influence the evolution of codon bias. Philos T R Soc B 2010; 365: 1203-12.
Shen LX, Basilion JP, Stanton VP. Single-nucleotide polymorphisms can cause different structural folds of mRNA. Proc Natl Acad Sci USA 1999; 96: 7871-6.
Duan J, Wainwright MS, Comeron JM, et al. Synonymous mutations in the human dopamine receptor D2 (DRD2) affect mRNA stability and synthesis of the receptor. Hum Mol Genet 2003; 12: 205-16.
Capon F, Allen MH, Ameen M, et al. A synonymous SNP of the corneodesmosin gene leads to increased mRNA stability and demonstrates association with psoriasis across diverse ethnic groups. Hum Mol Genet 2004; 13: 2361-8.
Chamary J, Hurst LD. Evidence for selection on synonymous mutations affecting stability of mRNA secondary structure in mammals. Genome Biol 2005; 6: R75.
Shah P, Gilchrist MA. Explaining complex codon usage patterns with selection for translational efficiency, mutation bias, and genetic drift. Proc Natl Acad Sci USA 2011; 108: 10231-6.
Plotkin JB, Kudla G. Synonymous but not the same: the causes and consequences of codon bias. Nat Rev Genet 2011; 12: 32.
Duret L, Mouchiroud D. Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis. Proc Natl Acad Sci USA 1999; 96(8): 4482-7.
Behura SK, Severson DW. Codon usage bias: causative factors, quantification methods and genome-wide patterns: with emphasis on insect genomes. Biol Rev 2013; 88: 49-61.
Fares MA, Byrne KP, Wolfe KH. Rate asymmetry after genome duplication causes substantial long-branch attraction artifacts in the phylogeny of Saccharomyces species. Mol Biol Evol 2006; 23(2): 245-53.
Brunet FG, Crollius HR, Paris M, et al. Gene loss and evolutionary rates following whole-genome duplication in teleost fishes. Mol Biol Evol 2006; 23(9): 1808-16.

Rights & PermissionsPrintExport Cite as

Article Details

Year: 2019
Page: [581 - 590]
Pages: 10
DOI: 10.2174/1574893614666190227151013
Price: $58

Article Metrics

PDF: 9