Analysis of Single-Cell RNA-seq Data by Clustering Approaches

Author(s): Xiaoshu Zhu, Hong-Dong Li, Lilu Guo, Fang-Xiang Wu, Jianxin Wang*

Journal Name: Current Bioinformatics

Volume 14 , Issue 4 , 2019

Become EABM
Become Reviewer
Call for Editor

Graphical Abstract:


Background: The recently developed single-cell RNA sequencing (scRNA-seq) has attracted a great amount of attention due to its capability to interrogate expression of individual cells, which is superior to traditional bulk cell sequencing that can only measure mean gene expression of a population of cells. scRNA-seq has been successfully applied in finding new cell subtypes. New computational challenges exist in the analysis of scRNA-seq data.

Objective: We provide an overview of the features of different similarity calculation and clustering methods, in order to facilitate users to select methods that are suitable for their scRNA-seq. We would also like to show that feature selection methods are important to improve clustering performance.

Results: We first described similarity measurement methods, followed by reviewing some new clustering methods, as well as their algorithmic details. This analysis revealed several new questions, including how to automatically estimate the number of clustering categories, how to discover novel subpopulation, and how to search for new marker genes by using feature selection methods.

Conclusion: Without prior knowledge about the number of cell types, clustering or semisupervised learning methods are important tools for exploratory analysis of scRNA-seq data.

Keywords: Single-cell sequencing technology, single-cell RNA-seq data, similarity measurement, clustering of cell types, cluster method, feature selection.

Buganim Y, Faddah DA, Cheng AW, et al. Single-cell expression analyses during cellular reprogramming reveal an early stochastic and a late hierarchic phase. Cell 2012; 150(6): 1209-22.
Ong TH, Kissick DJ, Jansson ET, et al. Classification of Large Cellular Populations and Discovery of Rare Cells Using Single Cell Matrix-Assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry. Anal Chem 2015; 87(14): 7036-42.
Grün D, Lyubimova A, Kester L, et al. Single-cell messenger RNA sequencing reveals rare intestinal cell types. Nature 2015; 525(7568): 251-5.
Heath JR, Ribas A, Mischel PS. Single-cell analysis tools for drug discovery and development. Nat Rev Drug Discov 2016; 15(3): 204-16.
Van Loo P, Voet T. Single cell analysis of cancer genomes. Curr Opin Genet Dev 2014; 24: 82-91.
Shalek AK, Satija R, Adiconis X, et al. Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells. Nature 2013; 498(7453): 236-40.
Iourov IY, Vorsanova SG, Yurov YB. Single cell genomics of the brain: focus on neuronal diversity and neuropsychiatric diseases. Curr Genomics 2012; 13(6): 477-88.
Deng X, Naccache SN, Ng T, et al. An ensemble strategy that significantly improves de novo assembly of microbial genomes from metagenomic next-generation sequencing data. Nucleic Acids Res 2015; 43(7): e46-6.
Gawad C, Koh W, Quake SR. Single-cell genome sequencing: current state of the science. Nat Rev Genet 2016; 17(3): 175-88.
Taghavi Z, Movahedi NS, Drǎghici S, Chitsaz H. Distilled single-cell genome sequencing and de novo assembly for sparse microbial communities. Bioinformatics 2013; 29(19): 2395-401.
Diaz A, Liu SJ, Sandoval C, et al. SCell: integrated analysis of single-cell RNA-seq data. Bioinformatics 2016; 32(14): 2219-20.
Wen Y, Wei Y, Zhang S, et al. Cell subpopulation deconvolution reveals breast cancer heterogeneity based on DNA methylation signature. Brief Bioinform 2017; 18(3): 426-40.
Chen H, Guo J, Mishra SK, Robson P, Niranjan M, Zheng J. Single-cell transcriptional analysis to uncover regulatory circuits driving cell fate decisions in early mouse development. Bioinformatics 2015; 31(7): 1060-6.
Vu TN, Wills QF, Kalari KR, et al. Beta-Poisson model for single-cell RNA-seq data analyses. Bioinformatics 2016; 32(14): 2128-35.
Ji Z, Ji H. TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis. Nucleic Acids Res 2016; 44(13): e117-7.
Woodcock DJ, Vance KW, Komorowski M, Koentges G, Finkenstädt B, Rand DA. A hierarchical model of transcriptional dynamics allows robust estimation of transcription rates in populations of single cells with variable gene copy number. Bioinformatics 2013; 29(12): 1519-25.
Hou Y, Fan W, Yan L, et al. Genome analyses of single human oocytes. Cell 2013; 155(7): 1492-506.
Bendall SC, Davis KL, Amir AD, et al. Single-cell trajectory detection uncovers progression and regulatory coordination in human B cell development. Cell 2014; 157(3): 714-25.
Yan L, Yang M, Guo H, et al. Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells. Nat Struct Mol Biol 2013; 20(9): 1131-9.
Liu F, Ren C, Li H, Zhou P, Bo X, Shu W. De novo identification of replication-timing domains in the human genome by deep learning. Bioinformatics 2016; 32(5): 641-9.
Li D, Liu CM, Luo R, Sadakane K, Lam TW. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 2015; 31(10): 1674-6.
Prjibelski AD, Vasilinetc I, Bankevich A, et al. ExSPAnder: a universal repeat resolver for DNA fragment assembly. Bioinformatics 2014; 30(12): i293-301.
Wang Y, Navin NE. Advances and applications of single-cell sequencing technologies. Mol Cell 2015; 58(4): 598-609.
Haghverdi L, Buettner F, Theis FJ. Diffusion maps for high-dimensional single-cell analysis of differentiation data. Bioinformatics 2015; 31(18): 2989-98.
Trapnell C. Defining cell types and states with single-cell genomics. Genome Res 2015; 25(10): 1491-8.
Buenrostro JD, Wu B, Litzenburger UM, et al. Single-cell chromatin accessibility reveals principles of regulatory variation. Nature 2015; 523(7561): 486-90.
Trapnell C, Cacchiarelli D, Grimsby J, et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol 2014; 32(4): 381-6.
Lawlor N, George J, Bolisetty M, et al. Single-cell transcriptomes identify human islet cell signatures and reveal cell-type-specific expression changes in type 2 diabetes. Genome Res 2017; 27(2): 208-22.
Zhang X, Marjani SL, Hu Z, Weissman SM, Pan X, Wu S. Single-cell sequencing for precise cancer research: progress and prospects. Cancer Res 2016; 76(6): 1305-12.
Wang Y, Waters J, Leung ML, et al. Clonal evolution in breast cancer revealed by single nucleus genome sequencing. Nature 2014; 512(7513): 155-60.
Patel AP, Tirosh I, Trombetta JJ, et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 2014; 344(6190): 1396-401.
Shekhar K, Brodin P, Davis MM, Chakraborty AK. Automatic classification of cellular expression by nonlinear stochastic embedding (ACCENSE). Proc Natl Acad Sci USA 2014; 111(1): 202-7.
Shalek AK, Satija R, Shuga J, et al. Single-cell RNA-seq reveals dynamic paracrine control of cellular variation. Nature 2014; 510(7505): 363-9.
Llorens-Bobadilla E, Zhao S, Baser A, Saiz-Castro G, Zwadlo K, Martin-Villalba A. Single-cell transcriptomics reveals a population of dormant neural stem cells that become activated upon brain injury. Cell Stem Cell 2015; 17(3): 329-40.
Shekhar K, Lapan SW, Whitney IE, et al. Comprehensive classification of retinal bipolar neurons by single-cell transcriptomics. Cell 2016; 166(5): 1308-23.
Shin J, Berg DA, Zhu Y, et al. Single-cell RNA-seq with waterfall reveals molecular cascades underlying adult neurogenesis. Cell Stem Cell 2015; 17(3): 360-72.
Darmanis S, Sloan SA, Zhang Y, et al. A survey of human brain transcriptome diversity at the single cell level. Proc Natl Acad Sci USA 2015; 112(23): 7285-90.
Lee HC, Kosoy R, Becker CE, Dudley JT, Kidd BA. Automated cell type discovery and classification through knowledge transfer. Bioinformatics 2017; 33(11): 1689-95.
Wang B, Zhu J, Pierson E, Ramazzotti D, Batzoglou S. Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning. Nat Methods 2017; 14(4): 414-6.
Schiffman C, Lin C, Shi F, Chen L, Sohn L, Huang H. SIDEseq: a cell similarity measure defined by shared identified differentially expressed genes for single-cell RNA sequencing data. Stat Biosci 2017; 9(1): 200-16.
Jiang L, Chen H, Pinello L, Yuan GC. GiniClust: detecting rare cell types from single-cell gene expression data with Gini index. Genome Biol 2016; 17(1): 144-56.
Calzolari F, Michel J, Baumgart EV, Theis F, Götz M, Ninkovic J. Fast clonal expansion and limited neural stem cell self-renewal in the adult subependymal zone. Nat Neurosci 2015; 18(4): 490-2.
Wu AR, Neff NF, Kalisky T, et al. Quantitative assessment of single-cell RNA-sequencing methods. Nat Methods 2014; 11(1): 41-6.
Rosvall M, Bergstrom CT. Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci USA 2008; 105(4): 1118-23.
Macosko EZ, Basu A, Satija R, et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 2015; 161(5): 1202-14.
Bonaguidi MA, Wheeler MA, Shapiro JS, et al. In vivo clonal analysis reveals self-renewing and multipotent adult neural stem cell characteristics. Cell 2011; 145(7): 1142-55.
Li B, Dewey CN. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics 2011; 12(1): 323.
Trapnell C, Williams BA, Pertea G, et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 2010; 28(5): 511-5.
Xu C, Su Z. Identification of cell types from single-cell transcriptomes using a novel clustering method. Bioinformatics 2015; 31(12): 1974-80.
Shao C, Höfer T. Robust classification of single-cell transcriptome data by nonnegative matrix factorization. Bioinformatics 2017; 33(2): 235-42.
Arvaniti E, Claassen M. Sensitive detection of rare disease-associated cell subsets via representation learning. Nat Commun 2017; 8: 14825.
Peng T, Nie Q. SOMSC: Self-Organization-Map for High-Dimensional Single-Cell Data of Cellular States and Their Transitions. bioRxiv 2017; 2017: 124693.
Goldberg AD, Allis CD, Bernstein E. Epigenetics: a landscape takes shape. Cell 2007; 128(4): 635-8.
Gerber T, Willscher E, Loeffler-Wirth H, et al. Mapping heterogeneity in patient-derived melanoma cultures by single-cell RNA-seq. Oncotarget 2017; 8(1): 846-62.
Kiselev VY, Kirschner K, Schaub MT, et al. SC3: consensus clustering of single-cell RNA-seq data. Nat Methods 2017; 14(5): 483-6.
Guo M, Wang H, Potter SS, Whitsett JA, Xu Y. SINCERA: a pipeline for single-cell RNA-Seq profiling analysis. PLOS Comput Biol 2015; 11(11): e1004575.
Satija R, Farrell JA, Gennert D, Schier AF, Regev A. Spatial reconstruction of single-cell gene expression data. Nat Biotechnol 2015; 33(5): 495-502.
Chung NC, Storey JD. Statistical significance of variables driving systematic variation in high-dimensional data. Bioinformatics 2015; 31(4): 545-54.
Fan J, Salathia N, Liu R, et al. Characterizing transcriptional heterogeneity through pathway and gene set overdispersion analysis. Nat Methods 2016; 13(3): 241-4.
Li A, Yin X, Pan Y. Three-dimensional gene map of cancer cell types: Structural entropy minimisation principle for defining tumour subtypes. Sci Rep 2016; 6: 20412.
Aibar S, González-Blas CB, Moerman T, et al. SCENIC: single-cell regulatory network inference and clustering. Nat Methods 2017; 14(11): 1083-6.
Chen X, Li M, et al. A novel method of gene regulatory network structure inference from gene knock-out expression data. Tsinghua Sci Technol 2018; 24(2): 446-55.
Zheng R, Li M, Chen X, Wu FX, Pan Y, Wang J. BiXGBoost: a scalable, flexible boosting based method for reconstructing gene regulatory networks. Bioinformatics 2018.
Li M, Zheng R, Li Y, et al. MGT-SM: A Method for constructing cellular signal transduction networks. IEEE/ACM Trans Comp 2017.
Li M, Meng X, Zheng R, et al. Identification of protein complexes by using a spatial and temporal active protein interaction network. IEEE/ACM Trans Comput Biol 2017.
[ 2749571.]
Szklarczyk D, Morris JH, Cook H, et al. The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Res 2017; 45(D1): D362-8.
Gao J, Song B, Hu X, Yan F, Wang J. ConnectedAlign: a PPI network alignment method for identifying conserved protein complexes across multiple species. BMC Bioinformatics 2018; 19(Suppl. 9): 286.
Xu YX, Li HD, et al. BioRank: a similarity assessment method for single cell clustering. IEEE International Conference on Bioinformatics and Biomedicine 2018.

Rights & PermissionsPrintExport Cite as

Article Details

Year: 2019
Published on: 19 November, 2018
Page: [314 - 322]
Pages: 9
DOI: 10.2174/1574893614666181120095038
Price: $65

Article Metrics

PDF: 79
PRC: 1