Background: The CpG islands (CGIs) are clusters of CpGs in CG-rich regions, which confer a
critical role in the regulation of transcription. Although multiple programs are developed for searching CGIs,
but all of them have drawbacks, such as low accuracy or long running time.
Objective: The aim of this study was to develop a new CGIs search tool, namely CpGIScan (CpG Islands
Scan), which improves upon previous programs.
Method: In this work, a CpG island is defined by three types of parameters: the window length, the guanine and
cytosine (G + C) frequency, and the ratio of the observed over the expected CpGs (CpG o/e). The algorithm in
CpGIScan is based on the sliding window method. To reduce the time required to identify CGIs, multithread
technology is employed in our program. CpGIScan was compared to existing widely used tools to benchmark
Results: Evaluations on a set of test sequences show that CpGIScan has high sensitivity and specificity. In
addition, CpGIScan is at least 4 times faster than existing tools. It has a large performance advantage over
previous tools when searching CpG islands from the bulk genomes. CpGIScan is written in C++ and provided
under the GNU CPL license. It is freely available at https://github.com/jianzuoyi/CpGIScan.
Conclusion: CpGIScan was specifically developed for ultrafast identifying CGIs in large sequences sets. It
takes the advantages of previous tools and significantly improves the computational efficiency. CpGIScan will
be of value to researchers for generating an initial genome-wide map of CpG islands.