Essential genes often play key roles in biological processes and mutations in these genes will have a great
impact on an organism’s survival and reproduction. Studying lethal phenotypes will provide important information about
the function of the gene product and direct gene therapy. Traditionally, essential genes have been identified through
single-gene knockout experiments, transposon mutagenesis, or antisense RNA inhibitions. However, experimental
methods are expensive, labor-intensive, and time-consuming. In addition, such experiments are not always possible as the
vast majority of microorganisms are unculturable. Computational methods for genome-scale essential gene prediction,
aided bythe explosion of genome-scale data provided by high-throughput technologies in recent years, provide an
alternative way to study essential genes. Constraint-based modeling and machine learning technology have been used in
this area and achieved promising results. Information such as protein sequence, network topology, gene expression data
and other features have been used to predict essential genes. In this article, we will review recent bioinformatics
progresses in the prediction of gene essentiality, including databases, computational methods, the most commonly used
features, machine learning classifier comparisons, and feature selection. Finally, we will discuss the challenges and future
directions of the field.
Computational modeling, essential genes, feature selection, flux balance analysis, machine learning, microbial, prediction.
Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 7024, Cincinnati, OH 45229, USA.