“Prediction”, which is the main task in machine learning research, has now become popular in biomedicine and
bioinformatics recently, since Obama proposed the precision medicine project. Precision medicine would combine biomedicine
and bioinformatics together, and employ advanced machine learning techniques, especially big data analysis approaches.
Traditional medical big data would be calculated with different statistics testing methods, which are difficult to deal with multimodal
complex data and the features relationship.
Machine learning was considered as a “black box” in the past years. However, it works better than the simple statistic
testing methods. In recent years, deep learning and big data machine learning techniques developed fast and showed amazing
performance in internet application, understanding images, and auto speech recognition. We can conclude that precision
medicine would be the next novel application for the big data machine learning. However, we found that traditional classifiers,
such as SVM, random forest, still dominate in the recent biomedicine and bioinformatics works. Therefore, we plan to organize
the special issue to bridge the latest methods and the biological applications.
In this thematic issue, eight outstanding reviews have been presented from different countries and regions, including P.R.
China, USA, and Germany.
Jing et al. conducted a systematic review of protein inter-residue contacts prediction methods. As a low-dimensional
representation of protein tertiary structure, protein inter-residue contacts could help de novo protein structure prediction
methods to reduce the conformational search space. Various methods have been developed for prediction of protein interresidue
contacts over the past two decades, and those methods could be roughly classified into five categories: correlated
mutations methods, machine-learning methods, fusion methods, template-based methods and 3D model-based methods. In this
article, they reviewed three main methods for prediction of protein inter-residue contacts and analyzed constraints of each
category. They also compared several representative methods and discussed performances of these methods in detail [1].
Zhang et al. performed a comprehensive review of the recent developments of feature extraction methods for protein
sequences, including composition-based features, autocorrelation-based features, and profile-based features. Their concepts and
the main equations were introduced and discussed. Finally, some existing computational tools of generating these features were
introduced. This article is especially useful for the researchers who are interested in computationally studying the structures and
functions of proteins [2].
Liu et al. analyzed the role of bioinformatics in modernization of Traditional Chinese Medicine. Traditional Chinese
Medicine (TCM) has a wide range of medicine practices, which was developed in China and other East Asian countries for
over 2000 years. The aim of modernization of TCM is to utilize modern biological and medicinal knowledge to research the
treatment mechanism of herbal medicine. Combined with rich practice experience of TCM, it can efficiently accumulate
development of modern drug. During this process, the bioinformatics as a rising subject can aid to discover and analyze TCM.
Currently, bioinformatics has played an important role to understand biological data. In this article, they try to introduce and
discuss the applications of bioinformatics techniques in TCM for interested readers and look ahead the potential development
of bioinformatics in TCM [3].
Lei et al. conducted a survey on the prediction of essential proteins and genes, which discusses not only the current
developments of diverse computational methods, but also the challenge for future work. It is well known that the prediction of
essential proteins using computational methods is quite significant in the field of bioinformatics, such as the disease research
and drug design. The authors introduced the recent studies for predicting essential proteins and genes by analyzing different
kinds of computational methods including topology-based methods, integrated methods, machine learning methods and reliable
network methods. In addition, they showed the evaluation techniques of predictive performance and some available databases
of essential genes, as well as the future research directions. This review presents the systematic knowledge of the essential
proteins and genes prediction which can help the studies in the related fields [4].
Jia and Gong summarized the computational methods in linear B-cell epitope prediction. The determination of B-cell
epitopes is thus important for the synthesis of peptide vaccines in the preparation of diagnostic reagents, and the screening of
monoclonal antibodies. For identifying such epitopes, many computational predictors have been proposed in the past 10 years.
In this review, we summarized the representative computational approaches developed for the identification of linear B-cell
epitopes. They mainly discussed the datasets, feature extraction methods and classification methods used in the previous work.
Moreover, they also consider existing challenges and future perspectives for developing reliable methods for predicting linear
B-cell epitopes [5].
Yang et al. summarized the application of machine learning methods in protein sub-Golgi localization. The distribution of
proteins in cell correlates with their functions. Due to its important roles in protein storage, package and distribution, the
identification of protein sub-Golgi localization is very important for the in-depth understanding of proteins function. Because
transitional biochemical experiments are time-consuming and require expensive materials, machine learning method is used to
identify the protein location in Golgi apparatus. In the review, Yang et al. briefly introduced the recent progresses of protein
sub-Golgi apparatus localization prediction from the database construction, feature extraction and optimization, prediction
algorithms and webserver establishment. The advantages and disadvantages of these methods were discussed. They also
provided the prospect of protein sub-Golgi localization prediction using machine learning methods [6].
Zhang et al. analyzed the cross-kingdom regulation of exogenous plant miRNAs in humans. As a new signaling molecule,
exogenous miRNAs regulate and influence the physiological functions of inhalers, revealing the effects on human health and
metabolism at the gene molecular level. During the process, the combination of bioinformatics and computer methods can help
find and analyze the cross-kingdom regulation of exogenous plant miRNA in the human body. In this article, they made attempt
to introduce and discuss recent advances of exogenous plant miRNAs into humans and their cross-kingdom regulation and look
ahead the potential development of cross-kingdom regulation of exogenous plant miRNAs [7].
Qu et al. analyzed the methods of prediction the DNA-binding protein. DNA-binding protein is an important component of
prokaryotic and eukaryotic proteomes, which can participate in many life activities. Due to the importance of DNA-binding
protein, there are many researches on prediction DNA-binding, and many methods are mentioned. In the earlier years, the
biochemical methods are often used for identification of these proteins, but with the continuous development of machine
learning, many machine-learning methods are used to predict DNA-binding proteins. In this article, they analyzed the feature
representation methods and common classifiers, summarized the evaluation methods and existing websites, and compared the
results of different methods [8].
Each paper in this special issue was extensively peer-reviewed by more than two external reviewers. I would like to thank
all the authors for contributing their work to our hot thematic issue and all the reviewers for their time and efforts.