ISSN (Print): 1574-8936
ISSN (Online): 2212-392X
Volume 16, 10 Issues, 2021
Download PDF Flyer
Open Access Funding
Promote Your Article
ISSN (Print): 1574-8936
ISSN (Online): 2212-392X
Aims & Scope
Science Citation Index Expanded™ (SciSearch®), Journal Citation Reports/Science Edition, InCites, Biological Abstracts, BIOSIS, BIOSIS Previews, Scopus, EMBASE, Chemical Abstracts Service/SciFinder, Cambridge Scientific Abstracts (CSA)/ProQuest, Scilit, ChemWeb, Google Scholar, EMBiology, MediaFinder®-Standard Periodical Directory, Genamics JournalSeek, PubsHub, Index Copernicus, J-Gate, CNKI Scholar, Suweco CZ, TOC Premier, EBSCO, Ulrich's Periodicals Directory, JournalTOCs, QOAM and Dimensions.
Ranking and Category:
Submit Abstracts / Manuscripts Online
Animated Abstract Submission
View Full Editorial Board
5 - Year: 1.1.98
Self Archiving Policies
Instructions for Authors
Free Copies Online
Open Access Articles
Most Cited Articles
Advertise With Us
Most Accessed Articles
Most Popular Articles
Special Issue Submission
In-Depth Exploration of miRNA: A New Approach to Study miRNA at the miRNA/isomiR Levels, 2014 : 9 5; 522- 530
Li Guo , Hui Zhang, Yang Zhao, Sheng Yang and Feng Chen
View Abstract Rights and Permissions
Bioinformatics in Educational Research
Guest Editor(s): Jun Shen, Wenli Chen, Yan Dong, Xuesong Zhai
Submit Abstract via Email
I am very much satisfied with the review process and the communication made by the team for my manuscript. I have struggled a lot to prepare the pictures as per the requirements. But your team has patiently given feedback about the corrections and finally agreed to approve. Really, i am very much thankful to that.
Dr. R. R. Rajalaxmia (Department of CSE, Kongu Engineering College, Perundurai, Erode, TamilNadu, India)
11 Abstract Ahead of Print are available electronically
40 Articles Ahead of Print are available electronically
With the development and application of high throughput technologies on biomedicine and biopharmacy, huge information in these fields
has been created. Lots of public and commercial databases have been set up to store this information and provide services, such as GEO,
TCGA, KEGG, DrugBank, etc. It is a great challenging problem to make full use of this information for investigating specific biomedicine or
biopharmacy problems. This thematic issue collected seven computational studies, which adopted computational methods to research different
There are two review articles in this thematic issue. Chen et al.  mainly focused on the most popular statistical machine learning methods
for cancer diagnosis and gene selection. From the perspective of support vector machine and sparse regression, a dozen of machine learning
methods for binary cancer diagnosis have been summarized. Both the binary classifiers ensemble methods and the global optimal model
methods are reviewed for multiple cancer diagnosis. Wang et al.  reviewed the basic stochastic neighbor embedding (SNE) algorithm and
its variants including T-distributed stochastic neighbor embedding (T-SNE). They described its application in visualizing molecular biological
data, including single-cell sequencing data, single nucleotide polymorphisms, and mass spectrometry imaging data. They also demonstrated
the superiority of the T-SNE over principal component analysis (PCA), isometric feature mapping (Isomap) and Locally Linear Embedding
(LLE), and influence of the parameters on the quality of visualization.
Besides, five research articles were also collected in the thematic issue. Four studies used computational methods to investigate various
disorders. The incidence and mortality of colorectal cancer are increasing day by day, and colorectal adenomas account for most of the precancerous
diseases of colorectal cancer. In the paper authored by Gao et al. , feature selection is performed on the collected data, and a
colorectal adenoma prediction model based on gradient boosting tree is established using the essential features extracted by the feature selection
procedure, with an accuracy rate of 0.821. This model can achieve the purpose of screening high-risk patients with colorectal adenoma.
Shen et al.  investigated the mechanisms of long non-coding RNAs (lncRNAs) underlying atrial fibrillation (AF) pathogenesis. From two
public datasets, they identified differentially expressed lncRNAs and mRNAs in the progression of AF. Several bioinformatics analyses were
adopted to explore functions of extracted lncRNAs in AF. Akter et al.  identified the aggregation prone regions (APRs) of different immunogens
of antibody sequences raised against Vibrio cholerae. Several bioinformatics algorithms were used to analyze probable APRs in 94
protein sequences. A number of regions in the monoclonal antibodies were identified to be APRs. He et al.  explored the early classification
of ovarian cancer by the mass spectrometry (MS) data set. They proposed an MS data analysis method based on star-like graph of protein
and support vector machine to classify the MS data. The model combined with the SELDI-TOF-MS technology had a prospect in early clinical
detection and diagnosis of cancer. The last study was about biopharmacy. Peng et al.  proposed a novel computational method for identification
of carcinogenic chemicals. To extract informative features of chemicals, some chemical networks were constructed and the powerful
network embedding algorithm, Mashup, was applied to these networks. The deep learning algorithm, recurrent neural network (RNN),
was adopted to build the prediction model. The model provided near perfect performance.
The studies in this thematic issue deepen our understanding on some biomedicine or biopharmacy problems. However, designing efficient
computational methods are still on the way. It is hopeful that the launch of this thematic issue may attracts more investigators for studying
related problems. Finally, I would like to express my thanks to all authors for contributing their work to this thematic issue and all anonymous
reviewers for devoting their valuable time to evaluate these papers.
With the development of high-throughput sequencing technology, large biological data sets are generated. How to effectively
excavate the hidden meaning behind big data is very important to further understand the relationship between molecular
biology and disease. In recent years, artificial intelligence technology has been widely studied and applied in bioinformatics
and biomedicine, which has promoted the rapid development of these fields. This thematic series aims to bring together latest
advances in artificial intelligence methods in bioinformatics and biomedicine.
Ali Ghulam et al.  described the necessary information related to pathway mechanisms, pathway characteristics and
pathway databases feature annotations. Various difficulties related to data storage and data retrieval in biological pathway databases
were discussed. These focus on different techniques for retrieving annotations, features, and methods of digital pathway
databases for biological pathway analysis.
The main factors that drive proteins to form 3D structures are constrained between residues. A highly accurate prediction of
inter-residue contacts and distance information is of great significance for protein tertiary structure computations. He Huang et
al.  summarized some recent algorithms and show that they have obtained good results. Compared to contact map prediction,
distance map prediction is in its infancy. There is a lot to do in the future, including improving distance map prediction precision
and incorporating them into residue-residue distance-guided ab initio protein folding.
Liu Zhiping  provided a review of predicting IncRNA-protein interaction by machine learning. Firstly, a computational
framework for predicting lncRNA-protein interactions was presented. Then, the author listed the currently available data resources
for the predictions. Furthermore, the crucial steps of the existing methods were reviewed in the prediction framework.
Finally, some important directions in bioinformatics for identifying essential lncRNA-protein interactions and deciphering the
dysfunctional importance of lncRNA were discussed.
Different cell-constitutions of a sample could differentiate the expression profile and set considerable biases for downstream
research. Matrix Factorization (MF) has contributed massively to deconvoluting genetic profiles at the expression level. Yuan
Liu et al.  reviewed the usages of MF methods on heterogeneous problems of genetic data on expression level. Specifically,
the manuscript presents separately into three sections: application scenarios, method categories and summarization for tools.
Based on the investigation, this study presents a relatively global picture to assist researchers in achieving quicker access of
deconvoluting genetic data in silico, further to help researchers in selecting suitable MF methods based on the different scenarios.
Although the spike-timing based neuronal codes have significant computational advantages over rate encoding scheme, the
exact spike timing-based learning mechanism in the brain remains an open question. Recently, many learning algorithms have
been proposed to consider both synaptic weight plasticity and synaptic delay plasticity. Yawen Lan et al.  gave an overview
of the existing synaptic delay-based learning algorithms in spiking neural networks, and described the typical learning algorithms
and report the experimental results. Furthermore, they discussed the properties and limitations of each algorithm and
made a comparison between them.
The segmentation of multiple abdominal organs of the human body from images with different modalities is challenging
because of the inter-subject variance among abdomens, as well as the complex intra-subject variance among organs. Qiang Li et
al.  reviewed the recent methods for Abdominal Multi-Organ Segmentation (AMOS) on medical images. The AMOS methods
can be categorized into traditional and deep learning-based methods. First, various approaches and related problems under
both segmentation categories are explained. Second, the advantages and disadvantages of these methods are discussed. Finally,
AMOS remains an unresolved problem, and the combination of different methods can achieve improved segmentation performance.
The immunogenicity of T cell epitopes depends on their source and stability in combination with Major Histocompatibility
Complex (MHC) molecules. Identification of epitopes is of great significance in the research of vaccine design and T cell immune
response. Yang Liu et al.  introduced some recent methods for identifying peptide-MHC binding emerge to pre-select
candidate peptides for experimental testing. Some of them employ matrix models or machine learning models based on the sequence
characteristic embedded in peptides or MHC to predict the binding ability of peptides to MHC. Some others utilize the
three-dimensional structural information of peptides or MHC.
High-density Genetic Linkage Map Construction in Sunflower (Helianthus annuus L.) use SNP and SSR Markers. Pin Lyu
et al.  proposed the construction of a high-density genetic linkage map by the F7 population of sunflowers using SNP and
SSR Markers. The SLAF-seq strategy was employed to further develop SNP markers with SSR markers to construct the highdensity
genetic map by the HighMap software. The final result demonstrated that the SLAF-seq strategy is suitable for SNP
With the rapid growth of biological information, biological science technology has greatly enriched the biology and medicine
data resources. The latest advantages of deep learning have achieved the state-of-the-art performance on high dimensional,
non-structural and less explanatory biological data. Yongqing Zhang et al.  introduced an overview of deep learning techniques
and some of the state-of-art applications in biology and medicine. Specifically, the fundamental of deep learning methods
was introduced, and their successes in bioinformatics, biomedical image, biomedicine and drug discovery. Furthermore, the
challenges, limitations and further improvement of this area were also discussed.
With the accumulation of biological and medical data, using traditional biochemical method to identify and analyze these
samples becomes difficult because of its expensive experimental materials and long experimental period. To overcome the
disadvantage, computational methods are a good choice. Thus, it is more and more popular to develop various artificial
intelligence methods to discriminate molecular types, predict molecular function, identify drug targets and predict disease
types. This thematic issue has collected nine computational works that used computational techniques in biological and medical
Protein function prediction is a hot topic in bioinformatics. Due to the increase of sequencing data, more and more proteins
need functional annotation. Four works in this thematic issue developed machine learning based methods to predict protein
functions. Liang and Zhang  focused on the prediction of apoptosis protein subcellular localization because apoptotic
proteins are involved in many biological processes. They designed a computational model to predict the apoptosis protein
subcellular localization. In their model, evolutionary information and nonnegative matrix factorization were used to formulate
protein samples. Support Vector Machine (SVM) is the classification algorithm. They demonstrated the performance of their
model on three published datasets. Their model will provide a guide for apoptosis protein analysis. Yang et al.  designed an
artificially intelligent model to identify the cancerlectins. Cancerlectins are a group of lectins proteins which have an important
function in cancer initiation, growth and spread. Correctly identifying cancerlectin could provide important clues for cancer
therapy. In this work, they used sequence information to describe cancerlectins and used Analysis of Variance (ANOVA) to
reduce feature dimension. Good prediction accuracy was obtained in cross-validation test. The third work  described a
computational strategy to improve gram-negative bacterial secreted protein prediction. The model reported in this paper used
position specific scoring matrix as features. SVM was used to perform classification. Their model could produce very high
accuracy. Ning et al.  found some peptides have the potential ability to bind to MHC-I and MHC-II molecules both in
PbHRH and Romiplostim as the potential epitopes using computational strategy.
RNA subcellular localization has attracted more and more attention of the researchers. Two works in the thematic issue
focused on this topic. Yang et al.  used unbalanced pseudo k nucleotide compositions to predict LncRNA subcellular
localization. ANOVA was applied to exclude noise and redundant information to improve the accuracy of the proposed model.
Finally, their model could produce an overall accuracy of 90.37% which is higher than that of other published methods. Wu et
al.  also developed a sequence-based method to classify the ncRNA subcellular localization. In their model, K-nearest
neighbor classifier was the classifier. RNA secondary structure information was also used in their model. Very encouraging
results were obtained.
Enhancer is an important regulatory element for gene transcription and expression. Zhang et al.  developed an
information gain-based method to identify enhancers. Several kinds of features, including sequence feature, transcriptional
feature and epigenetic feature were utilized to formulate enhancer samples. Information entropy change was used in enhancer
recognition. They demonstrated that their model could effectively identify enhancers.
Gene regulation could influence cancer. Wang et al.  focused on the transcription factors expression analysis in cancer
cells. They found that up-regulated genes in cancer are related to the immune system of normal issue. Their studies provided a
new clue to reveal gene expression in cancer cells.
DNA methylation is an important epigenetic marker and plays an important role in various biological processes. Zuo et al.
 computational studies analyzed the DNA methylation in cell reprogramming. Their studies provide new insight into
understanding the molecular mechanism of cell reprogramming.
In summary, the coming of the era of big biological data and artificial intelligence techniques provides us with the
opportunity to further clarify the biological mechanism. Bioinformatics is also an indispensable subject in the field of biological
With the exploration of big omics data in plants, we are faced with the problem of how to interpret and exploit the rapidly
accumulating sets of publicly available omics data. The development and application of computational algorithms, databases
and tools are desirable for the efficient processing, management and visualization of large-scale omics data. However,
analyzing such big data, deriving biological knowledge, and applying it for predictions and further experimentation back
become a challenging tasks. Bioinformatics analyses are important approaches to comprehensively understand how a plant
system works by exploiting computational methods to integrate multiple levels of omics datasets. The main objective of this
special issue is to provide a forum for researchers to present latest advances and state-of-the-art techniques, tools and
applications in analyzing plant omics data.
In this issue, a board scope on plant bioinformatics is covered, from genetics, comparative genomics, transcriptomics, gene
regulatory networks, signaling pathways, proteomics, non-coding RNAs to high-throughput sequencing analyses and
The Single-Molecule Real-Time (SMRT) Isoform Sequencing (Iso-Seq) has paved the way to obtain longer full-length
transcripts, with its significance in identifying full-length splice variants and other post-transcriptional events as compared to
RNA-Seq. Yubang Gao et al. summarized the existing Iso-Seq analysis tools and presented an integrated bioinformatics
pipeline for Iso-Seq analysis, which overcomes the limitations of NGS and generates long contiguous FLNC reads for analysis
of post-transcriptional events. Visualization approach of Iso-Seq, and combination of Iso-Seq data with RNA-Seq for
transcriptome quantification were also discussed .
Pan Wei et al. evaluated the genetic variation is B. germanica based on two mitochondrial genes. Phylogenetic analysis
indicated that the B. germanica isolates from central China should be classified as a single population. Demographic analysis
rejected the hypothesis of a sudden population expansion of the B. germanica population. These findings indicate that the 36
isolates of B. germanica sampled in the study have high genetic variation, which provides useful knowledge for monitoring
changes in parasite populations for future control strategies .
Whole genome-wide scans are useful to study positively selected genes (PSGs) to understand the dynamics of genome
evolution and the genetic basis of differences between species. Yue Guo et al. investigated the difference of PSGs between two
species of cotton. They suggested that the PSGs evolved at a higher rate of synonymous substitutions (Ks), but were subjected
to lower selection pressure (Ka/Ks). These findings indicate that PSGs in G. arboreum and G. raimondii differ not only in
Ka/Ks, but also in their evolutionary, structural, and expression properties, indicating that the divergence of G. arboreum and
G. raimondii were associated with differences in PSGs in terms of evolutionary rates, gene compactness, expression patterns,
and WGD retention in Gossypium .
Aravind Kumar K et al. identified the differentially expressed WRKY TFs in chickpea in response to herbicide stress and
deciphered their interacting partners. Comparison of the differentially expressed TFs, construction of co-functional gene
networks, and functional enrichment analysis were conducted. Systems biology approaches reveal a multi-stress responsive
WRKY transcription factor and stress associated gene co-expression networks in chickpea .
In plant, interactions of various phytohormones play important roles in response to various stresses. Maryam Mortezaeefar
et al. investigated the effects of the crosstalk among the hormone signalling pathways in plants. The Weighted Gene Coexpression
Network Analysis (WGCNA) method was used to define modules containing genes with highly correlated
expression patterns in response to abscisic acid (ABA), jasmonic acid (JA), and salicylic acid (SA) in Arabidopsis. Results
indicate that plants create a main change of expression profile and control diverse cell functions, including response to
environmental stresses and external factors, cell cycle, and antioxidant activity .
Ubiquitination plays a significant role in the regulations of many biological processes, such as cell division, signal
transduction, apoptosis and immune response. Identification of ubiquitination sites is the crucial first step to investigate the
molecular mechanisms of ubiquitination-related biological processes. Jiajing Chen et al. introduced an ubiquitination site
prediction method based on support vector machine (SVM) for Arabidopsis thaliana. The combination of AAC and CKSAAP
encoding schemes yielded the best performance with the accuracy and AUC of 81.35% and 0.868 in the independent test .
Long noncoding RNAs (lncRNAs), arbitrarily longer than 200 nucleotides, play critical roles in diverse biological
processes. Youhuang Bai et al. introduced a database of lncRNA sequences and annotation in plants. PlncRNADB provides a
pipeline to quickly distinguish potential noncoding RNAs from protein coding transcripts. The database provides the
relationship between lncRNAs and various RNA-binding proteins (RBPs), which can be displayed through a user-friendly web interface. PlncRNADB can serve as a reference database to investigate the lncRNAs and their interaction with RNA-binding
proteins in plants .
In plant phenomics, one of the key technologies is phenotype image analysis. Bizhi Wu et al. proposed a deep convolutional
autoencoder architecture to segment the biological phenotypic images and developed a phenotype retrieval system to enable a
better understanding of genotype–phenotype correlation. Their work focused on the identification of similar query phenotypic
images by searching the biological phenotype database, including information about loss-of-function and gain-of-function .
We hope that the readership of Current Bioinformatics would find this collection of papers interesting and useful.
The increasing role of computer modeling, bioinformatics and operation research in medicine and biology has been
remarkable over the last decades or so, owing to the rapid increase in computer processing speed and a greater demand of
realistic models capable of predicting actual biological or medical phenomena and optimizing discovery through big data
processing. Clearly, naturally occurring biological or physiological processes are complicated and often impossible to be
sufficiently simulated or automated ex vivo using 1D or 2D idealized models, let alone its performance optimization. The trend
is now moving towards using sophisticated patient-specific real time 3D models; some may even incorporate complex
biomaterials or design structures in an effort to produce accurate surrogates to represent biological or medical processes in the
laboratory. This drives a greater interest in the optimization of biomaterials, design structures, or biometric techniques with an
ultimate aim of repairing or replacement of malfunctioned anatomical structures or translating laboratory research into real life
applications. For most of such given problems, multi-objective optimization and data mining can provide the optimal solution
using an integrated informatics platform. Moreover, machine learning such as artificial neural network, deep learning,
evolutionary algorithm, and genetic algorithm are some of the other well-established computing techniques we can explore for
solutions generating solutions.
In the medical industry, using these algorithms can help biomedical engineers find the potential factors which affect the
process design more accurately and improve bio-device performance. It can also identify trends that bridge the gaps among the
fragments of seemingly unrelated information. In addition, the process of information management also promotes the
development of bioinformatics, including membrane computing, gene expressions, genetic computing, etc for the development
of new products in the medical industry. These new technologies can offer much higher quality and personalized services to
This collection of papers in this issue is dedicated to report cutting edge theoretical, experimental, biological, or clinical
investigations using advanced biomaterials, design structures, bioinformatics platform, or biometric techniques. We aim at
elucidating the complex designs of life through laboratory investigations or computational modeling. We invite submissions
from all fields in biomedical engineering, applied biology or medical sciences with a focus on the following: 1) Optimization of
flow or structural motions that occur in nature for biometrics and/or bioinformatics studies. 2) Development or optimization of
biomaterials that mimic physiological structures, which ultimately aim at accurate understanding of disease processes and in
vivo dynamic interactions using a big data processing platform. 3) Novel design or experimental techniques based on multiobjective
optimization which allow better understanding of the physiology processes in humans and animals. 4) Understanding
physiological processes and incorporating optimization algorithms into the design of devices for superior performance. 5)
Evolutionary algorithm or artificial neural network models that help predict disease initiation, progression or end-point to guide
and optimize management through informatics. 6) Translation of advanced computational bioinformatics methods into real life
The first paper “Bioinformatics study on serum triglyceride levels for analysis of a potential risk factor affecting blood
pressure variability” provided by Lin Xu et al.  establishes whether triglycerides (TGs) are related to blood pressure (BP)
variability and whether controlling TG levels leads to better BP variability management and prevents cardiovascular disease
(CVD). In this study, the authors enrolled 106 hypertensive patients and 80 non-hypertensive patients. Pearson correlation and
partial correlation analyses were used to define the relationships between TG levels and BP variability in all the subjects.
Patients with hypertension were divided into two subgroups according to TG level: Group A (TG<1.7 mmol/L) and Group B
(TG>=1.7 mmol/L). The heterogeneity between the two subgroups was compared using t tests and covariance analysis.
The second paper was provided by Gang Xu et al. entitled “Bioinformatics study of RNA Interference on the Effect of HIF-
1α on Apelin Expression in Nasopharyngeal Carcinoma Cells” . This paper investigates apelin expression in nasopharyngeal
carcinoma CNE-2 cells and its regulation by hypoxia inducible factor-1α (HIF-1α) under hypoxic conditions. CoCl2 was used
to induce hypoxia in CNE-2 cells for 12h, 24h and 48h. HIF-1α small interference RNA (siRNA) was transfected into CNE-2
cells using a transient transfection method. HIF-1α and apelin mRNA levels were detected by real time PCR. Western blot was
used to measure HIF-1α protein expression. The concentration of apelin in cell culture supernatant was determined by the
enzyme linked immunosorbent assay (ELISA). HIF-1α and apelin mRNA levels and protein expression in CNE-2 cells
increased gradually with an increased duration of hypoxic exposure and significantly reduced in HIF-1α siRNA transfected cells exposed to the same hypoxic conditions. Apelin expression was induced by hypoxia and regulated by HIF-1α in CNE-2
The third paper provided by Xi-Wen Jiang et al. is entitled as “MGB Blocker ARMS real-time PCR for diagnosis of
CYP2C19 mutation in Chinese population” . CYP2C19 is an important genetic factor modulating clopidogrel dose
requirement. Therefore, a simple and economic genotyping method for predicting clopidogrel dose of patients would be useful
in clinical applications. In this study, the MGB blocker ARMS real-time PCR contained two forward primers and two MGB
blockers and a common reverse primer was used for CYP2C19*2, *3 and *17 substitutions. Results showed that heterozygotes
and homozygotes of CYP2C19*2, *3 and *17 could be distinguished by the MGB blocker ARMS real-time PCR successfully.
In the Chinese population, patients had allele frequencies of CYP2C19*2, *3, and *17 being 18.43%, 3.03% and 0.76%,
respectively. This study indicates the MGB blocker ARMS real-time PCR will be a simple, economical method for the rapid
detection of SNPs in CYP2C19.
In the last paper “Bioinformatics analysis of quantitative PCR and reverse transcription PCR in detecting HCV RNA”, Wei
Lie et al.  aimed to make comparisons of sensitivity and specificity between quantitative real time polymerase chain reaction
(Q-PCR) and reverse transcription PCR (RT-PCR) in detecting the ribonucleic acid (RNA) expression levels of hepatitis C
virus (HCV). Patients suffering from hepatitis C and 98 healthy participants with normal liver functions were identified. The
venous blood collections were carried out, and were subjected to detect the expression levels of HCV RNA via Q-PCR and RTPCR.
After this, the data obtained from the above two detection methods were compared, including the sensitivity and
In summary, these 5 interesting papers provide readers with valuable information on the recent progress in bioinformatics.
Hopefully this collection of papers in this issue will provide a platform for researchers, academics and healthcare professionals
with the aim of promoting scientific study of bioinformatics. Also, it can be used as a valuable source of references for
researchers to produce more promising studies in this field.
In recent years, the use of computational tools and models in conjunction with medical imaging techniques has been
effectively employed for analyzing various health conditions. The entropy is a mathematical tool for quantifying the disorder or
randomness associated with a system. The entropy can also be interpreted as the degree of uncertainty and hence is a measure
of the information content of a signal [1, 2]. The concept of entropy is derived from the second law of thermodynamics and its
applications have branched into several areas of science and technology such as signal and image processing and mathematical
Recently, various entropy measures such as Tsallis entropy, Renyi entropy, etc. have been successfully utilized for the
development of medical diagnostic systems and decision support systems [6, 7]. In the field of image analysis, entropy can
provide a good level of information to describe a given image . Entropy measures are highly useful for segmentation, feature
extraction and texture characterization of images . Entropy measures have proved to be highly useful for the segmentation of
medical images and extraction of valuable features from biomedical signals and images for the development of systems for
classification of normal and abnormal cases and also for the analysis of severity of diseases [10, 11].
The Current Bioinformatics Journal offers an excellent forum for dissemination and understanding of interdisciplinary
knowledge between the biomedical researchers and the healthcare industry. This special issue of current Bioinformatics entitled
“Entropy measures for medical image analysis” unites the authors’ effort to bring recent technological discussion on the
applications of entropy measures in the field of medical image processing. This issue comprises three excellent contributions
from varied geographical locations around the globe and presents the applications of entropy in diverse medical images such as
Microscopic Images, echocardiography ultrasound images and Magnetic Resonance Images (MRI), obtained from various
Liver fibrosis is the formation of scar tissue in dysfunctional liver and is a major health problem in recent years. One of the
major causes of liver fibrosis is alcoholism, however several factors such as infections, hereditary causes etc. may also lead to
fibrosis of liver. The first paper of this special issue, authored by Yu Wang et al. employs entropy-based features extracted
from microscopic images of mice liver for analysis of fibrosis of liver.
The second paper in this special issue, authored by Luminita Moraru et al. addresses another important problem dealing
with the age-related heart muscle damage. The authors have utilized fuzzy c-means classification algorithm along with texturebased
image features such as entropy and homogeneity for classification of healthy subjects and patients with myocardial
In the final paper of this special issue, the authors Suresh Chandra Satapathy et al. have presented a semi-automated
examination procedure for inspecting the severity of stroke lesion using MRI images. Also, the authors have employed a novel
and new heuristic optimization algorithm namely Social Group Optimization (SGO) algorithm and a thresholding procedure
based on Shannon’s entropy, Kapur’s entropy and Otsu’s function for preprocessing the adopted images.
This special issue clearly magnifies the applicability and significance of entropy measures for the analysis of medical
images. We are indebted to the Editor-in-Chief, Prof. Dr. Yi-Ping Phoebe Chen for her valuable support. We thank Ms. Nida
Hatif, Assistant Manager Publications, Bentham Science Publishers for her assistance and support. The editors would like to
thank the reviewers of this special issue, for their excellent reviews and comments.
“Prediction”, which is the main task in machine learning research, has now become popular in biomedicine and
bioinformatics recently, since Obama proposed the precision medicine project. Precision medicine would combine biomedicine
and bioinformatics together, and employ advanced machine learning techniques, especially big data analysis approaches.
Traditional medical big data would be calculated with different statistics testing methods, which are difficult to deal with multimodal
complex data and the features relationship.
Machine learning was considered as a “black box” in the past years. However, it works better than the simple statistic
testing methods. In recent years, deep learning and big data machine learning techniques developed fast and showed amazing
performance in internet application, understanding images, and auto speech recognition. We can conclude that precision
medicine would be the next novel application for the big data machine learning. However, we found that traditional classifiers,
such as SVM, random forest, still dominate in the recent biomedicine and bioinformatics works. Therefore, we plan to organize
the special issue to bridge the latest methods and the biological applications.
In this thematic issue, eight outstanding reviews have been presented from different countries and regions, including P.R.
China, USA, and Germany.
Jing et al. conducted a systematic review of protein inter-residue contacts prediction methods. As a low-dimensional
representation of protein tertiary structure, protein inter-residue contacts could help de novo protein structure prediction
methods to reduce the conformational search space. Various methods have been developed for prediction of protein interresidue
contacts over the past two decades, and those methods could be roughly classified into five categories: correlated
mutations methods, machine-learning methods, fusion methods, template-based methods and 3D model-based methods. In this
article, they reviewed three main methods for prediction of protein inter-residue contacts and analyzed constraints of each
category. They also compared several representative methods and discussed performances of these methods in detail .
Zhang et al. performed a comprehensive review of the recent developments of feature extraction methods for protein
sequences, including composition-based features, autocorrelation-based features, and profile-based features. Their concepts and
the main equations were introduced and discussed. Finally, some existing computational tools of generating these features were
introduced. This article is especially useful for the researchers who are interested in computationally studying the structures and
functions of proteins .
Liu et al. analyzed the role of bioinformatics in modernization of Traditional Chinese Medicine. Traditional Chinese
Medicine (TCM) has a wide range of medicine practices, which was developed in China and other East Asian countries for
over 2000 years. The aim of modernization of TCM is to utilize modern biological and medicinal knowledge to research the
treatment mechanism of herbal medicine. Combined with rich practice experience of TCM, it can efficiently accumulate
development of modern drug. During this process, the bioinformatics as a rising subject can aid to discover and analyze TCM.
Currently, bioinformatics has played an important role to understand biological data. In this article, they try to introduce and
discuss the applications of bioinformatics techniques in TCM for interested readers and look ahead the potential development
of bioinformatics in TCM .
Lei et al. conducted a survey on the prediction of essential proteins and genes, which discusses not only the current
developments of diverse computational methods, but also the challenge for future work. It is well known that the prediction of
essential proteins using computational methods is quite significant in the field of bioinformatics, such as the disease research
and drug design. The authors introduced the recent studies for predicting essential proteins and genes by analyzing different
kinds of computational methods including topology-based methods, integrated methods, machine learning methods and reliable
network methods. In addition, they showed the evaluation techniques of predictive performance and some available databases
of essential genes, as well as the future research directions. This review presents the systematic knowledge of the essential
proteins and genes prediction which can help the studies in the related fields .
Jia and Gong summarized the computational methods in linear B-cell epitope prediction. The determination of B-cell
epitopes is thus important for the synthesis of peptide vaccines in the preparation of diagnostic reagents, and the screening of
monoclonal antibodies. For identifying such epitopes, many computational predictors have been proposed in the past 10 years.
In this review, we summarized the representative computational approaches developed for the identification of linear B-cell
epitopes. They mainly discussed the datasets, feature extraction methods and classification methods used in the previous work.
Moreover, they also consider existing challenges and future perspectives for developing reliable methods for predicting linear
B-cell epitopes .
Yang et al. summarized the application of machine learning methods in protein sub-Golgi localization. The distribution of
proteins in cell correlates with their functions. Due to its important roles in protein storage, package and distribution, the
identification of protein sub-Golgi localization is very important for the in-depth understanding of proteins function. Because
transitional biochemical experiments are time-consuming and require expensive materials, machine learning method is used to
identify the protein location in Golgi apparatus. In the review, Yang et al. briefly introduced the recent progresses of protein
sub-Golgi apparatus localization prediction from the database construction, feature extraction and optimization, prediction
algorithms and webserver establishment. The advantages and disadvantages of these methods were discussed. They also
provided the prospect of protein sub-Golgi localization prediction using machine learning methods .
Zhang et al. analyzed the cross-kingdom regulation of exogenous plant miRNAs in humans. As a new signaling molecule,
exogenous miRNAs regulate and influence the physiological functions of inhalers, revealing the effects on human health and
metabolism at the gene molecular level. During the process, the combination of bioinformatics and computer methods can help
find and analyze the cross-kingdom regulation of exogenous plant miRNA in the human body. In this article, they made attempt
to introduce and discuss recent advances of exogenous plant miRNAs into humans and their cross-kingdom regulation and look
ahead the potential development of cross-kingdom regulation of exogenous plant miRNAs .
Qu et al. analyzed the methods of prediction the DNA-binding protein. DNA-binding protein is an important component of
prokaryotic and eukaryotic proteomes, which can participate in many life activities. Due to the importance of DNA-binding
protein, there are many researches on prediction DNA-binding, and many methods are mentioned. In the earlier years, the
biochemical methods are often used for identification of these proteins, but with the continuous development of machine
learning, many machine-learning methods are used to predict DNA-binding proteins. In this article, they analyzed the feature
representation methods and common classifiers, summarized the evaluation methods and existing websites, and compared the
results of different methods .
Each paper in this special issue was extensively peer-reviewed by more than two external reviewers. I would like to thank
all the authors for contributing their work to our hot thematic issue and all the reviewers for their time and efforts.
The success of Bioinformatics in recent years has been prompted by research in Molecular Biology and Molecular Medicine
in several initiatives. These initiatives gave rise to an exponential increase in the volume and diversification of data, including
next generation sequencing data and their annotations, high-throughput experimental (omics) data, biomedical literature, among
many others. Systems Biology is a related research area that has been replacing the reductionist” view that dominated Biology
research in the last decades, requiring the coordinated efforts of biological researchers with those related to data analysis,
mathematical modeling, computer simulation and optimization.
The accumulation and exploitation of large-scale databases prompt the development of new computational technology and
research on these issues. In this context, many widely successful computational models and tools used by biologists in these
initiatives, such as clustering and classification methods for omics data, are based on Computer Science/ Artificial Intelligence
(CS/AI) techniques. In fact, these methods have been helping in tasks related to knowledge discovery, modeling and
optimization tasks, aiming at the development of computational models so that the response of biological complex systems to
any perturbation can be predicted.
In this context, the interaction of researchers from different scientific fields is, more than ever, of foremost importance,
boosting the research efforts in the field and contributing to the education of a new generation of bioinformatics scientists. The
Practical Applications in Computational Biology and Bioinformatics (PACBB) conference has been contributing to this effort,
promoting this fruitful interaction over the last 7 years. This special issue gathers four contributions, selected and significantly
extended from the rich PACBB'15 technical program, which included papers spanning many different sub-fields in
bioinformatics and computational biology.
This volume gathers four extended articles selected from the work presented at the PACBB’2015 conference, showing
distinct and meaningful practical applications of bioinformatics and computational biology. These range from text mining, to
next generation sequencing and gene expression data applications.
Calderon-Mantilla et al. propose a pipeline architecture for inferring and visualizing gene networks from expression data,
applied to the specific case of coffee plants . Rodriguez-Gonzalez et al. present a comparison of two distinct text mining
approaches for extracting diagnostic related knowledge from MedLine Plus articles . The work by Graña et al. proposes
nextpresso, a pipeline for the analysis of next generation sequencing (RNA-seq) data that covers the most common
requirements of these experimental data . Finally, the work by Fernandez-Gonzalez et al. addresses a relevant problem in
text retrieval, namely the influence of class imbalance when developing classifier models for assessing relevant biomedical
The major feature of our current life sciences, is the rapid increase of biological data, which are presented in many forms,
and reflect the characteristics of biological systems at various levels, including genome, transcriptome, epigenome, proteome
and metabolome etc. This is the so-called biological big data we are facing. Biological big data bring both challenges and
opportunities to bioinformatics. Tools and techniques for analyzing big biological data enable us to translate massive amount of
information into a better understanding of the basic biomedical mechanisms, which can further be applied to translational or
This thematic issue with a theme of “Bioinformatics in Biological Big Data Era” aims at extensively showing the latest
development and achievements in Bioinformatics in this biological big data era. The ten papers in this thematic issue were
selected from the 1st CCF Bioinformatics Conference (CBC 2016), which was sponsored by China Computer Federation (CCF).
The selected papers cover methods and algorithms for processing biological big data in addressing various bioinformatics
issues. Before submitted to the special issue, these papers have gone through strict reviewing organized by CBC 2016. Further
reviewing was organized by the guest editors.
In what follows, we give a brief review of the 10 papers included in this thematic issue.
Liu et al. in their paper “Sparse linear modeling kinase inhibition network for predicting combinatorial drug sensitivity in
cancer cells” used a sparse linear model called uncertain group sparse representation (UGSR) to infer essential kinases
governing the cellular responses to drug treatments, based on the massively collected drug-kinase interactions and drug
sensitivity datasets over hundreds of cancer cell lines .
In the paper “TagNovo: A dictionary based approach to predict peptide theroy spectra” Wang et al. presented a new
theoretical spectrum prediction model called TagNovo, which builds a “tag dictionary" from exiting spectrum library and is
used for theory spectrum prediction .
In the paper “Large-scale Investigation of Long Noncoding RNA Secondary Structures in Human and Mouse” by Guo et al.
the authors conducted a large-scale investigation of lncRNA secondary structures especially for hairpin structural motif in
human and mouse based on computational prediction using the RNAfold software, and found that the secondary structures of
lncRNAs have many characteristics, most of which are similar with those in mRNAs .
Nie et al. in their paper “Prediction of protein S-Sulfenylation sites using a deep belief network” developed a computational
method DBN-Sulf to effectively predict S-sulfenylation sites by using optimally extracted properties based on Deep Belief
Network (DBN) with Restricted Boltzmann Machines (RBMs). DBN-Suf shows significantly better performance than the
existing methods .
In the paper “Feature identification for phenotypic classification based on genes and gene pairs”, Su, Zhang and Pan
proposed a new algorithm called FSGGP to select both feature genes and feature gene pairs on the binary-value gene
expression data .
Chan et al. in their paper “MyPhi: Efficient Levenshtein Distance Computation on Xeon Phi based Architectures”
introduced MyPhi, an ultra-fast implementation of the Myers algorithm on Intel Xeon Phi based architectures for efficiently
computing Levenshtein Distance between genome sequences .
The paper “A Metric on the Space of Rooted Phylogenetic Trees” by Wang and Guo proposed a new metric on the space of
rooted phylogenetic trees, which can be calculated in polynomial time with the size of the compared trees .
Liao et al. developed a method to classify Small GTPases and non-small GTPases in their paper “Classification of small
GTPases with hybrid protein features and advanced machine learning techniques” . In the paper “Identification of Attention
Deficit/Hyperactivity Disorder in Children Using Multiple ERP Features”, Li et al. used non-invasive event-related potential
(ERP) features for Attention deficit hyperactivity disorder (ADHD) prediction . The paper “Low Rank Representation and
its application in bioinformatics” by You, Cai and Huang reviews the theoretical and numerical models based on low rank
representation and their applications in bioinformatics area .
This thematic issue is the result of many people’s contribution, support and cooperation. We appreciate the authors for
submitting their works to this thematic issue, and we thank the reviewers for their hard work in reviewing the papers. We also
thank the Editor in Chief and the staff of Current Bioinformatics for their valuable help to make this issue possible.
With the development of sequencing technologies, a wide variety of biological data including DNA, RNA and protein sequences and gene expression profiles were generated and accumulated. These data are an external manifestation of acting mechanism of the cell. How to discover cellular mechanism through these data is a vast challenge that current scientists are faced with. The computational approaches including bioinformatics and system biology have proved essential to analyze these complicated data, as Markowetz declared that all biology is computational biology . Recently a number of computational techniques and theories such as BLAST [2, 3] machine learning [4-7] and network theory [8-10], have facilitated the discovery of molecular structures and functions. Therefore, this thematic issue is intended to summarize recent progress of these computational techniques and theories in genomics and proteomics. Comparison between molecular sequences is very helpful to explore evolutionary relationship among different tissues, organisms or species and further to functional analysis. It is not an exaggeration to say that comparison between molecules is an important foundation of life exploration. Natural vector is a method of characterizing protein or DNA sequences and is applicable to classification and evolutionary analysis [11, 12]. Yu  reviewed the natural vectors method and application of it in the virus phylogenetic Classification.
Biological images are useful especially to phenotype quantification. High-throughput and quantitative biological phenotypes from images are increasingly becoming important to both the quantification of phenotypes and the visualization of biological molecular structure and activity. Bioimage informatics is becoming a new area of exploring life . Chen et al.  discussed the major studies based on biological images and summarized the computational techniques of biological image analysis.
Long noncoding RNAs (lncRNAs) are transcripts with more than 200 nucleotides, and belong to a type of non-protein coding RNA. LncRNAs have recently been discovered to perform a variety of functions . However, the identification of lncRNA is challenging . Yao et al.  reviewed the computational strategies of recognizing lncRNA and current progresses, and discussed existing difficulties in the prediction of the lncRNAs especially by using machine learning methods.
Biomedical data are commonly big data which require both high-performance computers and high-effective computational methods. Deep learning proposed by Hinton et al.  is becoming a dazzling research field and makes machine intelligence advance a big step. Peng et al.  reviewed the application of deep learning in the omics data processing, biological image processing and biomedical diagnosis and discussed challenges.
With the development of meteorological science, a large number of meteorology data, such as temperature, humidity, rainfall, air pressure, wind speed and so on, have been collected. So how to identify the key meteorological indexes causing an epidemic? How to integrate these key meteorological indexes to predict the epidemic? These become a vital task for us now. Liu et al.  reviewed two categories of model of prediction of the epidemics related to meteorological factors: deterministic models and stochastic models.
Single-nucleotide polymorphism known as SNP is referred to as the variation of a single nucleotide that occurs at a specific position in the genome. SNP was found to be associated with a wide range of disease or traits , such as inflammatory and autoimmune disorders , Alzheimer's disease  and breast cancer . The study of SNP-disease associations will facilitate the promise of precision medicine . Li  reviewed computational methods of identifying SNP-disease association and discussed improvement directions: data quality improvement, high-performance computing platform and advanced computational method.
Although most genes have been detected, little was known about its functions. Numerous computational methods have recently been proposed to find gene functions. Loh et al.  summarized these computational methods, compared them and analyzed their strength and weakness.
Protein post-translational modification (PTM) is a biochemical reaction which occurs after translation and before protein synthesis, covalently modified by different functional groups. The PTMs involved in every cellular process of life is a key regulating mechanism in the cell [29-32]. The first important step to explore PTMs is to identify PTM types and sites. Currently, the biochemical or biophysical experiments and the computational approaches are parallel to complement one another for identifying PTMs. The computational predictionsgenerally consisted of data collection, representation of PTMs (feature extraction), training the known samples and prediction new samples. Therefore, the feature extraction occupies a central position in the computational prediction of PTMs. Huang  reviewed the methods of feature extraction which have recently been developed for PTMs prediction and discussed several properties of it.
No Text Found