The advent of various advanced biological experimental techniques for discovery of conservations and interactions between
molecules has prompted the study of biological regulation networks in a systematic way. The increasingly growth of organisms,
genome-scale conservation and co-expression yield valuable data sources for exploring the functional and regulatory roles of
biological systems. Graphs can be easily applied to represent a given biological network by transferring its basic entities and
interactions into nodes and edges, respectively. It is important to develop effective methods for comparing these networks from
varied organisms and discover the common/frequent subgraphs. This is able help reveal their functions and the exact features or
scheme to carry out these functions. There have been increasing efforts to find featured patterns of interaction conserved by
several biological networks. The purpose of this special issue is to discuss the state of arts of the latest techniques and methods
for discovering diverse regulatory networks with respect to metabolism, PPI (protein-protein interaction), gene expression, and
THE ADVENT OF BIOLOGY BIG DATA
There has been a biology data explosion owing to the application of advanced experiment technologies, such as next generation
sequencing or deep sequencing. For example, the European Bioinformatics Institute (EBI), “one of the world's largest
biology-data repositories, currently contains 20 petabytes (1 petabyte is 1015 bytes) of data and back-ups regarding genes,
proteins and small molecules” . This not only makes it possible to perform a comprehensive analysis of the genome and the
transcriptome of a specific species, but also generates a big challenge to handle, process and extract information from the massive
big data sets. These data have been widely manipulated to produce various research solutions. Since costs have been
largely reduced owing to high-throughput instruments, small biology labs can also yield big data. It is possible that big data
users are from small labs without such good facilities but they can access online data from public repositories.
The biology data are usually more heterogeneous in contrast with conventional transaction data since a variety of
experiments can give rise to different kinds of information, such as protein-protein interactions (PPI), various sequence, and
RNA secondary structures or detections in the transcriptome. This generates a demand for biology data mining to access big
data sets, integrate, analyze, compare and interpret the complex data . In many ongoing studies, data-sharing has become a
popular way for a large scale genomic or proteomic comparisons. However, the traditional approaches by downloading the
data, storing them in own computer and analyze the data are time-consuming and high cost. Also, it is impossible for all users
to have the required computational facilities, such as supercomputer and software. Without a doubt, this addresses the needs of
flexible and public computational platform, including intelligent strategies about storage, management and analysis for dealing
with biology big-data.
A number of companies, institutes and labs have been established for different commercial and academic purposes, such as
the National Center for Biotechnology Information (NCBI), EBI and Beijing Genomics Institute (BGI). They provide open
access data sources, including data download and search, and usually allow one to obtain a data set from one location. To aid
scientists in sharing their data across different countries and regions, this highlights the need to share computational resources
and users can access the hardware and software on demand.
Cloud computation is a recently emerging technology to cope with biology big-data mining. They not only offer virtual
storage for data sharing, software and outcomes that a selected group of collaborators can share, but also prevents unauthorized
users from accessing them . A number of scientists can download/upload data and use software via cloud-based platforms.
There have been many data sets and software programs situated in huge and offsite centers. For example, IT Center for Science
at http://www.csc.fi/english is a high-performance computing centre run and funded by the government of Finland. Embassy
Cloud is a cloud-computing component for ELIXIR by EBI, which provides safe computational environments and data
download service for comparison purposes. These cloud-based infrastructures address the continued data growth and facilitate
scientists to have a quick access to the information they need. Nevertheless, big-data transfer between local and remote sites
and data sharing between collaborators remain a big challenge owing to unexpected interruption of data transfer.
Regulatory networks have become a prevalent way to store and manage a large volume of biological data by modeling the
molecular interactions. A deep study of identifying bimolecular networks and their correlations assists in understanding cellular
behaviors and uncovering their functions in cellular systems. As a result, protein system biology, one of the most important
forms of network system biology, will play a central role in life science. Life science is becoming data-driven. Big data science
including data management, sharing and analysis is useful to construct dynamic and interacted protein regulatory network in an
THE IMPORTANCE OF PROTEIN SYSTEMS BIOLOGY
Systems biology has become a hot research topic since 2000, from the construction of diverse biological systems, data
visualization to big-data management and analysis in molecular biology and biomedicine. The molecule, cells, tissues and
organs are not independent but perform their function together in a systematic way. It has been widely applied in both
biological and biomedical studies to explore complex interactions between components within biology systems by biomedical studies to explore complex interactions between components within biology systems by virtue of computational
methods and mathematical models. Gene networks and protein networks are two typical networks of system biology, in which
the properties and patterns of protein-based regulation and gene-related components play an important role in understanding
functions and behaviors of biology systems .
A number of commercial or academic research institutes, centers and labs have been established for systems biology investigation.
FAS center for systems biology is an interdepartmental initiative at Harvard University, which aims to explain the
structure, behavior and evolution of cells and organisms by combining quantitative measurements and systematic measurement
including genomics, proteomics, and computational biology, and mathematical models to extract and describe the dynamical
behavior of groups of interacting components. Systems problems have become an important topic to all computational biology
research and medicine design. The New South Wales Systems Biology Initiative was funded by the Australian Research Council
and NSW State Government, (http://www.systemsbiology.org.au/). It is located at the University of New South Wales and
targets at developing bioinformatics algorithms and tools for genomics and proteomics. SBI was established in 2000 and aimed
to facilitate systems biology research in several important areas with respect to healthcare and global sustainability. It has been
widely applied in a number of research programs mostly supported by Japanese government and private foundations.
“Pathways have been viewed as a convenient way of summarizing the results of a collection of experiments to describe
the flow of signals or metabolites in a cell. A number of databases regarding metabolic and signaling pathways are developed to
represent the relationships between molecules involved in various events, including reactions or as activation or inhibition” .
Notwithstanding many attempts to extract properties and details of the interaction, such as phosphorylation sites, there is generally
insufficient functional details to interpret the actual meaning of the link between two proteins. Relevant molecules, identified
binding sites and their interactions are able to greatly illuminate the understanding of protein system biology.
THE MOTIVATION FOR NOVEL COMPUTATIONAL METHODS
A great deal of molecular interactions have been unveiled, but the details of precise interactions are still far from perfect and
comprehensive. The difficulties to predict the behavior of involved genes and proteins mainly arise from the complexity of
turning the abstract biology system into models that exactly report the system reality, and the heterogeneity and size of biological
big data from multiple data sources. The paradigm of systems biology thus generates a demand for computational method,
interaction prediction and network construction.
High-throughput sequencing projects have identified a collection of involved components that function in an organism.
Many studies in post-genomic projects target extracting their relationships. Systems biology is thus motivated to make sense of
these relationships by considering them together, and simulates the manner by which the participated molecules work together
to obtain a designated outcome or perform targeted functions. As a result, traditional molecular biology that focused on studying
single molecules has been moved to systems biology by exploring pathways, complexes or even an organism. To understand
diverse pathways and or networks regarding gene regulation, scientists must have a good knowledge about the correlations
between protein and protein, protein and metabolite and protein and nucleic acid.
Structural information has been a useful way to offer a comprehensive understanding of interaction between molecules by
relying on atomic details about binding. However, it takes time for detailed structural information of a large complexes or
whole systems to be reached. Thus, this urges us to develop new computational methods to discover and model the relationships
between interacting molecules.
CONTRIBUTIONS TO THIS ISSUE
The articles included in this special issue are classified into protein function prediction, protein-protein interaction, and protein
Methods for construction and characterization of amino acid networks are reviewed by Jianhong Zhou et al. The authors
summarized and discussed network properties applied to the native structure selection, providing a future perspective on the
application of amino acid networks for the native folding detecting among the decoy sets.
Wei Peng et al. proposed an unbalanced Bi-random walk (UBiRW) algorithm to predict protein functions which iteratively
walks different number of steps in the two networks is adopted to find protein-GO term associations according to some known
“The interface in a complex involves two structurally matched protein subunits, and the binding sites can be predicted by
identifying structural matches at protein surfaces” . Understanding energetic and mechanism of complexes remains one of
the essential problems in binding site prediction. Fei Guo et al. developed a system, PBinder, for identifying binding sites based
on structural compatibility, side-chain conformations, amino acid types and contact energies. The system reports improvements
in prediction correctness, according to both accuracy and coverage.
Among the most important networks maintaining biological functions, protein-protein interactions span from local binary
interaction to an entire cell. It is still a long sought scientific goal to understand how the interacting partners recognize and bind
each other precisely. Comparing with other existed method, Least Squares regression (LSR) proposed by De-Shuang Huang elvirtue ofal. is a powerful tool to characterize the protein-protein correlations and to infer PPI, whilst keeping high performance on prediction
of PPI networks.
The review article written by Chiranjib Chakraborty et al. enhances our knowledge on how PPI networks architecture can
use to validate a drug target. At the conclusion, future directions of PPI in target discovery and drug-design have been suggested.
Based on reviewing the key regulators in the hydroxylated triacylglycerol ricinoleate biosynthesis pathway of castor bean,
Yujie Chen et al. analyzed several key regulators from the aspect of the structure/function prediction and similar expression
pattern mechanisms aimed to give an insight on the better understandings of the biosynthesis knowledge for this energy-rich
molecule and the key regulators performance in the pathways.
Lili Liu et al. defines the organelle-focused proteome and interactome of rice based on manual annotation, manual adjustment
and predictors’ cross validation. Furthermore, the cross talk bias between different organelles and the function organization
accounting for nine organelles are explored.
Wei Lan et al. explore the overlooked positions of microRNAs (miRNAs) based on sequential and structural features since
they have been recognized as important regulators in a wide range of biological processes. These functions may be exploited
for miRNA-mediated regulation of protein expression. Collision entropy is applied to measure the degree of importance of
miRNA position. In particular, two thresholds are used to prune those unimportant positions. The findings unveil important
positions of miRNAs related to biogenesis and function.
“Rapid advances in network biology indicate that cellular networks are governed by universal laws and offer a new conceptual
framework that could potentially revolutionize our view of biology and disease pathologies in the twenty-first century” .
Systems biology has been successful in predicting the behavior of a set of molecules involved in biological systems and
understanding their interactions. As an important branch of systems biology, protein systems biology focuses on investigating
the properties and patterns with respect to protein-related interactions. Owing to the application of high-throughput sequencing
techniques, biological big-data has become a big challenge to both biologists and computer scientist. Traditional computational
methods on the basis of local data and computational facilities have showed their limitations in addressing a large volume of
biological data from multiple sources. Thus, it is crucial to develop public computational platforms, including equipment and
software for storage, management and analysis of biological big-data. This needs collaboration of scientists from biology, computer
science and mathematics. To understand the properties of components in biological systems and their relationships, these
result in a number of interesting research topics described in this special issue.
We wish to thank all the authors who have contributed with their work to foster the dissemination of scientific excellence in
the protein network biology field; all the reviewers for giving their time and expertise to evaluate manuscripts submitted for this
publication. The work reported in this paper was partially supported by a National Natural Science Foundation of China project
61363025 and 31371328, and two key projects of Natural Science Foundation of Guangxi 053006 and 019029.