Protein complexes involve in most if not all of essential biological processes in a living cell. Many attempts
have been devoted to identify protein complexes using computational methods, most of which exploit protein-protein
interaction networks to search intensively interacting proteins as a protein complex. Besides identifying protein
complexes, knowing their biological functions may help unlock their molecular mechanisms and their roles in related
biological processes. Therefore, it is also desirable to computationally predict the functions of protein complexes.
However, no literature has been found to address such a problem. This paper attempts to address the problem by choosing
yeast as the model organism, where total 50 protein complexes are collected and their functions are validated by solid
experiments. Each of the complexes was encoded by a numeric vector based upon their graphic and functional properties.
Feature selection techniques, including Minimum Redundancy Maximum Relevance and Incremental Feature Selection,
were adopted to extract core features for the prediction. Three different prediction methods, Nearest Neighbor Algorithm,
Bayesian network and Sequential Minimal Optimization, were utilized in this study and tested by jackknife crossvalidation
test. Consequently, 22 core features coupled with Nearest Neighbor Algorithm gain the highest accuracy. These
core features are regarded as the most important features for the determination of the biological functions of protein
complexes. 19 out of 22 core features were from functional properties, indicating that the functions of each protein
component probably constrain the overall functions of the protein complex.
Keywords: Bayesian network, gene ontology, incremental feature selection, jackknife test, minimum redundancy maximum
relevance, nearest neighbor algorithm, prediction of functions of protein complex, protein complex, sequential minimal
Rights & PermissionsPrintExport