With the development of high-throughput analysis, the growing data of gene expression profile can be provided. It is possible to predict gene function by analyzing sequence homology and co-expression relation of the known marker genes and novel genes. In order to improve the accuracy of prediction gene function, this paper attempts to predict gene function by using computational biology combined with sequence homology and gene co-expression. Given an amino acid encoded sequence of novel gene, its homologous and co-expressed genes which encode the protein having enzyme function, the following questions are often asked. Is the amino acid encoded sequence of novel gene an enzyme or non-enzyme? If it is, which main family class does it belong to? Or going further deeper, what about its sub-functional class? In this paper, the training dataset was used by Shen and Chou. By using amino acid composition, low-frequency power spectral density, increment of diversity values, dipeptide amino acid compositions and pseudo amino acid composition to express the information of sequence, a support vector machine (SVM) or increment of diversity (ID) for predicting enzyme function of novel genes is proposed. The biological functions of 25 novel genes have been obtained. The article presents some promising patents on predicting gene function by using computational biology combines with gene homology and co-expression.
Keywords: Computational biology, gene homology, co-expression, prediction algorithm, gene function, enzyme function, support vector machine, Low-Frequency Power Spectral Density, Computational System, ID algorithm
Rights & PermissionsPrintExport