Deep Forest-based Prediction of Protein Subcellular Localization

Author(s): Lingling Zhao*, Junjie Wang, Mahieddine Mohammed Nabil, Jun Zhang*

Journal Name: Current Gene Therapy

Volume 18 , Issue 5 , 2018

  Journal Home
Translate in Chinese
Become EABM
Become Reviewer
Call for Editor


Motivation: Knowledge of the correct protein subcellular localization is necessary for understanding the function of a protein and revealing the mechanism of many human diseases due to protein subcellular mislocalization, which is required before approaching gene therapy to treat a disease. In addition, it is well-known that the gene therapy is an effective way to overcome disease by targeting a gene therapy product to a specific subcellular compartment. Deep neural networks to predict protein function have become increasingly popular due to large increases in the available genomics data due to its strong superiority in the non-linear classification ability. However, they still have some drawbacks such as too many hyper-parameters and sufficient amount of labeled data.

Results: We present a deep forest-based protein location algorithm relying on sequence information. The prediction model uses a random forest network with a multi-layered structure to identify the subcellular regions of protein. The model was trained and tested on a latest UniProt releases protein dataset, and we demonstrate that our deep forest predict the subcellular location of proteins given only the protein sequence with high accuracy, outperforming the current state-of-art algorithms. Meanwhile, unlike the deep neural networks, it has a significantly smaller number of parameters and is much easier to train.

Keywords: Protein subcellular location, Machine learning, Deep forest, Sequence information, UniProt, Algorithm's.

Rights & PermissionsPrintExport Cite as

Article Details

Year: 2018
Published on: 12 November, 2018
Page: [268 - 274]
Pages: 7
DOI: 10.2174/1566523218666180913110949
Price: $65

Article Metrics

PDF: 33