Background: Protein hydroxyproline is one type of post translational modification (PTM).
Because protein sequence contains many uncharacterized residues of P, the question that needs to be
answered is: Which ones can be hydroxylated, and which ones cannot? The solution will not only give a
deeper understanding of the hydroxylation mechanism but can also lead to drug development. The evergrowing
demand for better handling of protein sequences in the post-genomic age presents new
Objective: To address these challenges, developing computational methods to identify these sites
quickly and accurately is our objective.
Method: We propose a new approach for predicting hydroxyproline using the deep learning model
known as the convolutional neural network (CNN), and employed a pseudo amino acid composition
(PseAAC) to identify these proteins and used the position-specific scoring matrix (PSSM) to represent
samples as input to the CNN model.
Results and Conclusion: In our experiment, K-fold cross-validation testing on benchmark datasets
further demonstrated the potential for CNN identification of protein hydroxyproline as well as other
PTM type proteins.