Background: Heat shock proteins (HSPs) ubiquitously expressed in both prokaryotes and
eukaryotes. According to their molecular mass and function, HSPs are classified into different families
which are structurally different and play distinct functions in biological processes. Although some efforts
have been made for identifying the types of HSPs, there is no method available that can be used
to identify the types of HSPs in plants.
Method: The amino acid distributions in the different types of HSPs are anazlyed. HSPs are encoded
using the reduced amino acid alphabet (RAAA). By comparing the predictive capability of models
based on the composition of RAAA with different sizes, the optimal feature vector was obtained. A
support vector machine based model was developed to identify the types of HSPs by using the optimal
Results: The amino acid distributions are different among the different families of HSPs. In the rigorous
jackknife test, the proposed method obtained an accuracy of 93.65% for identifying the five families
of HSPs in plant.
Conclusions: We hope the proposed method will become a useful tool to identify the types of HSPs in