A novel method to predict the location and the length of transmembrane helices in GPCRs was proposed. This method consists of a “one by one” amino acid feature extraction window which makes it possible for the method to learn the amino acid distribution in helical segments of GPCR proteins. It is based on hidden Markov model (HMM) with a specific architecture that takes advantage of Viterbi decoding algorithm and the observed frequency values for adjusting the transition probabilities.
The prediction capability of the method was evaluated for per-protein, per-segment and per-residue accuracies on two datasets consisting of 649 (at least one GPCR from each family) and 2898 (all GPCRs) sequences extracted from UniProt database and compared with other commonly used existing methods. It was found that in all three assessments, the prediction accuracies for the new method on the larger dataset, i.e., 2898 GPCRs, were higher than that obtained by other methods. The results showed that our method was able to predict the topology of GPCR proteins without any sequence length limitation with the accuracies of 88.9 % and 87.4% for the small (i.e., 649 GPCRs) and large (i.e., 2898 GPCRs) datasets, respectively. (Availability status: The source code is available upon request from the authors)