Antifreeze proteins (AFPs) perform distinguishable roles in maintaining homeostatic conditions
of living organisms and protect their cell and body from freezing in extremely cold conditions.
Owing to high diversity in protein sequences and structures, the discrimination of AFPs from non-
AFPs through experimental approaches is expensive and lengthy. It is, therefore, vastly desirable to
propose a computational intelligent and high throughput model that truly reflects AFPs quickly and accurately.
In a sequel, a new predictor called “iAFP-gap-SMOTE” is proposed for the identification of
AFPs. Protein sequences are expressed by adopting three numerical feature extraction schemes namely;
Split Amino Acid Composition, G-gap di-peptide Composition and Reduce Amino Acid alphabet
composition. Usually, classification hypothesis biased towards majority class in case of the imbalanced
dataset. Oversampling technique Synthetic Minority Over-sampling Technique is employed in order to
increase the instances of the lower class and control the biasness. 10-fold cross-validation test is applied
to appraise the success rates of “iAFP-gap-SMOTE” model. After the empirical investigation,
“iAFP-gap-SMOTE” model obtained 95.02% accuracy. The comparison suggested that the accuracy
of” iAFP-gap-SMOTE” model is higher than that of the present techniques in the literature so far. It is
greatly recommended that our proposed model “iAFP-gap-SMOTE” might be helpful for the research
community and academia.
Keywords: Antifreeze proteins, Smote, KNN, PNN, SVM, AFPs.
Rights & PermissionsPrintExport