Title:Predicting Inhibitors for Multidrug Resistance Associated Protein-2 Transporter by Machine Learning Approach
VOLUME: 21 ISSUE: 8
Author(s):Sahil Kharangarh, Hardeep Sandhu, Sujit Tangadpalliwar and Prabha Garg*
Affiliation:Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, S.A.S Nagar, Punjab-160062, Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, S.A.S Nagar, Punjab-160062, Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, S.A.S Nagar, Punjab-160062, Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, S.A.S Nagar, Punjab-160062
Keywords:Multidrug Resistance Associated Protein-2 (MRP2), Machine Learning, Support Vector Machine (SVM), Random
Forest (RF), k- Nearest Neighbor (k-NN), model development.
Abstract:Background: The efflux transporter multidrug resistance associated protein-2 belongs to
ATP-binding cassette superfamily which plays an important role in multidrug resistance and drugdrug
interactions. Efflux transporters are considered to be important targets for increasing the
efficacy of drugs and importance of computational study of efflux transporters for predicting
substrates, non-substrates, inhibitors and non-inhibitors is well documented. Previous work on
predictive models for inhibitors of multidrug resistance associated Protein-2 efflux transporter
showed that machine learning methods produced good results.
Objective: The aim of the present work was to develop a machine learning predictive model to
classify inhibitors and non-inhibitors of multidrug resistance associated protein-2 transporter using a
well refined dataset.
Method: In this study, the various algorithms of machine learning were used to develop the
predictive models i.e. support vector machine, random forest and k-nearest neighbor. The methods
like variance threshold, SelectKBest, random forest, and recursive feature elimination were used to
select the features generated by PyDPI. A total of 239 molecules consisting of 124 inhibitors and 115
non-inhibitors were used for model development.
Results: The best multidrug resistance associated protein-2 inhibitor model showed prediction
accuracies of 0.76, 0.72 and 0.79 for training, 5-fold cross-validation and external sets, respectively.
Conclusion: It was observed that support vector machine model built on features selected using
recursive feature elimination method shows the best performance. The developed model can be used
in the early stages of drug discovery for identifying the inhibitors of multidrug resistance associated
protein-2 efflux transporter.