Amino acid substitutions in HIV-1 proteins critical to the viral replication cycle have the
potential to undermine successful inhibition of those targets, with some mutations leading to either
reduced susceptibility to certain medications or complete drug resistance. Phenotypic tests are best
suited to quantify the effects of complex mutational patterns on drug resistance; however, the
relatively high cost and long turnaround time associated with phenotyping has increased the demand
for in silico drug-specific models capable of accurately predicting phenotype directly from the target
protein sequences. The focus of this study is on the HIV-1 integrase (IN) enzyme, which mediates
integration of reversibly transcribed viral DNA into the host cell genome, and the development of predictive statistical
learning models of resistance to the IN inhibitors Raltegravir (RAL) and Elvitegravir (EVG). Models were trained using
datasets of IN protein sequence variants each having a known phenotype, quantified as the fold change in susceptibility to
the respective inhibitor, and obtained using an experimental assay. A sequence-based approach employing n-grams
relative frequencies was implemented to uniquely characterize each IN variant as a feature vector of input attributes.
Models for classifying IN variants as susceptible or resistant reach cross-validation balanced accuracy rates of 89% with
RAL and 85% with EVG. Additionally, regression models achieve Pearson’s correlation coefficients, between
experimental and predicted log-transformed phenotypic fold change values, as high as r = 0.80 with RAL and r = 0.76
with EVG. Our results suggest that as additional training data are made publicly available, the models may hold promise
as supplementary tools for making treatment decisions.
Keywords: Drug resistance, genotype-phenotype correlations, HIV-1 integrase, n-grams, regression, statistical learning
models, supervised classification.
Rights & PermissionsPrintExport