Selection of proper targets for the X-ray crystallography will benefit biological research
community immensely. Several computational models were proposed to predict propensity of successful
protein production and diffraction quality crystallization from protein sequences. We reviewed a comprehensive
collection of 22 such predictors that were developed in the last decade. We found that almost
all of these models are easily accessible as webservers and/or standalone software and we demonstrated
that some of them are widely used by the research community. We empirically evaluated and compared
the predictive performance of seven representative methods. The analysis suggests that these methods
produce quite accurate propensities for the diffraction-quality crystallization. We also summarized results
of the first study of the relation between these predictive propensities and the resolution of the crystallizable
proteins. We found that the propensities predicted by several methods are significantly higher
for proteins that have high resolution structures compared to those with the low resolution structures.
Moreover, we tested a new meta-predictor, MetaXXC, which averages the propensities generated by the
three most accurate predictors of the diffraction-quality crystallization. MetaXXC generates putative
values of resolution that have modest levels of correlation with the experimental resolutions and it offers
the lowest mean absolute error when compared to the seven considered methods. We conclude that protein
sequences can be used to fairly accurately predict whether their corresponding protein structures can
be solved using X-ray crystallography. Moreover, we also ascertain that sequences can be used to reasonably
well predict the resolution of the resulting protein crystals.
Keywords: X-ray crystallography, diffraction quality crystallization, protein production, resolution of protein crystals, meta
prediction, protein structure, prediction.
Rights & PermissionsPrintExport