Background: CRISPR/Cas9, a new generation of targeted gene editing technology with low
cost and simple operation has been widely employed in the field of gene editing. The erroneous cutting
of off-target sites in CRISPR/Cas9 is called off-target effect, which is also the biggest complication that
CRISPR/Cas9 confronts in practical application. To be specific, the off-target effects could lead to unexpected
gene editing results. Therefore, accurately predicting CRISPR/Cas9 off-target effect is a very
important task. Predicting off-target effects of CRISPR/Cas9 by machine learning method is feasible,
but most existing off-target tools did not pay close attention to the effects of gene encoding on prediction.
Methods: We compared three encoding methods based on One-Hot and combined the gene sequence
with four CRISPR/Cas9 off-target prediction tools to build an ensemble model with XGBoost, designated
as XGBCRISPR. The grid search is employed to find the optimal parameters to achieve the best performance.
Results: The performance is compared with existing tools based on the ROC value and PRC value. The
experimental results show that the XGBCRISPR model is superior to the existing tools.
Conclusion: The new model could achieve better prediction result than existing tools, but the accuracy
of model can be improved further as many off-target scores appear.