Background and Objective: Colorectal cancer (CRC) is a common malignant tumor of
the digestive system; it is associated with high morbidity and mortality. However, an early
prediction of colorectal adenoma (CRA) that is a precancerous disease of most CRC patients
provides an opportunity to make an appropriate strategy for prevention, early diagnosis and
treatment. It has been aimed to develop a machine learning model to predict CRA that could assist
physicians in classifying high-risk patients, make informed choices and prevent CRC.
Methods: Patients who had undergone a colonoscopy to fill out a questionnaire at the Sixth People
Hospital of Shanghai in China from July 2018 to November 2018 were instructed. A classification
model with the gradient boosting decision tree (GBDT) was developed to predict CRA. This
model was compared with three other models, namely, random forest (RF), support vector
machine (SVM), and logistic regression (LR). The area under the receiver operating characteristic
curve (AUC) was used to evaluate performance of the models.
Results: Among the 245 included patients, 65 patients had CRA. The area under the receiver
operating characteristic (AUCs) of GBDT, RF, SVM ,and LR with 10 fold-cross validation was
0.8131, 0.74, 0.769 and 0.763. An online prediction service, CRA Inference System, to
substantialize the proposed solution for patients with CRA was also built.
Conclusion: Four classification models for CRA prediction were developed and compared, and
the GBDT model showed the highest performance. Implementing a GBDT model for screening
can reduce the cost of time and money and help physicians identify high-risk groups for primary