Background: Sepsis is a life-threatening disease caused by the dysregulated host
response to the infection and the major cause of death of patients in the intensive care unit (ICU).
Objective: Early diagnosis of sepsis could significantly reduce in-hospital mortality. Though
generated from infection, the development of sepsis follows its own psychological process and
disciplines, alters with gender, health status and other factors. Hence, the analysis of mass data by
bioinformatics tools and machine learning is a promising method for exploring early diagnosis.
Methods: We collected miRNA and mRNA expression data of sepsis blood samples from Gene
Expression Omnibus (GEO) and ArrayExpress databases, screened out differentially expressed
genes (DEGs) by R software, predicted miRNA targets on TargetScanHuman and miRTarBase
websites, conducted Gene Ontology (GO) term and KEGG pathway enrichment analysis based on
overlapping DEGs. The STRING database and Cytoscape were used to build protein-protein
interaction (PPI) network and predict hub genes. Then we constructed a Random Forest model by
using the hub genes to assess sample type.
Results: Bioinformatic analysis of GEO dataset revealed 46 overlapping DEGs in sepsis. The PPI
network analysis identified five hub genes, SOCS3, KBTBD6, FBXL5, FEM1C and WSB1.
Random Forest model based on these five hub genes was used to assess GSE95233 and GSE95233
datasets, and the area under the curve (AUC) of ROC was 0.900 and 0.7988, respectively, which
confirmed the efficacy of this model.
Conclusion: The integrated analysis of gene expression in sepsis and the effective Random Forest
model built in this study may provide promising diagnostic methods for sepsis.