Background: Francisella tularensis is a stealth pathogen fatal for animals and humans.
Ease of its propagation, coupled with high capacity for ailment and death makes it a potential
candidate for biological weapon.
Objective: Work related to the pathogen’s classification and factors affecting its prolonged
existence in soil is limited to statistical measures. Machine learning other than conventional
analysis methods may be applied to better predict epidemiological modeling for this soil-borne
Methods: Feature-ranking algorithms namely; relief, correlation and oneR are used for soil
attribute ranking. Moreover, classification algorithms; SVM, random forest, naive bayes, logistic
regression and MLP are used for classification of the soil attribute dataset for Francisella
tularensis positive and negative soils.
Results: Feature-ranking methods concluded that clay, nitrogen, organic matter, soluble salts, zinc,
silt and nickel are the most significant attributes while potassium, phosphorous, iron, calcium,
copper, chromium and sand are the least contributing risk factors for the persistence of the
pathogen. However, clay is the most significant and potassium is the least contributing attribute.
Data analysis suggests that feature-ranking using relief produced classification accuracy of 84.35%
for multilayer perceptron; 82.99% for linear regression; 80.27% for SVM and random forest; and
78.23% for naive bayes, which is better than other ranking methods. MLP outperforms other
classifiers by generating an accuracy of 84.35%, 82.99% and 81.63% for feature-ranking using
relief, correlation and oneR algorithms, respectively.
Conclusion: These models can significantly improve accuracy and can minimize the risk of
incorrect classification. They further help in controlling epidemics and thereby minimizing the
socio-economic impact on the society.