Title:Deep Learning Model for Pathogen Classification Using Feature Fusion and Data Augmentation
VOLUME: 16 ISSUE: 3
Author(s):Fareed Ahmad*, Amjad Farooq and Muhammad Usman Ghani Khan
Affiliation:Department of Computer Science and Engineering, Faculty of Electrical Engineering, University of Engineering and Technology, Lahore, Department of Computer Science and Engineering, Faculty of Electrical Engineering, University of Engineering and Technology, Lahore, Department of Computer Science and Engineering, Faculty of Electrical Engineering, University of Engineering and Technology, Lahore
Keywords:Pathogen classification, augmentation, feature fusion, deep learning models, fine-tuning, transfer learning, zoonotic
diseases, disease outbreaks.
Abstract:Background: Bacterial pathogens are deadly for animals and humans. The ease of their
dissemination, coupled with their high capacity for ailments and death in infected individuals, makes
them a threat to society.
Objective: Due to the high similarity among genera and species of pathogens, it is sometimes
difficult for microbiologists to differentiate between them. Their automatic classification using deeplearning
models can help in gaining reliable and accurate outcomes.
Methods: Deep-learning models, namely; AlexNet, GoogleNet, ResNet101, and InceptionV3 are used
with numerous variations including training model from scratch, fine-tuning without pre-trained
weights, fine-tuning along with freezing weights of initial layers, fine-tuning along with adjusting
weights of all layers and augmenting the dataset by random translation and reflection. Moreover, as the
dataset is small, fine-tuning and data augmentation strategies are applied to avoid overfitting and
produce a generalized model. A merged feature vector is produced using two best-performing models
and accuracy is calculated by xgboost algorithm on the feature vector by applying cross-validation.
Results: Fine-tuned models where augmentation is applied produces the best results. Out of these,
two-best-performing deep models i.e. (ResNet101, and InceptionV3) selected for feature fusion,
produced a similar validation accuracy of 95.83 with a loss of 0.0213 and 0.1066, and testing
accuracy of 97.92 and 93.75, respectively. The proposed model used xgboost to attain a
classification accuracy of 98.17% by using 35-folds cross-validation.
Conclusion: The automatic classification using these models can help experts in the correct
identification of pathogens. Consequently, they can help in controlling epidemics and thereby
minimizing the socio-economic impact on the community.