Background: Mitochondria are membrane bound structures found in most eukaryotic cells.The most prominent function of this essential organelle is the generation of ATP and the regulation of cell metabolism. However, being a vital part of the cell, mitochondrial dysfunction has been associated to many diseases due to its influence on cellular metabolism. A range of disorders and diseases have been reported as a result of damage and dysfunction in mitochondria which include cancer, diabetes mellitus and neurodegenerative diseases that affect millions of people worldwide. This has made mitochondrial processes an attractive and novel target for potential therapeutic intervention. The application of cheminformatics tools has made possible prioritization and in-depth understanding of small molecules with mitochondrial phenotypes at a much faster rate and reduced cost compared to traditional high-throughput screening.
Results: We have used a publicly available dataset of inhibitors of mitochondrial fusion to build accurate predictive cheminformatics models. We have used the machine learning based classification algorithms and further enhanced this approach using a maximum common substructure (MCS) approach. Three classification algorithms, namely Naive Bayes, Random forest and J48 were used in the present study. Random forest based model was found to be the most accurate, with an accuracy of about 80%. As a proof of application, themodel was further used to prioritize a subset of drug like molecules from a large chemical library, ZINC as well as used to annotate potential new mechanisms of action of molecules with anti-cancer activities.
Conclusions: We show that machine learning approaches could be effectively used to build highly accurate classification models for high-throughput screen datasets. We show as proof of concept that such models could be used to screen and prioritize large datasets in silico, for further experimental validation and also assign potential mechanism of action for molecules.