MegaMiner: A Tool for Lead Identification Through Text Mining Using Chemoinformatics Tools and Cloud Computing Environment
Virtual screening is an indispensable tool to cope with the massive amount of data being tossed by the high
throughput omics technologies. With the objective of enhancing the automation capability of virtual screening process a
robust portal termed MegaMiner has been built using the cloud computing platform wherein the user submits a text query
and directly accesses the proposed lead molecules along with their drug-like, lead-like and docking scores. Textual
chemical structural data representation is fraught with ambiguity in the absence of a global identifier. We have used a
combination of statistical models, chemical dictionary and regular expression for building a disease specific dictionary. To
demonstrate the effectiveness of this approach, a case study on malaria has been carried out in the present work.
MegaMiner offered superior results compared to other text mining search engines, as established by F score analysis. A
single query term 'malaria' in the portlet led to retrieval of related PubMed records, protein classes, drug classes and 8000
scaffolds which were internally processed and filtered to suggest new molecules as potential anti-malarials. The results
obtained were validated by docking the virtual molecules into relevant protein targets. It is hoped that MegaMiner will
serve as an indispensable tool for not only identifying hidden relationships between various biological and chemical
entities but also for building better corpus and ontologies.
Keywords: Chemoinformatics, cloud computing, malaria, text mining, virtual screening.
Rights & PermissionsPrintExport