Background: The development of diagnostic decision support systems (DDSS) requires
having a reliable and consistent knowledge based on diseases and their symptoms, signs, and diagnostic
tests. Physicians are typically the source of this knowledge but it is not always possible to obtain all the
desired information from them. Other valuable sources are medical books and articles describing the
diagnosis of diseases, but again, extracting this information is a hard and time-consuming task.
Objective: In this paper we present the results of our research to compare two well-known tools that are
used to perform NLP in medical domain. In this context we have used these tools to perform the
operation of Name Entity Recognition to extract diagnostic terms from texts contained in MedLine Plus
Method: We have used Web scraping, natural language processing (NLP) techniques, a variety of
publicly available sources of diagnostic knowledge and two widely known medical concept identifiers,
MetaMap and cTAKES, to extract diagnostic criteria for infectious diseases from MedLine Plus articles.
Results: A performance comparison of MetaMap and cTAKES is presented being visible that although
the differences between both systems are not really significant there are some palpable differences in
the results provided by the system.
Conclusion: The extraction of diagnostic terms is a very important task for the creation of databases with
this information. The use of NLP systems capable of extraction, those terms from texts are very valuable
tools that need to be implemented and evaluated in order to obtain the maximum accuracy on this process.