Biomolecular-Level Event Detection: A New Representation of Generating Short Sentence and Sample Selection Strategy

Author(s): Yang Lu, Xiaolei Ma, Yinan Lu*, Zhili Pei.

Journal Name: Current Bioinformatics

Volume 14 , Issue 4 , 2019

Become EABM
Become Reviewer

Graphical Abstract:


Background: Biomolecular-level event extraction is one of the most important branches of information extraction. With the rapid growth of biomedical literature, it is difficult for researchers to manually obtain information of interest, e.g. unknown information of threatening human disease or some biological processes. Therefore, researchers are interested in automatically acquiring information of biomolecular-level events. However, the annotated biomolecular-level event corpus is limited and highly imbalanced, which affects the performance of the classification algorithms and can even lead to over-fitting.

Method: In this paper, a new approach using the Pairwise model and convolutional neural network for biomolecular-level event extraction is introduced. The method can identify more accurate positive instances from unlabeled data to enlarge the labeled data. First, unlabeled samples are categorized using the Pairwise model. Then, the shortest dependency path with additional information is generated. Furthermore, two input forms with a new representation of the convolutional neural network model, which are dependency word sequence and dependency relation sequence are presented. Finally, with the sample selection strategy, the expanded labeled samples from unlabeled domain corpus incrementally enlarge the training data to improve the performance of the classifier.

Result & Conclusion: Our proposed method achieved better performance than other excellent systems. This is due to our new representation of generated short sentence and proposed sample selection strategy, which greatly improved the accuracy of classification. The extensive experimental results indicate that the new method can effectively inculcate unlabeled data to improve the performance of classifier for biomolecular-level events extraction.

Keywords: Biomolecular-level event, protein complex event, short sentence generation, short sentence representation, sample selection strategy, word embedding.

Munkhdalai T, Li M, Kim T, et al. Bio Named Entity Recognition Based on Co-training Algorithm.International Conference on Advanced Information Networking and Applications Workshops. 2012 Mar 26-29; Japan IEEE . 1963. 1963.
Han S, Cai H, Che D, Zhang Y, Huang Y, Xie M. Metrical Consistency NMF for Predicting Gene-Phenotype Associations. Interdiscip Sci 2018; 10(1): 189-94.
Kim JD, Ohta T, Pyysalo S, et al. Overview of BioNLP’09 shared task on event extraction. Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task 2009; 1-9.
Kim JD, Wang Y, Takagi T, et al. Overview of genia event task in bionlp shared task 2011.
Kim JD, Wang Y, Yasunori Y. The genia event extraction shared task, 2013 edition-overview. Proceedings of the BioNLP Shared Task 2013 Workshop 2013; 8-15.
Hou WJ, Ceesay B. Event extraction for gene regulation network using syntactic and semantic approaches. International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems. 2015 June 10-12; Seoul, South Korea. Cham: Springer
Pham XQ, Le MQ, Ho BQ. A hybrid approach for biomedical event extraction. Proceedings of the BioNLP Shared Task 2013 Workshop 2013; 121-4.
Miwa M, Thompson P, Ananiadou S. Boosting automatic event extraction from the literature using domain adaptation and conference resolution. Bioinformatics 2012; 28(13): 1759-65.
Zhou D, Zhong D. A semi-supervised learning framework for biomedical event extraction based on hidden topics. Artif Intell Med 2015; 64(1): 51-8.
Björne J, Ginter F, Salakoski T. University of Turku in the BioNLP’11 shared task. BMC bioinformatics. BMC Bioinformatics 2012; 13(11): S4.
Hakala K, Van Landeghem S, Salakoski T, et al. EVEX in ST’13: Application of a large-scale text mining resource to event extraction and network construction. Proceedings of the BioNLP Shared Task 2013 Workshop 2013; 26-34.
Riedel S, McCallum A. Fast and robust joint models for biomedical event extraction. Proceedings of the Conference on Empirical Methods in Natural Language Processing 2011; 1-12.
Yang B, Mitchell T. Joint extraction of events and entities within a document con-text. arXiv preprint arXiv:1609.03632 2016.
Liu X, Bordes A, Grandvalet Y. Biomedical event extraction by multi-class classification of pairs of text entitiesBioNLP Shared Task 2013 Workshop 2013; 45-9.
Lu Y, Ma X, Lu Y, Zhou Y, Pei Z. A Novel Sample Selection Strategy for Imbalanced Data of Biomedical Event Extraction with Joint Scoring Mechanism. Comput Math Methods Med 2016; 2016(2): 7536494.
Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality. Adv Neural Inf Process Syst 2013; 2: 3111-9.
Pyysalo S, Ginter F, Moen H, Salakoski T, Ananiadou S. Distributional semantics resources for biomedical text processing.Proceedings of the 5th International Symposium on Languages in Biology and Medicine 2013 Tokyo, Japan. 2013; 2013: 39-43.
Wang J, Zhang J, An Y, et al. Biomedical event trigger detection by dependency-based word embedding. IEEE International Conference on Bioinformatics & Biomedicine 2015.
Mehryary F, Björne J, Pyysalo S, et al. Deep learning with minimal training data: TurkuNLP entry in the BioNLP shared task 2016. Proceedings of the 4th BioNLP Shared Task Workshop. 73-81.
Gu X, Gu Y, Wu H. Cascaded Convolutional Neural Networks for Aspect-Based Opinion Summary. Neural Process Lett 2017; 46(2): 1-14.
McClosky D, Surdeanu M, Manning CD. Event extraction as dependency parsing 2011.
Wei CH, Kao HY, Lu Z. PubTator: a web-based text mining tool for assisting biocura-tion. Nucleic Acids Res 2013; 41: 518-22.
Liu Y, Wei F, Li S, et al. A Dependency-Based Neural Network for Relation Classification. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 2015 July 26-31; Beijing, China: Association for Computational Linguistics
Kim Y. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882.
Sunil Sahu and, Ashish Anand. Evaluating distributed word representations for capturing semantics of biomedical concepts. Proceedings of BioNLP 2015; 15: 158-63.
Kingma D, Ba J. Adam: A method for stochastic optimization 2014. Available from: 1412.6980.
Munkhdalai T, Namsrai OE, Ryu K. Self-training in significance space of support vectors for imbalanced biomedical event data. BMC Bioinformatics 2015; 16(7): S6.
Li L, Liu S, Qin M, Wang Y, Huang D. Extracting biomedical event with dual decomposition integrating word embeddings. IEEE/ACM Trans Comput Biol Bioinformatics 2016; 13(4): 669-77.
Liu X, Bordes A, Grandvalet Y. Extracting biomedical events from pairs of text entities. BMC Bioinformatics 2015; 16(10): S8.

Rights & PermissionsPrintExport Cite as

Article Details

Year: 2019
Page: [359 - 370]
Pages: 12
DOI: 10.2174/1574893614666190204153531
Price: $58

Article Metrics

PDF: 51
PRC: 1