Designing a Chat-bot for College Information using Information Retrieval and Automatic Text Summarization Techniques

Radha       Guha

Abstract

Background: In the era of information overload it is very difficult for a human reader to make sense of the vast information available on the internet quickly. Even for a specific domain like a college or university website, it may be difficult for a user to browse through all the links to quickly get the relevant answers.

Objective: In this scenario, the design of a chat-bot which can answer questions related to college information and compare between colleges will be very useful and novel.

Methods: In this paper, a novel conversational interface chat-bot application with information retrieval and text summarization skill is designed and implemented. Firstly, this chat-bot has a simple dialog skill; when it can understand the user query intent, it responds from the stored collection of answers. Secondly, for unknown queries, this chat-bot can search the internet, and then perform text summarization using advanced techniques of natural language processing (NLP) and text mining (TM).

Results: The advancement of NLP capability of information retrieval and text summarization using machine learning techniques of Latent Semantic Analysis (LSI), Latent Dirichlet Allocation (LDA), Word2Vec, Global Vector (GloVe) and TextRank is reviewed and compared in this paper first before implementing them for the chat-bot design. This chat-bot improves user experience tremendously by getting answers to specific queries concisely which takes less time than to read the entire document. Students, parents and faculty can get the answers for a variety of information like admission criteria, fees, course offerings, notice board, attendance, grades, placements, faculty profile, research papers, patents, etc. more efficiently.

Conclusion: The purpose of this paper was to follow the advancement in NLP technologies and implement them in a novel application.

Keywords: Chat-bot, natural language processing, text mining, information retrieval, text summarization, topic modeling, latent semantic analysis, latent dirichlet allocation, word2vec, GloVe, word embedding, textrank.

Graphical Abstract

[1] 
R. Feldman,  and J. Sanger, The text mining handbook: advanced approaches in analyzing unstructured data., Cambridge University Press: New York, NY, 2007.
[2] 
R. High, The era of cognitive systems: an inside look at ibm watson and how it works., IBM Corporation, 2012.
[3] 
J. Elder, G. Miner,  and R. Nisbet, Practical text mining and statistical analysis for non-structured text data applications., Elsevier, 2012.
[4] 
M.W. Berry, Survey of text mining: clustering, classification and retrieval., Springer, 2007.
[5] 
R. Collobert,  and J. Weston, "a unified architecture for natural language processing: deep neural networks with multitask learning", proceedings of the 25th international conference on machine learning., 2008, pp. 160-167  Helsinki, Finland
[6] 
Y. Bengio, R. Ducharme, P. Vincent,  and C. Jauvin, "A neural probabilistic language model", Journal of MLR., vol. 3, pp. 1137-1155, 2003.
[7] 
A.M. Turing, "Computing Machinery and Intelligence", Mind, pp. 433-460, 1950.
[http://dx.doi.org/10.1093/mind/LIX.236.433] 
[8] 
L. Bradesko,  and D. Mladenic, "A Survey of chatbot systems through a loebner prize competition", proceedings of slovenian language technologies society eighth conference of language technologies, 2012, pp. 34-37 
[9] 
S. Deerwester, S. Dumais, G. Furnas, T. Landauer,  and R. Harshman, "Indexing by latent semantic analysis", J. of JASIS, 1990.
[http://dx.doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9] 
[10] 
D. Blei, A. Ng,  and M. Jordan, "Latent Dirichlet Allocation", J. MLR, vol. 3, pp. 993-1022, 2003.
[11] 
D. Blei, Griffiths, Jordan M., and Tannenbaum J., Hierarchical topic models and the nested chinese restaurant process. Advances in Neural Information Processing Syst., MIT Press: Cambridge, MA, 2004.
[12] 
M. Brett, Topic modeling: a basic introduction., J. JDH, 2012.
[13] 
T.L. Griffiths,  and M. Steyvers, "Finding scientific topics", Proc. Natl. Acad. Sci. USA, vol. 101, suppl. Suppl. 1, pp. 5228-5235, 2004.
[http://dx.doi.org/10.1073/pnas.0307752101] [PMID: 14872004] 
[14] 
G. Radha, "Exploring the field of text mining", J. IJCA, vol. Vol. 975, 2017.
[15] 
G. Radha, Exploring information retrieval by latent semantic and latent dirichlet allocation techniques., J. IRJCS, 2020.
[16] 
G. Radha, Impact of artificial intelligence and natural language processing on programming and software engineering., J. IRJCS, 2020.
[17] 
Y. Li, M. David, B. Zuhair, D. James, D. O’Shea,  and C. Keel, "Sentence similarity based on semantic nets and corpus statistics", IEEE Trans. Knowl. Data Eng., vol. 18, pp. 1138-1150, 2006.
[http://dx.doi.org/10.1109/TKDE.2006.130] 
[18] 
T. Mikolov, I. Sutskever, K. Chen, G. Corrado,  and J. Dean, "Distributed representations of words and phrases and their compositionality", Adv. Neural Inf. Process. Syst., pp. 3111-3119, 2013.
[19] 
T. Mikolov, V. Quoc,  and I. Sutskever, "Exploiting similarities among languages for machine translation", arXiv:1309 4168 [CS CL], 2013.
[20] 
J. Pennington, R. Socher,  and Manning C., "Glove: global vector for word representation", Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014pp. 1532-43 
[http://dx.doi.org/10.3115/v1/D14-1162] 
[21] 
D.E. Rumelhart, E.H. Geoffrey,  and R.J. Williams, "Learning representations by back- propagating errors", Nature, vol. 323, no. 6088, pp. 533-536, 1986.
[http://dx.doi.org/10.1038/323533a0] 
[22] 
T. Zhu,  and L. Kan, The similarity measure based on lda for automatic summarization. Elsevier.. IWIEE, 2012.
[23] 
C.Y. Lin, ROUGE: A Package for automatic evaluation of summaries. proceedings of the workshop on text summarization branches out., Barcelona, Spain, 2004.
[24] 
H.P. Luhn, "The automatic creation of literature abstracts", IBM J. Res. Develop., vol. 2, no. 2, pp. 159-165, 1958.
[http://dx.doi.org/10.1147/rd.22.0159] 
[25] 
T. Dunning, "Accurate methods for the statistics of surprise and coincidence", Comput. Linguist., vol. 9, no. 1, pp. 61-74, 1993.
[26] 
H.P. Edmundson, "New methods in automatic extracting", J. Assoc. Comput. Mach., 1969.
[http://dx.doi.org/10.1145/321510.321519] 
[27] 
R. Mihalcea, "Text rank - bringing order into texts", Proceedings of the conference on empirical methods in natural language processing (EMNLP 2004), 2004 
[28] 
C.Y. Lin, ROUGE: A package for automatic evaluation of summaries. proceedings of the workshop on text summari-zation branches out., Barcelona, Spain, 2004.

Rights & Permissions Print Cite

Article Metrics

12

1

DOI https://dx.doi.org/10.2174/2665997201999201022191540	Print ISSN 2665-9972
Publisher Name Bentham Science Publisher	Online ISSN 2665-9964

Current Chinese Computer Science

Designing a Chat-bot for College Information using Information Retrieval and Automatic Text Summarization Techniques

Abstract

Graphical Abstract

Related Journals

Related Books