Generic placeholder image

Recent Patents on Engineering

Editor-in-Chief

ISSN (Print): 1872-2121
ISSN (Online): 2212-4047

Research Article

An Investigation of Data Requirements for the Detection of Depression from Social Media Posts

Author(s): Sumit Dalal*, Sarika Jain and Mayank Dave

Volume 17, Issue 3, 2023

Published on: 30 September, 2022

Article ID: e120822207438 Pages: 13

DOI: 10.2174/1872212117666220812110956

Price: $65

Abstract

Background: Only a fraction of the produced social media data is usable in mental health assessment. So the problem of sufficient training data for deep learning approaches arises. Data sufficiency can be presented in terms of number of users or the number of posts per user.

Objective: We examine the data need of machine learning and deep learning models for a practical system and let researcher choose best fitting models depending on the dataset type available with them. We perform distinct experiments to find the effect of these issues on depression classification by various approaches.

Methods: We explored various machine learning and deep learning techniques on various data set versions, taken from Twitter and Reddit, with varying numbers of users and posts per user. Diagnosed and control users are taken in different ratios to assess the impact of an imbalanced dataset.

Results: The results reveal that SVM achieved 68% accuracy in depression classification for 70 users each from diagnosed and control group. It decreases for 150 users from each group, but then regains performance for 350 and 550 users from each group. Whereas Naive Bayes got 64% for the same dataset fragment (1). We observed that accuracy decreases for 150 diagnosed users, but then regains performance for 350 and 550 users. However from deep learning algorithms, HAN and BiLSTM perform better, compared to other algorithms, as the imbalance ratio increases.

Conclusion: We found, mainly, that classification accuracy increases with the number of users, number of posts per user and imbalance in the number of diagnosed versus control users. We also found that posts from Reddit have better accuracy compared to tweets.

Keywords: Mental health, neural network, depression, word embedding, machine learning, psycholinguistic.

Graphical Abstract
[1]
Committed to connecting the world-Statistics. Available from: https://www.itu.int/en/ITU-D/Statistics/Pages/stat/default.aspx
[2]
Wikipedia. Available from: https://en.wikipedia.org/wiki/
[3]
V. Sahayak, V. Shete, and A. Pathan, "Sentiment analysis on twitter data", Inter. J. Innov. Res. Adv. Eng., vol. 2, no. 1, pp. 178-183, 2015.
[4]
Z. Wang, V. Joo, C. Tong, X. Xin, and H.C. Chin, "Anomaly detection through enhanced sentiment analysis on social media data", In: 2014 IEEE 6th international conference on cloud computing technology and science, 15-18 December 2014, Singapore, IEEE, 2014, pp. 917-922.
[http://dx.doi.org/10.1109/CloudCom.2014.69]
[5]
Y.W. Syaifudin, and D. Puspitasari, "Twitter data mining for sentiment analysis on people’s feedback against government public policy", Inter. J. Sci. Technol., vol. 3, no. 1, pp. 110-122, 2017.
[6]
T.H. Rashidi, A. Abbasi, M. Maghrebi, S. Hasan, and T.S. Waller, "Exploring the capacity of social media data for modelling travel behaviour: Opportunities and challenges", Transp. Res., Part C Emerg. Technol., vol. 75, pp. 197-211, 2017.
[http://dx.doi.org/10.1016/j.trc.2016.12.008]
[7]
A. Abbasi, T.H. Rashidi, M. Maghrebi, and S.T. Waller, "Utilising location based social media in travel survey methods: bringing Twitter data into the play", In Proceedings of the 8th ACM SIGSPATIAL international workshop on location-based social networks, 2015, pp. 1-9
[http://dx.doi.org/10.1145/2830657.2830660]
[8]
A. Nickels, and V. Dimov, "Innovations in technology: Social media and mobile technology in the care of adolescents with asthma", Curr. Allergy Asthma Rep., vol. 12, no. 6, pp. 607-612, 2012.
[http://dx.doi.org/10.1007/s11882-012-0299-7] [PMID: 22976493]
[9]
S. Harous, M. El Menshawy, M.A. Serhani, and A. Benharref, "Mobile health architecture for obesity management using sensory and social data", Inform. Med. Unlocked, vol. 10, pp. 27-44, 2018.
[http://dx.doi.org/10.1016/j.imu.2017.12.005]
[10]
Depression and Other Common Mental Disorders: Global Health Estimates. Geneva: World Health Organization; 2017. Available from: https://apps.who.int/iris/bitstream/handle/10665/254610/WHO-MSD-MER-2017.2-eng.pdf
[11]
The World Health Report 2001: Mental Disorders affect one in four people. Available from: https://www.who.int/news/item/28-09-2001-the-world-health-report-2001-mental-disorders-affect-one-in-four-people
[13]
Office of National Statistics. Available from: https://www.ons.gov.uk/
[14]
B.M. Althouse, J.P. Allem, M.A. Childers, M. Dredze, and J.W. Ayers, "Population health concerns during the United States’ Great recession", Am. J. Prev. Med., vol. 46, no. 2, pp. 166-170, 2014.
[http://dx.doi.org/10.1016/j.amepre.2013.10.008] [PMID: 24439350]
[15]
S. Dalal, S. Jain, and M. Dave, "A systematic review of smart mental healthcare", Proceedings of the 5th International Conference on Cyber Security & Privacy in Communication Networks (ICCS), National Institute of Technology, Kurukshetra, India, 2019.
[http://dx.doi.org/10.2139/ssrn.3511013]
[16]
S. Dalal, and S. Jain, "Smart mental healthcare systems", In: Web Semantics., Academic Press, 2021, pp. 153-163.
[http://dx.doi.org/10.1016/B978-0-12-822468-7.00010-9]
[17]
R. Salas-Zárate, G. Alor-Hernández, M.D.P. Salas-Zárate, M.A. Paredes-Valverde, M. Bustos-López, and J.L. Sánchez-Cervantes, "Detecting depression signs on social media: A systematic literature review", Health Care, vol. 10, no. 2, p. 291, 2022.
[http://dx.doi.org/10.3390/healthcare10020291]
[18]
Y. Xue, Q. Li, L. Jin, L. Feng, D.A. Clifton, and G.D. Clifford, "Detecting adolescent psychological pressures from micro-blog", In: Y. Zhang, G. Yao, J. He, L. Wang, N.R. Smalheiser, X. Yin, Eds., Health Information Science. HIS 2014., Lecture Notes in Computer Science, vol. 8423. Springer, Cham, 2014.
[http://dx.doi.org/10.1007/978-3-319-06269-3_10]
[19]
S. Hassanpour, N. Tomita, T. DeLise, B. Crosier, and L.A. Marsch, "Identifying substance use risk based on deep neural networks and Instagram social media data", Neuropsychopharmacology, vol. 44, no. 3, pp. 487-494, 2019.
[http://dx.doi.org/10.1038/s41386-018-0247-x] [PMID: 30356094]
[20]
H. Lin, J. Jia, Q. Guo, Y. Xue, J. Huang, L. Cai, and L. Feng, "Psychological stress detection from cross-media microblog data using deep sparse neural network", 2014 IEEE International Conference on Multimedia and Expo (ICME), pp. 1-6, 2014.
[http://dx.doi.org/10.1109/ICME.2014.6890213]
[21]
H. Lin, J. Jia, Q. Guo, Y. Xue, Q. Li, and J. Huang, User-level psychological stress detection from social media using deep neural network. Proceedings of the 22nd ACM International Conference on Multimedia, 2014, pp. 507-516.
[http://dx.doi.org/10.1145/2647868.2654945]
[22]
G. Gkotsis, A. Oellrich, S. Velupillai, M. Liakata, T.J. Hubbard, R.J. Dobson, and R. Dutta, "Characterisation of mental health conditions in social media using informed deep learning", Sci. Rep., vol. 7, no. 1, pp. 1-11, 2017.
[http://dx.doi.org/10.1038/srep45141] [PMID: 28127051]
[23]
M. Trotzek, S. Koitka, and C.M. Friedrich, "Utilizing neural networks and linguistic metadata for early detection of depression indications in text sequences", IEEE Trans. Knowl. Data Eng., vol. 32, no. 3, pp. 588-601, 2018.
[http://dx.doi.org/10.1109/TKDE.2018.2885515]
[24]
M. Trotzek, S. Koitka, and C.M. Friedrich, Word embeddings and linguistic metadata at the CLEF 2018 tasks for early detection of depression and anorexia., CLEF, 2018.
[25]
F. Sadeque, D. Xu, and S. Bethard, "UArizona at the CLEF eRisk 2017 pilot task: Linear and recurrent models for early depression detection", CEUR Workshop Proc., 2017.
[26]
M. Trotzek, S. Koitka, and C.M. Friedrich, "Linguistic metadata augmented classifiers at the CLEF 2017 Task for early detection of depression", CLEF, 2017. Available from: https://studylib.net/doc/25354459/linguistic-metadata-augmented-classifiers-at-the-clef-201
[27]
A.H. Uddin, D. Bapery, and A.S.M. Arif, "Depression analysis from social media data in bangla language using Long Short Term Memory (LSTM) recurrent neural network technique", In: 2019 International Conference on Computer, Communication, Chemical, Materials and Electronic Engineering (IC4ME2), 2019, pp. 1-4.
[28]
H. Ahmad, M.Z. Asghar, F.M. Alotaibi, and I.A. Hameed, "Applying deep learning technique for depression classification in social media text", J. Med. Imaging Health Inform., vol. 10, no. 10, pp. 2446-2451, 2020.
[http://dx.doi.org/10.1166/jmihi.2020.3169]
[29]
L. Ren, H. Lin, B. Xu, S. Zhang, L. Yang, and S. Sun, "Depression detection on reddit with an emotion-based attention network: Algorithm development and validation", JMIR Med. Inform., vol. 9, no. 7, p. e28754, 2021.
[http://dx.doi.org/10.2196/28754] [PMID: 34269683]
[30]
A. Benton, M. Mitchell, and D. Hovy, "Multi-task learning for mental health using social media text", arXiv preprint arXiv: 1712.03538, 2017.
[31]
A.H. Orabi, P. Buddhitha, M.H. Orabi, and D. Inkpen, "Deep learning for depression detection of twitter users", Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic, pp. 88-97, 2018.
[http://dx.doi.org/10.18653/v1/W18-0609]
[32]
P.K. Gamaarachchige, and D. Inkpen, "Multi-task, multi-channel, multi-input learning for mental illness detection using social media text", Proceedings of the Tenth International Workshop on Health Text Mining and Information Analysis (LOUHI 2019), pp. 54-64, 2019.
[http://dx.doi.org/10.18653/v1/D19-6208]
[33]
R. Masood, Adapting models for the case of early risk prediction on the internet
In: K, Azzopardi, B. Stein, N. Fuhr, P. Mayr, C. Hauff, D. Hiemstra, (eds) Advances in Information Retrieval. ECIR 2019. Lecture Notes in Computer Science, vol 11438. Springer, Cham 2019. [http://dx.doi.org/10.1007/978-3-030-15719-7_48]
[34]
Q. Cong, Z. Feng, F. Li, Y. Xiang, G. Rao, and C. Tao, "XA-Bi-LSTM: A deep learning approach for depression detection in imbalanced data", 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1624-1627, 2018.
[http://dx.doi.org/10.1109/BIBM.2018.8621230]
[35]
J. Ive, G. Gkotsis, R. Dutta, R. Stewart, and S. Velupillai, "Hierarchical neural model with attention mechanisms for the classification of social media text related to mental health", Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic, pp. 69-77, 2018.
[http://dx.doi.org/10.18653/v1/W18-0607]
[36]
I. Sekulic, and M. Strube, "Adapting deep learning methods for mental health prediction on social media", arXiv preprint arXiv: 2003.07634., 2020.
[37]
A.S. Uban, and P. Rosso, "Deep learning architectures and strategies for early detection of self-harm and depression level prediction", CEUR Workshop Proc., vol. 2696, pp. 1-12, 2020.
[38]
H. Zogan, X. Wang, S. Jameel, and G. Xu, "Depression detection with multi-modalities using a hybrid deep learning model on social media", arXiv preprint arXiv: 2007.02847., 2020.
[39]
A. Wongkoblap, M.A. Vadillo, and V. Curcin, "Deep learning with anaphora resolution for the detection of tweeters with depression: Algorithm development and validation study", JMIR Ment. Health, vol. 8, no. 8, p. e19824, 2021.
[http://dx.doi.org/10.2196/19824] [PMID: 34383688]
[40]
G. Coppersmith, M. Dredze, and C. Harman, "Quantifying mental health signals in Twitter", In: Proceedings of the workshop on computational linguistics and clinical psychology: From linguistic signal to clinical reality, 2014, pp. 51-60.
[http://dx.doi.org/10.3115/v1/W14-3207]
[41]
D.E. Losada, F. Crestani, and J. Parapar, "Overview of eRisk: early risk prediction on the internet", International conference of the cross-language evaluation forum for european languages, CLEF 2018. Lecture Notes in Computer Science, vol 11018. Springer, Cham, 2018.
[http://dx.doi.org/10.1007/978-3-319-98932-7_30]
[42]
D.E. Losada, and F. Crestani, "A test collection for research on depression and language use", International Conference of the Cross-Language Evaluation Forum for European Languages, pp. 28-39, 2016.
[http://dx.doi.org/10.1007/978-3-319-44564-9_3]
[43]
S. Dalal, S. Jain, and M. Dave, (2022, December). "Early Depression Detection Using Textual Cues From Social Data A Research Agenda", In Proceedings of the International Health Informatics Conference (IHIC).. (Accepted)

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy