Datasets for Explainable Depression Detection on Twitter Aided by Metaphor Concept Mappings
Description
Introduction
The datasets for training and evaluating our model for explainable depression detection on Twitter aided by metaphor concept mappings proposed in the following paper:
Sooji, Han, Rui Mao, and Erik Cambria. "Hierarchical Attention Network for Explainable Depression Detection on Twitter Aided by Metaphor Concept Mappings." In Proceedings of the 29th International Conference on Computational Linguistics (COLING), 2022. in press
Source code for our model is available at github.com/soojihan/HAN.
These datasets were generated using the dataset proposed in Shen et al., 2017. The original dataset is available at github.com/sunlightsgy/MDDL.
Description
There are three datasets.
1. mdl_HAN: This dataset contains tweets and metaphor concept mappings (MCMs) for 5,899 positive (i.e. depressed) and 4,469 negative users. Tweets in this dataset are extracted from the original MDDL dataset (Shen et al., 2017). MCMs were extracted using MetaPro (Mao et al., 2022). Please refer to our paper for more details.The name of each subfolder under the 'positive' and 'negative' subfolders is tweet userid. Each user's folder contains one or two json files:
- [userid].json: This json file contains tweet text objects, each of which is represented by [timestamp, tweet text].
- [userid]_cm.json: This json file contains MCMs, each of which is represented by [timestamp, MCM]
Note that some users do not have MCMs. There's no [userid]_cm.json file in such users' folders.
2. imdl_HAN: This dataset has the same contents and structure as mdl_HAN except that explicit linguistic cues for depression (i.e., “I’m/I was/I am/I’ve been diagnosed depression” and words containing “depress”, “diagnos”, “anxiety”, “bipolar” and “disorder”) were removed from all tweets.
3. sampled_training_eval_data: This dataset contains 5 randomly sampled cross-validation sets. Each of the five set contains train.csv, test,csv and dev.csv. Each csv file contains user ids.
Remarks
If you use the datasets, please cite our paper:
Sooji, Han, Rui Mao, and Erik Cambria. "Hierarchical Attention Network for Explainable Depression Detection on Twitter Aided by Metaphor Concept Mappings." In Proceedings of the 29th International Conference on Computational Linguistics (COLING), 2022. in press
Contact
If you have any questions about the datasets or source code for our model, please contact Sooji Han.
References
Shen, Guangyao, Jia Jia, Liqiang Nie, Fuli Feng, Cunjun Zhang, Tianrui Hu, Tat-Seng Chua, and Wenwu Zhu. "Depression detection via harvesting social media: A multimodal dictionary learning solution." In IJCAI, pp. 3838-3844. 2017.
Rui Mao, Xiao Li, Mengshi Ge, and Erik Cambria. 2022. MetaPro: A computational metaphor pro- cessing model for text pre-processing. Information Fusion, 86-87:30–43.
Files
Files
(62.3 MB)
Name | Size | Download all |
---|---|---|
md5:e5925c67ebc874e094f909ec2e5a642a
|
31.0 MB | Download |
md5:f418e87176bff292113742468d28b6df
|
31.1 MB | Download |
md5:7bf56504c4b51de05ca43d8184b247c6
|
128.1 kB | Download |