Datasets for Explainable Depression Detection on Twitter Aided by Metaphor Concept Mappings

doi:10.5281/zenodo.7095100

Published September 20, 2022 | Version v1

Dataset Open

Datasets for Explainable Depression Detection on Twitter Aided by Metaphor Concept Mappings

1. Nanyang Technological University

Introduction

The datasets for training and evaluating our model for explainable depression detection on Twitter aided by metaphor concept mappings proposed in the following paper:

Sooji, Han, Rui Mao, and Erik Cambria. "Hierarchical Attention Network for Explainable Depression Detection on Twitter Aided by Metaphor Concept Mappings." In Proceedings of the 29th International Conference on Computational Linguistics (COLING), 2022. in press

Source code for our model is available at github.com/soojihan/HAN.

These datasets were generated using the dataset proposed in Shen et al., 2017. The original dataset is available at github.com/sunlightsgy/MDDL.

Description

There are three datasets.

1. mdl_HAN: This dataset contains tweets and metaphor concept mappings (MCMs) for 5,899 positive (i.e. depressed) and 4,469 negative users. Tweets in this dataset are extracted from the original MDDL dataset (Shen et al., 2017). MCMs were extracted using MetaPro (Mao et al., 2022). Please refer to our paper for more details.The name of each subfolder under the 'positive' and 'negative' subfolders is tweet userid. Each user's folder contains one or two json files:

[userid].json: This json file contains tweet text objects, each of which is represented by [timestamp, tweet text].
[userid]_cm.json: This json file contains MCMs, each of which is represented by [timestamp, MCM]

Note that some users do not have MCMs. There's no [userid]_cm.json file in such users' folders.

2. imdl_HAN: This dataset has the same contents and structure as mdl_HAN except that explicit linguistic cues for depression (i.e., “I’m/I was/I am/I’ve been diagnosed depression” and words containing “depress”, “diagnos”, “anxiety”, “bipolar” and “disorder”) were removed from all tweets.

3. sampled_training_eval_data: This dataset contains 5 randomly sampled cross-validation sets. Each of the five set contains train.csv, test,csv and dev.csv. Each csv file contains user ids.

Remarks

If you use the datasets, please cite our paper:

Sooji, Han, Rui Mao, and Erik Cambria. "Hierarchical Attention Network for Explainable Depression Detection on Twitter Aided by Metaphor Concept Mappings." In Proceedings of the 29th International Conference on Computational Linguistics (COLING), 2022. in press

Contact

If you have any questions about the datasets or source code for our model, please contact Sooji Han.

References

Shen, Guangyao, Jia Jia, Liqiang Nie, Fuli Feng, Cunjun Zhang, Tianrui Hu, Tat-Seng Chua, and Wenwu Zhu. "Depression detection via harvesting social media: A multimodal dictionary learning solution." In IJCAI, pp. 3838-3844. 2017.

Rui Mao, Xiao Li, Mengshi Ge, and Erik Cambria. 2022. MetaPro: A computational metaphor pro- cessing model for text pre-processing. Information Fusion, 86-87:30–43.

Files

Files (62.3 MB)

Name	Size	Download all
imdl_HAN.tar.gz md5:e5925c67ebc874e094f909ec2e5a642a	31.0 MB	Download
mdl_HAN.tar.gz md5:f418e87176bff292113742468d28b6df	31.1 MB	Download
sampled_training_eval_data.tar.gz md5:7bf56504c4b51de05ca43d8184b247c6	128.1 kB	Download

	All versions	This version
Views	313	310
Downloads	86	86
Data volume	3.0 GB	3.0 GB

Datasets for Explainable Depression Detection on Twitter Aided by Metaphor Concept Mappings

Creators

Description

Files

Files (62.3 MB)