Dataset Open Access

eDiseases Dataset

Carrillo-de-Albornoz, Jorge; Rodríguez-Vidal, Javier; Plaza, Laura

The eDiseases dataset contains patient data from the MedHelp health site (http://www.medhelp.org/), where different communities share information and opinions about diseases. Each community consists of a number of conversations; a conversation being a sequence of comments posted by patients.

To build the dataset, we automatically extracted 10 conversations from each of the following three communities: allergies, crohn and breast cancer. We selected a set of diseases that, according to medical expert, show high heterogeneity concerning both the degree of medical understanding of the diseases and the profile of the users. The conversations were selected randomly, but we automatically filtered out conversations with less than 10 posts. In total, we extracted 146 posts for allergies, 191 posts for crohn, and 142 posts for breast cancer; which include 983 sentences for allergies, 1780 sentences for crohn, and 1029 sentences for breast cancer, covering a 6 years time interval. Three frequent users of health forums annotated each sentence in the dataset as:

Factuality: OPINION, FACT, EXPERIENCE.
Polarity: POSITIVE, NEUTRAL, NEGATIVE.

In case of doubt, the annotators labeled the sentence as NOT_LABELED. As a result, we collected 967 labeled sentences for allergies, 1,709 labeled sentences 294 for crohn, and 959 labeled sentences for breast cancer.

Files (105.5 kB)
Name Size
eDiseases-dataset-V2.0.rar
md5:29ea0e06b236a4c75e5deb451e4c6aaf
105.5 kB Download
745
235
views
downloads
All versions This version
Views 745744
Downloads 235234
Data volume 24.8 MB24.7 MB
Unique views 701700
Unique downloads 227226

Share

Cite as