There is a newer version of the record available.

Published April 10, 2019 | Version v1
Conference paper Open

RedDust: a Large Reusable Dataset of Reddit User Traits

  • 1. Max Planck Institute for Informatics, Saarbr ̈ucken, Germany

Description

Social media is a rich source of assertions about personal attributes,
such as ``I am a doctor'' or ``my hobby is playing tennis.''
Precisely identifying explicit assertions is difficult, though,
because of the users' highly varied vocabulary and language expressions.
Identifying implicit assertions like ``I've been at work treating patients all day'' is even more challenging.
We present RedDust data resource consisting of personal attribute labels for over 300k Reddit users across five predicates: profession, hobby, family status, age, and gender.
We construct RedDust using a diverse set of high-precision patterns
To the best of our best knowledge, RedDust is the first semantic data resource
about Reddit users at large scale. We envision further use cases of RedDust for providing background
knowledge about user traits, to enhance personalized search and recommendation as well as
conversational agents.

Files

age.txt

Files (235.8 MB)

Name Size Download all
md5:926fbc8b2684c7e589965d7a1606a145
11.2 kB Preview Download
md5:4313316b460da94728880769b2f9dc5c
8.4 MB Preview Download
md5:cf8b465ce6f7bf5c52bef09fe71052fe
39.1 MB Preview Download
md5:3adfc8db5d747dfccc0bda293668f6db
98.3 MB Preview Download
md5:4e552ffdee75b6beaa56f2a39156de3e
90.0 MB Preview Download