Published April 10, 2019 | Version v4
Conference paper Open

RedDust: a Large Reusable Dataset of Reddit User Traits

  • 1. Max Planck Institute for Informatics, Saarbr ̈ucken, Germany

Description

Social media is a rich source of assertions about personal attributes,
such as ``I am a doctor'' or ``my hobby is playing tennis.''
Precisely identifying explicit assertions is difficult, though,
because of the users' highly varied vocabulary and language expressions.
Identifying implicit assertions like ``I've been at work treating patients all day'' is even more challenging.
We present RedDust data resource consisting of personal attribute labels for over 300k Reddit users across five predicates: profession, hobby, family status, age, and gender.
We construct RedDust using a diverse set of high-precision patterns
To the best of our best knowledge, RedDust is the first semantic data resource
about Reddit users at large scale. We envision further use cases of RedDust for providing background
knowledge about user traits, to enhance personalized search and recommendation as well as
conversational agents.

Files

age.txt

Files (445.5 MB)

Name Size Download all
md5:fd6560afccc2c4e056d12371f60959e2
121.8 MB Preview Download
md5:c1440982195695f22281381ece3935c5
12.3 MB Preview Download
md5:5d074d873c72ca64d34810ffa5c4254a
56.5 MB Preview Download
md5:2bd1a483986ab5fcecce5c3087ab2fef
124.9 MB Preview Download
md5:70f33634a2f3dc016f0fb24f11ad23fe
130.0 MB Preview Download
md5:3e2d4c24e21981c1b8acf57049b8bad9
722 Bytes Download
md5:b8c3d874688afe70a3cd731c6d7f0fbd
47.4 kB Preview Download