RedDust: a Large Reusable Dataset of Reddit User Traits
- 1. Max Planck Institute for Informatics, Saarbr ̈ucken, Germany
Description
Social media is a rich source of assertions about personal attributes,
such as ``I am a doctor'' or ``my hobby is playing tennis.''
Precisely identifying explicit assertions is difficult, though,
because of the users' highly varied vocabulary and language expressions.
Identifying implicit assertions like ``I've been at work treating patients all day'' is even more challenging.
We present RedDust data resource consisting of personal attribute labels for over 300k Reddit users across five predicates: profession, hobby, family status, age, and gender.
We construct RedDust using a diverse set of high-precision patterns
To the best of our best knowledge, RedDust is the first semantic data resource
about Reddit users at large scale. We envision further use cases of RedDust for providing background
knowledge about user traits, to enhance personalized search and recommendation as well as
conversational agents.
Files
age.txt
Files
(445.5 MB)
Name | Size | Download all |
---|---|---|
md5:fd6560afccc2c4e056d12371f60959e2
|
121.8 MB | Preview Download |
md5:c1440982195695f22281381ece3935c5
|
12.3 MB | Preview Download |
md5:5d074d873c72ca64d34810ffa5c4254a
|
56.5 MB | Preview Download |
md5:2bd1a483986ab5fcecce5c3087ab2fef
|
124.9 MB | Preview Download |
md5:70f33634a2f3dc016f0fb24f11ad23fe
|
130.0 MB | Preview Download |
md5:3e2d4c24e21981c1b8acf57049b8bad9
|
722 Bytes | Download |
md5:b8c3d874688afe70a3cd731c6d7f0fbd
|
47.4 kB | Preview Download |