Dataset Open Access
Hiba Arnaout; Simon Razniewski; Jeff Z. Pan
These datasets contains statements about demographics and outliers of Wiki-based Communities of Interest.
Group-centric dataset (sample):
{
"title": "winners of Priestley Medal",
"recorded_members": 83,
"topics": ["STEM.Chemistry"],
"demographics": [
"occupation-chemist",
"gender-male",
"citizen-U.S."
],
"outliers": [
{
"reason": "NOT(chemist) unlike 82 recorded members",
"members": [
"Francis Garvan (lawyer, art collector)"
]
},
{
"reason": "NOT(male) unlike 80 recorded members",
"members": [
"Mary L. Good (female)",
"Darleane Hoffman (female)",
"Jacqueline Barton (female)"
]
}
]
}
Subject-centric dataset (sample):
{
"subject": "Serena Williams",
"statements": [
{
"statement": "NOT(sport-basketball) but (tennis) unlike 4 recorded winners of Best Female Athlete ESPY Award.",
"score": 0.36
},
{
"statement": "NOT(occupation-politician) but (tennis player, businessperson, autobiographer) unlike 20 recorded winners of Michigan Women's Hall of Fame.",
"score": 0.17
}
]
}
This data can be also browsed at: https://wikiknowledge.onrender.com/demographics/
Name | Size | |
---|---|---|
group_centric.jsonl
md5:7143c37a9190cdc2628ca64266b0fc99 |
63.7 MB | Download |
subject_centric.jsonl
md5:578265b26d600f6ecb1bccf7dcfe2438 |
172.0 MB | Download |
All versions | This version | |
---|---|---|
Views | 109 | 52 |
Downloads | 15 | 8 |
Data volume | 1.2 GB | 834.7 MB |
Unique views | 79 | 46 |
Unique downloads | 11 | 5 |