Dataset Open Access

BuzzFeed-Webis Fake News Corpus 2016

Potthast, Martin; Kiesel, Johannes; Reinartz, Kevin; Bevendorff, Janek; Stein, Benno

The corpus comprises the output of 9 publishers in a week close to the US elections. Among the selected publishers are 6 prolific hyperpartisan ones (three left-wing and three right-wing), and three mainstream publishers (see Table 1). All publishers earned Facebook’s blue checkmark, indicating authenticity and an elevated status within the network. For seven weekdays (September 19 to 23 and September 26 and 27), every post and linked news article of the 9 publishers was fact-checked by professional journalists at BuzzFeed. In total, 1,627 articles were checked, 826 mainstream, 256 left-wing and 545 right-wing. The imbalance between categories results from differing publication frequencies.

Files (5.6 GB)
Name Size
articles.zip
md5:5899194f95016cc90aae5e6d0e7e5042
4.0 MB Download
overview.csv
md5:eab4909641bd707fffe514e9cd201430
253.8 kB Download
README.txt
md5:18e572550d8e6b03a099d4b73d496c4d
1.4 kB Download
schema.xsd
md5:99c40ec4f49cad8353f61e82a1265040
2.7 kB Download
web-archives.zip
md5:944cdafb0df4ea1070250a56b1342bdc
5.6 GB Download
  • Martin Potthast, Johannes Kiesel, Kevin Reinartz, Janek Bevendorff, and Benno Stein. A Stylometric Inquiry into Hyperpartisan and Fake News. In 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018), pages 231-240, July 2018. Association for Computational Linguistics.

1,744
1,444
views
downloads
All versions This version
Views 1,7441,280
Downloads 1,4441,185
Data volume 2.4 TB1.4 TB
Unique views 1,4621,131
Unique downloads 668532

Share

Cite as