BuzzFeed-Webis Fake News Corpus 2016

Potthast, Martin; Kiesel, Johannes; Reinartz, Kevin; Bevendorff, Janek; Stein, Benno

doi:10.5281/zenodo.1239675

Published February 20, 2018 | Version ACL 2018

Dataset Open

BuzzFeed-Webis Fake News Corpus 2016

1. Bauhaus-Universität Weimar

The corpus comprises the output of 9 publishers in a week close to the US elections. Among the selected publishers are 6 prolific hyperpartisan ones (three left-wing and three right-wing), and three mainstream publishers (see Table 1). All publishers earned Facebook’s blue checkmark, indicating authenticity and an elevated status within the network. For seven weekdays (September 19 to 23 and September 26 and 27), every post and linked news article of the 9 publishers was fact-checked by professional journalists at BuzzFeed. In total, 1,627 articles were checked, 826 mainstream, 256 left-wing and 545 right-wing. The imbalance between categories results from differing publication frequencies.

Files

articles.zip

Files (5.6 GB)

Name	Size	Download all
articles.zip md5:5899194f95016cc90aae5e6d0e7e5042	4.0 MB	Preview Download
overview.csv md5:eab4909641bd707fffe514e9cd201430	253.8 kB	Preview Download
README.txt md5:18e572550d8e6b03a099d4b73d496c4d	1.4 kB	Preview Download
schema.xsd md5:99c40ec4f49cad8353f61e82a1265040	2.7 kB	Download
web-archives.zip md5:944cdafb0df4ea1070250a56b1342bdc	5.6 GB	Preview Download

Additional details

Martin Potthast, Johannes Kiesel, Kevin Reinartz, Janek Bevendorff, and Benno Stein. A Stylometric Inquiry into Hyperpartisan and Fake News. In 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018), pages 231-240, July 2018. Association for Computational Linguistics.

	All versions	This version
Views	6,361	5,317
Downloads	3,847	3,470
Data volume	18.8 TB	17.0 TB

BuzzFeed-Webis Fake News Corpus 2016

Authors/Creators

Description

Files

articles.zip

Files (5.6 GB)

Additional details

References