Published February 20, 2018 | Version ACL 2018
Dataset Open

BuzzFeed-Webis Fake News Corpus 2016

Description

The corpus comprises the output of 9 publishers in a week close to the US elections. Among the selected publishers are 6 prolific hyperpartisan ones (three left-wing and three right-wing), and three mainstream publishers (see Table 1). All publishers earned Facebook’s blue checkmark, indicating authenticity and an elevated status within the network. For seven weekdays (September 19 to 23 and September 26 and 27), every post and linked news article of the 9 publishers was fact-checked by professional journalists at BuzzFeed. In total, 1,627 articles were checked, 826 mainstream, 256 left-wing and 545 right-wing. The imbalance between categories results from differing publication frequencies.

Files

articles.zip

Files (5.6 GB)

Name Size Download all
md5:5899194f95016cc90aae5e6d0e7e5042
4.0 MB Preview Download
md5:eab4909641bd707fffe514e9cd201430
253.8 kB Preview Download
md5:18e572550d8e6b03a099d4b73d496c4d
1.4 kB Preview Download
md5:99c40ec4f49cad8353f61e82a1265040
2.7 kB Download
md5:944cdafb0df4ea1070250a56b1342bdc
5.6 GB Preview Download

Additional details

References

  • Martin Potthast, Johannes Kiesel, Kevin Reinartz, Janek Bevendorff, and Benno Stein. A Stylometric Inquiry into Hyperpartisan and Fake News. In 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018), pages 231-240, July 2018. Association for Computational Linguistics.