Published February 20, 2018 | Version ACL 2018
Dataset Open

BuzzFeed-Webis Fake News Corpus 2016


The corpus comprises the output of 9 publishers in a week close to the US elections. Among the selected publishers are 6 prolific hyperpartisan ones (three left-wing and three right-wing), and three mainstream publishers (see Table 1). All publishers earned Facebook’s blue checkmark, indicating authenticity and an elevated status within the network. For seven weekdays (September 19 to 23 and September 26 and 27), every post and linked news article of the 9 publishers was fact-checked by professional journalists at BuzzFeed. In total, 1,627 articles were checked, 826 mainstream, 256 left-wing and 545 right-wing. The imbalance between categories results from differing publication frequencies.


Files (5.6 GB)

Name Size Download all
4.0 MB Preview Download
253.8 kB Preview Download
1.4 kB Preview Download
2.7 kB Download
5.6 GB Preview Download

Additional details


  • Martin Potthast, Johannes Kiesel, Kevin Reinartz, Janek Bevendorff, and Benno Stein. A Stylometric Inquiry into Hyperpartisan and Fake News. In 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018), pages 231-240, July 2018. Association for Computational Linguistics.