There is a newer version of this record available.

Dataset Open Access

ClaimBuster: A Benchmark Dataset of Check-worthy Factual Claims

Fatma Arslan; Naeemul Hassan; Chengkai Li; Mark Tremayne

The ClaimBuster dataset consists of statements extracted from all U.S. general election presidential debates (1960-2016) along with human-annotated check-worthiness labels. It contains 23,533 sentences where each sentence is categorized into one of the three categories: non-factual statement, unimportant factual statement, and check-worthy factual statement. 

The work is partially supported by NSF grants IIS-1408928, IIP-1565699, IIS-1719054, OIA-1937143, a Knight Prototype Fund from the Knight Foundation, and subawards from Duke University as part of a grant to the Duke Tech & Check Cooperative from the Knight Foundation and Facebook. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the funding agencies.
Files (9.3 MB)
Name Size
all_sentences.csv
md5:686ac1dd5123d9ca0d229ee9760d4962
5.3 MB Download
crowdsourced.csv
md5:af9649bc3cc93edbc804893720a50bde
3.9 MB Download
groundtruth.csv
md5:1577f6d45bf33eabe9cf760f0fb66da3
167.7 kB Download
842
594
views
downloads
All versions This version
Views 842492
Downloads 594555
Data volume 2.6 GB2.4 GB
Unique views 699441
Unique downloads 368347

Share

Cite as