ClaimBuster: A Benchmark Dataset of Check-worthy Factual Claims

Fatma Arslan; Naeemul Hassan; Chengkai Li; Mark Tremayne

doi:10.5281/zenodo.3609356

There is a newer version of the record available.

Published January 15, 2020 | Version v1

Dataset Open

ClaimBuster: A Benchmark Dataset of Check-worthy Factual Claims

1. University of Texas at Arlington
2. University of Maryland

The ClaimBuster dataset consists of statements extracted from all U.S. general election presidential debates (1960-2016) along with human-annotated check-worthiness labels. It contains 23,533 sentences where each sentence is categorized into one of the three categories: non-factual statement, unimportant factual statement, and check-worthy factual statement.

Notes

The work is partially supported by NSF grants IIS-1408928, IIP-1565699, IIS-1719054, OIA-1937143, a Knight Prototype Fund from the Knight Foundation, and subawards from Duke University as part of a grant to the Duke Tech & Check Cooperative from the Knight Foundation and Facebook. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the funding agencies.

Files

all_sentences.csv

Files (9.3 MB)

Name	Size	Download all
all_sentences.csv md5:686ac1dd5123d9ca0d229ee9760d4962	5.3 MB	Preview Download
crowdsourced.csv md5:af9649bc3cc93edbc804893720a50bde	3.9 MB	Preview Download
groundtruth.csv md5:1577f6d45bf33eabe9cf760f0fb66da3	167.7 kB	Preview Download

10K

Views

Downloads

Show more details

	All versions	This version
Views	9,541	4,048
Downloads	4,370	3,786
Data volume	24.4 GB	21.4 GB

More info on how stats are collected....

DOI

Resource type

Dataset

Publisher

Zenodo

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: January 16, 2020
Modified: July 2, 2020

ClaimBuster: A Benchmark Dataset of Check-worthy Factual Claims

Authors/Creators

Description

Notes

Files

all_sentences.csv

Files (9.3 MB)