Dataset Open Access

Claim Detection and Matching for Indian Languages

Ashkan Kazemi; Kiran Garimella; Devin Gaffney; Scott A. Hale


JSON-LD (schema.org) Export

{
  "inLanguage": {
    "alternateName": "hin", 
    "@type": "Language", 
    "name": "Hindi"
  }, 
  "description": "<p>Two datasets are included in this repository: claim matching and claim detection datasets.&nbsp;The collections contain&nbsp;data in 5 languages: Bengali, English, Hindi, Malayalam and Tamil.</p>\n\n<p>The &quot;claim detection&quot;&nbsp;dataset contains textual claims from social media and fact-checking websites annotated for the&nbsp; &quot;fact-check worthiness&quot; of the claims in each message. Data points have one of the three labels of &quot;Yes&quot; (text contains one or more check-worthy claims), &quot;No&quot; and &quot;Probably&quot;.&nbsp;</p>\n\n<p>The &quot;claim matching&quot; dataset is a curated collection of pairs of textual claims from social media and fact-checking websites for the purpose of automatic and multilingual claim matching.&nbsp;Pairs of data have one of the four labels of &quot;Very Similar&quot;, &quot;Somewhat Similar&quot;, &quot;Somewhat Dissimilar&quot; and &quot;Very Dissimilar&quot;.</p>\n\n<p>All personally identifiable information (PII) including phone numbers, email addresses,&nbsp;license plate numbers and addresses have been replaced with general tags (e.g. &lt;PHONE#&gt;, &lt;ADDRESS&gt;, etc)&nbsp;to protect user anonymity. A detailed explanation on the curation and annotation process is provided in our ACL 2021 paper:&nbsp;<br>\n<a href=\"https://arxiv.org/abs/2106.00853\">Kazemi, A.; Garimella, K.; Gaffney, D.; and Hale, S. A. 2021. Claim Matching Beyond English to Scale Global Fact-Checking. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, ACL 2021.</a></p>", 
  "license": "https://creativecommons.org/licenses/by/4.0/legalcode", 
  "creator": [
    {
      "affiliation": "University of Michigan & Meedan", 
      "@type": "Person", 
      "name": "Ashkan Kazemi"
    }, 
    {
      "affiliation": "MIT", 
      "@type": "Person", 
      "name": "Kiran Garimella"
    }, 
    {
      "affiliation": "Meedan", 
      "@type": "Person", 
      "name": "Devin Gaffney"
    }, 
    {
      "affiliation": "University of Oxford & Meedan", 
      "@type": "Person", 
      "name": "Scott A. Hale"
    }
  ], 
  "url": "https://zenodo.org/record/4890950", 
  "datePublished": "2021-06-01", 
  "version": "1.0", 
  "@type": "Dataset", 
  "keywords": [
    "nlp", 
    "fact-checking", 
    "misinformation", 
    "multilingual", 
    "claim matching", 
    "claim detection", 
    "whatsapp", 
    "tamil", 
    "malayalam", 
    "bengali"
  ], 
  "@context": "https://schema.org/", 
  "distribution": [
    {
      "contentUrl": "https://zenodo.org/api/files/8f069b8d-3064-4838-965f-de37d9c2c211/claim_detection_dataset.csv", 
      "encodingFormat": "csv", 
      "@type": "DataDownload"
    }, 
    {
      "contentUrl": "https://zenodo.org/api/files/8f069b8d-3064-4838-965f-de37d9c2c211/claim_matching_dataset.csv", 
      "encodingFormat": "csv", 
      "@type": "DataDownload"
    }
  ], 
  "identifier": "https://doi.org/10.5281/zenodo.4890950", 
  "@id": "https://doi.org/10.5281/zenodo.4890950", 
  "workFeatured": {
    "url": "https://2021.aclweb.org/", 
    "location": "Online", 
    "@type": "Event", 
    "name": "ACL-IJCNLP 2021"
  }, 
  "name": "Claim Detection and Matching for Indian Languages"
}
315
212
views
downloads
All versions This version
Views 315315
Downloads 212212
Data volume 682.2 MB682.2 MB
Unique views 287287
Unique downloads 176176

Share

Cite as