Dataset Open Access

Claim Detection and Matching for Indian Languages

Ashkan Kazemi; Kiran Garimella; Devin Gaffney; Scott A. Hale


JSON Export

{
  "files": [
    {
      "links": {
        "self": "https://zenodo.org/api/files/8f069b8d-3064-4838-965f-de37d9c2c211/claim_detection_dataset.csv"
      }, 
      "checksum": "md5:55206b1786829a5a1d31ecb2e69da090", 
      "bucket": "8f069b8d-3064-4838-965f-de37d9c2c211", 
      "key": "claim_detection_dataset.csv", 
      "type": "csv", 
      "size": 3110606
    }, 
    {
      "links": {
        "self": "https://zenodo.org/api/files/8f069b8d-3064-4838-965f-de37d9c2c211/claim_matching_dataset.csv"
      }, 
      "checksum": "md5:0088a150cb6c9ef5fbc4a59cbd4aeb88", 
      "bucket": "8f069b8d-3064-4838-965f-de37d9c2c211", 
      "key": "claim_matching_dataset.csv", 
      "type": "csv", 
      "size": 4144285
    }
  ], 
  "owners": [
    224060
  ], 
  "doi": "10.5281/zenodo.4890950", 
  "stats": {
    "version_unique_downloads": 174.0, 
    "unique_views": 284.0, 
    "views": 312.0, 
    "version_views": 312.0, 
    "unique_downloads": 174.0, 
    "version_unique_views": 284.0, 
    "volume": 675968198.0, 
    "version_downloads": 210.0, 
    "downloads": 210.0, 
    "version_volume": 675968198.0
  }, 
  "links": {
    "doi": "https://doi.org/10.5281/zenodo.4890950", 
    "conceptdoi": "https://doi.org/10.5281/zenodo.4890949", 
    "bucket": "https://zenodo.org/api/files/8f069b8d-3064-4838-965f-de37d9c2c211", 
    "conceptbadge": "https://zenodo.org/badge/doi/10.5281/zenodo.4890949.svg", 
    "html": "https://zenodo.org/record/4890950", 
    "latest_html": "https://zenodo.org/record/4890950", 
    "badge": "https://zenodo.org/badge/doi/10.5281/zenodo.4890950.svg", 
    "latest": "https://zenodo.org/api/records/4890950"
  }, 
  "conceptdoi": "10.5281/zenodo.4890949", 
  "created": "2021-06-01T17:39:42.420797+00:00", 
  "updated": "2021-06-06T17:33:13.522234+00:00", 
  "conceptrecid": "4890949", 
  "revision": 5, 
  "id": 4890950, 
  "metadata": {
    "access_right_category": "success", 
    "doi": "10.5281/zenodo.4890950", 
    "description": "<p>Two datasets are included in this repository: claim matching and claim detection datasets.&nbsp;The collections contain&nbsp;data in 5 languages: Bengali, English, Hindi, Malayalam and Tamil.</p>\n\n<p>The &quot;claim detection&quot;&nbsp;dataset contains textual claims from social media and fact-checking websites annotated for the&nbsp; &quot;fact-check worthiness&quot; of the claims in each message. Data points have one of the three labels of &quot;Yes&quot; (text contains one or more check-worthy claims), &quot;No&quot; and &quot;Probably&quot;.&nbsp;</p>\n\n<p>The &quot;claim matching&quot; dataset is a curated collection of pairs of textual claims from social media and fact-checking websites for the purpose of automatic and multilingual claim matching.&nbsp;Pairs of data have one of the four labels of &quot;Very Similar&quot;, &quot;Somewhat Similar&quot;, &quot;Somewhat Dissimilar&quot; and &quot;Very Dissimilar&quot;.</p>\n\n<p>All personally identifiable information (PII) including phone numbers, email addresses,&nbsp;license plate numbers and addresses have been replaced with general tags (e.g. &lt;PHONE#&gt;, &lt;ADDRESS&gt;, etc)&nbsp;to protect user anonymity. A detailed explanation on the curation and annotation process is provided in our ACL 2021 paper:&nbsp;<br>\n<a href=\"https://arxiv.org/abs/2106.00853\">Kazemi, A.; Garimella, K.; Gaffney, D.; and Hale, S. A. 2021. Claim Matching Beyond English to Scale Global Fact-Checking. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, ACL 2021.</a></p>", 
    "language": "hin", 
    "title": "Claim Detection and Matching for Indian Languages", 
    "license": {
      "id": "CC-BY-4.0"
    }, 
    "relations": {
      "version": [
        {
          "count": 1, 
          "index": 0, 
          "parent": {
            "pid_type": "recid", 
            "pid_value": "4890949"
          }, 
          "is_last": true, 
          "last_child": {
            "pid_type": "recid", 
            "pid_value": "4890950"
          }
        }
      ]
    }, 
    "version": "1.0", 
    "keywords": [
      "nlp", 
      "fact-checking", 
      "misinformation", 
      "multilingual", 
      "claim matching", 
      "claim detection", 
      "whatsapp", 
      "tamil", 
      "malayalam", 
      "bengali"
    ], 
    "publication_date": "2021-06-01", 
    "creators": [
      {
        "affiliation": "University of Michigan & Meedan", 
        "name": "Ashkan Kazemi"
      }, 
      {
        "affiliation": "MIT", 
        "name": "Kiran Garimella"
      }, 
      {
        "affiliation": "Meedan", 
        "name": "Devin Gaffney"
      }, 
      {
        "affiliation": "University of Oxford & Meedan", 
        "name": "Scott A. Hale"
      }
    ], 
    "meeting": {
      "url": "https://2021.aclweb.org/", 
      "dates": "2-4 August 2021", 
      "place": "Online", 
      "title": "ACL-IJCNLP 2021"
    }, 
    "access_right": "open", 
    "resource_type": {
      "type": "dataset", 
      "title": "Dataset"
    }, 
    "related_identifiers": [
      {
        "scheme": "doi", 
        "identifier": "10.5281/zenodo.4890949", 
        "relation": "isVersionOf"
      }
    ]
  }
}
312
210
views
downloads
All versions This version
Views 312312
Downloads 210210
Data volume 676.0 MB676.0 MB
Unique views 284284
Unique downloads 174174

Share

Cite as