00000nmm##2200000uu#4500 4890950 doi 10.5281/zenodo.4890950 oai:zenodo.org:4890950 Kiran Garimella MIT Devin Gaffney Meedan Scott A. Hale University of Oxford & Meedan Claim Detection and Matching for Indian Languages Ashkan Kazemi University of Michigan & Meedan info:eu-repo/semantics/openAccess Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode cc-by-4.0 spdx nlp fact-checking misinformation multilingual claim matching claim detection whatsapp tamil malayalam bengali Two datasets are included in this repository: claim matching and claim detection datasets. The collections contain data in 5 languages: Bengali, English, Hindi, Malayalam and Tamil. The "claim detection" dataset contains textual claims from social media and fact-checking websites annotated for the  "fact-check worthiness" of the claims in each message. Data points have one of the three labels of "Yes" (text contains one or more check-worthy claims), "No" and "Probably".  The "claim matching" dataset is a curated collection of pairs of textual claims from social media and fact-checking websites for the purpose of automatic and multilingual claim matching. Pairs of data have one of the four labels of "Very Similar", "Somewhat Similar", "Somewhat Dissimilar" and "Very Dissimilar". All personally identifiable information (PII) including phone numbers, email addresses, license plate numbers and addresses have been replaced with general tags (e.g. <PHONE#>, <ADDRESS>, etc) to protect user anonymity. A detailed explanation on the curation and annotation process is provided in our ACL 2021 paper:  <a href="https://arxiv.org/abs/2106.00853">Kazemi, A.; Garimella, K.; Gaffney, D.; and Hale, S. A. 2021. Claim Matching Beyond English to Scale Global Fact-Checking. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, ACL 2021.</a> hin Zenodo 2021-06-01 info:eu-repo/semantics/other 20210606173313.0 4144285 md5:0088a150cb6c9ef5fbc4a59cbd4aeb88 https://zenodo.org/records/4890950/files/claim_matching_dataset.csv 3110606 md5:55206b1786829a5a1d31ecb2e69da090 https://zenodo.org/records/4890950/files/claim_detection_dataset.csv open 10.5281/zenodo.4890949 isVersionOf doi