Published May 16, 2022 | Version 4
Dataset Open

CT-FAN: A Multilingual dataset for Fake News Detection

  • 1. University of Duisburg-Essen
  • 2. University of Applied Sciences Potsdam
  • 3. University of Hildesheim
  • 4. University of Klagenfurt
  • 5. Darmstadt University of Applied Sciences

Contributors

  • 1. University of Applied Sciences Potsdam
  • 2. University of Hildesheim

Description

By downloading the data, you agree with the terms & conditions mentioned below:

Data Access: The data in the research collection may only be used for research purposes. Portions of the data are copyrighted and have commercial value as data, so you must be careful to use them only for research purposes. 

Summaries, analyses and interpretations of the linguistic properties of the information may be derived and published, provided it is impossible to reconstruct the information from these summaries. You may not try identifying the individuals whose texts are included in this dataset. You may not try to identify the original entry on the fact-checking site. You are not permitted to publish any portion of the dataset besides summary statistics or share it with anyone else.

We grant you the right to access the collection's content as described in this agreement. You may not otherwise make unauthorised commercial use of, reproduce, prepare derivative works, distribute copies, perform, or publicly display the collection or parts of it. You are responsible for keeping and storing the data in a way that others cannot access. The data is provided free of charge.

Citation

Please cite our work as

@InProceedings{clef-checkthat:2022:task3,
author = {K{\"o}hler, Juliane and Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Wiegand, Michael and Siegel, Melanie and Mandl, Thomas},
title = "Overview of the {CLEF}-2022 {CheckThat}! Lab Task 3 on Fake News Detection",
year = {2022},
booktitle = "Working Notes of CLEF 2022---Conference and Labs of the Evaluation Forum",
series = {CLEF~'2022},
address = {Bologna, Italy},}
@article{shahi2021overview,
  title={Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection},
  author={Shahi, Gautam Kishore and Stru{\ss}, Julia Maria and Mandl, Thomas},
  journal={Working Notes of CLEF},
  year={2021}
}

Problem Definition: Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other (e.g., claims in dispute) and detect the topical domain of the article. This task will run in English and German.

Task 3: Multi-class fake news detection of news articles (English) Sub-task A would detect fake news designed as a four-class classification problem. Given the text of a news article, determine whether the main claim made in the article is true, partially true, false, or other. The training data will be released in batches and roughly about 1264 articles with the respective label in English language. Our definitions for the categories are as follows:

  • False - The main claim made in an article is untrue.

  • Partially False - The main claim of an article is a mixture of true and false information. The article contains partially true and partially false information but cannot be considered 100% true. It includes all articles in categories like partially false, partially true, mostly true, miscaptioned, misleading etc., as defined by different fact-checking services.

  • True - This rating indicates that the primary elements of the main claim are demonstrably true.

  • Other- An article that cannot be categorised as true, false, or partially false due to a lack of evidence about its claims. This category includes articles in dispute and unproven articles.

Cross-Lingual Task (German)

Along with the multi-class task for the English language, we have introduced a task for low-resourced language. We will provide the data for the test in the German language. The idea of the task is to use the English data and the concept of transfer to build a classification model for the German language.

Input Data

The data will be provided in the format of Id, title, text, rating, the domain; the description of the columns is as follows:

  • ID- Unique identifier of the news article
  • Title- Title of the news article
  • text- Text mentioned inside the news article
  • our rating - class of the news article as false, partially false, true, other

Output data format

  • public_id- Unique identifier of the news article
  • predicted_rating- predicted class

Sample File

public_id, predicted_rating
1, false
2, true

IMPORTANT!

  1. We have used the data from 2010 to 2022, and the content of fake news is mixed up with several topics like elections, COVID-19 etc.

Baseline: For this task, we have created a baseline system. The baseline system can be found at https://zenodo.org/record/6362498

Related Work

  • Shahi GK. AMUSED: An Annotation Framework of Multi-modal Social Media Data. arXiv preprint arXiv:2010.00502. 2020 Oct 1.https://arxiv.org/pdf/2010.00502.pdf
  • G. K. Shahi and D. Nandini, “FakeCovid – a multilingual cross-domain fact check news dataset for covid-19,” in workshop Proceedings of the 14th International AAAI Conference on Web and Social Media, 2020. http://workshop-proceedings.icwsm.org/abstract?id=2020_14
  • Shahi, G. K., Dirkson, A., & Majchrzak, T. A. (2021). An exploratory study of covid-19 misinformation on twitter. Online Social Networks and Media22, 100104. doi: 10.1016/j.osnem.2020.100104
  • Shahi, G. K., Struß, J. M., & Mandl, T. (2021). Overview of the CLEF-2021 CheckThat! lab task 3 on fake news detection. Working Notes of CLEF.
  • Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeno, A., Míguez, R., Shaar, S., ... & Mandl, T. (2021, March). The CLEF-2021 CheckThat! lab on detecting check-worthy claims, previously fact-checked claims, and fake news. In European Conference on Information Retrieval (pp. 639-649). Springer, Cham.
  • Nakov, P., Da San Martino, G., Elsayed, T., Barrón-Cedeño, A., Míguez, R., Shaar, S., ... & Kartal, Y. S. (2021, September). Overview of the CLEF–2021 CheckThat! Lab on Detecting Check-Worthy Claims, Previously Fact-Checked Claims, and Fake News. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 264-291). Springer, Cham.

Files

FakeNews_Task3_2022.zip

Files (4.6 MB)

Name Size Download all
md5:4f4b90e10910b74816d10e144221e1fb
4.6 MB Preview Download

Additional details

References

  • Köhler, J., Shahi, G. K., Struß, J. M., Wiegand, M., Siegel, M., Mandl, T., & Schütz, M. (2022). Overview of the CLEF-2022 CheckThat! lab task 3 on fake news detection. Working Notes of CLEF.