Published March 9, 2026 | Version v1
Dataset Open

CLIC24 – Climate Change Multilingual Media Corpus 2024

Description

The CLIC24 corpus is a multilingual collection of journalistic texts focusing on climate change related topics. The corpus contains texts in Czech, German, English, and Spanish, collected automatically using a custom extraction script.
For each language, a predefined set of topic-specific keywords related to climate change was used. An article was included in the corpus if at least one keyword occurred either in the article title or in the article body.
During data extraction, texts were cleaned of navigation elements, advertisements, and other typical website noise. Articles identified as incomplete (e.g., due to paywalls) were excluded. Duplicate texts were removed based on URL comparison.
Only articles published between January 1, 2024 and December 31, 2024 were included. Records without a reliably traceable publication date were excluded.
The corpus documentation includes an overview of individual media sources, the number of included texts, and the list of keywords used for data collection.
The dataset was created for linguistic and interdisciplinary research on media discourse, disinformation, and climate change communication.

Access and Licensing Conditions:
Metadata-only dataset.
Due to copyright restrictions, the original full-text articles included in the CLIC24 dataset cannot be redistributed. 
Use of the dataset outside the scope of the grant project requires permission from the authors.

Files

CLIC24_Metadata_Czech_Media.csv

Files (5.4 MB)

Name Size Download all
md5:e413f06799e466eda3b19f6b4c1c333a
210.9 kB Preview Download
md5:b08b120f421275eeb890628a926b6dd8
3.9 MB Preview Download
md5:a4dd21883e2cd1b5359c638c3ca1fc07
951.5 kB Preview Download
md5:6763d8e5efee2a7136dae3611eeac771
389.5 kB Preview Download

Additional details

Funding

Ministry of Education Youth and Sports
Biography of Fake News with a Touch of AI: Dangerous Phenomenon through the Prism of Modern Human Sciences CZ.02.01.01/00/23_025/0008724

Dates

Collected
2024
Time range of published articles