CLIC24 – Climate Change Multilingual Media Corpus 2024

Kubát, Miroslav; Nogolová, Michaela; Mostýn, Martin; Místecký, Michal; Beneš Kováčová, Dominika; Šlechta, Petr; Pišl, Milan; Lukl, Jiří; Chen, Xinying; Vankova, Lenka

doi:10.5281/zenodo.18924734

Published March 9, 2026 | Version 1.0

Dataset Open

CLIC24 – Climate Change Multilingual Media Corpus 2024

1. University of Ostrava

The CLIC24 corpus is a multilingual collection of journalistic texts focusing on climate change related topics. The corpus contains texts in Czech, German, English, and Spanish, collected automatically using a custom extraction script.
For each language, a predefined set of topic-specific keywords related to climate change was used. An article was included in the corpus if at least one keyword occurred either in the article title or in the article body.
During data extraction, texts were cleaned of navigation elements, advertisements, and other typical website noise. Articles identified as incomplete (e.g., due to paywalls) were excluded. Duplicate texts were removed based on URL comparison.
Only articles published between January 1, 2024 and December 31, 2024 were included. Records without a reliably traceable publication date were excluded.
The corpus documentation includes an overview of individual media sources, the number of included texts, and the list of keywords used for data collection.
The dataset was created for linguistic and interdisciplinary research on media discourse, disinformation, and climate change communication.

Access and Licensing Conditions:
Metadata-only dataset.
This is a closed dataset, only the metadata are open. Public files attached within the record contain only list of articles with URLs included in original dataset.
Due to copyright restrictions, the original full-text articles included in the CLIC24 dataset cannot be redistributed and do not have the license. License specification will be assigned individually based on legitimate data request. Use of the dataset outside the scope of the grant project requires permission from the authors.

Contact person: miroslav.kubat@osu.cz

Notes (English)

Access and Licensing Conditions:

Metadata-only dataset.
This is a closed dataset, only the metadata are open. Public files attached within the record contain only list of articles with URLs included in original dataset.
Due to copyright restrictions, the original full-text articles included in the CLIC24 dataset cannot be redistributed and do not have the license. License specification will be assigned individually based on legitimate data request. Use of the dataset outside the scope of the grant project requires permission from the authors.

Contact person: miroslav.kubat@osu.cz

Files

CLIC24_Metadata_Czech_Media.csv

Files (5.4 MB)

Name	Size	Download all
CLIC24_Metadata_Czech_Media.csv md5:e413f06799e466eda3b19f6b4c1c333a	210.9 kB	Preview Download
CLIC24_Metadata_English_Media.csv md5:b08b120f421275eeb890628a926b6dd8	3.9 MB	Preview Download
CLIC24_Metadata_German_Media.csv md5:a4dd21883e2cd1b5359c638c3ca1fc07	951.5 kB	Preview Download
CLIC24_Metadata_Spanish_Media.csv md5:6763d8e5efee2a7136dae3611eeac771	389.5 kB	Preview Download

Additional details

Ministry of Education Youth and Sports
Biography of Fake News with a Touch of AI: Dangerous Phenomenon through the Prism of Modern Human Sciences CZ.02.01.01/00/23_025/0008724

Collected: 2024

Time range of published articles

	All versions	This version
Views	125	125
Downloads	169	169
Data volume	152.8 MB	152.8 MB

CLIC24_Metadata_Czech_Media.csv

Files (5.4 MB)

Funding

Dates

CLIC24 – Climate Change Multilingual Media Corpus 2024

Authors/Creators

Description

Notes (English)

Files

CLIC24_Metadata_Czech_Media.csv

Files (5.4 MB)

Additional details

Funding

Dates