SemEval 2019 Task 4 - Hyperpartisan News Detection

Johannes Kiesel; Martin Potthast; Maria Mestre; Rishabh Shukla; Benno Stein; David Corney; Emmanuel Vincent; Payam Adineh

doi:10.5281/zenodo.1406208

Published September 3, 2018 | Version Trial v3

Dataset Open

SemEval 2019 Task 4 - Hyperpartisan News Detection

1. Bauhaus-Universität Weimar
2. Leipzig University
3. Factmata Ltd.

Third trial dataset for the SemEval 2019 Task 4: Hyperpartisan News Detection.

The dataset contains 1 million articles. It is split in training (200,000 left, 400,000 least, 200,000 right) and validation (50,000 left, 100,000 least, 50,000 right), where no publisher that occurs in the training set also occurs in the validation set. All articles are labeled by the overall bias of the publisher as provided by BuzzFeed journalists or MediaBiasFactCheck.com.

The trial data is not fully cleaned. Due to some encoding error, some characters are replaced by question marks. However, all files are already fully compatible with the XML schema files.

Files

articles-training-20180831.xml.zip

Files (2.0 GB)

Name	Size	Download all
article.xsd md5:31e3fb439c98b18cd74a7d936a65b218	2.1 kB	Download
articles-training-20180831.xml.zip md5:c3e85da69f0ec76d30a2c1a0b22d3150	1.4 GB	Preview Download
articles-validation-20180831.xml.zip md5:5dd17f5043f130407cf599d585ba4ca9	547.9 MB	Preview Download
ground-truth-training-20180831.xml.zip md5:7ea315edde4f500b554571f388a2fa46	30.0 MB	Preview Download
ground-truth-validation-20180831.xml.zip md5:50fdbd01f9eef4902a0e6fe93a360577	7.0 MB	Preview Download
ground-truth.xsd md5:81dd0e153d6f78ca10a5599da6aac66e	1.6 kB	Download

Additional details

Is referenced by: https://pan.webis.de/semeval19/semeval19-web/ (URL)

	All versions	This version
Views	21,599	8,720
Downloads	10,443	739
Data volume	11.7 TB	1.0 TB

SemEval 2019 Task 4 - Hyperpartisan News Detection

Creators

Description

Files

articles-training-20180831.xml.zip

Files (2.0 GB)

Additional details

Related works