There is a newer version of this record available.

Dataset Restricted Access

SemEval 2019 Task 4 - Hyperpartisan News Detection

Johannes Kiesel; Martin Potthast; Maria Mestre; Rishabh Shukla; Benno Stein; David Corney; Emmanuel Vincent; Payam Adineh

Third trial dataset for the SemEval 2019 Task 4: Hyperpartisan News Detection.

The dataset contains 1 million articles. It is split in training (200,000 left, 400,000 least, 200,000 right) and validation (50,000 left, 100,000 least, 50,000 right), where no publisher that occurs in the training set also occurs in the validation set. All articles are labeled by the overall bias of the publisher as provided by BuzzFeed journalists or

The trial data is not fully cleaned. Due to some encoding error, some characters are replaced by question marks. However, all files are already fully compatible with the XML schema files.

Restricted Access

You may request access to the files in this upload, provided that you fulfil the conditions below. The decision whether to grant/deny access is solely under the responsibility of the record owner.

Access is restricted to participants and organizers of the challenge for now. The data will be publicly available after the evaluation period.

All versions This version
Views 16,1818,441
Downloads 6,6291,762
Data volume 2.0 TB901.0 GB
Unique views 13,6078,024
Unique downloads 1,605456


Cite as