SemEval 2019 Task 4 - Hyperpartisan News Detection

doi:10.5281/zenodo.1400316

Published July 11, 2018 | Version Trial

Dataset Restricted

SemEval 2019 Task 4 - Hyperpartisan News Detection

1. Bauhaus-Universität Weimar
2. Leipzig University
3. Factmata Ltd.

Second trial dataset for the SemEval 2019 Task 4: Hyperpartisan News Detection.

The dataset contains ~1 million articles. It is split in training and validation, where no publisher that occurs in the training set also occurs in the validation set. Due to imbalance in our raw data, the training dataset of this version contains more articles that are hyperpartisan (533334: 26667 left and 26667 right) than not (26667). The validation set is balanced as the test set will be: 50% hyperpartisan (33333 left and 33333 right) and 50% not (66666). All articles are labeled by the overall bias of the publisher as provided by BuzzFeed journalists or MediaBiasFactCheck.com.

The trial data is not fully cleaned. Due to some encoding error, some characters are replaced by question marks. However, all files are already fully compatible with the XML schema files. Unlike the first trial version of this dataset, the <q> tag is used instead of <quote> (to be compatible with HTML).

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Request access

If you would like to request access to these files, please fill out the form below.

Access is restricted to participants and organizers of the challenge for now. The data will be publicly available after the evaluation period.

You are currently not logged in. Do you have an account? Log in here

Additional details

Is referenced by: https://pan.webis.de/semeval19/semeval19-web/ (URL)

	All versions	This version
Views	19,440	805
Downloads	7,111	108
Data volume	11.2 TB	181.2 GB

SemEval 2019 Task 4 - Hyperpartisan News Detection

Creators

Description

Files

Restricted

Request access

Additional details

Related works