10.5281/zenodo.1400316
https://zenodo.org/records/1400316
oai:zenodo.org:1400316
Johannes Kiesel
Johannes Kiesel
0000-0002-1617-6508
Bauhaus-Universität Weimar
Martin Potthast
Martin Potthast
Leipzig University
Maria Mestre
Maria Mestre
Factmata Ltd.
Rishabh Shukla
Rishabh Shukla
Factmata Ltd.
Benno Stein
Benno Stein
Bauhaus-Universität Weimar
David Corney
David Corney
Emmanuel Vincent
Emmanuel Vincent
Factmata Ltd.
Payam Adineh
Payam Adineh
Bauhaus-Universität Weimar
SemEval 2019 Task 4 - Hyperpartisan News Detection
Zenodo
2018
Hyperpartisan news
SemEval
SemEval 2019
SemEval 2019 Task 4
Biased news
News bias
Hyperpartisan
Hyperpartisanship
2018-07-11
eng
https://pan.webis.de/semeval19/semeval19-web/
10.5281/zenodo.1310145
https://zenodo.org/communities/pan
https://zenodo.org/communities/webis
Trial
Second trial dataset for the SemEval 2019 Task 4: Hyperpartisan News Detection.
The dataset contains ~1 million articles. It is split in training and validation, where no publisher that occurs in the training set also occurs in the validation set. Due to imbalance in our raw data, the training dataset of this version contains more articles that are hyperpartisan (533334: 26667 left and 26667 right) than not (26667). The validation set is balanced as the test set will be: 50% hyperpartisan (33333 left and 33333 right) and 50% not (66666). All articles are labeled by the overall bias of the publisher as provided by BuzzFeed journalists or MediaBiasFactCheck.com.
The trial data is not fully cleaned. Due to some encoding error, some characters are replaced by question marks. However, all files are already fully compatible with the XML schema files. Unlike the first trial version of this dataset, the <q> tag is used instead of <quote> (to be compatible with HTML).