SemEval 2019 Task 4 - Hyperpartisan News Detection
Creators
- 1. Bauhaus-Universität Weimar
- 2. Leipzig University
- 3. Factmata Ltd.
Description
Third trial dataset for the SemEval 2019 Task 4: Hyperpartisan News Detection.
The dataset contains 1 million articles. It is split in training (200,000 left, 400,000 least, 200,000 right) and validation (50,000 left, 100,000 least, 50,000 right), where no publisher that occurs in the training set also occurs in the validation set. All articles are labeled by the overall bias of the publisher as provided by BuzzFeed journalists or MediaBiasFactCheck.com.
The trial data is not fully cleaned. Due to some encoding error, some characters are replaced by question marks. However, all files are already fully compatible with the XML schema files.
Files
articles-training-20180831.xml.zip
Files
(2.0 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:31e3fb439c98b18cd74a7d936a65b218
|
2.1 kB | Download |
|
md5:c3e85da69f0ec76d30a2c1a0b22d3150
|
1.4 GB | Preview Download |
|
md5:5dd17f5043f130407cf599d585ba4ca9
|
547.9 MB | Preview Download |
|
md5:7ea315edde4f500b554571f388a2fa46
|
30.0 MB | Preview Download |
|
md5:50fdbd01f9eef4902a0e6fe93a360577
|
7.0 MB | Preview Download |
|
md5:81dd0e153d6f78ca10a5599da6aac66e
|
1.6 kB | Download |
Additional details
Related works
- Is referenced by
- https://pan.webis.de/semeval19/semeval19-web/ (URL)