A Multidimensional Dataset for Analyzing and Detecting News Bias based on Crowdsourcing

Färber, Michael; Burkard, Victoria; Jatowt, Adam; Lim, Sora

doi:10.5281/zenodo.3885351

Published June 8, 2020 | Version v1

Dataset Open

A Multidimensional Dataset for Analyzing and Detecting News Bias based on Crowdsourcing

1. Karlsruhe Institute of Technology
2. Kyoto University

We provide a large data set consisting of 2,057 sentences from 90 news articles and annotations of crowdworkers with respect to bias itself and the following bias dimensions:

hidden assumptions
subjectivity
representation tendencies

Our data set contains 44,547 labels in total (43,197 sentence labels and 1,350 article labels).

The news articles deal with the Ukraine crisis. They were published in 33 countries in total and were selected based on the data set of Cremisini et al. (Cremisini, A., Aguilar, D., & Finlayson, M. A. A Challenging Dataset for Bias Detection: The Case of the Crisis in the Ukraine, Proc. of SBP-BRiMS'19, pp. 173-183, 2019).

Each sentence was annotated by 5 crowdworkers. In total, we spent $ 3,335 for the crowdworkers annotations.

More information can be found in our GitHub repository. A description of the used file format is given in the codebook attached to the dataset.

Please cite our data set as follows:

@unpublished{Faerber2020Bias,
 author = {Michael F{\"{a}}rber and Victoria Burkard and Adam Jatowt and Sora Lim},
 title  = {{A Multidimensional Dataset for Analyzing and Detecting News Bias based on Crowdsourcing}},
 year   = {2020}
}

Files

all-data-as-json.zip

Files (237.0 kB)

Name	Size	Download all
all-data-as-json.zip md5:1bd110fa8c3f6bc1d9e8fd01c30f3df9	237.0 kB	Preview Download

	All versions	This version
Views	1,223	1,223
Downloads	139	139
Data volume	33.2 MB	33.2 MB

A Multidimensional Dataset for Analyzing and Detecting News Bias based on Crowdsourcing

Creators

Description

Files

all-data-as-json.zip

Files (237.0 kB)