Navigating News Narratives: A Media Bias Analysis Dataset

Raza, Shaina

doi:10.6084/m9.figshare.24422122

Published October 23, 2023 | Version v2

Dataset Open

Navigating News Narratives: A Media Bias Analysis Dataset

Raza, Shaina (Other)¹

1. Vector Institute

Contributors

Other:

Raza, Shaina¹

1. Vector Institute for Artificial Intelligence

The prevalence of bias in the news media has become a critical issue, affecting public perception on a range of important topics such as political views, health, insurance, resource distributions, religion, race, age, gender, occupation, and climate change. The media has a moral responsibility to ensure accurate information dissemination and to increase awareness about important issues and the potential risks associated with them. This highlights the need for a solution that can help mitigate against the spread of false or misleading information and restore public trust in the media.

Data description: This is a dataset for news media bias covering different dimensions of the biases: political, hate speech, political, toxicity, sexism, ageism, gender identity, gender discrimination, race/ethnicity, climate change, occupation, spirituality, which makes it a unique contribution. The dataset used for this project does not contain any personally identifiable information (PII).

Data Format: The format of data is:

ID: Numeric unique identifier.
Text: Main content.
Dimension: Categorical descriptor of the text.
Biased_Words: List of words considered biased.
Aspect: Specific topic within the text.
Label: Neutral, Slightly Biased , Highly Biased

Annotation Scheme: The annotation scheme is based on Active learning, which is Manual Labeling --> Semi-Supervised Learning --> Human Verifications (iterative process)

Bias Label: Indicate the presence/absence of bias (e.g., no bias, mild, strong).
Words/Phrases Level Biases: Identify specific biased words/phrases.
Subjective Bias (Aspect): Capture biases related to content aspects.

List of datasets used : We curated different news categories like Climate crisis news summaries , occupational, spiritual/faith/ general using RSS to capture different dimensions of the news media biases. The annotation is performed using active learning to label the sentence (either neural/ slightly biased/ highly biased) and to pick biased words from the news.

We also utilize publicly available data from the following links. Our Attribution to others.

MBIC (media bias): Spinde, Timo, Lada Rudnitckaia, Kanishka Sinha, Felix Hamborg, Bela Gipp, and Karsten Donnay. "MBIC--A Media Bias Annotation Dataset Including Annotator Characteristics." arXiv preprint arXiv:2105.11910 (2021). https://zenodo.org/records/4474336

Hyperpartisan news: Kiesel, Johannes, Maria Mestre, Rishabh Shukla, Emmanuel Vincent, Payam Adineh, David Corney, Benno Stein, and Martin Potthast. "Semeval-2019 task 4: Hyperpartisan news detection." In Proceedings of the 13th International Workshop on Semantic Evaluation, pp. 829-839. 2019. https://huggingface.co/datasets/hyperpartisan_news_detection

Toxic comment classification: Adams, C.J., Jeffrey Sorensen, Julia Elliott, Lucas Dixon, Mark McDonald, Nithum, and Will Cukierski. 2017. "Toxic Comment Classification Challenge." Kaggle. https://kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge.

Jigsaw Unintended Bias: Adams, C.J., Daniel Borkan, Inversion, Jeffrey Sorensen, Lucas Dixon, Lucy Vasserman, and Nithum. 2019. "Jigsaw Unintended Bias in Toxicity Classification." Kaggle. https://kaggle.com/competitions/jigsaw-unintended-bias-in-toxicity-classification.

Age Bias : Díaz, Mark, Isaac Johnson, Amanda Lazar, Anne Marie Piper, and Darren Gergle. "Addressing age-related bias in sentiment analysis." In Proceedings of the 2018 chi conference on human factors in computing systems, pp. 1-14. 2018. Age Bias Training and Testing Data - Age Bias and Sentiment Analysis Dataverse (harvard.edu)

Multi-dimensional news Ukraine: Färber, Michael, Victoria Burkard, Adam Jatowt, and Sora Lim. "A multidimensional dataset based on crowdsourcing for analyzing and detecting news bias." In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 3007-3014. 2020. https://zenodo.org/records/3885351#.ZF0KoxHMLtV

Social biases: Sap, Maarten, Saadia Gabriel, Lianhui Qin, Dan Jurafsky, Noah A. Smith, and Yejin Choi. "Social bias frames: Reasoning about social and power implications of language." arXiv preprint arXiv:1911.03891 (2019). https://maartensap.com/social-bias-frames/

Goal of this dataset :We want to offer open and free access to dataset, ensuring a wide reach to researchers and AI practitioners across the world. The dataset should be user-friendly to use and uploading and accessing data should be straightforward, to facilitate usage.

If you use this dataset, please cite us.

Files

newsmediabias-full.csv

Files (1.0 GB)

Name	Size	Download all
newsmediabias-full.csv md5:a65f4855d4fc2c975230f972cccfd6e1	1.0 GB	Preview Download

Additional details

Other: 2023

Navigating News Narratives: A Media Bias Analysis Dataset © 2023 by Shaina Raza, Vector Institute is licensed under CC BY-NC 4.0

Citations

Oops! Something went wrong while fetching results.

	All versions	This version
Views	488	285
Downloads	241	209
Data volume	343.3 GB	283.1 GB

Navigating News Narratives: A Media Bias Analysis Dataset

Contributors

Other:

Files

newsmediabias-full.csv

Files (1.0 GB)

Additional details

Dates

References

Navigating News Narratives: A Media Bias Analysis Dataset

Creators

Contributors

Other:

Description

Files

newsmediabias-full.csv

Files (1.0 GB)

Additional details

Dates

References