Dataset for "Is Wikipedia Politically Biased?"
Creators
Description
· This work aims to determine whether there is evidence of political bias in English Wikipedia articles.
· Wikipedia is one of the most visited domains on the Web, attracting hundreds of millions of unique users per month. Wikipedia content is also routinely used for training Large Language Models (LLMs), which are the core engines driving cutting edge AI systems.
· To study political bias in Wikipedia content, we analyze the sentiment (positive, neutral or negative) with which a set of target terms (N=1,628) with political connotations (i.e. names of recent U.S. presidents, U.S. congressmembers, U.S. Supreme Court Justices, or Prime Ministers of Western countries) are used in Wikipedia articles.
· We do not cherry pick the set of terms to be included in the analysis but instead use publicly available pre-existing lists of terms from Wikipedia and other sources.
· We find a mild to moderate tendency in Wikipedia articles to associate public figures politically aligned right-of-center with more negative sentiment than left-of-center public figures.
· These favorable associations for left-leaning public figures are apparent for names of recent U.S. Presidents, U.S. Supreme Court Justices, U.S. Senators, U.S. House of Representatives Congressmembers, U.S. State Governors, Western countries’ Prime Ministers, and prominent U.S. based journalists and media organizations.
· Despite being common, these politically asymmetrical sentiment associations are not ubiquitous. We find no evidence of them in the sentiment with which names of U.K. MPs and U.S. based think tanks are used in Wikipedia articles.
· We also find larger associations of negative emotions (i.e. anger and disgust) with right-leaning public figures and positive emotion (i.e. joy) with left-leaning public figures.
· The trends just described constitute suggestive evidence of political bias embedded in Wikipedia articles.
· We also find some of the aforementioned sentiment political associations embedded in Wikipedia articles popping up in OpenAI’s language models. This is suggestive of the potential for biases in Wikipedia content percolating into widely used AI systems.
· Wikipedia’s neutral point of view policy (NPOV) aims for articles in Wikipedia to be written in an impartial and unbiased tone. Our results suggest that Wikipedia’s neutral point of view policy is not achieving its stated goal of political viewpoint neutrality.
· This report highlights areas where Wikipedia can improve in how it presents political information. Nonetheless, we want to acknowledge Wikipedia’s significant and valuable role as a public resource. We hope this work inspires efforts to uphold and strengthen Wikipedia’s principles of neutrality and impartiality.
The set of 1,653 target terms used in our analysis, the sample of Wikipedia paragraphs where they occur (as of 2022) and their sentiment and emotion annotations are provided in the files:
- WikipediaParagraphsWithTargetNGramsAndSentiment.csv
- WikipediaParagraphsWithTargetNGramsAndEmotion.csv