SEN - Sentiment analysis of Entities in News headlines
Creators
- 1. Polish-Japanese Academy of Information Technology
- 2. Polish-Japanese Academy of Information Technology / Institute of Computer Science Polish Academy of Sciences
Description
If you wish to use this data please cite:
Katarzyna Baraniak, Marcin Sydow,
A dataset for Sentiment analysis of Entities in News headlines (SEN),
Procedia Computer Science,
Volume 192,
2021,
Pages 3627-3636,
ISSN 1877-0509,
https://doi.org/10.1016/j.procs.2021.09.136.
(https://www.sciencedirect.com/science/article/pii/S1877050921018755)
bibtex: users.pja.edu.pl/~msyd/bibtex/sydow-baraniak-SENdataset-kes21.bib
SEN is a novel publicly available human-labelled dataset for training and testing machine learning algorithms for the problem of entity level sentiment analysis of political news headlines.
On-line news portals play a very important role in the information society. Fair media should present reliable and objective information. In practice there is an observable positive or negative bias concerning named entities (e.g. politicians) mentioned in the on-line news headlines.
Our dataset consists of 3819 human-labelled political news headlines coming from several major on-line media outlets in English and Polish.
Each record contains a news headline, a named entity mentioned in the headline and a human annotated label (one of “positive”, “neutral”, “negative” ). Our SEN dataset package consists of 2 parts: SEN-en (English headlines that split into SEN-en-R and SEN-en-AMT), and SEN-pl (Polish headlines). Each headline-entity pair was annotated via team of volunteer researchers (the whole SEN-pl dataset and a subset of 1271 English records: the SEN-en-R subset, “R” for “researchers”) or via the Amazon Mechanical Turk service (a subset of 1360 English records: the SEN-en-AMT subset).
During analysis of annotation outlying annotations and removed . Separate version of dataset without outliers is marked by "noutliers" in data file name.
Details of the process of preparing the dataset and presenting its analysis are presented in the paper.
In case of any questions, please contact one of the authors. Email adresses are in the paper.
Files
Additional details
Related works
- Is published in
- Conference paper: 10.1016/j.procs.2021.09.136 (DOI)