Sentiment Analysis outputs based on the combination of three classifiers for news headlines and body text

Mello, Caio; Cheema, Gullal S.

doi:10.5281/zenodo.6326348

Published March 3, 2022 | Version v1

Dataset Open

Sentiment Analysis outputs based on the combination of three classifiers for news headlines and body text

1. School of Advanced Study, University of London
2. TIB - Leibniz Information Center for Science and Technology

Sentiment Analysis outputs based on the combination of three classifiers for news headlines and body text covering the Olympic legacy of Rio 2016 and London 2012. Data was searched via Google search engine. It is composed of sentiment labels assigned to 1271 news articles in total.

News outlets:

BBC
Daily Mail
The Telegraph
The Guardian
Globo
Estadao
Folha de S. Paulo

Events covered by the articles:

London 2012 Olympic legacy
Rio 2016 Olympic legacy

All classifiers were used in texts in English. Text originally published in Portuguese by the Brazilian media were automatically translated.

Sentiment classifiers used:

Vader
BERT (Trained on Amazon data)
BERT (Trained on twitter data - 140)

Each document (spreadsheet - xlsx) refers to one outlet and one event (London 2012 or Rio 2016).

How were labels assigned to the texts?

These labels are a combination of the three sentiment classifiers listed above. If two of them agree with the same label, then this label would be considered as right. Otherwise, the label ‘other’ was assigned.

For news article body text: the proportion of sentences of each sentiment type was used to assign labels to the whole article instead of averaging the sentence scores. For example, if the proportion of sentences with negative labels is greater than 50%, then the article is assigned a negative label.

The documents are composed of the following columns:

Rank: the position of the article on Google search ranking
Date: date of article's publication (DD/MM/YYYY)
Link: article's link
Title: article's title
Sentiment_Title: final sentiment for article headline
Sentiment_Text: final sentiment for article's body text

PS: Documents do not include articles' body text.

Sentiment is presented in labels as follows:

Pos: Positive
Neg: Negative
Neutral: Neutral
other: inconclusive - if each of the 3 classifiers assigned a different label to the article, the label 'other' was used. Therefore, 'other' identifies contradictory results.

Files

Files (194.4 kB)

Name	Size	Download all
london_bbc.xlsx md5:4688dc2aa55abc736580565b75640c35	19.4 kB	Download
london_dailymail.xlsx md5:0d8d9d7120291ab3574a37dab1be0136	19.7 kB	Download
london_estadao.xlsx md5:c4a8f2bad2c100be11554e774491158a	7.1 kB	Download
london_folha.xlsx md5:82a18a88cb203aaa4ad8b60b7a380c67	6.1 kB	Download
london_globo.xlsx md5:ae5d2964ca4667a9b4946058bc46d2e6	7.6 kB	Download
london_telegraph.xlsx md5:cbda439130de289e441a9b667ebc4f0a	25.1 kB	Download
london_theguardian.xlsx md5:bbe6d517f5769777a62fec0a81edac41	22.6 kB	Download
rio_bbc.xlsx md5:0ad631b56cb2dcbd66c14ed52a229a8d	6.0 kB	Download
rio_dailymail.xlsx md5:4e0dc34ade055ecf7e2410d98394814d	6.5 kB	Download
rio_estadao.xlsx md5:67c88f148c6b8e91394e1428dabbb4e0	17.6 kB	Download
rio_folha.xlsx md5:1642aa06df5ef81f49eef57cd4b0e178	11.5 kB	Download
rio_globo.xlsx md5:4f3bbfc9a6578574fd41e54a8841cce5	29.7 kB	Download
rio_telegraph.xlsx md5:aa6cd83d968124f346be86f402a8ee34	5.8 kB	Download
rio_theguardian.xlsx md5:c7b0831a4122757c6adbbd3f41e6917e	9.7 kB	Download

Additional details

European Commission
Cleopatra - Cross-lingual Event-centric Open Analytics Research Academy 812997

	All versions	This version
Views	313	313
Downloads	564	564
Data volume	9.5 MB	9.5 MB

Sentiment Analysis outputs based on the combination of three classifiers for news headlines and body text

Authors/Creators

Description

Files

Files (194.4 kB)

Additional details

Funding