Published March 3, 2022 | Version v1
Dataset Open

Sentiment Analysis outputs based on the combination of three classifiers for news headlines and body text

  • 1. School of Advanced Study, University of London
  • 2. TIB - Leibniz Information Center for Science and Technology

Description

Sentiment Analysis outputs based on the combination of three classifiers for news headlines and body text covering the Olympic legacy of Rio 2016 and London 2012. Data was searched via Google search engine. It is composed of sentiment labels assigned to 1271 news articles in total.

News outlets:

  • BBC
  • Daily Mail
  • The Telegraph
  • The Guardian
  • Globo
  • Estadao
  • Folha de S. Paulo

Events covered by the articles:

  • London 2012 Olympic legacy
  • Rio 2016 Olympic legacy

All classifiers were used in texts in English. Text originally published in Portuguese by the Brazilian media were automatically translated.

Sentiment classifiers used:

  • Vader
  • BERT (Trained on Amazon data)
  • BERT (Trained on twitter data - 140)

Each document (spreadsheet - xlsx) refers to one outlet and one event (London 2012 or Rio 2016).

How were labels assigned to the texts?

These labels are a combination of the three sentiment classifiers listed above. If two of them agree with the same label, then this label would be considered as right. Otherwise, the label ‘other’ was assigned.

For news article body text: the proportion of sentences of each sentiment type was used to assign labels to the whole article instead of averaging the sentence scores. For example, if the proportion of sentences with negative labels is greater than 50%, then the article is assigned a negative label.

The documents are composed of the following columns:

  • Rank: the position of the article on Google search ranking
  • Date: date of article's publication (DD/MM/YYYY)
  • Link: article's link
  • Title: article's title
  • Sentiment_Title: final sentiment for article headline
  • Sentiment_Text: final sentiment for article's body text

PS: Documents do not include articles' body text.

Sentiment is presented in labels as follows:

  • Pos: Positive
  • Neg: Negative
  • Neutral: Neutral
  • other: inconclusive - if each of the 3 classifiers assigned a different label to the article, the label 'other' was used. Therefore, 'other' identifies contradictory results.

 

Files

Files (194.4 kB)

Name Size Download all
md5:4688dc2aa55abc736580565b75640c35
19.4 kB Download
md5:0d8d9d7120291ab3574a37dab1be0136
19.7 kB Download
md5:c4a8f2bad2c100be11554e774491158a
7.1 kB Download
md5:82a18a88cb203aaa4ad8b60b7a380c67
6.1 kB Download
md5:ae5d2964ca4667a9b4946058bc46d2e6
7.6 kB Download
md5:cbda439130de289e441a9b667ebc4f0a
25.1 kB Download
md5:bbe6d517f5769777a62fec0a81edac41
22.6 kB Download
md5:0ad631b56cb2dcbd66c14ed52a229a8d
6.0 kB Download
md5:4e0dc34ade055ecf7e2410d98394814d
6.5 kB Download
md5:67c88f148c6b8e91394e1428dabbb4e0
17.6 kB Download
md5:1642aa06df5ef81f49eef57cd4b0e178
11.5 kB Download
md5:4f3bbfc9a6578574fd41e54a8841cce5
29.7 kB Download
md5:aa6cd83d968124f346be86f402a8ee34
5.8 kB Download
md5:c7b0831a4122757c6adbbd3f41e6917e
9.7 kB Download

Additional details

Funding

European Commission
Cleopatra - Cross-lingual Event-centric Open Analytics Research Academy 812997