Sentiment Analysis outputs based on the combination of three classifiers for news headlines and body text
Authors/Creators
- 1. School of Advanced Study, University of London
- 2. TIB - Leibniz Information Center for Science and Technology
Description
Sentiment Analysis outputs based on the combination of three classifiers for news headlines and body text covering the Olympic legacy of Rio 2016 and London 2012. Data was searched via Google search engine. It is composed of sentiment labels assigned to 1271 news articles in total.
News outlets:
- BBC
- Daily Mail
- The Telegraph
- The Guardian
- Globo
- Estadao
- Folha de S. Paulo
Events covered by the articles:
- London 2012 Olympic legacy
- Rio 2016 Olympic legacy
All classifiers were used in texts in English. Text originally published in Portuguese by the Brazilian media were automatically translated.
Sentiment classifiers used:
- Vader
- BERT (Trained on Amazon data)
- BERT (Trained on twitter data - 140)
Each document (spreadsheet - xlsx) refers to one outlet and one event (London 2012 or Rio 2016).
How were labels assigned to the texts?
These labels are a combination of the three sentiment classifiers listed above. If two of them agree with the same label, then this label would be considered as right. Otherwise, the label ‘other’ was assigned.
For news article body text: the proportion of sentences of each sentiment type was used to assign labels to the whole article instead of averaging the sentence scores. For example, if the proportion of sentences with negative labels is greater than 50%, then the article is assigned a negative label.
The documents are composed of the following columns:
- Rank: the position of the article on Google search ranking
- Date: date of article's publication (DD/MM/YYYY)
- Link: article's link
- Title: article's title
- Sentiment_Title: final sentiment for article headline
- Sentiment_Text: final sentiment for article's body text
PS: Documents do not include articles' body text.
Sentiment is presented in labels as follows:
- Pos: Positive
- Neg: Negative
- Neutral: Neutral
- other: inconclusive - if each of the 3 classifiers assigned a different label to the article, the label 'other' was used. Therefore, 'other' identifies contradictory results.
Files
Files
(194.4 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:4688dc2aa55abc736580565b75640c35
|
19.4 kB | Download |
|
md5:0d8d9d7120291ab3574a37dab1be0136
|
19.7 kB | Download |
|
md5:c4a8f2bad2c100be11554e774491158a
|
7.1 kB | Download |
|
md5:82a18a88cb203aaa4ad8b60b7a380c67
|
6.1 kB | Download |
|
md5:ae5d2964ca4667a9b4946058bc46d2e6
|
7.6 kB | Download |
|
md5:cbda439130de289e441a9b667ebc4f0a
|
25.1 kB | Download |
|
md5:bbe6d517f5769777a62fec0a81edac41
|
22.6 kB | Download |
|
md5:0ad631b56cb2dcbd66c14ed52a229a8d
|
6.0 kB | Download |
|
md5:4e0dc34ade055ecf7e2410d98394814d
|
6.5 kB | Download |
|
md5:67c88f148c6b8e91394e1428dabbb4e0
|
17.6 kB | Download |
|
md5:1642aa06df5ef81f49eef57cd4b0e178
|
11.5 kB | Download |
|
md5:4f3bbfc9a6578574fd41e54a8841cce5
|
29.7 kB | Download |
|
md5:aa6cd83d968124f346be86f402a8ee34
|
5.8 kB | Download |
|
md5:c7b0831a4122757c6adbbd3f41e6917e
|
9.7 kB | Download |