Published March 2, 2021 | Version v1
Dataset Restricted

underlying data for "PERCEIVE - ENGAGING THE PEOPLE": IS SOCIAL MEDIA COVERAGE OF EU POLICY ASSOCIATED WITH PUBLIC SUPPORT FOR EUROPEAN INTEGRATION?

  • 1. Università di Roma Tor Vergata
  • 2. Wirtschaftsuniversität Wien

Contributors

Project manager:

  • 1. Università di Roma Tor Vergata
  • 2. Wirtschaftsuniversität Wien
  • 3. Università di Bologna

Description

README file

Data Set Title: “PERCEIVE - ENGAGING THE PEOPLE’: IS SOCIAL MEDIA COVERAGE OF EU POLICY ASSOCIATED WITH PUBLIC SUPPORT FOR EUROPEAN INTEGRATION?”

Data Set Authors:

Vitaliano Barberio (Wirtschaftsuniversität Wien), ORCID http://orcid.org/0000-0002-2615-5006;

Luca Pareschi (Università di Roma Tor Vergata), ORCID http://orcid.org/0000-0002-4402-9329;

 

Data Set Contributors:

Ines Kuric (Wirtschaftsuniversität Wien);

Edoardo Mollona (Università di Bologna), ORCID  http://orcid.org/0000-0001-9496-8618.

Markus Höllerer (Wirtschaftsuniversität Wien); http://orcid.org/0000-0003-2509-2696

 

Data Set Contact Person:

Luca Pareschi (Università di Roma Tor Vergata), ORCID http://orcid.org/0000-0002-4402-9329;

luca.pareschi@uniroma2.it .

 

Data Set License: this data set is distributed under a Creative Commons Attribution (CC BY) 4.0 International license

 

Publication Year: 2021

Project Info: PERCEIVE (Perception and Evaluation of Regional and Cohesion Policies by Europeans and Identification with the Values of Europe), funded by European Union, Horizon 2020 Programme. Grant Agreement num. 693529; https://www.perceiveproject.eu/.  

Data set Contents

The data set consists of:

 

  • 1 README file
  • 6 textual qualitative file saved in .txt format

stoplist_file_[nation].txt

  • 12 textual quantitative file saved in .txt format

[source]-keys.txt”: 6 files

  • 2 excel quantitative files saved in .xlsx format

SentimentFB.xlsx”

topics_prevalence_and_clustering.xlsx”

 

 

 

Data set Documentation

Abstract

This data set contains the underlying data of the paper “’ENGAGING THE PEOPLE’: IS SOCIAL MEDIA COVERAGE OF EU POLICY ASSOCIATED WITH PUBLIC SUPPORT FOR EUROPEAN INTEGRATION?”.

Data openly available within this dataset are a subset of the two following data sets, which contains all the relevant data of Work Package 3 and Work Package 5 of PERCEIVE project:

For the paper we collected Facebook posts referred to EU CP policies. We don’t have the permission to share these data (as they are protected by copyright), but all the sources are described in Deliverable 5.2, which is public (see http://doi.org/10.6092/unibo/amsacta/5726 or http://doi.org/10.5281/zenodo.1318184). We analyzed the textual content of data to construct a database of discursive topics in Task5.4. Data set includes the results of topic modeling and of a sentiment analysis performed on the Facebook homepages of Local Management Authorities (LMA) of PERCEIVE case study regions. 

 

Content of the files:

 

  • 1 sub-folder, named “A_Stopword”, which contains all the stopword lists used for performing Topic Modeling. These are 6 .txt files, one for each language: Austrian, Italian, Polish, Romanian, Spanish, Swedish (“stoplist_file_[nation].txt”).
  • 1 sub-folder which contain the Topic Modeling results for Facebook profiles of the Local Managing Authorities for Austria, Italy, Poland, Romania, Spain, and Sweden (sub-folder “B_Facebook”, 12 .txt files). For each case, a file “[source]-keys.txt” lists the 100 most important words for each topic, while a file “[source]-composition.txt” details the topic composition of each textual source. These files were obtained through Mallet software[1].
  • File “SentimentFB.xlsx” contains data regarding the sentiment analysis for contents on Facebook homepages of Local Managing Authorities. The first column indicates the country, as well as row labels (see below). Columns 2-21 indicate the number id of the topics for each topic model (national level). The three rightmost columns of the file represent respectively a) the name of the lexicon used to detect sentiment orientation (i.e. “VADER”); c) the average sentiment score for positive, neutral and average words for each lexicon and each country; and c) the sentiment score across all topics in a country.
  • File “topics_prevalence_and_clustering.xlsx” contains data regarding the three clusters of topics analyzed in the paper. The first column represents the ID of each topic; the second column reports the cluster of each topic; the third and the fourth columns report the average prevalence of each topic (rows) in posts and comments, respectively. As these data refer to a regional case study, these columns refer the first region for each country; the sixth and the seventh columns report the average prevalence of each topic (rows) in posts and comments for the second region analyzed (only for those countries where we analyzed two regions); the eighth and ninth columns reports the average prevalence of topics and comments, respectively, for each country; and finally the tenth column reports the country to which data in the previous two columns are referred.

 

 

 

[1] McCallum, Andrew Kachites. "MALLET: A Machine Learning for Language Toolkit."http://mallet.cs.umass.edu. 2002.

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Additional details

Funding

European Commission
PERCEIVE - Perception and Evaluation of Regional and Cohesion policies by Europeans and Identification with the Values of Europe 693529