underlying data for "PERCEIVE - ENGAGING THE PEOPLE": IS SOCIAL MEDIA COVERAGE OF EU POLICY ASSOCIATED WITH PUBLIC SUPPORT FOR EUROPEAN INTEGRATION?
Authors/Creators
- 1. Università di Roma Tor Vergata
- 2. Wirtschaftsuniversität Wien
Contributors
Data curators:
Project manager:
Project members:
- 1. Università di Roma Tor Vergata
- 2. Wirtschaftsuniversität Wien
- 3. Università di Bologna
Description
README file
Data Set Title: “PERCEIVE - ENGAGING THE PEOPLE’: IS SOCIAL MEDIA COVERAGE OF EU POLICY ASSOCIATED WITH PUBLIC SUPPORT FOR EUROPEAN INTEGRATION?”
Data Set Authors:
Vitaliano Barberio (Wirtschaftsuniversität Wien), ORCID http://orcid.org/0000-0002-2615-5006;
Luca Pareschi (Università di Roma Tor Vergata), ORCID http://orcid.org/0000-0002-4402-9329;
Data Set Contributors:
Ines Kuric (Wirtschaftsuniversität Wien);
Edoardo Mollona (Università di Bologna), ORCID http://orcid.org/0000-0001-9496-8618.
Markus Höllerer (Wirtschaftsuniversität Wien); http://orcid.org/0000-0003-2509-2696
Data Set Contact Person:
Luca Pareschi (Università di Roma Tor Vergata), ORCID http://orcid.org/0000-0002-4402-9329;
Data Set License: this data set is distributed under a Creative Commons Attribution (CC BY) 4.0 International license
Publication Year: 2021
Project Info: PERCEIVE (Perception and Evaluation of Regional and Cohesion Policies by Europeans and Identification with the Values of Europe), funded by European Union, Horizon 2020 Programme. Grant Agreement num. 693529; https://www.perceiveproject.eu/.
Data set Contents
The data set consists of:
- 1 README file
- 6 textual qualitative file saved in .txt format
“stoplist_file_[nation].txt”
- 12 textual quantitative file saved in .txt format
“[source]-keys.txt”: 6 files
- 2 excel quantitative files saved in .xlsx format
“SentimentFB.xlsx”
“topics_prevalence_and_clustering.xlsx”
Data set Documentation
Abstract
This data set contains the underlying data of the paper “’ENGAGING THE PEOPLE’: IS SOCIAL MEDIA COVERAGE OF EU POLICY ASSOCIATED WITH PUBLIC SUPPORT FOR EUROPEAN INTEGRATION?”.
Data openly available within this dataset are a subset of the two following data sets, which contains all the relevant data of Work Package 3 and Work Package 5 of PERCEIVE project:
- Data set: “PERCEIVE: WP3: Effectiveness of communication strategies of EU projects” https://doi.org/10.5281/zenodo.3371133
- Data set: “PERCEIVE: WP5: The multiplicity of shared meanings of EU and Cohesion Regional and Urban Policy at different discursive levels” https://doi.org/10.5281/zenodo.3371174
For the paper we collected Facebook posts referred to EU CP policies. We don’t have the permission to share these data (as they are protected by copyright), but all the sources are described in Deliverable 5.2, which is public (see http://doi.org/10.6092/unibo/amsacta/5726 or http://doi.org/10.5281/zenodo.1318184). We analyzed the textual content of data to construct a database of discursive topics in Task5.4. Data set includes the results of topic modeling and of a sentiment analysis performed on the Facebook homepages of Local Management Authorities (LMA) of PERCEIVE case study regions.
Content of the files:
- 1 sub-folder, named “A_Stopword”, which contains all the stopword lists used for performing Topic Modeling. These are 6 .txt files, one for each language: Austrian, Italian, Polish, Romanian, Spanish, Swedish (“stoplist_file_[nation].txt”).
- 1 sub-folder which contain the Topic Modeling results for Facebook profiles of the Local Managing Authorities for Austria, Italy, Poland, Romania, Spain, and Sweden (sub-folder “B_Facebook”, 12 .txt files). For each case, a file “[source]-keys.txt” lists the 100 most important words for each topic, while a file “[source]-composition.txt” details the topic composition of each textual source. These files were obtained through Mallet software[1].
- File “SentimentFB.xlsx” contains data regarding the sentiment analysis for contents on Facebook homepages of Local Managing Authorities. The first column indicates the country, as well as row labels (see below). Columns 2-21 indicate the number id of the topics for each topic model (national level). The three rightmost columns of the file represent respectively a) the name of the lexicon used to detect sentiment orientation (i.e. “VADER”); c) the average sentiment score for positive, neutral and average words for each lexicon and each country; and c) the sentiment score across all topics in a country.
- File “topics_prevalence_and_clustering.xlsx” contains data regarding the three clusters of topics analyzed in the paper. The first column represents the ID of each topic; the second column reports the cluster of each topic; the third and the fourth columns report the average prevalence of each topic (rows) in posts and comments, respectively. As these data refer to a regional case study, these columns refer the first region for each country; the sixth and the seventh columns report the average prevalence of each topic (rows) in posts and comments for the second region analyzed (only for those countries where we analyzed two regions); the eighth and ninth columns reports the average prevalence of topics and comments, respectively, for each country; and finally the tenth column reports the country to which data in the previous two columns are referred.
[1] McCallum, Andrew Kachites. "MALLET: A Machine Learning for Language Toolkit."http://mallet.cs.umass.edu. 2002.