A Data Quality Multidimensional Model for Social Media Analysis
- 1. Universitat Jaume I
Description
This dataset comprises the data used in the paper for assessing the quality of several metrics in determining the relevance of the users.
The datasets consists of data extracted from Twitter for the automotive domain, where the query consisted in several brands and models of cars. We provide three datasets:
| users_all_metrics2.txt |
User_id, statuses, listed, friends, followers, tweets on domain (dataset), Screen name, User language, User location, Verified account (True/False), Coherence of profile (entropy of text under domain model), #Performed actions, #Received actions |
| tweets_all_metrics.txt.gz |
Tweet_id, replies, retweets, favourites, User_id, statuses, listed, friends, followers, tweets on domain (dataset), Screen name, User language, User location, Verified account (True/False), Coherence of profile, Date of publication (created_at), Tweet Language, processed text, coherence of text, repetitions of text in collection, user's received actions, user's generated actions, text polarity, number of facts, number of linked opinion expressions, number of linked entities |
| relevant_new.txt | Screen names of the users deemed relevant for the domain |
Datasets are "|"-separeted text files with no header provided (see table above for the name of the columns).
Files
Additional details
Funding
- Ministerio de Ciencia, Innovación y Universidades
- Prueba de Concepto para la Plataforma de Análisis Social Dinámico en el Contexto del Turismo Sostenible PDC2021-121097-I00
- Ministerio de Ciencia, Innovación y Universidades
- XAI4SOC: Explainable Artificial Intelligence for Healthy Aging and Social Wellbeing PID2021-123152OB-C22