Published February 8, 2024 | Version v1
Dataset Restricted

A Data Quality Multidimensional Model for Social Media Analysis

Description

This dataset comprises the data used in the paper for assessing the quality of several metrics in determining the relevance of the users.

The datasets consists of data extracted from Twitter for the automotive domain, where the query consisted in several brands and models of cars. We provide three datasets:

users_all_metrics2.txt

User_id, statuses, listed, friends, followers,  tweets on domain (dataset), Screen name, User language, User location, Verified account (True/False), Coherence of profile (entropy of text under domain model), #Performed actions, #Received actions

tweets_all_metrics.txt.gz

Tweet_id, replies, retweets, favourites, User_id, statuses, listed, friends, followers, tweets on domain (dataset), Screen name, User language, User location, Verified account (True/False), Coherence of profile, Date of publication (created_at), Tweet Language, processed text, coherence of text, repetitions of text in collection, user's received actions, user's generated actions, text polarity, number of facts, number of linked opinion expressions, number of linked entities

relevant_new.txt Screen names of the users deemed relevant for the domain

 

Datasets are "|"-separeted text files with no header provided (see table above for the name of the columns).

Files

Restricted

The record is publicly accessible, but files are restricted. <a href="https://zenodo.org/account/settings/login?next=https://zenodo.org/records/10636895">Log in</a> to check if you have access.

Additional details

Funding

Ministerio de Ciencia, Innovación y Universidades
Prueba de Concepto para la Plataforma de Análisis Social Dinámico en el Contexto del Turismo Sostenible PDC2021-121097-I00
Ministerio de Ciencia, Innovación y Universidades
XAI4SOC: Explainable Artificial Intelligence for Healthy Aging and Social Wellbeing PID2021-123152OB-C22