Published January 31, 2024 | Version v1
Dataset Restricted

Multi-dimensional author profiling by business roles

Description

This dataset contains the data used in the paper "Multidimensional Author Profiling for Social Business Intelligence", more specifically, the gold standard (GS) and silver standard (SS) created for training and validating text classifiers for business profiling of social network users.

The GS dataset is a CSV file with the following columns:

  • screen-name, user-id, verified-user (boolean), multi-level-labelmanual-verification, textual-description, followers (int), friends (int), source (not used)
The attribute "multi-level label" contains label represeting the user business profile, regarding the three perspectives: role, colective-vs-individual, and on-domain ones. The attribute "manual-verification" is a second pass from experts to validate the assigned label.
 
The SS dataset is a "|"-separated text file with the following columns:
  • screen-name|user-id|verified-user|multi-level-label|textual-description
     
The SS dataset is generated with an unsupervised method through an initial seed of bigrams. Therefore, the dataset can contain wrong and incomplete labels, hence the name silver standard (SS).
 
As data is captured from Twitter, we can only relase it under restricted conditions.

Files

Restricted

The record is publicly accessible, but files are restricted to users with access.

Additional details

Funding

Ministerio de Ciencia, Innovación y Universidades
Prueba de Concepto para la Plataforma de Análisis Social Dinámico en el Contexto del Turismo Sostenible PDC2021-121097-I00
Ministerio de Ciencia, Innovación y Universidades
XAI4SOC: Explainable Artificial Intelligence for Healthy Aging and Social Wellbeing PID2021-123152OB-C22