Published January 31, 2024
| Version v1
Dataset
Restricted
Multi-dimensional author profiling by business roles
Authors/Creators
- 1. Universitat Jaume I
Description
This dataset contains the data used in the paper "Multidimensional Author Profiling for Social Business Intelligence", more specifically, the gold standard (GS) and silver standard (SS) created for training and validating text classifiers for business profiling of social network users.
The GS dataset is a CSV file with the following columns:
- screen-name, user-id, verified-user (boolean), multi-level-label, manual-verification, textual-description, followers (int), friends (int), source (not used)
The attribute "multi-level label" contains label represeting the user business profile, regarding the three perspectives: role, colective-vs-individual, and on-domain ones. The attribute "manual-verification" is a second pass from experts to validate the assigned label.
The SS dataset is a "|"-separated text file with the following columns:
-
screen-name|user-id|verified-user|multi-level-label|textual-description
The SS dataset is generated with an unsupervised method through an initial seed of bigrams. Therefore, the dataset can contain wrong and incomplete labels, hence the name silver standard (SS).
As data is captured from Twitter, we can only relase it under restricted conditions.
Files
Additional details
Funding
- Ministerio de Ciencia, Innovación y Universidades
- Prueba de Concepto para la Plataforma de Análisis Social Dinámico en el Contexto del Turismo Sostenible PDC2021-121097-I00
- Ministerio de Ciencia, Innovación y Universidades
- XAI4SOC: Explainable Artificial Intelligence for Healthy Aging and Social Wellbeing PID2021-123152OB-C22