Detection of Real-World Influence through Social Media
- 1. Université d'Avignon
- 2. Université d'Orléans
Description
Description. This dataset corresponds to the resources produced for the following conference paper and its extended version:
- J.-V. Cossu, N. Dugué, and V. Labatut, “Detecting Real-World Influence Through Twitter,” in 2nd European Network Intelligence Conference (ENIC), 2015, pp. 83–90. ⟨hal-01164453⟩ DOI: 10.1109/ENIC.2015.20
- J.-V. Cossu, V. Labatut, and N. Dugué, “A Review of Features for the Discrimination of Twitter Users: Application to the Prediction of Offline Influence,” Social Network Analysis and Mining 6:25, 2016. ⟨hal-01203171⟩ DOI: 10.1007/s13278-016-0329-x
Raw data are available through the official RepLab page: http://nlp.uned.es/replab2014/ (follow http://nlp.uned.es/replab2014/replab2014-dataset.tar.gz)
Source code. The source code used to generate these output is available on GitHub: https://github.com/CompNet/Influence
Funding. This work was partly funded by the French National Research Agency (ANR), through the project ImagiWeb ANR-12-CORD-0002.
Contact. Jean-Valère Cossu <jean-valere.cossu@alumni.univ-avignon.fr>
Citation. If you use these data, please cite paper [1] above.
@InProceedings{Cossu2015,
author = {Cossu, Jean-Valère and Dugué, Nicolas and Labatut, Vincent},
title = {Detecting Real-World Influence Through {Twitter}},
booktitle = {2\textsuperscript{nd} European Network Intelligence Conference},
year = {2015},
pages = {83-90},
address = {Karlskrona, SE},
publisher = {IEEE Publishing},
doi = {10.1109/ENIC.2015.20},
}
Details. This archive contains all ranking outputs formatted according to the TREC-EVAL tool format. These outputs consist for each domain in a ranked list of user from the most influential to the least influential. For a classification-type evaluation, just consider that users having a score higher than 0.5 are influential.
File names correspond to the system (those starting with Cos*, indicate: the method BoT for Bag-of-Tweets, UaD for User-as-Document; the use of the Tweet-Selection strategy files denoted Artex; the learning process with Global or separated models which are noted Multi and last but not least the decision strategy for Bag-of-Tweets: Counting or Sum) or feature name. Files starting with out_* contain the results of logistic regression ranking outputs. Files matrix_auto.dat and matrix_bank.dat contain the data used to feed the PLS model (code: plspm4influence.R).
RepLab 2014 uses Twitter data in English and Spanish. The balance between both languages depends on the availability of data for each of the profiles included in the dataset.
The training dataset consists of 7,000 Twitter profiles (all with at least 1,000 followers) related to the automotive and banking domains, evaluation is performed separately. Each profile consists of (i) author name; (ii) profile URL and (iii) the last 600 tweets published by the author at crawling time and have been manually labelled by reputation experts either as “opinion maker” (i.e. authors with reputational influence) or “non-opinion maker”. The objective is to find out which authors have more reputational influence (who the opinion makers are) and which profiles are less influential or have no influence at all.
Since Twitter ToS do not allow redistribution of tweets, only tweets ids and screen names are provided. Replab organizers provide details about how to download the tweets.
Files
ACTIA.png
Additional details
Related works
- Is compiled by
- Software: https://github.com/CompNet/Influence (URL)
- Is documented by
- Conference paper: 10.1109/ENIC.2015.20 (DOI)
- Journal article: 10.1007/s13278-016-0329-x (DOI)
- Obsoletes
- Dataset: 10.6084/m9.figshare.1506785 (DOI)
Funding
- Agence Nationale de la Recherche
- ImagiWeb – Image on the Web: analysis of the image life cycle through the Web 2.0_x000D_ ANR-12-CORD-0002