This archive contains the #Élysée2017fr dataset.
(Initially published at https://web.archive.org/web/20200530171644if_/https://dataverse.mpi-sws.org/dataverse/icwsm18 on June 24, 2018. This dataverse being defunct now, we repost on Zenodo)
The keywords used to collect the initial dataset, each presented with the start and stop dates of use (date format: YYYY-MM-DD).
The manual profiles annotations. The file contains the following columns:
The profile's id used by Twitter
"individual" if the profile is managed by a single person, else "non individual". The "non individual" label is itself divided in 3 subcategories:
The profile's political affiliation(s), indicated as the shortcut for the political party:
When a profile has 2 affiliations, they are separated by a slash (ex: "ps/fi").
For individual profiles only. Indicates if the profile's owner self-identify as a media professional (journalist, editorialist, ...)
For individual profiles only. Indicates the sex of the profile's owner:
Files containing the tweets and retweets ids, divided according to the political affiliation of their authors for more flexibility.
Each file contains one tweet id per line.
Files containing the mention and retweet networks, in NCOL and GraphML format.
The NCOL files contains the directed weighted edges between profiles, one per line, in the following format: profile1_twitter_id profile2_twitter_id edge_weight
The GraphML files contains the directed weighted edges between profiles, as well as all the profiles annotations presented in profiles_annotations.csv. They can be opened using a graph visualisation software like Gephi.
You can use various tools to help you get tweets from their ids, we suggest the following:
Fraisier Ophélie, Cabanac Guillaume, Pitarch Yoann, Besançon Romaric, Boughanem Mohand. 2018. #Élysée2017fr: the French Presidential Election on Twitter. In International Conference on Weblogs and Social Media. https://aaai.org/ocs/index.php/ICWSM/ICWSM18/paper/view/17821 (https://hal.archives-ouvertes.fr/hal-02319715)