Open dataset of scholars on Twitter
IMPORTANT NOTE: This dataset was created using the May 2022 OpenAlex data dump. In June 2023, OpenAlex announced the implementation of a new author disambiguation algorithm that replaced all the old IDs with new ones, essentially making the dataset unusable. A new version of the dataset using the new OpenAlex author IDs is being prepared and will be available shortly.
This is a dataset of paired OpenAlex author_ids (https://docs.openalex.org/about-the-data/author) and tweeter_ids (usernames).
The dataset includes 492,124 unique author_ids and 423,920 unique tweeter_ids forming 498,672 unique author-tweeter pairs. The file contains the following columns:
|author_id||author_id from OpenAlex|
|tweeter_id||tweeter_id of the Twitter user|
|criteria||A list of the different matching criteria that identified the pair|
|valid||This column indicates whether the match has been manually checked. A 0 indicates a false positive, and a 1 indicates a true positive. Empty rows have not been manually validated.|
When using the dataset, please cite the following preprint which provides details about the matching process:
Mongeon, P., Bowman, T. D., & Costas, R. (2022). An open dataset of scholars on Twitter (arXiv:2208.11065). arXiv. https://doi.org/10.48550/arXiv.2208.11065
Links to R scripts can be found here: https://github.com/pmongeon/scholars-on-twitter/.