Published August 21, 2022 | Version 2020-08-23
Dataset Open

Open dataset of scholars on Twitter

  • 1. Dalhousie University


IMPORTANT NOTE: This dataset was created using the May 2022 OpenAlex data dump. In June 2023, OpenAlex announced the implementation of a new author disambiguation algorithm that replaced all the old IDs with new ones, essentially making the dataset unusable. A new version of the dataset using the new OpenAlex author IDs is being prepared and will be available shortly.

This is a dataset of paired OpenAlex author_ids ( and tweeter_ids (usernames).

The dataset includes 492,124 unique author_ids and 423,920 unique tweeter_ids forming 498,672 unique author-tweeter pairs. The file contains the following columns:

Column Description
author_id author_id from OpenAlex
tweeter_id tweeter_id of the Twitter user
criteria A list of the different matching criteria that identified the pair
valid This column indicates whether the match has been manually checked. A 0 indicates a false positive, and a 1 indicates a true positive. Empty rows have not been manually validated.

When using the dataset, please cite the following preprint which provides details about the matching process:

Mongeon, P., Bowman, T. D., & Costas, R. (2022). An open dataset of scholars on Twitter (arXiv:2208.11065). arXiv.

Links to R scripts can be found here:




Files (96.5 MB)

Name Size Download all
96.5 MB Preview Download