Published April 1, 2024 | Version v2
Dataset Open

Open dataset of scholars on Twitter (X)

Description

This is a version 2 dataset of paired OpenAlex author IDs (https://docs.openalex.org/about-the-data/author) and Twitter (now X) user IDs

Major update in this version

Following the significant update to OpenAlex's author identification system, the scholars on Twitter dataset, which previously linked Twitter IDs to OpenAlex author IDs, immediately became outdated. This called for a new approach to re-establish these links, as the absence of new Twitter data made it impossible to replicate the original method of matching Twitter profiles with scholarly authors. To navigate this challenge, a bridge was constructed between the June 2022 snapshot of the OpenAlex database—used in the original matching process—and the most recent snapshot from February 2024. This bridge utilized OpenAlex works IDs and DOIs to match authors in both datasets by their shared publications and identical primary names. When a connection was established between two authors with the same name, the new OpenAlex author ID was assigned to the corresponding Twitter ID. When direct matches based on primary names were not found, an attempt was made to establish connections by matching the names from June 2022 with any corresponding alternative names found in the 2024 dataset. This method ensured continuity of identity through the system update, adapting the strategy to link profiles across the temporal divide created by the database's overhaul.

Our efficient method for re-establishing links between author IDs and Twitter profiles has been notably successful, managing to rematch 432,417 (88%) OpenAlex author IDs. This effort successfully restored connections for 388,968 unique Twitter users, which represents 92% of the original dataset. Of these, 375,316 were matched using their primary names, and 57,101 through alternative names. The simplicity and quick execution of this approach led to exceptionally favourable results, with a minimal loss of only 8% of the original Twitter-linked scholarly accounts.

The dataset includes  432,417 unique author_ids and 388,968 unique tweeter_ids forming 462,427 unique author-tweeter pairs.

 

File descriptions

  • authors_tweeters_2024_02.csv is the actual dataset of author IDs paired with tweeter IDs. The "alternative" column indicates if the match was made with the primary name (0) or an alternate name (1).
  • mapping_tweeters_2022_2024.csv contains the relationship made between the 2022 author IDs and the 2024 author IDs, including the names.

 

How to cite

When using the dataset, please cite the following article providing details about the matching process:

Mongeon, P., Bowman, T. D., & Costas, R. (2023). An open data set of scholars on Twitter. Quantitative Science Studies, 1–11.

Files

authors_tweeters_2024_02.csv

Files (81.5 MB)

Name Size Download all
md5:c53da0871f2fb56e0758165c920143c8
22.7 MB Preview Download
md5:f1c51c38f551c5eca785cc3fb82d43f8
58.8 MB Preview Download