Published June 12, 2020 | Version v1
Dataset Open

Datasets on contributorship and bibliometric variables for the study 'Task specialization across research careers'

  • 1. Delft Institute of Applied Mathematics, TU Delft, Netherlands
  • 2. Centre for Science and Technology Studies, Leiden University, Netherlands
  • 3. School of Informatics, Computing, and Engineering, Indiana University Bloomington, United States
  • 4. École de bibliothéconomie et des sciences de l'information, Université de Montréal, Canada


Datasets used in the study 'Task specialization and its effects on research careers'.

Dataset 1 (plos_contribution_data_set.csv). Seed dataset containing contribution and bibliometric data on a set of publications assigned to the Medical and Life Sciences from PLOS journals.

Dataset 2 (pub_history.csv). Dataset of author-publication combinations for the complete publication history of 222,295 disambiguated authors and 6,236,239 distinct publications.


Summary of the paper

Research evaluation remains largely focused on individuals’ leadership and excellence, disregarding the collaborative nature of their work. We model a set of 70,694 publications and 347,136 distinct authors using Bayesian networks to predict scientists’ specific contributions on each of their publications. We predict the contributions of 222,925 authors in 6,236,239 publications, and apply an archetypal analysis to profile scientists by career stage. We divide scientific careers into four stages: junior, early-career, mid-career and late-career. Three scientific archetypes are found throughout the four career stages: 1) leader, 2) specialized, and 3) supporting. All three archetypes are encountered for the early- and mid-career stages, whereas for junior and late-career stages only two archetypes are found: specialized and supporting for junior scholars, and leader and supporting for late-career scholars. Scientists assigned to the leader and specialized archetypes tend to have longer careers than researchers who belong to the supporting archetype. There is consistent gender bias at all stages: the majority of male scientists belong to the leader archetype, while the larger proportion of women belong to the specialized archetype, especially for early and mid-career researchers. 


Paper using these datasets available here:



Files (894.2 MB)

Name Size Download all
39.0 MB Preview Download
855.2 MB Preview Download

Additional details


LEaDing Fellows – LEaDing Fellows 707404
European Commission