Dataset Open Access
Datasets used in the study 'Task specialization and its effects on research careers'.
Dataset 1 (plos_contribution_data_set.csv). Seed dataset containing contribution and bibliometric data on a set of publications assigned to the Medical and Life Sciences from PLOS journals.
Dataset 2 (pub_history.csv). Dataset of author-publication combinations for the complete publication history of 222,295 disambiguated authors and 6,236,239 distinct publications.
Summary of the paper
Research evaluation remains largely focused on individuals’ leadership and excellence, disregarding the collaborative nature of their work. We model a set of 70,694 publications and 347,136 distinct authors using Bayesian networks to predict scientists’ specific contributions on each of their publications. We predict the contributions of 222,925 authors in 6,236,239 publications, and apply an archetypal analysis to profile scientists by career stage. We divide scientific careers into four stages: junior, early-career, mid-career and late-career. Three scientific archetypes are found throughout the four career stages: 1) leader, 2) specialized, and 3) supporting. All three archetypes are encountered for the early- and mid-career stages, whereas for junior and late-career stages only two archetypes are found: specialized and supporting for junior scholars, and leader and supporting for late-career scholars. Scientists assigned to the leader and specialized archetypes tend to have longer careers than researchers who belong to the supporting archetype. There is consistent gender bias at all stages: the majority of male scientists belong to the leader archetype, while the larger proportion of women belong to the specialized archetype, especially for early and mid-career researchers.