Presentation Open Access
In this study we develop a model to predict the probability of performing specific contribution roles based on bibliometric data to identify the contribution of each author in this dataset for all publications in their research career. Based on these predictions we profile career trajectories. We use as a seed dataset, contribution data from 71,083 articles and reviews from PLOS journals published between 2006 and 2013 from Medical and Health Sciences. These contribution types are the following: 1) wrote the manuscript, 2) conceived the experiments, 3) performed the experiments, 4) analysed the data, 5) contributed with tools, 6) approved the final version, and 7) other. A total of 348,710 unique disambiguated authors are identified using an author name disambiguation algorithm (14).
We consider the following bibliometric variables to predict contribution types: document type, author order, academic age (15) at the time of publication, total number of papers published at the time of publication, total number of author, total number of countries and total number of institutions authoring the publication. We employ a data-driven approach to construct a Bayesian network (16) which graphically represents the relation between the bibliometric and contribution variables and also entails the joint probability distribution that accounts for the existing dependencies between the model’s variables.
Then we segregate our set of researchers by four career stages: junior (<5 years since first publication), early-career (≥5 to <15 years since first publication), mid-career (≥15 to 30 years since first publication) and late career (≥30 years since first publication). We then perform an archetypal analysis (17) at each stage to identify profiles of researchers based on their predicted contributorship.