PhD-delay Dataset for Online Stats Training

Rens van de Schoot

This is a dataset used for the online stats training website ( and is based on the data used by  Van de Schoot, Yerkes, Mouw and Sonneveld 2013


Among many other questions, the researchers asked the Ph.D. recipients how long it took them to finish their Ph.D. thesis (n=333). It appeared that Ph.D. recipients took an average of 59.8 months (five years and four months) to complete their Ph.D. trajectory. The variable B3_difference_extra measures the difference between planned and actual project time in months (mean=9.97, minimum=-31, maximum=91, sd=14.43). For the the exercises we are interested in the question whether age (M = 31.7, SD = 6.86) of the Ph.D. recipients is related to a delay in their project. The relation between completion time and age is expected to be non-linear. This might be due to that at a certain point in your life (i.e., mid thirties), family life takes up more of your time than when you are in your twenties or when you are older. So, in our model the gapgap (B3_difference_extra) is the dependent variable and ageage (E22_Age) and age2age2(E22_Age_Squared ) are the predictors.

For more information on the sample, instruments, methodology and research context we refer the interested reader to  Van de Schoot, Yerkes, Mouw and Sonneveld 2013.


Note that this dataset is for teaching purposes only.
