The dataset contained a large number of missing observations. Therefore, we have applied a number of imputation schemes as a mean to limit the statistical power lost by the missingness. Specifically, we implemented the following three imputation strategies: Imputation under the missing completely at random (MCAR) assumption and imputation under the missing at random (MAR) assumption, using either only other predictors for modelling the varibale with missing values or using both other predictors and components from the response variable from the survival time model, as suggested by White & Royston (2009). The variable to be imputed is denoted in the following.

The MCAR assumption implies, as the name suggests, that the mechanism resulting in missing values is completely random and independent of all marginal and multivariate distributions in the data. This imputation scheme is implemented by drawing observations randomly from the marginal distribution of the VI and inserting these into the missing slots of the VI.

The two other imputation methods involve fitting models with the VI as response variable and a number of predictors. In both of the strategies, we used up to five predictors from the total set of predictors in the data to fit either normal, linear models (for continuous responses) or logit regression models (for binary responses). The predicted values for missing observations resulting from these models then constituted the imputations. The five predictors used were the variables that had the strongest marginal effects on the response in terms of \(p\)-values. Only variables with \(p \leq 0.05\) and no missing observations were considered. In the White & Royston setup, two additional predictor variables were included, namely the Nelson-Aalen estimate of the cummulative hazard and the indicator variable for censoring. White & Royston show that such a model has the best imputation properties in a simulation setup, compared to both of the other imputation schemes mentioned here and a number of other methods. The imputation models were constructed using the original pre-imputation dataset, so imputed variables could not function as predictors for other imputation models. This means that no concerns about the order in which the imputations were conducted are necessary. Categorical variables with more than two levels were imputed using the MCAR strategy outlined above, as imputation modelling for this kind of variables has not been investigated in White & Royston.