BY-COVID - WP5 - Baseline Use Case: SARS-CoV-2 vaccine effectiveness assessment

Validation

Compliance with the Common Data Model specification

We check whether the imported dataset complies with the data model specification (https://doi.org/10.5281/zenodo.6913045).

To comply with the data model, the dataset must pass a number of validation rules. The data are tested against this set of validation rules and results from this validation process are summarized.

Validation rule Name rule Items Passes Fails Percentage of fails Number of NAs Percentage of NAs Error Warning
is.na(age_nm) | age_nm - 5 >= -1e-08 & age_nm - 115 <= 1e-08 V01 1143805 1143805 0 0% 0 0% FALSE FALSE
is.na(sex_cd) | sex_cd %vin% c(“0”, “1”, “2”, “9”) V02 1143805 1143805 0 0% 0 0% FALSE FALSE
is.na(dose_1_brand_cd) | dose_1_brand_cd %vin% c(“BP”, “MD”, “JJ”, “AZ”, “NV”) V03 1143805 1143805 0 0% 0 0% FALSE FALSE
is.na(dose_2_brand_cd) | dose_2_brand_cd %vin% c(“BP”, “MD”, “JJ”, “AZ”, “NV”) V04 1143805 1143805 0 0% 0 0% FALSE FALSE
is.na(doses_nm) | doses_nm - 0 >= -1e-08 & doses_nm - 10 <= 1e-08 V05 1143805 1143805 0 0% 0 0% FALSE FALSE
fully_vaccinated_bl == FALSE | fully_vaccinated_bl == TRUE & !is.na(vaccination_schedule_cd) V06 1143805 1143366 439 0.04% 0 0% FALSE FALSE
is.na(test_type_cd) | test_type_cd %vin% c(“PCR”, “AG”, “other”) V07 1143805 1143805 0 0% 0 0% FALSE FALSE
is.na(variant_cd) | variant_cd %vin% c(“alpha”, “beta”, “gamma”, “delta”, “omicron”, “epsilon”, “zeta”, “eta”, “theta”, “iota”, “kappa”, “lambda”, “mu”) V08 1143805 1143805 0 0% 0 0% FALSE FALSE
is.na(pregnancy_bl) | pregnancy_bl == FALSE | (pregnancy_bl == TRUE & sex_cd == “2” & age_nm - 12 >= -1e-08 & age_nm - 55 <= 1e-08) V09 1143805 1133843 9962 0.87% 0 0% FALSE FALSE
is.na(essential_worker_bl) | essential_worker_bl == FALSE | (essential_worker_bl == TRUE & age_nm - 16 >= -1e-08 & age_nm - 70 <= 1e-08) V10 1143805 1143801 4 0% 0 0% FALSE FALSE
(is.na(dose_1_dt) & is.na(dose_2_dt)) | is.na(dose_2_dt) | !is.na(dose_1_dt) & !is.na(dose_2_dt) & (dose_1_dt < dose_2_dt) V11 1143805 1143356 449 0.04% 0 0% FALSE FALSE
(is.na(dose_2_dt) & is.na(dose_3_dt)) | is.na(dose_3_dt) | !is.na(dose_2_dt) & !is.na(dose_3_dt) & (dose_2_dt < dose_3_dt) V12 1143805 1143614 191 0.02% 0 0% FALSE FALSE
is.na(previous_infection_dt) | is.na(confirmed_case_dt) | !is.na(previous_infection_dt) & !is.na(confirmed_case_dt) & (previous_infection_dt < confirmed_case_dt) V13 1143805 1143805 0 0% 0 0% FALSE FALSE
is.na(confirmed_case_dt) | is.na(exitus_dt) | !is.na(confirmed_case_dt) & !is.na(exitus_dt) & (confirmed_case_dt <= exitus_dt) V14 1143805 1143791 14 0% 0 0% FALSE FALSE
is.na(previous_infection_dt) | is.na(exitus_dt) | !is.na(previous_infection_dt) & !is.na(exitus_dt) & (previous_infection_dt <= exitus_dt) V15 1143805 1143794 11 0% 0 0% FALSE FALSE
is.na(fully_vaccinated_dt) | is.na(exitus_dt) | !is.na(fully_vaccinated_dt) & !is.na(exitus_dt) & fully_vaccinated_dt <= exitus_dt V16 1143805 1143492 313 0.03% 0 0% FALSE FALSE
(!is.na(dose_1_dt) & !is.na(dose_2_dt) & !is.na(dose_3_dt) & doses_nm - 3 >= -1e-08) | (!is.na(dose_1_dt) & !is.na(dose_2_dt) & is.na(dose_3_dt) & abs(doses_nm - 2) <= 1e-08) | (!is.na(dose_1_dt) & is.na(dose_2_dt) & is.na(dose_3_dt) & abs(doses_nm - 1) <= 1e-08) | (is.na(dose_1_dt) & is.na(dose_2_dt) & is.na(dose_3_dt) & abs(doses_nm - 0) <= 1e-08) V17 1143805 1097965 181 0.02% 45659 3.99% FALSE FALSE
is.na(dose_1_dt) | (!is.na(dose_1_dt) & !is.na(dose_1_brand_cd)) V18 1143805 1143278 527 0.05% 0 0% FALSE FALSE
is.na(dose_2_dt) | (!is.na(dose_2_dt) & !is.na(dose_2_brand_cd) & !is.na(dose_1_dt) & !is.na(dose_1_brand_cd)) V19 1143805 1143260 545 0.05% 0 0% FALSE FALSE
is.na(dose_3_dt) | (!is.na(dose_3_dt) & !is.na(dose_3_brand_cd) & !is.na(dose_2_dt) & !is.na(dose_2_brand_cd) & !is.na(dose_1_dt) & !is.na(dose_1_brand_cd)) V20 1143805 1143519 286 0.03% 0 0% FALSE FALSE
(dose_1_brand_cd == “JJ” & !is.na(dose_1_dt) & !is.na(fully_vaccinated_dt) & fully_vaccinated_bl == TRUE) | (dose_1_brand_cd != “JJ” & !is.na(dose_2_dt) & !is.na(fully_vaccinated_dt) & fully_vaccinated_bl == TRUE) | (is.na(dose_1_brand_cd) & is.na(fully_vaccinated_dt) & fully_vaccinated_bl == FALSE) | (dose_1_brand_cd != “JJ” & is.na(dose_2_dt) & is.na(fully_vaccinated_dt) & fully_vaccinated_bl == FALSE) V21 1143805 1143366 0 0% 439 0.04% FALSE FALSE

The vertical bars in the validation plot indicate the percentage of records ‘Passing’, ‘Failing’ and ‘Missing’

Non-compliance with the Common Data Model specification

The set of validation rules are considered ‘essential’ not to be violated to be considered for the subsequent analysis. A logical variable flag_violation_val is created in the cohort_data table in the BY-COVID-WP5-BaselineUseCase-VE.duckdb database and set to TRUE when at least one of the validation rules in the pre-specified set is violated (otherwise this variable is set to FALSE).

flag_violating_val==TRUE flag_violating_val==FALSE
11491 1132314