BY-COVID - WP5 - Baseline Use Case: SARS-CoV-2 vaccine effectiveness assessment

Validation

Compliance with the Common Data Model specification

We check whether the imported dataset complies with the data model specification (https://doi.org/10.5281/zenodo.6913045).

To comply with the data model, the dataset must pass a number of validation rules. The data are tested against this set of validation rules and results from this validation process are summarized.

Validation rule Name rule Items Passes Fails Percentage of fails Number of NAs Percentage of NAs Error Warning
is.na(age_nm) | age_nm - 5 >= -1e-08 & age_nm - 115 <= 1e-08 V01 3948815 3915229 33586 0.85% 0 0% FALSE FALSE
is.na(sex_cd) | sex_cd %vin% c(“0”, “1”, “2”, “9”) V02 3948815 3948815 0 0% 0 0% FALSE FALSE
is.na(dose_1_brand_cd) | dose_1_brand_cd %vin% c(“BP”, “MD”, “JJ”, “AZ”, “NV”) V03 3948815 3948815 0 0% 0 0% FALSE FALSE
is.na(dose_2_brand_cd) | dose_2_brand_cd %vin% c(“BP”, “MD”, “JJ”, “AZ”, “NV”) V04 3948815 3948815 0 0% 0 0% FALSE FALSE
is.na(doses_nm) | doses_nm - 0 >= -1e-08 & doses_nm - 10 <= 1e-08 V05 3948815 3948815 0 0% 0 0% FALSE FALSE
fully_vaccinated_bl == FALSE | fully_vaccinated_bl == TRUE & !is.na(vaccination_schedule_cd) V06 3948815 3948815 0 0% 0 0% FALSE FALSE
is.na(test_type_cd) | test_type_cd %vin% c(“PCR”, “AG”, “other”) V07 3948815 3817013 131802 3.34% 0 0% FALSE FALSE
is.na(variant_cd) | variant_cd %vin% c(“alpha”, “beta”, “gamma”, “delta”, “omicron”, “epsilon”, “zeta”, “eta”, “theta”, “iota”, “kappa”, “lambda”, “mu”) V08 3948815 3893433 55382 1.4% 0 0% FALSE FALSE
is.na(pregnancy_bl) | pregnancy_bl == FALSE | (pregnancy_bl == TRUE & sex_cd == “2” & age_nm - 12 >= -1e-08 & age_nm - 55 <= 1e-08) V09 3948815 3948815 0 0% 0 0% FALSE FALSE
is.na(essential_worker_bl) | essential_worker_bl == FALSE | (essential_worker_bl == TRUE & age_nm - 16 >= -1e-08 & age_nm - 70 <= 1e-08) V10 3948815 3937254 11351 0.29% 210 0.01% FALSE FALSE
(is.na(dose_1_dt) & is.na(dose_2_dt)) | is.na(dose_2_dt) | !is.na(dose_1_dt) & !is.na(dose_2_dt) & (dose_1_dt < dose_2_dt) V11 3948815 3948815 0 0% 0 0% FALSE FALSE
(is.na(dose_2_dt) & is.na(dose_3_dt)) | is.na(dose_3_dt) | !is.na(dose_2_dt) & !is.na(dose_3_dt) & (dose_2_dt < dose_3_dt) V12 3948815 3948813 2 0% 0 0% FALSE FALSE
is.na(previous_infection_dt) | is.na(confirmed_case_dt) | !is.na(previous_infection_dt) & !is.na(confirmed_case_dt) & (previous_infection_dt < confirmed_case_dt) V13 3948815 3948815 0 0% 0 0% FALSE FALSE
is.na(confirmed_case_dt) | is.na(exitus_dt) | !is.na(confirmed_case_dt) & !is.na(exitus_dt) & (confirmed_case_dt <= exitus_dt) V14 3948815 3948751 64 0% 0 0% FALSE FALSE
is.na(previous_infection_dt) | is.na(exitus_dt) | !is.na(previous_infection_dt) & !is.na(exitus_dt) & (previous_infection_dt <= exitus_dt) V15 3948815 3948787 28 0% 0 0% FALSE FALSE
is.na(fully_vaccinated_dt) | is.na(exitus_dt) | !is.na(fully_vaccinated_dt) & !is.na(exitus_dt) & fully_vaccinated_dt <= exitus_dt V16 3948815 3948693 122 0% 0 0% FALSE FALSE
(!is.na(dose_1_dt) & !is.na(dose_2_dt) & !is.na(dose_3_dt) & doses_nm - 3 >= -1e-08) | (!is.na(dose_1_dt) & !is.na(dose_2_dt) & is.na(dose_3_dt) & abs(doses_nm - 2) <= 1e-08) | (!is.na(dose_1_dt) & is.na(dose_2_dt) & is.na(dose_3_dt) & abs(doses_nm - 1) <= 1e-08) | (is.na(dose_1_dt) & is.na(dose_2_dt) & is.na(dose_3_dt) & abs(doses_nm - 0) <= 1e-08) V17 3948815 3756176 192639 4.88% 0 0% FALSE FALSE
is.na(dose_1_dt) | (!is.na(dose_1_dt) & !is.na(dose_1_brand_cd)) V18 3948815 3948815 0 0% 0 0% FALSE FALSE
is.na(dose_2_dt) | (!is.na(dose_2_dt) & !is.na(dose_2_brand_cd) & !is.na(dose_1_dt) & !is.na(dose_1_brand_cd)) V19 3948815 3948815 0 0% 0 0% FALSE FALSE
is.na(dose_3_dt) | (!is.na(dose_3_dt) & !is.na(dose_3_brand_cd) & !is.na(dose_2_dt) & !is.na(dose_2_brand_cd) & !is.na(dose_1_dt) & !is.na(dose_1_brand_cd)) V20 3948815 3948813 2 0% 0 0% FALSE FALSE
(dose_1_brand_cd == “JJ” & !is.na(dose_1_dt) & !is.na(fully_vaccinated_dt) & fully_vaccinated_bl == TRUE) | (dose_1_brand_cd != “JJ” & !is.na(dose_2_dt) & !is.na(fully_vaccinated_dt) & fully_vaccinated_bl == TRUE) | (is.na(dose_1_brand_cd) & is.na(fully_vaccinated_dt) & fully_vaccinated_bl == FALSE) | (dose_1_brand_cd != “JJ” & is.na(dose_2_dt) & is.na(fully_vaccinated_dt) & fully_vaccinated_bl == FALSE) V21 3948815 3948815 0 0% 0 0% FALSE FALSE

The vertical bars in the validation plot indicate the percentage of records ‘Passing’, ‘Failing’ and ‘Missing’

Non-compliance with the Common Data Model specification

The set of validation rules are considered ‘essential’ not to be violated to be considered for the subsequent analysis. A logical variable flag_violation_val is created in the cohort_data table in the BY-COVID-WP5-BaselineUseCase-VE.duckdb database and set to TRUE when at least one of the validation rules in the pre-specified set is violated (otherwise this variable is set to FALSE).

flag_violating_val==TRUE flag_violating_val==FALSE
414278 3534537