BY-COVID - WP5 - Baseline Use Case: SARS-CoV-2 vaccine effectiveness assessment

Validation

Compliance with the Common Data Model specification

We check whether the imported dataset complies with the data model specification (https://doi.org/10.5281/zenodo.6913045).

To comply with the data model, the dataset must pass a number of validation rules. The data are tested against this set of validation rules and results from this validation process are summarized.

Validation rule Name rule Items Passes Fails Percentage of fails Number of NAs Percentage of NAs Error Warning
is.na(age_nm) | age_nm - 5 >= -1e-08 & age_nm - 115 <= 1e-08 V01 5308428 5308428 0 0% 0 0% FALSE FALSE
is.na(sex_cd) | sex_cd %vin% c(“0”, “1”, “2”, “9”) V02 5308428 5308428 0 0% 0 0% FALSE FALSE
is.na(dose_1_brand_cd) | dose_1_brand_cd %vin% c(“BP”, “MD”, “JJ”, “AZ”, “NV”) V03 5308428 5308428 0 0% 0 0% FALSE FALSE
is.na(dose_2_brand_cd) | dose_2_brand_cd %vin% c(“BP”, “MD”, “JJ”, “AZ”, “NV”) V04 5308428 5308428 0 0% 0 0% FALSE FALSE
is.na(doses_nm) | doses_nm - 0 >= -1e-08 & doses_nm - 10 <= 1e-08 V05 5308428 5308428 0 0% 0 0% FALSE FALSE
fully_vaccinated_bl == FALSE | fully_vaccinated_bl == TRUE & !is.na(vaccination_schedule_cd) V06 5308428 5308428 0 0% 0 0% FALSE FALSE
is.na(test_type_cd) | test_type_cd %vin% c(“PCR”, “AG”, “other”) V07 5308428 5308428 0 0% 0 0% FALSE FALSE
is.na(variant_cd) | variant_cd %vin% c(“alpha”, “beta”, “gamma”, “delta”, “omicron”, “epsilon”, “zeta”, “eta”, “theta”, “iota”, “kappa”, “lambda”, “mu”) V08 5308428 5121033 187395 3.53% 0 0% FALSE FALSE
is.na(pregnancy_bl) | pregnancy_bl == FALSE | (pregnancy_bl == TRUE & sex_cd == “2” & age_nm - 12 >= -1e-08 & age_nm - 55 <= 1e-08) V09 5308428 5308428 0 0% 0 0% FALSE FALSE
is.na(essential_worker_bl) | essential_worker_bl == FALSE | (essential_worker_bl == TRUE & age_nm - 16 >= -1e-08 & age_nm - 70 <= 1e-08) V10 5308428 5307878 550 0.01% 0 0% FALSE FALSE
(is.na(dose_1_dt) & is.na(dose_2_dt)) | is.na(dose_2_dt) | !is.na(dose_1_dt) & !is.na(dose_2_dt) & (dose_1_dt < dose_2_dt) V11 5308428 5308428 0 0% 0 0% FALSE FALSE
(is.na(dose_2_dt) & is.na(dose_3_dt)) | is.na(dose_3_dt) | !is.na(dose_2_dt) & !is.na(dose_3_dt) & (dose_2_dt < dose_3_dt) V12 5308428 5308428 0 0% 0 0% FALSE FALSE
is.na(previous_infection_dt) | is.na(confirmed_case_dt) | !is.na(previous_infection_dt) & !is.na(confirmed_case_dt) & (previous_infection_dt < confirmed_case_dt) V13 5308428 5308428 0 0% 0 0% FALSE FALSE
is.na(confirmed_case_dt) | is.na(exitus_dt) | !is.na(confirmed_case_dt) & !is.na(exitus_dt) & (confirmed_case_dt <= exitus_dt) V14 5308428 5308419 9 0% 0 0% FALSE FALSE
is.na(previous_infection_dt) | is.na(exitus_dt) | !is.na(previous_infection_dt) & !is.na(exitus_dt) & (previous_infection_dt <= exitus_dt) V15 5308428 5308428 0 0% 0 0% FALSE FALSE
is.na(fully_vaccinated_dt) | is.na(exitus_dt) | !is.na(fully_vaccinated_dt) & !is.na(exitus_dt) & fully_vaccinated_dt <= exitus_dt V16 5308428 5308351 77 0% 0 0% FALSE FALSE
(!is.na(dose_1_dt) & !is.na(dose_2_dt) & !is.na(dose_3_dt) & doses_nm - 3 >= -1e-08) | (!is.na(dose_1_dt) & !is.na(dose_2_dt) & is.na(dose_3_dt) & abs(doses_nm - 2) <= 1e-08) | (!is.na(dose_1_dt) & is.na(dose_2_dt) & is.na(dose_3_dt) & abs(doses_nm - 1) <= 1e-08) | (is.na(dose_1_dt) & is.na(dose_2_dt) & is.na(dose_3_dt) & abs(doses_nm - 0) <= 1e-08) V17 5308428 4478848 0 0% 829580 15.63% FALSE FALSE
is.na(dose_1_dt) | (!is.na(dose_1_dt) & !is.na(dose_1_brand_cd)) V18 5308428 5307590 838 0.02% 0 0% FALSE FALSE
is.na(dose_2_dt) | (!is.na(dose_2_dt) & !is.na(dose_2_brand_cd) & !is.na(dose_1_dt) & !is.na(dose_1_brand_cd)) V19 5308428 5307141 1287 0.02% 0 0% FALSE FALSE
is.na(dose_3_dt) | (!is.na(dose_3_dt) & !is.na(dose_3_brand_cd) & !is.na(dose_2_dt) & !is.na(dose_2_brand_cd) & !is.na(dose_1_dt) & !is.na(dose_1_brand_cd)) V20 5308428 5307115 1313 0.02% 0 0% FALSE FALSE
(dose_1_brand_cd == “JJ” & !is.na(dose_1_dt) & !is.na(fully_vaccinated_dt) & fully_vaccinated_bl == TRUE) | (dose_1_brand_cd != “JJ” & !is.na(dose_2_dt) & !is.na(fully_vaccinated_dt) & fully_vaccinated_bl == TRUE) | (is.na(dose_1_brand_cd) & is.na(fully_vaccinated_dt) & fully_vaccinated_bl == FALSE) | (dose_1_brand_cd != “JJ” & is.na(dose_2_dt) & is.na(fully_vaccinated_dt) & fully_vaccinated_bl == FALSE) V21 5308428 5307686 0 0% 742 0.01% FALSE FALSE

The vertical bars in the validation plot indicate the percentage of records ‘Passing’, ‘Failing’ and ‘Missing’

Non-compliance with the Common Data Model specification

The set of validation rules are considered ‘essential’ not to be violated to be considered for the subsequent analysis. A logical variable flag_violation_val is created in the cohort_data table in the BY-COVID-WP5-BaselineUseCase-VE.duckdb database and set to TRUE when at least one of the validation rules in the pre-specified set is violated (otherwise this variable is set to FALSE).

flag_violating_val==TRUE flag_violating_val==FALSE
189831 5118597