Contributor: HHU Düsseldorf / Marschall Lab Contact: Jana Ebler [jana.ebler@hhu.de] / Tobias Marschall [tobias.marschall@hhu.de] Genotying results produced for the minigraph-cactus paper based on GRCh38 and CHM13. PanGenie version v2.1.0 was used. Pipelines used: https://bitbucket.org/jana_ebler/hprc-experiments/src/master/ (GRCh38) and https://bitbucket.org/jana_ebler/hprc-experiments/src/chm13-based-pipeline/ (CHM13) GRCh38-based results: 1.) grch38_all-samples_bi_all.vcf.gz: PanGenie genotypes across all 300 pilot samples and the panel samples in bi-allelic representation. Contains the unfiltered set (= all variants). In order to obtain the final, filtered genotypes, extract all variants with confidence_level >= 1 as defined in file: grch38_bi_all_filters.tsv.gz. This can be done based on the provided script using the following command: zcat all-samples_bi_all.vcf.gz | python3 select_ids.py grch38_bi_all_filters.tsv filtered | bgzip -c > grch38_all-samples_bi_filtered.vcf.gz 2.) grch38_bi_all_filters.tsv.gz: filters computed across genotypes. The column "confidence_level" defines which variants are in the unfiltered, positive and filtered set of variants. - unfiltered set (= all variants): confidence_level >= 0 - positive set: confidence_level = 4 - final filtered set: confidence_level >= 1 CHM13-based results: 1.) cactus_filtered_ids_chm13.vcf.gz: Input VCF used for PanGenie. Filtered and preprocessed version of the Minigraph-Cactus VCF for CHM13. 2.) chm13_all-samples_bi_all.vcf.gz: PanGenie genotypes across all 300 pilot samples and the panel samples in bi-allelic representation. Contains the unfiltered set (= all variants). In order to obtain the final, filtered genotypes, extract all variants with confidence_level >= 1 as defined in file: chm13_bi_all_filters.tsv.gz. This can be done based on the provided script using the following command: zcat all-samples_bi_all.vcf.gz | python3 select_ids.py chm13_bi_all_filters.tsv filtered | bgzip -c > chm13_all-samples_bi_filtered.vcf.gz 3.) chm13_bi_all_filters.tsv.gz: filters computed across genotypes. The column "confidence_level" defines which variants are in the unfiltered, positive and filtered set of variants. - unfiltered set (= all variants): confidence_level >= 0 - positive set: confidence_level = 4 - final filtered set: confidence_level >= 1 HGSVC GRCh38-based results: 1.) hgsvc_all-samples_bi_all.vcf.gz: PanGenie genotypes across all 300 pilot samples and the panel samples in bi-allelic representation. Contains the unfiltered set (= all variants). In order to obtain the final, filtered genotypes, extract all variants with confidence_level >= 1 as defined in file: hgsvc_bi_all_filters.tsv.gz. This can be done based on the provided script using the following command: zcat all-samples_bi_all.vcf.gz | python3 select_ids.py hgsvc_bi_all_filters.tsv filtered | bgzip -c > hgsvc_all-samples_bi_filtered.vcf.gz 2.) hgsvc_bi_all_filters.tsv.gz: filters computed across genotypes. The column "confidence_level" defines which variants are in the unfiltered, positive and filtered set of variants. - unfiltered set (= all variants): confidence_level >= 0 - positive set: confidence_level = 4 - final filtered set: confidence_level >= 1