Published January 4, 2020 | Version v2
Dataset Open

NA12878 WES Benchmark dataset

  • 1. Vilnius University

Contributors

Project leader:

Project members:

  • 1. CHEO

Description

This dataset makes available the UCSC Genome Browser (genome.ucsc.edu) GRCh37 genome build public session NA12878 WES Benchmark files in a single dataset so that these files can be used in other applications or genome browsers such as IGV. 

The "Procedure and datasets to cross-reference OMIM genes with the genomic regions of interest"  Galaxy page  on  usegalaxy.org server's Shared Data Pages describes practical procedure and several possible use cases for this data set. This page can be accessed freely by users logged into their accounts on usegalaxy.org.  Please register if you don't have an account on usegalaxy.org Galaxy server.  

All  genomic variant calls in  all VCF files of this data set were decomposed and normalized with vt. This dataset contains: 

  1. Genome in a bottle (GIAB) version 3.3.2 high confidence (HC)  variant calls and genomic regions for HapMap individual NA12878 :
    1. GIAB_v3.3.2_NA12878-decomposed-normalized.vcf.gz
    2. GIAB_v3.3.2_NA12878-decomposed-normalized.vcf.gz.tbi
    3. GIAB_v3.3.2_NA12878_HC_regions.bed
  2. HapMap individual NA12878 WES variant calls (VCF) and capture regions (BED) from diagnostic laboratories :
    • ARUP whole exome sequencing data (HiSeq 2000) publically available from NCBI GeT-RM Browser
      1. converted_ARUP_NA12878_Exome-decomposed-normalized.vcf.gz
      2. converted_ARUP_NA12878_Exome-decomposed-normalized.vcf.gz.tbi
      3.  ARUP_SeqCap_EZ_Exome.bed
    • UCSF whole exome sequencing data (HiSeq 2500) publically available from NCBI GeT-RM Browser
      1. converted_UCSF_NA12878_WES_Agilent_V4_Custom-decomposed-normalized.vcf.gz
      2. converted_UCSF_NA12878_WES_Agilent_V4_Custom-decomposed-normalized.vcf.gz.tbi
      3. UCSF_WES_Agilent_V4_Custom.bed
    • Whole exome data (NextSeq 500) sequenced in CHEO diagnostic laboratory
      1. CHEO_NA12878_WES_S1dataset.vcf.gz
      2. CHEO_NA12878_WES_S1dataset.vcf.gz.tbi
      3. Agilent_CRE_v2.bed
  3. Genomic coordinates (BED) of OMIM genes for which a molecular basis of the associated disease is known (as of September 2019) :
    • Omim_Genes.bed 

Files

Files (432.2 MB)

Name Size Download all
md5:4427cc9e6411f38d79b57f7bc4a769b4
51.2 MB Download
md5:9f5b4cf0bc7fedf52d43b7ce91cc1bf7
4.7 MB Download
md5:abbddd37a52abd99eb21cc358a5107f5
208.7 MB Download
md5:2549bdb7bffd64490eeca80f91b85ad4
1.5 MB Download
md5:d58475a0ab622c14ab170eb5401d01b6
1.1 MB Download
md5:bd145f43fb1de7aa5a85acebb6f044ee
144.5 kB Download
md5:1e5121e446f97957de52a576ed95a7f0
3.6 MB Download
md5:e6e58089605a6fc95b005d62e9a5ec1c
283.6 kB Download
md5:b5447252fb60bdd1ea40b20c74136705
139.9 MB Download
md5:6c0dffc6f46ba5b5b598bab9706c3a70
1.6 MB Download
md5:d0c71cf4240e2c5bf111a26c3f741577
14.3 MB Download
md5:aa52a98bdcf98dce38bafe9e211b5b86
128.7 kB Download
md5:1de4675ac16d4b498154fa501b037d6e
5.1 MB Download

Additional details

References

  • Pranckeviciene E, Potter R, Huang L, Jarinova O. Validation of bcbio-nextgen Pipeline Based on NextSeq500 Exome Sequencing. In 2019 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI) 2019 May 19 (pp. 1-6). IEEE.