Published March 2, 2023 | Version v2
Dataset Open

AGC archives of human and SARS-CoV-2 genomes

  • 1. Silesian University of Technology
  • 2. DFCI & Harvard

Description

AGC is a tool to compress a collection of similar genomes. This Zenodo record provides pre-built AGC-3.0 archives of several datasets:

  • File "HPRC-yr1.agc" contains CHM13 and 94 haploid human assemblies released by HPRC in 2021. The telomere-to-telomere CHM13 v2 plus chrY from GRCh38 is used as the reference genome.
  • File "sars-cov-2_ncbi-620k.agc" contains 619,750 complete SARS-CoV-2 genomes with NC_045512.2 as the reference. It was created with AGC command line "agc create -cb10000 -s3000". SARS-CoV-2 genomes were downloaded from NCBI at the end of year 2021. The original FASTA is provided as "sars-cov-2_ncbi-620k.fa.xz".

Files

Files (1.5 GB)

Name Size Download all
md5:23e32a54b73d05c786c51330bd261de5
1.4 GB Download
md5:15639996fc99f293fea5ccdbbc9f8adb
1.9 kB Download
md5:83dc01539645e85fd0ac7338fcb41b9f
22.5 MB Download
md5:3535a25b04460983f0f9864e9320ab3b
761 Bytes Download
md5:0a3140edb6a67e7133be748b5af422a4
56.3 MB Download