There is a newer version of the record available.

Published October 19, 2024 | Version 1.2
Dataset Open

A collection of high-quality human assemblies

Description

A collection of high-quality human assemblies, including:

  • T2T-CHM13 v2.0 analysis set with HG002 chrY and rCRS chrM
  • GRCh38 no-alt analysis set with rCRS chrM
  • HG002 v1.1
  • CN1 v1.0.1
  • YAO v1.1
  • 156 HPRC "r2-v1" samples (312 assemblies)

Use AGC to extract indivual genomes and use ropebwt3 to query the FM-index:

agc listset human320.agc   # list genomes
agc getset human320.agc 400131_HG02615.pat > HG02615.pat.fa # extract one genome
gzip -d human320.fmr.gz     # decompress the incremental index
ropebwt3 build -i human320.fmr -do human320.fmd  # convert to a faster query format

Note: HPRC samples are already available from GenBank but are not formally published. You may use the data for algorithm development or performance evaluation. If you want to use the genomes for biological discovery, please contact HPRC.

Files

Files (16.3 GB)

Name Size Download all
md5:6310a4be1ab2b3cc66b4f383a84f3b4a
2.6 GB Download
md5:9ece0874cc8f6e355c70acbe4aabe0ac
199.2 kB Download
md5:856bf507c3cc73ac6b1198006f9c95c2
6.1 GB Download
md5:c1fa13d87b428725ae906c878e96786a
7.7 GB Download
md5:2d2e617ec6133664b9e351fd5f1b8833
161.0 kB Download

Additional details

Related works

Continues
Dataset: 10.5281/zenodo.11533210 (DOI)
Is published in
Journal article: 10.1093/bioinformatics/btae717 (DOI)