Assemblies for generated for "SEGMENTAL DUPLICATIONS AND THEIR VARIATION IN A COMPLETE HUMAN GENOME"
Creators
- 1. University of Washington School of Medicine Department of Genome Sciences
Description
This repository contains all assemblies used in the paper titled "SEGMENTAL DUPLICATIONS AND THEIR VARIATION IN A COMPLETE HUMAN GENOME". With the exception of T2T CHM13 v1.0 and GRCh38 all assemblies in this repository were assembled with Hifiasm v0.12 using default parameters. The human samples with the exception of CHM1 were assembled using parental short-read data for phasing. All nonhuman primates and CHM1 were assembled without parental phasing information since none exists. The assembly of GRCh38 contains only the chromosome level sequences removing all other contigs, and the T2T assembly (v1.0) has the GRCh38 chrY added to facilitate identifying interchromosomal duplications shared with chrY.
The tar ball "data.tar" contains SD annotations, methylation data, Liftoff gene models, RepeatMasker annotations, and data used to make the figures.
The raw HiFi sequence data used for assembly is on the NCBI SRA:
- HiFi data for Chimpanzee, Macaque, and Orangutan can be found under NCBI BioProject PRJNA659034, and PRJNA691628 for Bonobo and Gorilla.
- HiFi data for all the human samples can be found under the following accessions and BioProjects:
- CHM13: NCBI SRA SRR11292120-SRR11292123 (PRJNA530776)
- HG00733: NCBI SRA ERX3831682
- HG002: NCBI SRA SRR10382244, SRR10382245, SRR10382248 and SRR10382249
- HG00514: NCBI SRA ERX4795966
- NA19240: NCBI SRA ERX4787609, ERX4787607, ERX4787606, ERX4782632, and ERX4781730
- CHM1:
- HiFi data for all remaining human samples can be found under NCBI BioProject PRJNA701308
Notes
Files
Files
(33.8 GB)
Name | Size | Download all |
---|---|---|
md5:1771aa7693da8f12c3b89c3805cae5c4
|
2.7 GB | Download |
md5:c9619b446cd957dace04738421dc84af
|
22.3 GB | Download |
md5:fc8a4c4953bc81dc6542850cc8065ce6
|
8.8 GB | Download |