Published April 26, 2021 | Version v0.4
Journal article Open

Data files for: Segmental duplications and their variation in a complete human genome.

  • 1. University of Washington School of Medicine Department of Genome Sciences

Description

This repository contains all assemblies used in the paper titled: Segmental duplications and their variation in a complete human genome. With the exception of T2T-CHM13 v1.0 and GRCh38 all assemblies in this repository were assembled with Hifiasm v0.12 using default parameters. The human samples with the exception of CHM1 were assembled using parental short-read data for phasing. All nonhuman primates and CHM1 were assembled without parental phasing information since none exists. The assembly of GRCh38 contains only the chromosome level sequences removing all other contigs, and the T2T assembly (v1.0) has the GRCh38 chrY added to facilitate identifying interchromosomal duplications shared with chrY. 

  • The zip file annotation_bed_files.zip contains the SD annotations and masking files used in the paper.
  • The zip file TBC1D3-data-and-msa-analysis.zip contains the multiple sequence alignments and steps used for phylogenetic analysis of TBC1D3. 
  • The zip file plot_data.zip contains information used for figure generation including: SD annotations, methylation data, Liftoff gene models, WSSD copy number estimates, FISH results, and RepeatMasker annotations.
  • The tar ball "important_biomedical_and _evolutionary_loci.tar" contains sequences and annotations for the 10 loci highlighted in the paper. 

The raw HiFi sequence data used for assembly is on the NCBI SRA:

  • HiFi data for Chimpanzee, Macaque, and Orangutan can be found under NCBI BioProject PRJNA659034, and PRJNA691628 for Bonobo and Gorilla. 
  • HiFi data for all the human samples can be found under the following accessions and BioProjects:
    • CHM13: NCBI SRA SRR11292120-SRR11292123 (PRJNA530776)
    • HG00733: NCBI SRA ERX3831682
    • HG002: NCBI SRA SRR10382244, SRR10382245, SRR10382248 and SRR10382249
    • HG00514: NCBI SRA ERX4795966
    • NA19240: NCBI SRA ERX4787609, ERX4787607, ERX4787606, ERX4782632, and ERX4781730
    • CHM1: 
    • HiFi data for all remaining human samples can be found under NCBI BioProject PRJNA701308

 

Code:

Notes

Upload for second submission.

Files

annotation_bed_files.zip

Files (34.7 GB)

Name Size Download all
md5:1ac84db26e6eec30dfddf37696aa67dc
492.2 MB Preview Download
md5:c9619b446cd957dace04738421dc84af
22.3 GB Download
md5:461536fab5333da3e230e6ddd818c70e
693.3 MB Download
md5:fc8a4c4953bc81dc6542850cc8065ce6
8.8 GB Download
md5:682bc904404e1f37a0e8f1bf8a8253d7
2.4 GB Preview Download
md5:5a766488df4bce89fe0b93308c9a97f4
132.1 kB Preview Download