Info: Zenodo’s user support line is staffed on regular business days between Dec 23 and Jan 5. Response times may be slightly longer than normal.

There is a newer version of the record available.

Published July 28, 2022 | Version v1.2
Dataset Open

Project Adotto Tandem-Repeat Regions and Annotations

Creators

Description

Collection of inputs and outputs from a project attempting to catalog tandem-repeat regions in Human genomes. Details on the project can be found on the github.

Notes

Changes: - Updated with two additional pathogenic repeats - FGF14 from https://www.nejm.org/doi/full/10.1056/NEJMoa2207406 - THAP11 from https://pubmed.ncbi.nlm.nih.gov/37148549/ - Intersected with 118 phenotypic VNTRs - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8549062/ The phenotypic VNTRs had no overlap to pathogenic repeats. Therefore, we added the phenotypic VNTR's gene name to the pathogenic column of the catalog. This makes the v1.2 'patho' column more of a 'patho/pheno' # File Structure: | Column | Definition | |---------------|------------------------------------------------------------------------------------------------| | chr | Chromosome of the region | | start | Start position of the region | | end | End position of the region | | ovl_flag | overlap categories of annotations inside the region | | up_buff | number of bases upstream of the first annotation's start that are non-TR sequence | | dn_buff | number of bases downstream of the last annotation's end that are non-TR sequence | | hom_span | number of bases of the region found to be homopolymer repeats | | n_filtered | number of annotations removed from the region | | n_annos | number of annotations remaining in the region | | n_subregions | number of subregions in the region | | mu_purity | average purity of annotations in region | | pct_annotated | percent of the region's range (minus buffer) annotated | | interspersed | name of interspersed repeat class found within region by RepeatMasker v4.1.4 | | patho/pheno | name of gene affected by a pathogenic or phenotypic tandem repeat in region | | codis | name of CODIS site contained in region | | gene_flag | gene features intersecting region (Enseml v105) | | biotype | comma separated gene biotypes intersecting region (Enseml v105) | | annos | JSON of TRF annotations in the region (list of dicts with keys: motif, entropy, ovl_flag, etc) |

Files

Files (102.8 MB)

Name Size Download all
md5:2178373611d9ae5facc46bf52d401c1b
102.8 MB Download