Published December 15, 2024 | Version 1.4
Dataset Open

Genome, repeat, and functional annotation associated with the naked mole-rat genome assembly, mHetGlaV3 (GCA_964261345.1)

  • 1. ROR icon University of Toronto
  • 2. ROR icon Hospital for Sick Children
  • 3. ROR icon Ontario Institute for Cancer Research

Description

The naked mole-rat (NMR; Heterocephalus glaber) is a eusocial subterranean rodent with a highly unusual set of physiological traits, such as extreme longevity, that has attracted great interest amongst the scientific community. However, the genetic basis of most of these traits has not been elucidated. To facilitate our understanding of the molecular mechanisms underlying NMR physiology and behaviour, we generated a long-read chromosomal-level genome assembly of the NMR. This genome, mHetGlaV2, was subsequently annotated and incorporated into a “91 eutherian mammals” multiple whole genome alignment in Ensembl. 

We identified intra-chromosomal misassemblies within mHetGlaV2. We fixed these misassemblies by comparing syntenic blocks between this assembly and the Canadian Porcupine (EreDor) genome assembly (https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_028451465.1/) and a FISH-Karyotype of the naked mole-rat completed by Romanenko et al., 2023 (PMID: 380307020) to address any misassemblies and place centromeres. Chromosome numbering was identified from a composite karyogram of karyotypes from over 350 cells. This scaffold-corrected assembly is labelled mHetGlaV3 (https://www.ebi.ac.uk/ena/browser/view/GCA_964261345.1).

This repository stores the repeat, genome, and epigenome annotations for HetGlaV3.

mHetGlaV3.primary.gtf.gz. Gene structures and gene symbols are transferred from ENSEMBL annotations of mHetGlaV2 using liftOff with default parameters. Additional gene symbols were identified using TOGA and manual curation.

mHetGlaV3.primary.gtf.gz. Simple repetitive regions and transposable elements were annotated using EarlGrey (https://github.com/TobyBaril/EarlGrey) using "Rodentia" annotations for RepeatMasker.

mHetGlaV3.primary.genesymbol_table.txt.txt.gz. A tab-delimited file where rows are gene IDs and columns are gene symbols generated with each method. "Consensus" shows the best matching gene symbol for each gene ID.

mHetGlaV3.primary_annotated_blacklist.bed.gz. Provides an assembly "blacklist" for mHetGlaV3. This blacklist is a bed file annotating assembly breakpoints between HetGlaV2 and HetGlaV3. This blacklist contains additional columns (e.g., closest gene, overlapping TE etc.) and should therefore be filtered to the first column before being incorporated into traditional genomic pipelines.

mHetGlaV3.primary_hypothalamus_ABC_enhancer.bedpe.gz. Activity-By-Contact enhancers (https://github.com/broadinstitute/ABC-Enhancer-Gene-Prediction) generated in the female subordinate naked mole-rat hypothalamus using Hi-C-seq, ChIP-seq of H3K27Ac data, ATAC-seq, and RNA-seq information.

mHetGlaV3.primary_hypothalamus_chromHMM.bed.gz. Chromatin states (using Chromhmm) annotating the female subordinate naked mole-rat hypothalamus using H3K4me3 (promoter), H4K4me2 (promoter-enhancer), H3K27Ac (active enhancer), H3K36me3 (elongated), H3K27me3 (polycomb repressed), H3K9me3 (heterochromatin), and CTCF (whole brain) ChIP-seq data, as well as ATAC-seq and RNA-seq data.

mHetGlaV3.primary.fa.gz. Genome assembly fasta file for the naked mole-rat (V3, primary assembly). This assembly matches the primary assembly stored on ENA, however the chromosome names match these files, rather than have chromosome names processed by ENA (e.g. chr 1 instead of "OZ179169.1 Heterocephalus glaber genome assembly, chromosome: 1").

 

UPDATES:

* The 1.2 update fixed unscaffolded contig names from those used in-lab to those compatible with ENA.

* The 1.3 update added small (50~100kbp) contigs onto mHetGlaV3.primary.fa.gz that were filtered before the ENA submission.

* The 1.4 update fixed a small chromosome naming inconsistency spotted in the 1.3 update.

Files

Files (829.8 MB)

Name Size Download all
md5:aa871552a2b16503505ef105b5f43b18
754.2 MB Download
md5:1eb35ea646bf427e59da5951cd917bbf
44.1 MB Download
md5:5bb7a097847469548fd51d8ff4d96e60
419.5 kB Download
md5:53b43fbe8b93af7a52f6b80266dd7ba8
23.4 MB Download
md5:7aefa9b4bf89345f504bdb77e9f40b7f
10.3 kB Download
md5:afad0efd810b28e49854eee19c68db85
729.1 kB Download
md5:a1a44c870e924c9ff8c3fc50f1c36385
7.0 MB Download

Additional details

Related works