Published November 2024 | Version v0.1
Dataset Open

Lake Malawi cichlid pangenome graph reveals extensive structural variation driven by transposable elements

Description

This repository represents a data snapshot associated with the manuscript Lake Malawi cichlid pangenome graph reveals extensive structural variation driven by transposable elements. The manuscript is currently available as a preprint on bioRxiv.

Under an Access and Benefit Sharing agreement, these data are made available on an open access basis for research use only. Any person who wishes to use these data for any form of commercial purpose must first enter into a commercial licensing and benefit sharing arrangement with the Government of Malawi. For further information, contact the Access and Benefit-sharing National Focal Point (ABS NFP) for Malawi registered with CBD at https://www.cbd.int/information/nfp.shtml.

Data description

Provided in this repository are the FASTA files of six new genome assemblies.

  • troMau: Tropheops sp. ‘mauve’, PacBio Sequel II
  • aulStu: Aulonocara stuartgranti, PacBio Sequel II
  • rhaChi: Rhamphochromis sp. ‘chillingali’ (male), PacBio Sequel II
  • otoArg: Otopharynx argyrosoma, R9 MinION
  • copChr: Copadichromis chrysonotus, R9 MinION
  • rhaChi2: Rhamphochromis sp. ‘chillingali’ (female), R9 MinION

Two previously published genomes from Ensembl v103 were also included in the pangenome graph: Astatotilapia calliptera (fAstCal1.2, GCF_900246225.1) and Maylandia zebra (M_zebra_UMD2a, GCA_000238955.4).

Other files that are also included:

  • malawi_haplochromines-graph.gfa: pangenome graph in GFA format constructed using the minigraph software package
  • malawi_haplochromines-variants.xlsx: detected structural variants, as defined on the fAstCal1.2 reference coordinates
  • malawi_haplochromines-genelists.xlsx: genes that overlap and do not overlap with structural variants

Access information for raw reads

Raw reads used to generate the new assemblies are accessible on NCBI.

Sample BioProject Genome Biosample Run ID(s)
troMau PRJEB80840 GCA_964274065.1 SAMEA11293786 ERR12954135
aulStu PRJEB80765 GCA_964273965.1 SAMEA115846654 ERR13382500
rhaChi PRJEB80761 GCA_964273455.1 SAMEA115846655 ERR13382499
otoArg PRJNA1144831 GCA_046255105.1 SAMN43044617 SRR30633342
copChr PRJNA1144838 - SAMN43044710 SRR30633337, SRR30633338
rhaChi2 PRJNA1144843 - SAMN43044956 SRR30633436, SRR30633437, SRR30633438

Notes

Some of the assemblies are in the process of being uploaded to NCBI, which have flagged a few contigs as part of their quality checks:

  • ctg00001557 in otoArg (mitochondria)
  • ctg00005350 in copChr (BLAST hits to amphibian and fish E3 SUMO-protein ligase)
  • ctg00002210 in rhaChi2 (“worm” contaminant)

It is very likely that these contigs will be removed from the final NCBI assemblies. However, none of these contigs are included in the pangenome graph, and therefore, the findings from the paper remain unaltered. A mapping between the Zenodo contigs and their NCBI counterparts will be provided at a later stage to facilitate coordinate conversions.

Files

Files (3.0 GB)

Name Size Download all
md5:774fa3fc2105439d51be1dd4a17ae6b7
277.9 MB Download
md5:779b09d8edce29d8fcb9337b54deb752
269.0 MB Download
md5:24a6bf894959dfbc9eb5654ab4cdf585
1.4 MB Download
md5:9bca92b8343c614e9d44a528af4b72a5
1.3 GB Download
md5:149c53fae8320bac8d918c32dec8d15a
79.1 MB Download
md5:86566c5b47a63a5fcd035f61f232e634
271.8 MB Download
md5:45734726474ef932f36859c8e9d8a3d0
279.8 MB Download
md5:71ed12c3d55413088229fe5a305c9f16
264.6 MB Download
md5:687ccb932166fc19a6c092ae6cdab142
282.1 MB Download

Additional details

Additional titles

Alternative title
A pangenomic perspective of the Lake Malawi cichlid radiation reveals extensive structural variation driven by transposable elements