Lake Malawi cichlid pangenome graph reveals extensive structural variation driven by transposable elements
Authors/Creators
Description
This repository represents a data snapshot associated with the manuscript Lake Malawi cichlid pangenome graph reveals extensive structural variation driven by transposable elements. The manuscript is currently available as a preprint on bioRxiv.
Under an Access and Benefit Sharing agreement, these data are made available on an open access basis for research use only. Any person who wishes to use these data for any form of commercial purpose must first enter into a commercial licensing and benefit sharing arrangement with the Government of Malawi. For further information, contact the Access and Benefit-sharing National Focal Point (ABS NFP) for Malawi registered with CBD at https://www.cbd.int/information/nfp.shtml.
Data description
Provided in this repository are the FASTA files of six new genome assemblies.
- troMau: Tropheops sp. ‘mauve’, PacBio Sequel II
- aulStu: Aulonocara stuartgranti, PacBio Sequel II
- rhaChi: Rhamphochromis sp. ‘chillingali’ (male), PacBio Sequel II
- otoArg: Otopharynx argyrosoma, R9 MinION
- copChr: Copadichromis chrysonotus, R9 MinION
- rhaChi2: Rhamphochromis sp. ‘chillingali’ (female), R9 MinION
Two previously published genomes from Ensembl v103 were also included in the pangenome graph: Astatotilapia calliptera (fAstCal1.2, GCF_900246225.1) and Maylandia zebra (M_zebra_UMD2a, GCA_000238955.4).
Other files that are also included:
- malawi_haplochromines-graph.gfa: pangenome graph in GFA format constructed using the minigraph software package
- malawi_haplochromines-variants.xlsx: detected structural variants, as defined on the fAstCal1.2 reference coordinates
- malawi_haplochromines-genelists.xlsx: genes that overlap and do not overlap with structural variants
Access information for raw reads
Raw reads used to generate the new assemblies are accessible on NCBI.
| Sample | BioProject | Genome | Biosample | Run ID(s) |
| troMau | PRJEB80840 | GCA_964274065.1 | SAMEA11293786 | ERR12954135 |
| aulStu | PRJEB80765 | GCA_964273965.1 | SAMEA115846654 | ERR13382500 |
| rhaChi | PRJEB80761 | GCA_964273455.1 | SAMEA115846655 | ERR13382499 |
| otoArg | PRJNA1144831 | GCA_046255105.1 | SAMN43044617 | SRR30633342 |
| copChr | PRJNA1144838 | - | SAMN43044710 | SRR30633337, SRR30633338 |
| rhaChi2 | PRJNA1144843 | - | SAMN43044956 | SRR30633436, SRR30633437, SRR30633438 |
Notes
Some of the assemblies are in the process of being uploaded to NCBI, which have flagged a few contigs as part of their quality checks:
- ctg00001557 in otoArg (mitochondria)
- ctg00005350 in copChr (BLAST hits to amphibian and fish E3 SUMO-protein ligase)
- ctg00002210 in rhaChi2 (“worm” contaminant)
It is very likely that these contigs will be removed from the final NCBI assemblies. However, none of these contigs are included in the pangenome graph, and therefore, the findings from the paper remain unaltered. A mapping between the Zenodo contigs and their NCBI counterparts will be provided at a later stage to facilitate coordinate conversions.
Files
Files
(3.0 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:774fa3fc2105439d51be1dd4a17ae6b7
|
277.9 MB | Download |
|
md5:779b09d8edce29d8fcb9337b54deb752
|
269.0 MB | Download |
|
md5:24a6bf894959dfbc9eb5654ab4cdf585
|
1.4 MB | Download |
|
md5:9bca92b8343c614e9d44a528af4b72a5
|
1.3 GB | Download |
|
md5:149c53fae8320bac8d918c32dec8d15a
|
79.1 MB | Download |
|
md5:86566c5b47a63a5fcd035f61f232e634
|
271.8 MB | Download |
|
md5:45734726474ef932f36859c8e9d8a3d0
|
279.8 MB | Download |
|
md5:71ed12c3d55413088229fe5a305c9f16
|
264.6 MB | Download |
|
md5:687ccb932166fc19a6c092ae6cdab142
|
282.1 MB | Download |
Additional details
Additional titles
- Alternative title
- A pangenomic perspective of the Lake Malawi cichlid radiation reveals extensive structural variation driven by transposable elements