Published 2025 | Version v2
Dataset Open

Acanthaster whole genome datasets and scripts

Authors/Creators

Description

These whole-genome datasets were analyzed to assess the connectivity patterns of Crown-of-Thorns Seastars (COTS, Acanthaster cf. solaris) across the Pacific Ocean (Leiva et al., 2025, BMC Biology). Find below the description of each dataset and script:

  •  Aca_206_ind_allSNPs_chr1_renamed.vcf.gz: contains genotype calls from all SNPs of the longest scaffold, for all 206 Acanthaster cf. solaris samples. It was produced by ANGSD and filtered in bcftools, and was used to detect contaminated samples in the dataset with verifyBamID.
  • Aca_198_ind_thin10kSNPs.beagle.gz: contains genotype likelihoods from 198 non-contaminated Acanthaster cf. solaris samples. It was produced by ANGSD, thinned with vcftools, and transformed to genotype likelihoods again by ANGSD. It was used to assess population connectivity, structure and diversity.
  • Aca_198_ind_think10kSNPs.recode.vcf: contains genotype calls from 198 non-contaminated Acanthaster cf. solaris samples. It was produced by ANGSD and thinned with vcftools. It was used for population structure analyses.
  • T_mod_ANGSD_Haplo_09filt.fasta: fasta file with haplotype calls from 198 non-contaminated Acanthaster cf. solaris samples, plus 2 COTS samples from the Gulf of California, plus 2 Acanthaster planci samples from the Indian Ocean, plus 2 Acanthaster benziei from the Red Sea. It was produced by ANGSD using the -doHaploCall 2 flag, and then transformed to fasta using a custom R script (see "reformating_ANGSD_fasta.R").
  • Aca_Hawaii_200scaffolds.vcf.gz: vcf file with genotype calls and genotype likelihoods from the first (longest) 200 scaffolds of the COTS samples from Hawai'i. File used as input for RAiSD.
  • Aca_French_Polynesia_200scaffolds.vcf.gz: vcf file with genotype calls and genotype likelihoods from the first (longest) 200 scaffolds of the COTS samples from French Polynesia. File used as input for RAiSD.
  • Aca_West_Pacific_200scaffolds.vcf.gz: vcf file with genotype calls and genotype likelihoods from the first (longest) 200 scaffolds of the COTS samples from the West Pacific. File used as input for RAiSD.
  • reformating_ANGSD_fasta.R: R script used to transform haplotype calls from ANGSD (-doHaploCall 2) into a fasta file to perform phylogenetic analyses with iqtree2.
  • plot_relatedness2_from_vcftools.R: R script used to plot dendrogram and heatmap from relatedness data from vcftools --relatedness2.
  • pca_and_plot.R: R script used to perform and plot a PCA from the covariance matrix obtained with PCAngsd.
  • pairwise_FSTs.R: R script used to calculate and plot Pairwise Fst distances among populations.

Files

Files (39.8 GB)

Name Size Download all
md5:4729fc86b209a7319bd05c154a968066
19.1 MB Download
md5:c75c3e262a12567f968c50bd4252e007
479.1 MB Download
md5:d8a806f08ba17da34183ddd162ce0be8
347.8 MB Download
md5:0d1dd88d6ac20549fc488098dc68420d
5.9 GB Download
md5:857e0e7bd5ee238e5f5feb3524c23e68
230.3 MB Download
md5:ed89a21d81a659813560d2b6d8857679
31.8 GB Download
md5:32a7582d271fbdd03fa6fa15227fc154
1.5 kB Download
md5:9ff03e1bbae3fc6d1b8de8b626fbc6da
1.7 kB Download
md5:8279e6c0e9ce14a8840b44cd60ec0985
470 Bytes Download
md5:228c3a5847d6f08eb16c97d2c009bec5
2.2 kB Download
md5:749d800fff57dc7fe23b6105d7547a7d
939.2 MB Download