Published June 14, 2021 | Version v5
Dataset Open

Data release: Whole-genome sequencing of Schistosoma mansoni reveals extensive diversity with limited selection despite mass drug administration

  • 1. Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, United Kingdom
  • 2. Big Data Institute, Nuffield Department of Medicine, University of Oxford, Oxford, OX3 7BN, United Kingdom
  • 3. Institute for Biodiversity, Animal Health, and Comparative Medicine, and Wellcome Centre for Integrative Parasitology, University of Glasgow, Glasgow, G12 8QQ, UK
  • 4. The Natural History Museum, Department of Life Sciences, Cromwell Road, London SW7 5BD, United Kingdom
  • 5. Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, CB10 1SA
  • 6. Institute of Parasitology, Faculty of Agricultural and Environmental Sciences, McGill University, Montreal, Quebec, Canada
  • 7. Vector Borne & Neglected Tropical Disease Control Division, Ministry of Health, Kampala, Uganda
  • 8. Department of Pathology and Pathogen Biology, Centre for Emerging, Endemic and Exotic Diseases, Royal Veterinary College, University of London, Herts, AL9 7TA, United Kingdom

Description

Source data used in the publication: Berger et al. (2021) - Provisional title: 'Whole-genome sequencing of Schistosoma mansoni reveals extensive diversity with limited selection despite mass drug administration'. These data were used to generate all figures used in the publication and all files are organised and labelled specifically to run with the custom code that uses these data can be found at: http://doi.org/10.5281/zenodo.4975908. 


File descriptions:

SOURCE DATA.zip - All source data for all figures. 

Figure 1b:

  • supplementary_data_9.txt - Metadata

Figure 2a&b:

  • 207_PCA.eigenvec - PCA eigenvectors
  • 207_PCA.eigenval - PCA eigenvalues

Figure 2c:

  • autosomes.mdist - PLINK distance matrix used to build the neighbour joining phylogeny

Figure 2d:

  • all.pi.pixy.schools.txt - Nucleotide diversity results for each school subpopulation.

Figure 2e:

  • autosomes.dxy.5kb.schools.txt - Autosomal DXY results between school subpopulations. 
  • autosomes.fst.5kb.schools.txt - Autosomal FST results between school subpopulations.

Figure 2f:

  • admixture_all.txt - ADMIXTURE results for each sample and population sizes, column 1 represents number of populations (K), columns 3-8 represent admixture values for each population. 

Figure 3a, Supplementary figure 10a:

  • sfs.csv - Site frequency spectra (allelic proportions at each frequency bin) for each school. 

Figure 3b:

  • TD.all.txt - Tajima's D values calculated in 5 kb windows for each school subpopulation. 

Figure 4a, Supplementary figures 13-18: 

  • ALL.MAYUGE.IHS.ihs.out.100bins.norm.txt.zip  - Normalised iHS scores for the Mayuge district parasite populations (Selscan output).

Figure 4b, Supplementary figures 13-18: 

  • ALL.TORORO.IHS.ihs.out.100bins.norm.txt.zip -Normalised iHS scores for the Tororo district parasite populations (Selscan output).

Figure 4c, Supplementary figures 13-18: 

  • ALL.MAYUGEvsTORORO.xpehh.xpehh.out.norm.txt.zip - - Normalised XP-EHH scores between Mayuge and Tororo parasite populations.

Figure 4d, Supplementary figures 13-18:

  • MAYUGE_TORORO_2000.windowed.weir.txt.zip - FST values calculated between Mayuge and Tororo populations in 2kb windows.  

Figure 4e, Supplementary figures 12a&c:

  • MAYUGE_PI.windowed.pi.zip - Nucleotide diversity values calculated in 2 kb windows for Mayuge populations. 
  • TORORO_PI.windowed.pi.zip - Nucleotide diversity values calculated in 2 kb windows for Kocoge populations (Tororo district).

Figure 5a:

  • all.pi.treat.fix.txt.zip - Nucleotide diversity results for each treatment subpopulation

Figure 5b

  • autosomes.dxy.5kb.treatment.txt -  - Autosomal DXY results between clearance phenotype subpopulations. 
  • autosomes.fst.5kb.treatment.txt - Autosomal FST results between clearance phenotype subpopulations. 

Figure 5c:

  • fst.windows.2kb.treatment.txt.zip - FST values for comparisons between different treatment groups (Pre-treatment, post-treatment (good clearers), post-treatment (poor clearers))

Figure 5d: 

  • assoc_err_binary.txt.zip - Results of binary trait association between miracidia sampled from hosts with good clearance phenotypes (where treatment appeared to be highly effective) and miracidia isolated post-treatment from hosts with poor clearance phenotypes (where miracidia are potentially derived from parasites that survived treatment.

Figure 5e:

  • assoc_err_linear.txt.zip - - Results of linear regression genome-wide association study with the ERR estimates for all 198 samples, using the mean of the posterior ERR estimates from Crellen et al. (2016) as a quantitative trait.

Supplementary figure 1:

  • median.coverage.txt - Normalised depth of read coverage (column 4) calculated in 25 kb windows (columns 2&3) across all samples for all chromosomes (column 1).

Supplementary figure 2a-f: 

  • cohort.genotyped.txt.zip -  - Variant quality site values (used to inform variant site retention or removal). 

Supplementary figure 2g:

  • hard_filtered.imiss.txt -  Per sample variant missingness (used to inform quality control).

Supplementary figure 2h:

  • hard_filtered_filtindv.lmiss.txt.zip - Per site missingness (used to inform quality control).

Supplementary figure 3a, 4a, 4b:

  • prunedData.eigenvec - PCA eigenvectors
  • prunedData.eigenval - PCA eigenvalues

Supplementary figure 3b:

  • pruned_data.mdist.csv - Distance matrix used as the basis for the neighbour joining phylogeny.

Supplementary figure 5:

  • cv_scores.txt - ADMIXTURE coefficient of variation scores (column 2) for each population size (1).

Supplementary figure 6:

  • *_SMC_SE.csv - SMC++ results (from 25 subsampled replicates) for each school subpopulation and outgroup samples. 

Supplementary Figure 7:

  • smcpp.csv - SMC++ results for each school subpopulation and outgroup samples. 

Supplementary Figure 8a-d

  • pi.per_host.txt.zip - Nucleotide diversity values for each host infrapopulation. 

Supplementary Figure 9:

  • sexing.csv - inferred sex (based on differential read coverage over pseudoautosomal and Z-specific regions of the Z chromosome). 

Supplementary Figure 10b:

  • sfs_res.csv - residuals for the SFS analysis in 3a/10a.

Supplementary Figure 11:

  • MAYUGE_TAJIMA_D.Tajima.D.2kb.txt.zip - Tajima's D values calculated for the Mayuge population in 2kb windows. 
  • Tororo_TAJIMA_D.Tajima.D.2kb.txt.zip - Tajima's D values calculated for the Tororo population in 2kb windows. 

Supplementary Figures 13-18:

  • genes.bed - Coordinates of gene models (S. mansoni v7 annotation).
  • KOCOGE_SITE_PI.sites.pi.txt.zip - Per site nucleotide diversity values
  • MAYUGE_TORORO_sites.weir.fst.txt.zip - Per site FST values between Mayuge and Tororo populations. 
  • coverage_5kb.windows.txt.zip - Per sample depth of read coverage in 5 kb windows. Columns 4,5,6 represent the median, mean and sstev of coverage for each 5kb window (columns 2&3) along each chromosome (column 1). 
  • median.sample.coverage.txt -  Median chromosomal depth of read coverage for each sample. 

Supplementary Figure 19:

  • kocoge_median.ld.txt.zip -  - The decay of linkage disequilibrium with genomic distance between all sites within 50 kb for the Kocoge parasite samples. Chromosomes are shown in column 1, distance in column 2, median values in column 3. 
  • mayuge_median.ld.txt.zip - The decay of linkage disequilibrium with genomic distance between all sites within 50 kb for the Mayuge parasite samples. Chromosomes are shown in column 1, distance in column 2, median values in column 3. 

Misc files:

schools.list - List of samples and schools where they were sampled. 

 

Files

admixture_all.txt

Files (1.4 GB)

Name Size Download all
md5:f5b01964f78588e1e32b32f6df0f1337
78 Bytes Download
md5:8c26ea1d5f574f65ed3b875c6663e410
26.2 kB Download
md5:5eb01665ef2890a744fe6f31454a7887
43.6 kB Preview Download
md5:da98b770a8dd0d774610c391fbb01967
39.3 MB Preview Download
md5:10b4b827d2dbec2ab61afe4017f7b1c6
110.4 MB Preview Download
md5:5f98571e3a841be327439a5f3a58e3e4
19.0 MB Preview Download
md5:a2cc5fd36e002929bf795ce6c0435cd1
4.8 MB Preview Download
md5:2ac9cadab55a4d9e844d8399285c2264
35.8 MB Preview Download
md5:28a1f486f96209fb6f997a0c9f49c23a
2.1 MB Preview Download
md5:9ca7fec82e2fc55c762161883fa3d4e1
2.2 MB Preview Download
md5:3cb1bc6e0aa63bba70d7c4c0b2e43ac5
31.4 MB Preview Download
md5:959b089fbe3d0e27e4bd315fb9c77385
18.4 MB Preview Download
md5:2f58d60cd64abc143373ac6b355ffcfc
23.5 MB Preview Download
md5:42e6e495e2b7d88001680638a1d4e88d
13.5 MB Preview Download
md5:0724af1771cb04348a374147dc979a2e
395.6 kB Download
md5:0dc45ceec9d40df7a3774ac65c8f3b73
137.5 kB Preview Download
md5:cac83ff97b58b6ffe5854150b50e861d
140.2 kB Preview Download
md5:68003911f96e932afe5773629e977a3e
140.3 kB Preview Download
md5:add39a671c73ba1a877a59601faeec6b
143.3 kB Preview Download
md5:2b135364e6e3e7cd6ed897b08bfad016
333.0 MB Preview Download
md5:329b38336bdf50ef7b016310a0fc5d8e
15.3 MB Preview Download
md5:39db301410a5d1ab001a556899dca6bb
2.1 kB Preview Download
md5:18796ab4f76fdc1f9031018e209cdb79
7.8 MB Preview Download
md5:3f337f6509e6ec992ba6bfe75d2dbaae
392.9 kB Download
md5:bdd3cb11faafe8c56db3a11be53d469d
10.3 kB Preview Download
md5:1bfae48540872343f9d2bd533a77e70c
85.7 MB Preview Download
md5:b472aa107e349848e8b5aa3e8f400f50
137.3 kB Preview Download
md5:5949a4be701a86f61ee15786f17c0892
2.1 MB Preview Download
md5:e14d233ed6f671024fae74081065311a
20.8 MB Preview Download
md5:09bbdc5871ff9b0e010be8b3fb8a7873
138.0 kB Preview Download
md5:4b208af7be36bbc49274fa936231f997
10.1 MB Preview Download
md5:9dd6cb2d4280bf04650173a64a53ce2e
1.9 MB Preview Download
md5:06cc740a15c0ee7b59add78731b59361
29.7 MB Preview Download
md5:44df33afff0466de6a6afda0b58e7623
1.4 MB Preview Download
md5:02899186e671af4733a62c62e4f9086b
2.4 MB Preview Download
md5:50bafb6694823e18b05bfb34d92888af
34.4 MB Preview Download
md5:2d836b2bb5205f137624625aee0e623e
684.6 kB Preview Download
md5:97d03895b8aabfd08e60cd59fbd3d232
6.3 kB Preview Download
md5:dcd228400fa177dee032e2e6ff006aaf
137.8 kB Preview Download
md5:1a3c58066b177b28cca6b2b9143dc890
48.7 MB Preview Download
md5:6bf4acadf68bf329e7bfabaf5a555337
373.5 kB Preview Download
md5:d2a5e56ab3d16bb5fca343f22d787341
25.3 kB Download
md5:9e3defa70f8946e627e72d190f4c7e2c
78 Bytes Download
md5:10fe0274dcdd18cd349921a76e2b8b80
5.0 kB Download
md5:dcf1375f083f3b4bfc4bd3cbc013fe7d
141.4 kB Preview Download
md5:5e4d0f8118b7ed900a8b321d81decc8d
20.8 kB Preview Download
md5:97765e86134a566fb65ae21d57f836fd
11.2 kB Preview Download
md5:0434072f7afc09867477fa48dc590517
7.4 kB Preview Download
md5:0dc80c3dcb73ed80dde3b236aee7049a
52.3 kB Preview Download
md5:adde0ecc0341508332601fee14aec17a
457.5 MB Preview Download
md5:a17bc36ee818b782dc1a1a20d04157d2
2.4 kB Preview Download
md5:311d83fbde047e3cb35f11ed94947811
11.3 MB Preview Download
md5:b184f3b086690435e2797363558dad46
1.8 MB Preview Download
md5:adaba83b02c90b15459859fb9b3e0a85
1.3 MB Preview Download
md5:7d5172598184adc1f748f35616ae6438
143.4 kB Preview Download