Data from: Viral tagging reveals discrete populations in Synechococcus viral genome sequence space

doi:10.5061/dryad.gr3ks

Published April 16, 2015 | Version v1

Dataset Open

Data from: Viral tagging reveals discrete populations in Synechococcus viral genome sequence space

1. University of Arizona
2. Georgia Institute of Technology
3. University of Queensland

Microbes and their viruses drive myriad processes across ecosystems ranging from oceans and soils to bioreactors and humans. Despite this importance, microbial diversity is only now being mapped at scales relevant to nature, while the viral diversity associated with any particular host remains little researched. Here we quantify host-associated viral diversity using viral-tagged metagenomics, which links viruses to specific host cells for high-throughput screening and sequencing. In a single experiment, we screened 107 Pacific Ocean viruses against a single strain of Synechococcus and found that naturally occurring cyanophage genome sequence space is statistically clustered into discrete populations. These population-based, host-linked viral ecological data suggest that, for this single host and seawater sample alone, there are at least 26 double-stranded DNA viral populations with estimated relative abundances ranging from 0.06 to 18.2%. These populations include previously cultivated cyanophage and new viral types missed by decades of isolate-based studies. Nucleotide identities of homologous genes mostly varied by less than 1% within populations, even in hypervariable genome regions, and by 42–71% between populations, which provides benchmarks for viral metagenomics and genome-based viral species definitions. Together these findings showcase a new approach to viral ecology that quantitatively links objectively defined environmental viral populations, and their genomes, to their hosts.

Notes

RandomizationsX1500

To estimate the variability within a population from the available metagenomic data, random candidatus genomes (CG) were generated as follows using a series of custom perl scripts. First, we recruited reads to each CG requiring at least 95% identity and a coverage of 95% of the entire length of the read. Each read was non-redundantly assigned and aligned to a CG using default parameters in MUSCLE. For each CG population, we generated 100 random CG sequences using the metagenomic data that were recruited to consensus sequences, with each base having a probability of being assigned from its relative abundance in the underlying metagenomic sequence data. Here we show the result of 1500 randomizations.

ANI_2_PCA

Matrix of ANI values as obtained from each a comparison of each candidatus genome and the reference genome. This file is used as the input to perform a PCA, which is the figure shown in the manuscript.

Viral Tagged Metagenome 454

This is identical to VT_MG.fna as it appears in CAM_P_0001068 in camera.

VT_MG.fna

Community Metagenome

Identical to Comm_MG.fna under CAM_P_0001068.

Comm_MG.fna

GP23_Sequences

Gp23 Sequences amplified from the isolates, data incorporated into table 1.

DATA-FIGURES

Tabulated data for all the figures in the manuscript.

Rarefaction files

The zip folder includes the script and tables used to generate the rarefaction curves and richness index. The tables are structured as Read, Protein, Protein Cluster

RAREFACTION.zip

ConsensusCGs

Assembly and gene predictions (CDS and aminoacid sequences) for the 26 candidatus genomes referred in the manuscript.

VT_MG_IL

Fastq sequencing data of the simplified metagenome after a Viral Tagging Experiment.

Files

ANI_2_PCA.txt

Files (3.6 GB)

Name	Size	Download all
ANI_2_PCA.txt md5:f85b8b8f9226a1b26df4dc9ddf39f0d3	351.3 kB	Preview Download
Comm_MG.fna md5:fa08df3d4c7b497acbc1c7d2fdc73125	53.8 MB	Download
ConsensusCGs.zip md5:a769b4912cfc6016b59975eb95aa2c25	1.4 MB	Preview Download
DATA-FIGURES_Replace.xls md5:a9ac2730881f1f5beb5016d75c4e496c	196.1 kB	Download
GP23_Sequences.txt md5:e4a18060ed8b5fdfc79df38c4b420b5a	13.5 kB	Preview Download
RandomizationsX1500.FNA md5:1fb11a47c02d2e37bd6de795ee47915f	136.7 MB	Download
RAREFACTION.zip md5:bd63da8efa04c8a2056ad52ecf73a48b	3.5 MB	Preview Download
VT_MG.fna md5:39e5a1546d62148115d9ef5466b7fe0c	40.1 MB	Download
VT_MG_IL.fastq md5:f45c3654b26adf245581f7be37288eaa	3.4 GB	Download

Additional details

Is cited by: 10.1038/nature13459 (DOI)

	All versions	This version
Views	96	95
Downloads	31	31
Data volume	11.0 GB	11.0 GB

Data from: Viral tagging reveals discrete populations in Synechococcus viral genome sequence space

Creators

Description

Notes

Files

ANI_2_PCA.txt

Files (3.6 GB)

Additional details

Related works