Dataset Open Access

Expansion of the global RNA virome reveals diverse clades of bacteriophages

Uri Neri; Yuri I. Wolf; Simon Roux; Antonio Pedro Camargo; Benjamin D. Lee; Darius Kazlauskas; I. Min Chen; Natalia Ivanova; Lisa Zeigler Allen; David Paez-Espino; Donald A. Bryant; Devaki Bhaya; Mart Krupovic; Valerian V. Dolja; Nikos C. Kyrpides; Eugene V. Koonin; Uri Gophna; RNA Virus in Metaranscriptomes Consortium

This deposit is intended to contain the various data generated as part of the RNA Virus in MetaTranscriptomes project ("RVMT"). This initial version is released ahead of time, near the time of submission, in hopes of providing a long lasting resource for the general scientific community. Note well - The authors listed in this initial version release are a partial list only. The RNA Virus in MetaTranscriptomes consortium is a project with over 90 researches from various institutions (see below).

High-throughput RNA sequencing offers broad opportunities to explore the Earth RNA virome. Mining 5,150 diverse metatranscriptomes uncovered >2.5 million RNA virus contigs. Analysis of >330,000 RNA-dependent RNA polymerases (RdRPs) shows that this expansion corresponds to a 5-fold increase of the known RNA virus diversity. Gene content analysis revealed multiple protein domains previously not found in RNA viruses and implicated in virus-host interactions. Extended RdRP phylogeny supports the monophyly of the five established phyla and reveals two putative additional bacteriophage phyla and numerous putative additional classes and orders. The dramatically expanded phylum Lenarviricota, consisting of bacterial and related eukaryotic viruses, now accounts for a third of the RNA virome. Identification of CRISPR spacer matches and bacteriolytic proteins suggests that subsets of picobirnaviruses and partitiviruses, previously associated with eukaryotes, infect prokaryotic hosts.

The RNA Virus in metatranscriptomes consortium:
Adrienne B. Narrowe, Alexander J. Probst, Alexander Sczyrba, Annegret Kohler, Armand Séguin, Ashley Shade, Barbara J. Campbell, Björn D. Lindahl, Brandi Kiel Reese, Breanna M. Roque, Chris DeRito, Colin Averill, Daniel Cullen, David A. C. Beck, David A. Walsh, David M. Ward, Dongying Wu, Emiley Eloe-Fadrosh, Eoin L. Brodie, Erica B. Young, Erik A. Lilleskov, Federico J. Castillo, Francis M. Martin, Gary R. LeCleir, Graeme T. Attwood, Hinsby Cadillo-Quiroz, Holly M. Simon, Ian Hewson, Igor V. Grigoriev, James M. Tiedje, Janet K. Jansson, Janey Lee, Jean S. VanderGheynst, Jeff Dangl, Jeff S. Bowman, Jeffrey L. Blanchard, Jennifer L. Bowen, Jiangbing Xu, Jillian F. Banfield, Jody W Deming, Joel E. Kostka, John M. Gladden, Josephine Z Rapp, Joshua Sharpe, Katherine D. McMahon, Kathleen K. Treseder, Kay D. Bidle, Kelly C. Wrighton, Kimberlee Thamatrakoln, Klaus Nusslein, Laura K. Meredith, Lucia Ramirez, Marc Buee, Marcel Huntemann, Marina G. Kalyuzhnaya, Mark P Waldrop, Matthew B Sullivan, Matthew O. Schrenk, Matthias Hess, Michael A. Vega, Michelle A. O’Malley, Monica Medina, Naomi E. Gilbert, Nathalie Delherbe, Olivia U. Mason, Paul Dijkstra, Peter F. Chuckran, Petr Baldrian, Philippe Constant, Ramunas Stepanauskas, Rebecca A. Daly, Regina Lamendella, Robert J Gruninger, Robert M. McKay, Samuel Hylander, Sarah L. Lebeis, Sarah P Esser, Silvia G. Acinas, Steven S. Wilhelm, Steven W. Singer, Susannah S. Tringe, Tanja Woyke, TBK Reddy, Terrence H. Bell, Thomas Mock, Tim McAllister, Vera Thiel, Vincent J. Denef, Wen-Tso Liu, Willm Martens-Habbena, Xiao-Jun Allen Liu, Zachary S. Cooper, Zhong Wang. For the full list of authors and related information, please see the spreadsheet tittle "Table S9 - Consortium coauthorship" available in this collection in the folder named "Tables".

Current release (V.4) : * Addition of NVPC_info.tsv; master table connecting the NVPC HMMs to the putative functions. * Updating the title (from that of the preprint, to that of the published paper). * Update of the paper supplementary files (from those shared with the preprint, to those shared with the published paper; no real differences apart from the "collapsing" of previously separate tables into multiple spreadsheets within the same excel file. See readme in the respective sub folder for more information). * Addition of the graphical abstract from the published version. * Addition of the "depermutated" and "original" RdRp sequences (77.5k set). * Addition of CM13.xlsx, clan_membership3.v3.xlsx, NeoCM3_full.xlsx to the Domain_annotation/misc/ folder (CAUTION these intermediate files from the process of functional annotation assignment to HMM profiles from various sources. This information was partially used when assigning labels/categories to NVPC profiles/intermediate clusters of profiles ("clans")). * Update of Riboviria-org/RiboV1.6_Info.tsv and Riboviria-org/RiboV1.6_Contigs.fasta (contig collection - correction of missing fields and minor misc, e.g. two contigs with incorrect length). * Updated CDS predictions master table (Riboviria-org/AllORFsInfo.tsv - correction of some IDs and addition of some proteins from reference genomes of segmented viruses (i.e. CDS predicted for non-RdRp carrying contig). Previous releases: * V.3 - Addition of table S10, a copy of the project code (latest repository version from GitHub).
Files (9.4 GB)
Name Size
9.4 GB Download
  • DOI:

All versions This version
Views 1,123168
Downloads 61998
Data volume 5.9 TB919.8 GB
Unique views 917147
Unique downloads 31163


Cite as