Published January 15, 2024 | Version v1
Dataset Open

Discovering and exploring the hidden diversity of the human gut viruses using highly enriched virome samples.

  • 1. Department CIBIO - University of Trento, Italy
  • 2. Integrated Open Systems Unit, Okinawa Institute of Science and Technology (OIST), Okinawa, Japan
  • 3. Center Agriculture Food Environment (C3A), University of Trento, Italy
  • 4. Fondazione Edmund Mach, San Michele all'Adige, Trento, Italy
  • 5. Harvard Chan Microbiome in Public Health Center, Harvard T.H. Chan School of Public Health, Boston, MA, USA
  • 6. Department of Twin Research & Genetic Epidemiology, King's College London, London, UK
  • 7. Center of Biostatistics, Epidemiology and Public Health, Department of Clinical and Biological Sciences, University of Turin, Turin, Italy
  • 8. The Systems Biology Institute (SBI), Tokyo, Japan
  • 9. School of Life Science and Technology, Tokyo Institute of Technology, Tokyo, Japan
  • 10. Metagen, Inc., Yamagata, Japan
  • 11. Metagen Therapeutics, Inc., Yamagata, Japan
  • 12. Digzyme, Inc., Tokyo, Japan
  • 13. Department of Experimental Oncology, IEO European Institute of Oncology IRCCS, Milan, Italy

Description

Discovering and exploring the hidden diversity of the human gut viruses using highly enriched virome samples

Viruses are crucially important in the human microbiome. By leveraging enriched Viral-Like Particle (VLP) viromes, through metagenomic assembly and sequence clustering, we retrieved thousands of viral contigs by from viromes and metagenomes.

This upload contains the public collection of > 162,000 viral sequnces we retrieved. Sequences are clustered into 3,944 VSCs (Viral Sequence Clusters) that are labelled as known (kVSCs) or unknown (uVSCs), and further grouped into 1,345 Viral Sequence Groups (VSGs).

Files

File Description
VSC5_rep_fnas_nr99_45k_metaphlanDB.fna.gz The 45,872 representative sequences (dereplicated at 99% identity), included in the MetaPhlan 4.1 module, in FASTA format.
VSCs_groups.csv Metadata of the 45,872 representative sequences included in the MetaPhlan 4.1 module.
VSC5_rep_fnas_full_47k.fna.gz The non-dereplicated set of  47,820 representative sequences.
VSC5_complete_162k_labelled.fna.gz

The complete set of 162,876 sequences of potential viral origin extracted from metagenomes and viromes:

  • 5651 Highly Enriched Virome Contigs (HEVC)
  • 126,894 contigs from the unbinned metagenomes of Pasolli et al.
  • 30,331 contigs from viromes 

CRISPR_VSG-to-species.csv

CRISPR_VSG-to-SGBs.csv

The  host-associations of each VSG group (each line is a match between VSG-species and VSG-SGB).

VSC_profiling_examples.zip

An archive containing a test / tutorial subsampled dataset.

Supplementary Data

See Supplementary Figures and Tables in the original biorxiv publication.

Citation

Discovering and exploring the hidden diversity of the human gut viruses using highly enriched virome samples - bioRxiv 2024

Moreno Zolfo,  Andrea Silverj, Aitor Blanco-Míguez, Paolo Manghi, Omar Rota-Stabelli, Vitor Heidrich, Jordan Jensen, Sagun Maharjan, Eric Franzosa, Cristina Menni, Alessia Visconti, Federica Pinto, Matteo Ciciani, Curtis Huttenhower, Anna Cereseto, Francesco Asnicar, Hiroaki Kitano, Takuji Yamada, Nicola Segata.

Files

CRISPR_VSG-to-SGBs.csv

Files (991.0 MB)

Name Size Download all
md5:ddea001e2ca01fd5bb5ecdbdfda37827
1.5 MB Preview Download
md5:0e3402a8843f840bf9c4f25230919045
1.6 MB Preview Download
md5:1be08b0db65b46ec81be288ff4379728
490.5 MB Download
md5:541270a5447a0364597018a9a333b6d5
233.1 MB Download
md5:6ea358709a5be98f3ab469db513634a3
228.3 MB Download
md5:600ef9110af8dba679af620426470d45
22.2 MB Preview Download
md5:da90c22e3c561cb2a99749b48a36956d
13.7 MB Preview Download

Additional details

Related works

Is part of
Computational notebook: https://github.com/biobakery/MetaPhlAn/ (URL)

Funding

European Commission
MetaPG – Culture-free strain-level population genomics to identify disappearing human-associated microbes in the westernized world 716575
European Commission
microTOUCH – Transmission of the human microbiome and its impact on health 101045015
European Commission
ONCOBIOME – Gut OncoMicrobiome Signatures (GOMS) associated with cancer incidence, prognosis and prediction of treatment response. 825410
European Commission
MASTER – Microbiome Applications for Sustainable food systems through Technologies and EnteRprise 818368
European Commission
IHMCSA – International Human Microbiome Coordination and Support Action 964590
National Cancer Institute
National Cancer Institute of the National Institutes of Health 1U01CA230551
Regione Lombardia
Premio Internazionale Lombardia e Ricerca 2019 -
Japan Science and Technology Agency
AIP Acceleration Research JPMJCR19U3
Japan Society for the Promotion of Science
KAKENHI JP16H06279