Published July 3, 2025 | Version v1
Dataset Open

Viro3D Dataset – Part 1: Metadata, ColabFold Predictions and Foldseek database

  • 1. MRC-University of Glasgow Centre for Virus Research

Description

This repository contains Part 1 of the dataset associated with the manuscript “Viro3D: a comprehensive database of virus protein structure predictions” by Ulad Litvin, Spyros Lytras, Alexander Jack, David L. Robertson, Joseph Hughes, and Joe Grove.

The dataset includes:

  • Metadata: protein and species lists in CSV format (viro3d_metadata.tar.gz)
  • Relaxed ColabFold predictions in PDB format (colabfold_pdb.tar.gz)

  • ColabFold pLDDT and pTM confidence scores in JSON format (colabfold_json_scores.tar.gz)

  • ColabFold multiple sequence alignments in A3M format (colabfold_msa.tar.gz)

  • Viro3D Foldseek structural search database (foldseekViro3D.tar.gz)

  • Foldseek-derived structural protein clusters (viro3d_protein_clusters.tar.gz)

  • Foldseek-derived structural similarity network (viro3d_protein_network.tar.gz)

  • Foldseek-based annotation expansion of protein functions (viro3d_annotation_expansion.tar.gz)

Files

Files (47.9 GB)

Name Size Download all
md5:7450435f3682ddb8890359f6071fd080
31.6 GB Download
md5:df16642fc9f2233f73fd76e41f9f6c48
7.8 GB Download
md5:f4b2e1a992cbea448d5aa75caced5646
7.9 GB Download
md5:ed46740700f169e5cfadd38573c9ff56
568.7 MB Download
md5:01f6407c563b345bd438ed10db16fb41
12.3 MB Download
md5:70e8e3b7fddd37170e14522574c72239
18.5 MB Download
md5:d49fed63243e3140a6859131d3080aaf
3.9 MB Download
md5:f43d0b8f63b6fce2057d30960ded14ff
5.9 MB Download