viraldb-sh curated plant virus and viroid database (v1.0)
Description
This dataset contains curated viral sequence databases generated using viraldb-sh, a lightweight pipeline for building harmonized viral databases from NCBI Virus and ViroidDB. The pipeline integrates sequence retrieval, taxonomy enrichment, sequence filtering, clustering using CD-HIT-EST, and policy-based representative sequence selection.
The database includes representative sequences clustered at three identity thresholds (100%, 99.5%, and 99%) together with summary tables describing cluster composition and associated metadata. These resources provide a non-redundant and traceable viral reference database suitable for downstream analyses such as viral detection, metagenomic classification, and phylogenetic studies.
All sequences retain associated taxonomic lineage information and clustering summaries to ensure transparency and reproducibility.
The database was generated using viraldb-sh v0.1 on 2026-03-05.
Contents
Representative sequence databases:
viraldb-sh_v1.0_representatives_c1.000.fasta.gz
viraldb-sh_v1.0_representatives_c0.995.fasta.gz
viraldb-sh_v1.0_representatives_c0.990.fasta.gz
Cluster summary tables:
viraldb-sh_v1.0_summary_c1.000.tsv
viraldb-sh_v1.0_summary_c0.995.tsv
viraldb-sh_v1.0_summary_c0.990.tsv
Additional documentation:
README_dataset.txt
Files
Files
(182.8 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:4b1dccca0e25a686a19a59b60a7c63b2
|
182.8 MB | Download |
Additional details
Dates
- Created
-
2026-03-05viraldb-sh curated plant virus and viroid database
Software
- Repository URL
- https://github.com/robertobarrero/viraldb-sh
- Programming language
- Python , Shell
- Development Status
- Active
References
- Barrero R.A. viraldb-sh: A lightweight pipeline for building curated viral databases from NCBI Virus and ViroidDB.