Published December 17, 2023 | Version V1.0
Dataset Open

CryoVirusDB: An Annotated Dataset for AI-Based Virus Particle Identification in Cryo-EM Micrographs

  • 1. ROR icon University of Missouri
  • 2. ROR icon Brookhaven National Laboratory

Description

With the advancements in instrumentation, image processing algorithms, and computational capabilities, single-particle cryo-electron microscopy (cryo-EM) has achieved atomic resolution in determining the 3D structures of viruses. The virus structures play a crucial role in studying their biological function and advancing the development of antiviral vaccines and treatments. Despite the effectiveness of artificial intelligence (AI) in general image processing, its development for identifying and extracting virus particles from cryo-EM micrographs has been hindered by the lack of manually labeled high-quality datasets. To fill the gap, we introduce CryoVirusDB, a labeled dataset containing the coordinates of expert-picked virus particles in cryo-EM micrographs. CryoVirusDB comprises 9,941 micrographs of 9 different viruses along with the coordinates of 339,398 labeled virus particles. CryoVirusDB comprises 9,941 micrographs from 9 datasets representing 7 distinct nonenveloped viruses exhibiting icosahedral or pseudoicosahedral symmetry, along with coordinates of 339,398 labeled virus particles. It can be used to train and test AI and machine learning (e.g., deep learning) methods to accurately identify virus particles in cryo-EM micrographs for building atomic 3D structural models for viruses.

Instructions to download and use the dataset are openly available at: https://github.com/BioinfoMachineLearning/CryoVirusDB

Files

Files (22.4 MB)

Name Size Download all
md5:8d435cbb0186ecd25d68572bc5564a37
4.4 MB Download
md5:f0562804393fd8217a424c3a4f865418
932.6 kB Download
md5:1f26854f52e63318afd0515bcdeb3116
479.6 kB Download
md5:dde50f8369183005c4ae7e9a6be4f367
2.0 MB Download
md5:ca7dd2353d559ad3d5511fd6e3f66a37
4.3 MB Download
md5:f40df4cd97d028a6738fcddf24f4fc23
2.7 MB Download
md5:1fac55d66458f930057ec5bfb9d512b6
4.6 MB Download
md5:3021ce98e7dc2087366adcc795cb216a
1.7 MB Download
md5:db8947fb784be6e196ff1c18d26b74ac
1.4 MB Download

Additional details

Funding

National Institute of Health
Cryo-EM R01GM146340