Published December 22, 2025 | Version v1
Dataset Open

CryoFM EMDB Data Lists

  • 1. ByteDance Seed

Description

Description

This dataset provides CSV lists containing structured metadata for cryo-electron microscopy (cryo-EM) map processing tasks used in the CryoFM research papers. The data lists are curated from entries in the Electron Microscopy Data Bank (EMDB) and organized into CSV files with detailed metadata for training, validation, and testing of deep learning models.

Dataset Contents

This repository contains CSV data lists for two main research projects:
 
1. CryoFM1 (ICLR 2025): CSV lists for cryo-EM map processing at two different resolutions
  • cryofm1_1-5apix_dataset: High-resolution dataset (~1.5 Å/pixel)
  • cryofm1_3apix_dataset: Standard-resolution dataset (~3.0 Å/pixel)
2. CryoFM2: CSV lists for foundation model pre-training and fine-tuning
  • cryofm2_pretrain_dataset: Pre-training dataset with half-map pairs
  • cryofm2_emhancer_dataset: Enhancement dataset with half-map pairs and model-based LocScale maps
  • cryofm2_emready_dataset: EMReady dataset with deposited and simulated maps

CSV List Structure

CryoFM1 CSV lists contain: EMDB ID, relative path to map file, voxel dimensions (nz, ny, nx), and pixel size (apix). Maps are rescaled to specified resolutions (1.5 or 3.0 Å/pixel).
CryoFM2 CSV lists contain: map paths, statistical features (mean, std, quantile_max_value), pixel size (apix), and EMDB/PDB IDs. All maps are resized to 1.5 Å/pixel.
Detailed schema descriptions are provided in `schema.md` files within each dataset directory.

Note

This dataset contains CSV metadata lists only; the actual map files are not included. Map files should be downloaded from EMDB using the provided EMDB IDs.

Files

cryoFM-emdb-lists.zip

Files (358.3 kB)

Name Size Download all
md5:dede72b9ecd120b8f6cb5dd9b3a94b59
358.3 kB Preview Download

Additional details

Related works

Is supplement to
Conference paper: arXiv:2410.08631 (arXiv)
Preprint: 10.64898/2025.12.29.696802 (DOI)

Software

Repository URL
https://github.com/ByteDance-Seed/cryofm
Programming language
Python
Development Status
Active