Published June 6, 2025 | Version v1
Dataset Open

STED-FM dataset

Description

This is the self-supervised training dataset associated with the publication :

Bilodeau, A.*, Beaupré, F.*, Chabbert, J., Bellavance, J-M, Lessard, K., Deschênes, A., Bernatchez, R., De Koninck, P., Gagné, C., Lavoie-Cardinal, F. (2025) A Self-Supervised Foundation Model for Robust and Generalizable Representation Learning in STED Microscopy. bioRxiv.

The STED-FM dataset consists of 37387 images of varying size which were split into 224x224 crops. The resulting size of the dataset was 976 022 crops, all of which were used for pre-training of STED-FM. The provided datasets contain the crops.

A subset of 238 683 crops each associated with one of 24 protein classes is also provided.

The dataset is provided as tar files. We provide the images already preprocessed for normalization (`STED-FM-dataset-crop.tar`) and as raw values (`STED-FM-dataset-crops-raw.tar`). All files in these archives are stored as npz with keys: `image`, and `metadata`. We also provide the raw files stored as tif files (`STED-FM-dataset-crops-tiff-raw.tar`).

Files

STED-FM-dataset-crops.zip

Files (23.3 GB)

Name Size Download all
md5:a9752cdf6591bcdb2ae4c9401f867084
19.0 GB Preview Download
md5:642910b437f61e5181a6535b67863b3e
4.3 GB Preview Download

Additional details

References

  • Bilodeau, A.*, Beaupré, F.*, Chabbert, J., Bellavance, J-M, Lessard, K., Deschênes, A., Bernatchez, R., De Koninck, P., Gagné, C., Lavoie-Cardinal, F. (2025) A Self-Supervised Foundation Model for Robust and Generalizable Representation Learning in STED Microscopy. bioRxiv.