SELMA3D 2026: Self-supervised learning for 3D light-sheet microscopy image segmentation
Authors/Creators
Description
In modern biological and biomedical research, high-resolution, 3D visualization of biological structures is essential for understanding spatial organization and structural relationships across scales, yet traditional imaging often fails to offer cellular resolution or preserve tissue integrity. Combining tissue clearing with light-sheet microscopy (LSM) enables high-contrast, ultra-high-resolution imaging of whole organs or even whole organisms [1–3], supporting studies in neuroscience, immunology, oncology, and cardiology [4–7]. A key bottleneck in translating these rich datasets into biological insight lies in automated image analysis, with segmentation serving as a fundamental step. Whole-organ LSM datasets may contain volumes on the order of 10000^3 voxels, making manual annotation infeasible. Deep learning-based segmentation models offer promising automation [8–10], but they remain task-specific, annotation-intensive, and limited in generalizability [11], highlighting the need for scalable solutions.
It is crucial to develop generalizable or foundation models capable of serving multiple LSM image segmentation tasks. Self-supervised learning (SSL) offers significant advantages in this regard, as it allows deep learning models to pretrain on large-scale, unannotated datasets, thereby learning robust and transferable representations of LSM image data. Subsequently, the model can be fine-tuned on relatively small annotated datasets to address specific segmentation tasks [12], substantially reducing annotation burden while improving generalization across datasets and imaging conditions. As a result, robust and scalable segmentation models can directly facilitate a wide range of downstream biological and clinical workflows, including cell counting, axon tracing, vascular network reconstruction, spatial phenotyping, and quantitative morphometric analysis. Beyond segmentation, SSL-pretrained models can also be leveraged for other downstream tasks, such as artifact removal, image deblurring, anomaly detection, cell-type classification, atlas-based mapping and so on. While these applications are certainly interesting and promising, the focus of this challenge is on segmentation.
Despite the availability of extensive LSM datasets encompassing diverse biological structures, SSL remains underexplored in the LSM domain. Notably, certain characteristics of LSM images, such as their high signal-to-noise ratio, make them particularly well-suited for SSL. The SELMA3D challenge represents a significant advancement in SSL research within the field of 3D LSM images. To the best of our knowledge, SELMA3D 2024 was the first attempt to systematically benchmark self-supervised learning approaches for LSM image segmentation. SELMA3D was successfully organized again at MICCAI in 2025, featuring progressively larger datasets, a refined challenge design, and enhanced participant support. However, the scale of the datasets which reached several terabytes, posed a significant barrier, limiting participation and constraining timely method development.
In the 2026 edition, we will build upon the 2025 framework while lowering the entry barrier and advancing the research scope by restructuring SELMA3D into two complementary tasks. These tasks are designed to isolate two central research questions in SSL for 3D LSM: method development under tightly controlled data conditions, and performance scaling under open-data conditions.
Task 1 SSL for 3D LSM image segmentation under fixed training data conditions. To make participation substantially easier and to enable clean methodological comparisons, Task 1 will be run on a fixed, fully prepared dataset. Compared to 2024 and 2025 editions where we provided large, original large 3D LSM images as the unannotated dataset for SSL, this year we will instead offer preprocessed patches. These patches are cropped from the original 3D LSM volumes into a standardized patch size. To reduce data transfer overhead and minimize the time and effort required for large-scale data downloading and preprocessing, the majority of non-informative background regions are filtered out to improve computational efficiency. However, a representative subset, approximately 5% per biological structure, is intentionally retained to preserve realistic low-signal conditions and enable robustness evaluation. This strategy maintains practical feasibility while allowing participants to focus on optimizing their SSL approaches rather than on extensive data handling and preprocessing. To ensure that performance improvements can be attributed to SSL strategy and model design rather than data volume or curation, no additional data will be permitted beyond what is provided in the challenge package. This task is explicitly designed as a controlled benchmark to study SSL method development, including pretext objectives, masking/augmentation strategies, network architectures, and fine-tuning protocols, without confounding effects from heterogeneous external datasets or unequal compute/data access.
Task 2 SSL for 3D LSM image segmentation under open data conditions. In contrast, Task 2 is designed as an open-data challenge in which participants may leverage any amount of data, including the full set of original unannotated 3D LSM images we provide via links, as well as any additional public or private LSM data. Importantly, Task 2 will use the same held-out test set as Task 1, enabling a direct, apples-to-apples assessment of how data scaling and data diversity impact SSL performance, when the evaluation target remains fixed. This design allows the community to quantify the practical gains from increasing pretraining data size and breadth, and to probe whether scaling laws observed in other domains translate to 3D microscopy.
In the 2026 edition, we will also strengthen the challenge’s scientific scope by expanding data diversity and difficulty. Beyond the growth in our existing collections, a key novelty is the inclusion of axon-focused LSM data from the BRAIN Initiative CONNECTS program (https://www.brain-connects.org/), which introduces fine-grained, elongated neuronal morphology and new sources of heterogeneity that are not well captured by previously included structures [17]. This axonal data complements the broader expansion of unannotated and annotated samples and provides an important new testbed for SSL generalization in settings where topology and continuity can be particularly challenging. Besides, in this way, SELMA3D 2026 will bring the connectomics and large-scale light-sheet microscopy communities together into MICCAI, bridging traditionally separate research ecosystems.
Moreover, this year beyond differences in connectivity among biological structures, we explicitly consider spatial density and crowding as an additional dimension of spatial complexity during evaluation. For example, cell nuclei typically exhibit a dense pattern, whereas amyloid-beta (Aβ) plaques are relatively sparse; similarly, single axons are sparse structures, while brain vessels often form dense networks. Accordingly, this year the datasets are further stratified into four categories: isolated sparse, isolated dense, contiguous sparse, and contiguous dense structures. Model performance will be evaluated separately across these groups, providing deeper insight into where models generalize effectively and where challenges remain.
Building on these improvements, we aim to optimize the challenge setting and establish a more advanced benchmark for self-supervised learning in 3D LSM image segmentation. We look forward to organizing the third edition of the SELMA3D challenge and welcoming submissions.
Files
313-SELMA3D_2026_Self-supervised_learning_for_3D_light-sheet_microscopy_2026-04-22T16-37-10.pdf
Files
(160.0 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:8f6e4df5b3a5a98c3b0f675bf1c87ff1
|
160.0 kB | Preview Download |