Published April 2, 2024 | Version 1.0
Dataset Open

FISBe: A real-world benchmark dataset for instance segmentation of long-range thin filamentous structures

  • 1. Max Delbrück Center for Molecular Medicine
  • 2. Howard Hughes Medical Institute - Janelia Research Campus
  • 3. German Cancer Research Center
  • 1. Max Delbrück Center for Molecular Medicine
  • 2. Howard Hughes Medical Institute - Janelia Research Campus
  • 3. German Cancer Research Center

Description

General

For more details and the most up-to-date information please consult our project page: https://kainmueller-lab.github.io/fisbe.

Summary

  • A new dataset for neuron instance segmentation in 3d multicolor light microscopy data of fruit fly brains
    • 30 completely labeled (segmented) images
    • 71 partly labeled images
    • altogether comprising ∼600 expert-labeled neuron instances (labeling a single neuron takes between 30-60 min on average, yet a difficult one can take up to 4 hours)
  • To the best of our knowledge, the first real-world benchmark dataset for instance segmentation of long thin filamentous objects
  • A set of metrics and a novel ranking score for respective meaningful method benchmarking
  • An evaluation of three baseline methods in terms of the above metrics and score

Abstract

Instance segmentation of neurons in volumetric light microscopy images of nervous systems enables groundbreaking research in neuroscience by facilitating joint functional and morphological analyses of neural circuits at cellular resolution. Yet said multi-neuron light microscopy data exhibits extremely challenging properties for the task of instance segmentation: Individual neurons have long-ranging, thin filamentous and widely branching morphologies, multiple neurons are tightly inter-weaved, and partial volume effects, uneven illumination and noise inherent to light microscopy severely impede local disentangling as well as long-range tracing of individual neurons. These properties reflect a current key challenge in machine learning research, namely to effectively capture long-range dependencies in the data. While respective methodological research is buzzing, to date methods are typically benchmarked on synthetic datasets. To address this gap, we release the FlyLight Instance Segmentation Benchmark (FISBe) dataset, the first publicly available multi-neuron light microscopy dataset with pixel-wise annotations. In addition, we define a set of instance segmentation metrics for benchmarking that we designed to be meaningful with regard to downstream analyses. Lastly, we provide three baselines to kick off a competition that we envision to both advance the field of machine learning regarding methodology for capturing long-range data dependencies, and facilitate scientific discovery in basic neuroscience.

Dataset documentation:

We provide a detailed documentation of our dataset, following the Datasheet for Datasets questionnaire:

>> FISBe Datasheet

Our dataset originates from the FlyLight project, where the authors released a large image collection of nervous systems of ~74,000 flies, available for download under CC BY 4.0 license.

Files

  • fisbe_v1.0_{completely,partly}.zip
    • contains the image and ground truth segmentation data; there is one zarr file per sample, see below for more information on how to access zarr files.
  • fisbe_v1.0_mips.zip
    • maximum intensity projections of all samples, for convenience.
  • sample_list_per_split.txt
    • a simple list of all samples and the subset they are in, for convenience.
  • view_data.py
    • a simple python script to visualize samples, see below for more information on how to use it.
  • dim_neurons_val_and_test_sets.json
    • a list of instance ids per sample that are considered to be of low intensity/dim; can be used for extended evaluation.
  • Readme.md
    • general information

How to work with the image files

Each sample consists of a single 3d MCFO image of neurons of the fruit fly.
For each image, we provide a pixel-wise instance segmentation for all separable neurons.
Each sample is stored as a separate zarr file (zarr is a file storage format for chunked, compressed, N-dimensional arrays based on an open-source specification.").
The image data ("raw") and the segmentation ("gt_instances") are stored as two arrays within a single zarr file.
The segmentation mask for each neuron is stored in a separate channel.
The order of dimensions is CZYX.

We recommend to work in a virtual environment, e.g., by using conda:

conda create -y -n flylight-env -c conda-forge python=3.9
conda activate flylight-env

How to open zarr files

  1. Install the python zarr package: 
    pip install zarr
  2. Opened a zarr file with:

    import zarr
    raw = zarr.open(<path_to_zarr>, mode='r', path="volumes/raw")
    seg = zarr.open(<path_to_zarr>, mode='r', path="volumes/gt_instances")

    # optional:
    import numpy as np
    raw_np = np.array(raw)

Zarr arrays are read lazily on-demand.
Many functions that expect numpy arrays also work with zarr arrays.
Optionally, the arrays can also explicitly be converted to numpy arrays.

How to view zarr image files

We recommend to use napari to view the image data.

  1. Install napari: 
    pip install "napari[all]"
  2. Save the following Python script: 

    import zarr, sys, napari

    raw = zarr.load(sys.argv[1], mode='r', path="volumes/raw")
    gts = zarr.load(sys.argv[1], mode='r', path="volumes/gt_instances")

    viewer = napari.Viewer(ndisplay=3)
    for idx, gt in enumerate(gts):
      viewer.add_labels(
        gt, rendering='translucent', blending='additive', name=f'gt_{idx}')
    viewer.add_image(raw[0], colormap="red", name='raw_r', blending='additive')
    viewer.add_image(raw[1], colormap="green",  name='raw_g', blending='additive')
    viewer.add_image(raw[2], colormap="blue",  name='raw_b', blending='additive')
    napari.run()

  3. Execute: 
    python view_data.py <path-to-file>/R9F03-20181030_62_B5.zarr

Metrics

  • S: Average of avF1 and C
  • avF1: Average F1 Score
  • C: Average ground truth coverage
  • clDice_TP: Average true positives clDice
  • FS: Number of false splits
  • FM: Number of false merges
  • tp: Relative number of true positives

For more information on our selected metrics and formal definitions please see our paper.

Baseline

To showcase the FISBe dataset together with our selection of metrics, we provide evaluation results for three baseline methods, namely PatchPerPix (ppp), Flood Filling Networks (FFN) and a non-learnt application-specific color clustering from Duan et al..
For detailed information on the methods and the quantitative results please see our paper.

License

The FlyLight Instance Segmentation Benchmark (FISBe) dataset is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Citation

If you use FISBe in your research, please use the following BibTeX entry: 

@misc{mais2024fisbe,
  title =        {FISBe: A real-world benchmark dataset for instance
                  segmentation of long-range thin filamentous structures},
  author =       {Lisa Mais and Peter Hirsch and Claire Managan and Ramya
                  Kandarpa and Josef Lorenz Rumberger and Annika Reinke and Lena
                  Maier-Hein and Gudrun Ihrke and Dagmar Kainmueller},
  year =         2024,
  eprint =       {2404.00130},
  archivePrefix ={arXiv},
  primaryClass = {cs.CV}
}

Acknowledgments

We thank Aljoscha Nern for providing unpublished MCFO images as well as Geoffrey W. Meissner and the entire FlyLight Project Team for valuable
discussions.
P.H., L.M. and D.K. were supported by the HHMI Janelia Visiting Scientist Program.
This work was co-funded by Helmholtz Imaging.

Changelog

There have been no changes to the dataset so far.
All future change will be listed on the changelog page.

Contributing

If you would like to contribute, have encountered any issues or have any suggestions, please open an issue for the FISBe dataset in the accompanying github repository.

All contributions are welcome!

Files

Readme.md

Files (21.6 GB)

Name Size Download all
md5:1074814993f36a5202d7318521d80571
1.4 kB Preview Download
md5:cb9da1180c39a712309f24370a150a49
7.1 GB Preview Download
md5:35b81c46e2f698c193862be81ccf1f67
82.6 MB Preview Download
md5:9f014ba2f0b6a696592ad0f815f2d553
14.4 GB Preview Download
md5:472b332b60d991cf23b2984a31571baa
6.2 kB Preview Download
md5:ea49422db84eab181b8054df2868090b
2.5 kB Preview Download
md5:64c62b8204b0b72eab07bd1d6c14be76
560 Bytes Download

Additional details

Identifiers

Related works

Is derived from
Dataset: 10.7554/eLife.80660 (DOI)

Dates

Accepted
2024-02-26
Accepted at CVPR 2024
Available
2024-04-02
Made publicly available