Is the winner really the best? A critical analysis of common research practice in biomedical image analysis competitions
Authors/Creators
- 1. Division of Computer Assisted Medical Interventions (CAMI), German Cancer Research Center (DKFZ), Heidelberg, Germany
Contributors
Data collectors:
- 1. Division of Computer Assisted Medical Interventions (CAMI), German Cancer Research Center (DKFZ), Heidelberg, Germany
Description
This data set corresponds to the paper: Is the winner really the best? A critical analysis of common research practice in biomedical image analysis competitions [1] (Experiment: Comprehensive reporting).
The key research questions corresponding to this data set were:
RQ1: What is the role of challenges for the field of biomedical image analysis (e.g. How many challenges conducted to date? In which fields? For which algorithm categories? Based on which modalities?)
RQ2: What is common practice related to challenge design (e.g. choice of metric(s) and ranking methods, number of training/test images, annotation practice etc.)? Are there common standards?
RQ3: Does common practice related to challenge reporting allow for reproducibility and adequate interpretation of results?
To address these research questions, we aimed to capture all biomedical image analysis challenges that have been conducted up to 2016. To acquire the data, we analyzed the websites hosting/representing biomedical image analysis challenges, namely grand-challenge.org, dreamchallenges.org and kaggle.com as well as websites of main conferences in the field of biomedical image analysis, namely Medical Image Computing and Computer Assisted Intervention (MICCAI), International Symposium on Biomedical Imaging (ISBI), International Society for Optics and Photonics (SPIE) Medical Imaging, Cross Language Evaluation Forum (CLEF), International Conference on Pattern Recognition (ICPR), The American Association of Physicists in Medicine (AAPM), the Single Molecule Localization Microscopy Symposium (SMLMS) and the BioImage Informatics Conference (BII). This yielded a list of 150 challenges with 549 tasks.
Next, a tool for instantiating the challenge parameter list introduced in [1] was used by some of the authors (engineers and medical student) to formalize all challenges that met our inclusion criteria as follows: (1) Initially, each challenge was independently formalized by two different observers. (2) The formalization results were automatically compared. In ambiguous cases, when the observers could not agree on the instantiation of a parameter - a third observer was consulted, and a decision was made. When refinements to the parameter list were made, the process was repeated for missing values. Based on the formalized challenge data set, a descriptive statistical analysis was performed to characterize common practice related to challenge design and reporting.
[1] Maier-Hein, L., Eisenmann, M., Reinke, A., Onogur, S., Stankovic, M., Scholz, P., Arbel, T., Bogunovic, H., Bradley, A. P., Carass, A., Feldmann, C., Frangi, A. F., Full, P. M., van Ginneken, B., Hanbury, A., Honauer, K., Kozubek, M., Landman, B. A., März, K., Maier, O., Maier-Hein, K., Menze, B. H., Müller, H., Neher, P. F., Niessen, W., Rajpoot, N., Sharp, G. C., Sirinukunwattana, K., Speidel, S., Stock, C., Stoyanov, D., Aziz Taha, A., van der Sommen, F., Wang, C.-W., Weber, M.-A., Zheng, G., Jannin, P., Kopp-Schneider, A.: Is the winner really the best? A critical analysis of common research practice in biomedical image analysis competitions. arXiv preprint arXiv:1806.02051 (2018).
Notes
Files
Files
(2.8 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:193fb461cb6a4f44a0141680dab990ed
|
2.8 MB | Download |
Additional details
Related works
- Is supplement to
- https://arxiv.org/abs/1806.02051v1 (URL)
References
- Maier-Hein, L., et al.: Is the winner really the best? A critical analysis of common research practice in biomedical image analysis competitions. arXiv preprint arXiv:1806.02051 (2018).