Other Open Access
Stefanie Speidel; Lena Maier-Hein; Danail Stoyanov; Hassan Al Hajj; Gwenolé Quellec; Pierre-Henri Conze; Mathieu Lamard; Béatrice Cochener; Imanol Luengo; Abdolrahim Kadkhodamohammadi; Arnaud Huaulmé; Duygu Sarikaya; Kevin Le Mut; Pierre Jannin; Kanako Harada; Aneeq Zia; Kiran Bhattacharyya; Xi Liu; Ziheng Wang; Anthony Jarc
This is the challenge design document for the "Endoscopic Vision Challenge", accepted for MICCAI 2020.
Minimally invasive surgery using cameras to observe the internal anatomy is the preferred approach to many surgical procedures. Furthermore, other surgical disciplines rely on microscopic images. As a result, endoscopic and microscopic image processing as well as surgical vision are evolving as techniques needed to facilitate computer assisted interventions (CAI). Algorithms that have been reported for such images include 3D surface reconstruction, salient feature motion tracking, instrument detection or activity recognition. However, what is missing so far are common datasets for consistent evaluation and benchmarking of algorithms against each other. As a vision CAI challenge at MICCAI, our aim is to provide a formal framework for evaluating the current state of the art, gather researchers in the field and provide high quality data with protocols for validating endoscopic vision algorithms.
Sub-Challenge 1: CATARACTS - Surgical Workflow Analysis
Surgical microscopes or endoscopes are commonly used to observe the anatomy of the organs in surgeries. Analyzing the video signals issued from these tools are evolving as techniques needed to empower computerassisted interventions (CAI). A fundamental building block to such capabilities is the ability to automatically understand what the surgeons are performing throughout the surgery. In other words, recognizing the surgical activities being performed by the surgeon and segmenting videos into semantic labels, that differentiates and localizes tissue types and different instruments, can be deemed as an essential steps toward CAI. The main motivation for these tasks is to design efficient solutions for surgical workflow analysis, with potential applications in post operative analysis of the surgical intervention, surgical training and real-time decision support. Our application domain is cataract surgery. As a challenge, our aim is to provide a formal framework for evaluating new and current state-of-the-art methods and gather researchers in the field of surgical workflow analysis.
Analyzing the surgical workflow is a prerequisite for many applications in computer assisted interventions (CAI), such as real-time decision support, surgeon skill evaluation and report generation. To do so, one crucial step is to recognize the activities being performed by the surgeon throughout the surgery. Visual features have proven their efficiency in such tasks in the recent years, thus, a dataset of cataract surgery videos is used for this task. We have defined twenty surgical activities for cataract procedures. This task consists of identifying the activity at time t using solely visual information from the cataract videos. In particular, it focuses on the online workflow analysis of the cataract surgery, where the algorithm estimates the surgical phase at time t without seeing any future information.
Sub-Challenge 2: CATARACTS - Semantic Segmentation
Video processing and understanding can be used to empower computer assisted interventions (CAI) as well as the development of detailed post-operative analysis of the surgical intervention. A fundamental building block to such capabilities is the ability to understand and segment video frames into semantic labels that differentiate and localize tissue types and different instruments. Deep learning has advanced semantic segmentation techniques dramatically in recent years. Different papers have proposed and studied deep learning models for the task of segmenting color images into body organs and instruments. These studies are however performed different dataset and different level of granualirities, like instrument vs. background, instrument category vs background and instrument category vs body organs. In this challenge, we create a fine-grained annotated dataset that all anatomical structures and instruments are labelled to allow for a standard evaluation of models using the same data at different granularities. We introduce a high quality dataset for semantic segmentation in Cataract surgery. We generated this dataset from the CATARACTS challenge dataset, which is publicly available. To the best of our knowledge, this dataset has the highest quality annotation in surgical data to date.
Sub-Challenge 3: MIcro-Surgical Anastomose Workflow recognition on training sessions
Automatic and online recognition of surgical workflows is mandatory to bring computer assisted surgery (CAS) applications inside the operating room. According to the type of surgery, different modalities could be used for workflow recognition. In the case where the addition of multiple sensors is not possible, the information available for manual surgery is generally restricted to video-only. In the case of robotic-assisted surgery, kinematic information is also available. It is expected that multimodal data would make easier automatic recognition methods.
The “MIcro-Surgical Anastomose Workflow recognition” (MISAW) sub-challenge provides a unique dataset for online automatic recognition of surgical workflow by using both kinematic and stereoscopic video information on a micro-anastomosis training task. Participants are challenged to recognize online surgical workflow at different granularity levels (phases, steps, and activities) by taking advantage of both modalities available. Participants can submit results for the recognition of one or several granularity levels. In the case of several granularities, participants are encouraged (but not required) to submit the result of a multi granularity workflow recognition, i.e. recognize different granularity levels thanks to a unique model.
Sub-Challenge 4: SurgVisDom - Surgical visual domain adaptation: from virtual reality to real, clinical environments
Surgical data science is revolutionizing minimally invasive surgery. By developing algorithms to be context-aware, exciting applications to augment surgeons are becoming possible. However, there exist many sensitivities around surgical data (or health data more generally) needed to develop context-aware models. This challenge seeks to explore the potential for visual domain adaptation in surgery to overcome data privacy concerns. In particular, we propose to use video from virtual reality simulation data from clinical-like tasks to develop algorithms to recognize activities and then to test these algorithms on videos of the same task in a clinical setting (i.e., porcine model).