ODIN2026 Challenge: Multimodal Text Report Generation for Oral and Dental Image Analysis

Bolelli, Federico; Ben-Hamadou, Achraf; Lumetti, Luca; Pujades Rocamora, Sergi; van Nistelrooij, Niels; Marchesini, Kevin; Cremonini, Francesca; Di Bartolomeo, Mattia; Fix, Lucas; Morelli, Nicola; Rekik, Ahmed; Neifar, Nour; Abida, Ons; Smaoui, Oussama; Xi, Ton; Vinayahalingam, Shankeeth; Lombardo, Luca; Anesi, Alexandre; Grana, Costantino

doi:10.5281/zenodo.19727377

Published April 24, 2026 | Version v1

Other Open

ODIN2026 Challenge: Multimodal Text Report Generation for Oral and Dental Image Analysis

1. Università degli Studi di Modena e Reggio Emilia
2. Digital Research Centre of Sfax
3. University of Modena and Reggio Emilia
4. Inria
5. Univ. Grenoble Alpes
6. CNRS
7. Grenoble INP
8. LJK
9. Radboud University Medical Center
10. University of Ferrara
11. Biotech Dental Group

Three-dimensional imaging is now routine in dentistry and maxillofacial surgery, where cone-beam computed tomography (CBCT) and intraoral scanning (IOS) support diagnosis, treatment planning, and follow-up across workflows such as implantology, tooth removal, and orthodontics. CBCT captures internal dental and craniofacial anatomy (e.g., bone quality/quantity and proximity to critical structures), while IOS provides highly accurate surface geometry of crowns and gingiva. However, despite the increasing availability of rich 3D (and complementary 2D) data, the production of structured, high-quality clinical reports remains largely manual—time-consuming for clinicians and prone to inter-observer variability—creating a clear bottleneck for scalable, consistent clinical decision support.

From a technical perspective, the community has made substantial progress on foundational subtasks such as 3D segmentation, landmarking, and multimodal registration, including approaches that begin to approach full automation in constrained settings. Yet, these advances have not translated into standardized, end-to-end benchmarks that directly evaluate systems transforming multimodal oral imaging into clinically meaningful textual descriptions. Compared with the growing body of work on 2D image-to-report generation, multimodal 3D-to-text reporting is still underexplored, largely due to the added complexity of learning and summarizing 3D spatial relationships across volumes, meshes, and photographs.

ODIN 2026 addresses this gap by introducing a dedicated benchmark for multimodality-to-text report generation, building on the ToothFairy and 3DTeethLand/Seg series. The challenge targets two clinically intertwined scenarios: (1) maxillofacial and surgical planning report generation from CBCT (ToothFairy4), and (2) orthodontic report generation from IOS meshes and intraoral photographs (Bite2Text). By design, both tasks leverage multi-center training data and a hidden test set from an independent center to rigorously assess robustness to domain shifts: an essential requirement for real-world deployment.

The envisioned impact is twofold. Biomedically, ODIN 2026 aims to accelerate and standardize documentation by enabling automatic draft reports that improve consistency across centers, reduce clinician workload, and support less experienced practitioners by systematically surfacing critical findings and risks. Technically, the challenge will catalyze research in multimodal 3D-to-text learning by benchmarking models that must jointly reason over high-dimensional volumes, surface meshes, 2D photographs, and clinical language, while producing template-constrained outputs suitable for clinical use. By releasing training resources and evaluation protocols and combining automatic text metrics with expert review, ODIN 2026 is designed to drive clinically grounded innovation at the intersection of dental imaging, multimodal learning, and language generation.

Files

293-ODIN2026_Challenges_-_Multimodal_Text_Report_Generation_for_2026-04-22T16-36-59.pdf

Files (164.6 kB)

Name	Size	Download all
293-ODIN2026_Challenges_-_Multimodal_Text_Report_Generation_for_2026-04-22T16-36-59.pdf md5:e9a80625577bc7574d9c523fed5a59d0	164.6 kB	Preview Download

	All versions	This version
Views	657	657
Downloads	347	347
Data volume	65.8 MB	65.8 MB

ODIN2026 Challenge: Multimodal Text Report Generation for Oral and Dental Image Analysis

Authors/Creators

Description

Files

293-ODIN2026_Challenges_-_Multimodal_Text_Report_Generation_for_2026-04-22T16-36-59.pdf

Files (164.6 kB)