DMIS Lab at MedHopQA-2025: Ensemble Multi-Retrieval Methodologies with Reasoning Language Model Decision

Jung, Jongmyung; Hwang, Hyeongsoon; Park, Yein; Song, Minju; Yoon, Jaehoon; Hwang, Hyeon; Lee, Sanghoon; Sohn, Jiwoong; Kang, Jaewoo

doi:10.5281/zenodo.16875789

Published August 14, 2025 | Version v1

Conference proceeding Open

DMIS Lab at MedHopQA-2025: Ensemble Multi-Retrieval Methodologies with Reasoning Language Model Decision

1. Department of Computer Science and Engineering, Korea University
2. College of Medicine, Hanyang University
3. AIGEN Sciences
4. Department of Biosystems Science and Engineering, ETH Zurich

Abstract

Robust and trustworthy biomedical question answering (QA) remains a critical challenge for large language models (LLMs), especially in complex domains such as rare diseases, where information is fragmented across multiple sources. MedHopQA 2025 benchmark introduces 10,000 multi-step reasoning questions curated from Wikipedia, requiring systems to extract, connect, and synthesize biomedical knowledge across interlinked documents. In this work, we present a retrieval-augmented generation (RAG) and decision-making framework that integrates diverse retrieval strategies, including Query2Doc-based, Rationale-based, and Web-augmented retrieval, and employs a dedicated decision-maker model to select or directly generate the most accurate and well-reasoned answers. Our system not only leverages evidence from both Wikipedia and the web but also explicitly evaluates and compares candidate answers to ensure answer reliability and reasoning transparency. Experiments on the test set demonstrate that our ensemble approach achieves state-of-the-art performance, highlighting the importance of hybrid retrieval and robust decision-making in advancing biomedical multi-step QA.

This article is part of the Proceedings of the BioCreative IX Challenge and Workshop (BC9): Large Language Models for Clinical and Biomedical NLP at the International Joint Conference on Artificial Intelligence (IJCAI).

Files

BC9_paper10.pdf

Files (221.4 kB)

Name	Size	Download all
BC9_paper10.pdf md5:56f6225da26f7538c67a105a8186f4fa	221.4 kB	Preview Download

114

Views

154

Downloads

Show more details

	All versions	This version
Views	114	114
Downloads	154	154
Data volume	39.9 MB	39.9 MB

More info on how stats are collected....

DOI

Resource type

Conference proceeding

Publisher

Zenodo

Imprint

Proceedings of the BioCreative IX Challenge and Workshop (BC9): Large Language Models for Clinical and Biomedical NLP at the International Joint Conference on Artificial Intelligence (IJCAI).

Conference

International Joint Conference on Artificial Intelligence (IJCAI) , Montreal, Canada, 2025

License: Creative Commons Attribution 4.0 International

The Creative Commons Attribution license allows re-distribution and re-use of a licensed work on the condition that the creator is appropriately credited. Read more

Technical metadata

Created: August 14, 2025
Modified: August 14, 2025

DMIS Lab at MedHopQA-2025: Ensemble Multi-Retrieval Methodologies with Reasoning Language Model Decision

Authors/Creators

Description

Abstract

Files

BC9_paper10.pdf

Files (221.4 kB)