There is a newer version of the record available.

Published May 5, 2025 | Version v5
Dataset Open

Norwegian Medical Question Answering Dataset - NorMedQA

  • 1. ROR icon Simula Research Laboratory

Contributors

Contact person:

  • 1. ROR icon Simula Research Laboratory

Description

This benchmark dataset consists of 1401 medical question-and-answer pairs primarily in Norwegian (Bokmål and Nynorsk), designed for evaluating Large Language Models (LLMs). The content originates from publicly available sources containing medical exam questions and has undergone cleaning and preprocessing. The dataset is structured in JSON format, with each record containing the source document name, question number (where available), the question text, and the reference answer text and the wrong answers text if the answer was multiple choice. It is suitable for use within evaluation frameworks such as lm-evaluation-harness (Github with config and code example: https://github.com/kelkalot/normedqa)to assess model capabilities in medical knowledge retrieval and reasoning specific to the Norwegian context.

Files

norwegian_medical_qa_v2.json

Files (1.1 MB)

Name Size Download all
md5:3bac1a4f73270db2c701c37d8c2315d0
1.1 MB Preview Download