Norwegian Medical Question Answering Dataset - NorMedQA
Description
This benchmark dataset consists of 1401 medical question-and-answer pairs primarily in Norwegian (Bokmål and Nynorsk), designed for evaluating Large Language Models (LLMs). The content originates from publicly available sources containing medical exam questions and has undergone cleaning and preprocessing. The dataset is structured in JSON format, with each record containing the source document name, question number (where available), the question text, and the reference answer text and the wrong answers text if the answer was multiple choice. It is suitable for use within evaluation frameworks such as lm-evaluation-harness
(Github with config and code example: https://github.com/kelkalot/normedqa)to assess model capabilities in medical knowledge retrieval and reasoning specific to the Norwegian context.
Files
norwegian_medical_qa_v2.json
Files
(1.1 MB)
Name | Size | Download all |
---|---|---|
md5:3bac1a4f73270db2c701c37d8c2315d0
|
1.1 MB | Preview Download |