TechManualQA-700: A Validated Synthetic Question-Answering Benchmark for Technical Manuals
- 1. Graz University of Technology
- 2. Know-Center GmbH
Description
This archive contains the full dataset and validation artifacts associated with the ECIR 2026 paper: "A Semi-Automated Pipeline for Synthetic Generation of Grounded, Structurally-Aware QA Datasets for Technical Manuals".
The primary resource is the TechManualQA-700 benchmark, a question-answering dataset containing 700 validated pairs from 20 diverse technical manuals (e.g., home appliances, cars, power tools). The dataset is designed to evaluate Large Language Model (LLM) performance on technical documents and is balanced across three primary question types: General (for factual recall), Procedural (for multi-step guidance), and Unanswerable (for testing hallucination resistance).
Methodology Overview:
The dataset was created using a semi-automated pipeline. A long-context LLM (Gemini 2.5 Pro) generated initial candidates from full-text manuals. These candidates were then refined through a rigorous automated filtering cascade that included semantic deduplication, RAGAS for grounding checks, and a GPT-4.1 model as a quality judge. The final selection was validated by two human annotators to ensure high data quality and reliability, achieving an average Cohen's Kappa of κ=0.82.
Contents of this Archive:
TechManualQA_700.jsonl: The complete benchmark dataset containing 700 question-answering pairs.human_annotation/: A directory containing the raw, filled-out Excel files from our two annotators (A and B) for both the general and procedural audit tasks. This data is provided for full reproducibility of our inter-rater reliability analysis.README.md: A file explaining the schema of the main JSONL dataset file.LICENSE: The dataset is provided under the CC-BY 4.0 license.
The source code for the generation and analysis pipeline is available on GitHub at [https://github.com/tduricic/techmanualqa].
Files
TechManualQA_700.zip
Files
(167.4 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:cbd7984bd8506777d3d805c7acff4e4f
|
167.4 kB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/tduricic/techmanualqa
- Programming language
- Python
- Development Status
- Active