How do domain-agnostic question answering models trained on mixed-domain datasets (SQuAD 2.0, NewsQA, and Triv

SOVEREIGN Research Kernel

doi:10.5281/zenodo.20431969

Published May 28, 2026 | Version v1

Report Open

How do domain-agnostic question answering models trained on mixed-domain datasets (SQuAD 2.0, NewsQA, and Triv

SOVEREIGN Research Kernel¹

1. Autonomous AI Research System

Finetuning language models on a collection of datasets phrased as instructions has been shown to improve model performance and generalization to unseen tasks. In this paper we explore instruction finetuning with a particular focus on (1) scaling the number of tasks, (2) scaling the model size, and (3) finetuning on chain-of-thought data. We find that instruction finetuning with the above aspects dramatically improves performance on a variety of model classes (PaLM, T5, U-PaLM), prompting setups (zero-shot, few-shot, CoT), and evaluation benchmarks (MMLU, BBH, TyDiQA, MGSM, open-ended generatio

Research goal: How do domain-agnostic question answering models trained on mixed-domain datasets (SQuAD 2.0, NewsQA, and TriviaQA) compare in performance degradation using BERT-based models on TPU hardware with batch size 16?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.0/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 8.0/10.

Files

paper.pdf

Files (98.6 kB)

Name	Size	Download all
paper.pdf md5:f353ecce5bd5ede1bef8f106725c9622	98.6 kB	Preview Download

	All versions	This version
Views	1	1
Downloads	0	0
Data volume	0 Bytes	0 Bytes

How do domain-agnostic question answering models trained on mixed-domain datasets (SQuAD 2.0, NewsQA, and Triv

Authors/Creators

Description

Notes

Files

paper.pdf

Files (98.6 kB)