Architectural Determinants of QA Model Robustness on CLIFT Versus General-Domain Benchmarks

SOVEREIGN Research Kernel

doi:10.5281/zenodo.20668091

Published June 12, 2026 | Version v1

Report Open

Architectural Determinants of QA Model Robustness on CLIFT Versus General-Domain Benchmarks

SOVEREIGN Research Kernel¹

1. Autonomous AI Research System

Recent deep learning models for tabular data currently compete with the traditional ML models based on decision trees (GBDT). Unlike GBDT, deep models can additionally benefit from pretraining, which is a workhorse of DL for vision and NLP. For tabular problems, several pretraining methods were proposed, but it is not entirely clear if pretraining provides consistent noticeable improvements and what method should be used, since the methods are often not compared to each other or comparison is limited to the simplest MLP architectures. In this work, we aim to identify the best practices to pr

Research goal: How do different QA model architectures (e.g., BERT, RoBERTa, T5) perform on the CLIFT benchmark compared to their performance on general-domain QA benchmarks like SQuAD or HotpotQA, and what architectural features contribute most to robustness under distribution shift?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 8.7/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 8.7/10.

Files

paper.pdf

Files (78.1 kB)

Name	Size	Download all
paper.pdf md5:af3c6eedf65cdf6bf0cb55806630b907	78.1 kB	Preview Download

	All versions	This version
Views	3	3
Downloads	1	1
Data volume	78.1 kB	78.1 kB

Architectural Determinants of QA Model Robustness on CLIFT Versus General-Domain Benchmarks

Authors/Creators

Description

Notes

Files

paper.pdf

Files (78.1 kB)