ERRORQUAKE:Heavy-Tailed Error Severity Distributions in Open-Weight Large Language Models

Wang, Jason Z

doi:10.5281/zenodo.20514339

Published June 2, 2026 | Version v1

Preprint Open

ERRORQUAKE:Heavy-Tailed Error Severity Distributions in Open-Weight Large Language Models

Wang, Jason Z

This preprint introduces ERRORQUAKE, a benchmark and analysis framework for measuring not only whether large language models make
errors, but how severe those errors are. The study evaluates 21 open-weight LLMs on 10,000 queries across 8 domains and 5
difficulty tiers using a continuous 9-level error severity scale.

ERRORQUAKE models error severity distributions as heavy-tailed phenomena, drawing on a Gutenberg-Richter-style tail index to
characterize how often models produce high-severity failures. The paper reports that models with similar aggregate error rates can
differ substantially in the severity profile of their mistakes, including matched-accuracy model pairs with disjoint confidence
intervals for the severity distribution index.

The work contributes a severity-aware evaluation paradigm for open-weight LLMs, distributional evidence that scalar accuracy can
obscure important differences in model risk, and robustness analyses including bootstrap confidence intervals, sensitivity checks,
and human-audit validation.

Files

errorquake.pdf

Files (853.5 kB)

Name	Size	Download all
errorquake.pdf md5:b5055b44e44a54588fd19956dd8cbb47	853.5 kB	Preview Download

Additional details

Programming language: Python

	All versions	This version
Views	13	13
Downloads	6	6
Data volume	6.8 MB	6.8 MB

ERRORQUAKE:Heavy-Tailed Error Severity Distributions in Open-Weight Large Language Models

Authors/Creators

Description

Files

errorquake.pdf

Files (853.5 kB)

Additional details

Software