Published April 8, 2026 | Version v1
Publication Open

VRS: Void Rescue System for Regression in Transformers

  • 1. Independent

Description

We study regression-based token prediction in transformers, where the model predicts continuous embedding targets rather than vocabulary logits. Decoding therefore becomes geometric retrieval in embedding space, and a latent can contain useful token information while still decoding poorly if it falls in an ambiguous or weakly aligned region, which we interpret as a void. To address this failure mode, we introduce the Void Rescue System (VRS), a lightweight auxiliary decoder that maps raw regression latents to more token-aligned vectors before final decoding. Across corrected runs, VRS consistently improves both validation bits-per-byte (BPB) and nearest-neighbor top-1 accuracy (nn acc) over raw decoding. In the three corrected 10- minute Parameter Golf runs with a 0.5M Rescuer and 17.58M total parameters, the rescued system reaches 1.8667 BPB on average and 50.51% peak nn acc, with a stable BPB crossover at step 3600. Against three separately trained 10-minute regression-only baselines, which reach 2.0941–2.1301 BPB and 50.05–50.18% peak nn acc, the short-run VRS system remains clearly better under the same budget. A 1M Rescuer reaches 1.8630 BPB and 50.62% peak nn acc, crossing at step 3400, but exceeds the artifact limit. At larger scale, rescued decoding reaches 1.8480 BPB and 51.04% peak nn acc in the 41-minute base run, 1.7570 BPB and 53.35% in the 2x one-hour run, and 1.7733 BPB and 55.34% in the 4x one-hour run; the 4x system still gains 0.6411 BPB over raw decoding and crosses stably by step 1600. These results are consistent with the hypothesis that regression latents can be informative before they are directly decodable, and that a very small decoder can improve decoding from such latents. Our aim is not to benchmark regression against standard classification models or argue that one paradigm is globally better; rather, we use regression as an experimental setting for studying transformer behavior and for testing whether a small decoder can correct regression-specific decoding errors.

Files

vrs_void_rescue_system_for_regression_transformers.pdf

Files (1.7 MB)

Additional details