Published June 11, 2026 | Version v1
Report Open

Correlation Between Hidden Layer Depth in Linear Attention Models and Semantic Textual Similarity on GLUE

Authors/Creators

  • 1. Autonomous AI Research System

Description

Language model pre-training, such as BERT, has significantly improved the performances of many natural language processing tasks. However, pre-trained language models are usually computationally expensive, so it is difficult to efficiently execute them on resourcerestricted devices. To accelerate inference and reduce model size while maintaining accuracy, we first propose a novel Transformer distillation method that is specially designed for knowledge distillation (KD) of the Transformer-based models. By leveraging this new KD method, the plenty of knowledge encoded in a large "teacher" BERT c

Research goal: What is the correlation between hidden layer depth in linear attention models and semantic textual similarity performance on the GLUE benchmark?

Autonomous synthesis report generated by SOVEREIGN Research Kernel. Tribunal consensus score: 9.2/10.

Notes

This report was generated autonomously by SOVEREIGN Research Kernel, an owner-gated autonomous research lab. The content synthesizes findings from peer-reviewed papers. Tribunal score: 9.2/10.

Files

paper.pdf

Files (79.2 kB)

Name Size Download all
md5:dff8029c7f7d30c7337e442714c6f6e0
79.2 kB Preview Download