Dissecting BERT Layers: FFN Dual Role, Separability-Guided Layer Skip, and Interpretable Classification via Charge-Flow Learning
Authors/Creators
Description
We present a layer-level analysis framework for BERT across five GLUE tasks. Using RX(River XAI), a charge-flow based interpretable learning framework, we replace BERT’s classifier with a 2–16 node interpretable network and identify removable layers through separability analysis. Our key contributions are: (1) a separability-guided layer skip method validated by actual BERT forward-pass experiments on all five tasks, (2) quantitative decomposition of FFN’s dual role — 92% structural (norm normalization) vs. 8% classification-relevant — explaining why FFN removal causes model collapse while individual layers appear “harmful” to classification, and (3) error analysis revealing that 60–93% of misclassifications are high-confidence errors (margin > 0.3), indicating BERT’s CLS representation itself is the bottleneck. RX is one application of a broader proprietary learning framework developed at River Lab; method specifics are subject to intellectual property protection.
Files
river_7 (2).pdf
Files
(295.4 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:53de7e3755af7d2e4edbba3d4237fb6d
|
295.4 kB | Preview Download |