Round 41 (Appendix L): add per-block drift diagnostic reinforcing cos-vs-acc hypothesis

Extracted from existing round 38 JSON data without running new compute. The drift field (||W_final - W_init||_F / ||W_init||_F) is produced by cifar_resmlp.py's feature_drift() and was already saved but not used in the paper. Key finding: CB+penalty has LARGER block updates than SB+penalty (per-block w2 drift 19.3x vs 14.3x; embed drift 44.6x vs 7.1x) yet 9.3 pp LOWER accuracy. This rules out 'CB just has smaller updates' as an alternative explanation for the cos-vs-acc dissociation. Added 2 sentences to Appendix L paragraph 2 noting this supporting evidence for the 'angular agreement does not certify functional forward-state content' mechanism hypothesis in §4. Main content still 9 pages exactly within E&D budget. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
author: YurenHao0426 <Blackhao0426@gmail.com> 2026-04-08 12:22:58 -0500
committer: YurenHao0426 <Blackhao0426@gmail.com> 2026-04-08 12:22:58 -0500
commit: c201cb31018b35bf88482f7dc768b8f7a057703b (patch)
tree: d12e4640b4d0abef34c73f0f667f8a0eb026f794
parent: 35be969067396306c19a3caac2d887bcde48c5d0 (diff)
2 files changed, 1 insertions, 1 deletions
diff --git a/paper/main.pdf b/paper/main.pdf
index 0414d31..0626715 100644
--- a/paper/main.pdf
+++ b/paper/main.pdf
diff --git a/paper/main.tex b/paper/main.tex
index faa22df..b71d023 100644
--- a/paper/main.tex
+++ b/paper/main.tex
@@ -485,7 +485,7 @@ DFA+pen mean (3 seeds) & $0.363 \pm 0.001$ & $4.0\times 10^4$ & $9.0\times 10^{-
 \end{tabular}
 \end{table}
 
-The penalty rescue effect on State Bridge is much larger than on DFA: $+24$ percentage points for State Bridge versus $+5.5$ percentage points for DFA on the same architecture and intervention. SB+penalty is the first audited non-BP method whose trained deep blocks substantively beat the architecture-matched random-block baseline. We treat this as evidence that Mode~2 (low intrinsic credit-direction quality) has method-dependent severity within the audited fixed-feedback family once Mode~1 is alleviated, rather than being a uniform property of all fixed-feedback local-credit objectives. Importantly, State Bridge's deep cosine $+0.322$ is approximately twice DFA's $+0.155$ on the same intervention, but neither approaches the BP reference value of $\approx +1.0$, so this is a within-class gradation in credit-direction quality, not a claim that bridge constructions ``solve'' Mode~2. Under the same intervention Credit Bridge reaches a three-seed test accuracy of $0.360 \pm 0.003$, a three-seed deep mean cosine of $+0.679 \pm 0.008$, and a three-seed deep mean $\rho$ of $+0.464 \pm 0.025$, with $\|h_L\|\approx 5680 \pm 178$ and $\|g_L\|\approx 1.9\times 10^{-5}$ well above the diagnostic floor. Credit Bridge therefore has an even higher deep cosine than State Bridge (about $4\times$ the DFA value and roughly $2\times$ the State Bridge value), but reaches the same final accuracy as DFA+penalty and $9.3$ percentage points below State Bridge+penalty. This is a clean dissociation: within the audited fixed-feedback family under the same rescue, deep cosine and deep $\rho$ differ by more than a factor of four across methods without tracking final accuracy in the same direction, so alignment to the BP gradient is a necessary but not sufficient diagnostic of usable credit for depth. That cross-method dissociation is a direct reason the protocol in Section~\ref{sec:protocol} keeps final accuracy, layerwise credit quality, and the depth-utilization baseline as three separate reporting axes rather than collapsing them into a single headline.
+The penalty rescue effect on State Bridge is much larger than on DFA: $+24$ percentage points for State Bridge versus $+5.5$ percentage points for DFA on the same architecture and intervention. SB+penalty is the first audited non-BP method whose trained deep blocks substantively beat the architecture-matched random-block baseline. We treat this as evidence that Mode~2 (low intrinsic credit-direction quality) has method-dependent severity within the audited fixed-feedback family once Mode~1 is alleviated, rather than being a uniform property of all fixed-feedback local-credit objectives. Importantly, State Bridge's deep cosine $+0.322$ is approximately twice DFA's $+0.155$ on the same intervention, but neither approaches the BP reference value of $\approx +1.0$, so this is a within-class gradation in credit-direction quality, not a claim that bridge constructions ``solve'' Mode~2. The drift diagnostic reinforces this reading rather than contradicting it: per-block $w_2$ relative displacement after $30$ epochs is $14.3\times$ for SB+penalty and $19.3\times$ for CB+penalty (a $35\%$ gap), and the embedding layer's relative drift is $7.1\times$ for SB versus $44.6\times$ for CB (a $6\times$ gap), so CB's per-block updates are not silenced under penalty and are in fact larger in magnitude than SB's, yet CB's final accuracy is $9.3$ percentage points lower. The larger-but-less-useful parameter updates in CB are consistent with the mechanism hypothesis that angular agreement with the BP gradient does not by itself certify the functional forward-state content of the update. Under the same intervention Credit Bridge reaches a three-seed test accuracy of $0.360 \pm 0.003$, a three-seed deep mean cosine of $+0.679 \pm 0.008$, and a three-seed deep mean $\rho$ of $+0.464 \pm 0.025$, with $\|h_L\|\approx 5680 \pm 178$ and $\|g_L\|\approx 1.9\times 10^{-5}$ well above the diagnostic floor. Credit Bridge therefore has an even higher deep cosine than State Bridge (about $4\times$ the DFA value and roughly $2\times$ the State Bridge value), but reaches the same final accuracy as DFA+penalty and $9.3$ percentage points below State Bridge+penalty. This is a clean dissociation: within the audited fixed-feedback family under the same rescue, deep cosine and deep $\rho$ differ by more than a factor of four across methods without tracking final accuracy in the same direction, so alignment to the BP gradient is a necessary but not sufficient diagnostic of usable credit for depth. That cross-method dissociation is a direct reason the protocol in Section~\ref{sec:protocol} keeps final accuracy, layerwise credit quality, and the depth-utilization baseline as three separate reporting axes rather than collapsing them into a single headline.
 
 \section{Reproducibility}
 \label{app:reproducibility}
author	YurenHao0426 <Blackhao0426@gmail.com>	2026-04-08 12:22:58 -0500
committer	YurenHao0426 <Blackhao0426@gmail.com>	2026-04-08 12:22:58 -0500
commit	c201cb31018b35bf88482f7dc768b8f7a057703b (patch)
tree	d12e4640b4d0abef34c73f0f667f8a0eb026f794
parent	35be969067396306c19a3caac2d887bcde48c5d0 (diff)