diff options
| author | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-08 12:53:00 -0500 |
|---|---|---|
| committer | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-08 12:53:00 -0500 |
| commit | 3919f5a7db4abb699af150107bb1a2c587c14cfd (patch) | |
| tree | 8e150567bde28ba316a077b829fa36e9137a60c0 | |
| parent | c201cb31018b35bf88482f7dc768b8f7a057703b (diff) | |
Round 41 (Appendix L): add nudging-test functional evidence for cos-vs-acc dissociation
The nudging test values were already in the round 38 JSON under
diag['nudging']['0.01'] but never used. Extracted and added to Appendix L:
SB+penalty: deep nudge delta = -1.78e-3 (per-layer, eta=0.01)
CB+penalty: deep nudge delta = -0.45e-3 (per-layer, eta=0.01)
A single step of size eta=0.01 in each method's credit direction decreases
the test loss by 1.78e-3 (SB) vs 0.45e-3 (CB) — a 4x gap in functional
loss decrease that EXACTLY INVERTS the 4x deep-cosine gap between the
methods. This is the direct functional measurement for the
'angular agreement is not sufficient' claim.
Combined with the drift diagnostic (larger CB updates), the cos-vs-acc
mechanism hypothesis now has THREE independent pieces of support:
1. Test accuracy (headline: CB same as DFA, SB higher)
2. Parameter drift (CB larger updates than SB)
3. Nudging functional loss decrease (CB 4x smaller than SB)
Zero new compute — all from existing round 38 JSON data.
Main content still 9 pages exactly within E&D budget.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| -rw-r--r-- | paper/main.pdf | bin | 483268 -> 484316 bytes | |||
| -rw-r--r-- | paper/main.tex | 2 |
2 files changed, 1 insertions, 1 deletions
diff --git a/paper/main.pdf b/paper/main.pdf Binary files differindex 0626715..814a77b 100644 --- a/paper/main.pdf +++ b/paper/main.pdf diff --git a/paper/main.tex b/paper/main.tex index b71d023..2854cca 100644 --- a/paper/main.tex +++ b/paper/main.tex @@ -485,7 +485,7 @@ DFA+pen mean (3 seeds) & $0.363 \pm 0.001$ & $4.0\times 10^4$ & $9.0\times 10^{- \end{tabular} \end{table} -The penalty rescue effect on State Bridge is much larger than on DFA: $+24$ percentage points for State Bridge versus $+5.5$ percentage points for DFA on the same architecture and intervention. SB+penalty is the first audited non-BP method whose trained deep blocks substantively beat the architecture-matched random-block baseline. We treat this as evidence that Mode~2 (low intrinsic credit-direction quality) has method-dependent severity within the audited fixed-feedback family once Mode~1 is alleviated, rather than being a uniform property of all fixed-feedback local-credit objectives. Importantly, State Bridge's deep cosine $+0.322$ is approximately twice DFA's $+0.155$ on the same intervention, but neither approaches the BP reference value of $\approx +1.0$, so this is a within-class gradation in credit-direction quality, not a claim that bridge constructions ``solve'' Mode~2. The drift diagnostic reinforces this reading rather than contradicting it: per-block $w_2$ relative displacement after $30$ epochs is $14.3\times$ for SB+penalty and $19.3\times$ for CB+penalty (a $35\%$ gap), and the embedding layer's relative drift is $7.1\times$ for SB versus $44.6\times$ for CB (a $6\times$ gap), so CB's per-block updates are not silenced under penalty and are in fact larger in magnitude than SB's, yet CB's final accuracy is $9.3$ percentage points lower. The larger-but-less-useful parameter updates in CB are consistent with the mechanism hypothesis that angular agreement with the BP gradient does not by itself certify the functional forward-state content of the update. Under the same intervention Credit Bridge reaches a three-seed test accuracy of $0.360 \pm 0.003$, a three-seed deep mean cosine of $+0.679 \pm 0.008$, and a three-seed deep mean $\rho$ of $+0.464 \pm 0.025$, with $\|h_L\|\approx 5680 \pm 178$ and $\|g_L\|\approx 1.9\times 10^{-5}$ well above the diagnostic floor. Credit Bridge therefore has an even higher deep cosine than State Bridge (about $4\times$ the DFA value and roughly $2\times$ the State Bridge value), but reaches the same final accuracy as DFA+penalty and $9.3$ percentage points below State Bridge+penalty. This is a clean dissociation: within the audited fixed-feedback family under the same rescue, deep cosine and deep $\rho$ differ by more than a factor of four across methods without tracking final accuracy in the same direction, so alignment to the BP gradient is a necessary but not sufficient diagnostic of usable credit for depth. That cross-method dissociation is a direct reason the protocol in Section~\ref{sec:protocol} keeps final accuracy, layerwise credit quality, and the depth-utilization baseline as three separate reporting axes rather than collapsing them into a single headline. +The penalty rescue effect on State Bridge is much larger than on DFA: $+24$ percentage points for State Bridge versus $+5.5$ percentage points for DFA on the same architecture and intervention. SB+penalty is the first audited non-BP method whose trained deep blocks substantively beat the architecture-matched random-block baseline. We treat this as evidence that Mode~2 (low intrinsic credit-direction quality) has method-dependent severity within the audited fixed-feedback family once Mode~1 is alleviated, rather than being a uniform property of all fixed-feedback local-credit objectives. Importantly, State Bridge's deep cosine $+0.322$ is approximately twice DFA's $+0.155$ on the same intervention, but neither approaches the BP reference value of $\approx +1.0$, so this is a within-class gradation in credit-direction quality, not a claim that bridge constructions ``solve'' Mode~2. The drift diagnostic reinforces this reading rather than contradicting it: per-block $w_2$ relative displacement after $30$ epochs is $14.3\times$ for SB+penalty and $19.3\times$ for CB+penalty (a $35\%$ gap), and the embedding layer's relative drift is $7.1\times$ for SB versus $44.6\times$ for CB (a $6\times$ gap), so CB's per-block updates are not silenced under penalty and are in fact larger in magnitude than SB's, yet CB's final accuracy is $9.3$ percentage points lower. The larger-but-less-useful parameter updates in CB are consistent with the mechanism hypothesis that angular agreement with the BP gradient does not by itself certify the functional forward-state content of the update. The nudging test at the same checkpoints provides the direct functional measurement: taking a small step of size $\eta{=}0.01$ in the direction of each method's per-layer credit $a_l$ decreases the test loss by $-1.78\times 10^{-3}$ on average over the deep blocks for SB+penalty and by only $-0.45\times 10^{-3}$ for CB+penalty, a $4\times$ gap in functional loss decrease that inverts the $4\times$ deep-cosine gap between the two methods. At the same per-layer credit direction, a step in SB's direction moves the loss about four times more than a step in CB's direction, even though CB's direction is more aligned with the BP gradient in angle. Under the same intervention Credit Bridge reaches a three-seed test accuracy of $0.360 \pm 0.003$, a three-seed deep mean cosine of $+0.679 \pm 0.008$, and a three-seed deep mean $\rho$ of $+0.464 \pm 0.025$, with $\|h_L\|\approx 5680 \pm 178$ and $\|g_L\|\approx 1.9\times 10^{-5}$ well above the diagnostic floor. Credit Bridge therefore has an even higher deep cosine than State Bridge (about $4\times$ the DFA value and roughly $2\times$ the State Bridge value), but reaches the same final accuracy as DFA+penalty and $9.3$ percentage points below State Bridge+penalty. This is a clean dissociation: within the audited fixed-feedback family under the same rescue, deep cosine and deep $\rho$ differ by more than a factor of four across methods without tracking final accuracy in the same direction, so alignment to the BP gradient is a necessary but not sufficient diagnostic of usable credit for depth. That cross-method dissociation is a direct reason the protocol in Section~\ref{sec:protocol} keeps final accuracy, layerwise credit quality, and the depth-utilization baseline as three separate reporting axes rather than collapsing them into a single headline. \section{Reproducibility} \label{app:reproducibility} |
