diff options
| author | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-08 08:52:08 -0500 |
|---|---|---|
| committer | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-08 08:52:08 -0500 |
| commit | 0dc89449af7307516372fe2b0463c11484a695e1 (patch) | |
| tree | 73ef9f2bc29151ca5408b6564ce65f2d527bb9ce | |
| parent | 6016484e9f16e220660ed6e028f4417a26cd3fee (diff) | |
Appendix G + J: 3-seed depth scan L=4 validation + SB/CB random_targets 100ep convergence
- Appendix G: depth scan L=4 now 3-seed (s42, s123, s456). 3-seed DFA layer-0 cos
+0.412±0.011, deep cos -0.0004±0.0008, CB deep cos +0.039±0.010. Indistinguishable
from single-seed row shown in table.
- Appendix J: full 100-epoch random_targets trajectory for SB and CB:
SB: ||h_L||=3.6e5, ||g_L||=4e-8 (at floor), acc 0.100 (chance)
CB: ||h_L||=1.38e8, ||g_L||=0 (collapsed), acc 0.085 (chance)
Both reach Mode 1 (a)+(b) at 100ep, consistent with DFA's 1.67e8 / 8e-12.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| -rw-r--r-- | paper/main.pdf | bin | 474382 -> 475396 bytes | |||
| -rw-r--r-- | paper/main.tex | 4 |
2 files changed, 2 insertions, 2 deletions
diff --git a/paper/main.pdf b/paper/main.pdf Binary files differindex 244824c..834f5da 100644 --- a/paper/main.pdf +++ b/paper/main.pdf diff --git a/paper/main.tex b/paper/main.tex index e58a1f3..af04885 100644 --- a/paper/main.tex +++ b/paper/main.tex @@ -382,7 +382,7 @@ $12$ & Credit Bridge & $0.239$ & $+0.208$ & $+0.016$ & $+0.000$ \\ \end{tabular} \end{table} -The layerwise pattern is essentially depth-invariant. DFA's layer-0 cosine stays in $[+0.39,+0.40]$ across all five depths, while its mean deep cosine sits within $[-0.005,+0.000]$ and its deep $\rho$ collapses to numerical zero in every condition. Credit Bridge shows a slightly milder version of the same shape, with a small positive deep cosine that does not improve as depth shrinks. BP, by contrast, maintains a deep cosine of $+0.94$ even at $L{=}12$, so the BP reference is still measurably non-degenerate where DFA and Credit Bridge are flat. This rules out the explanation that DFA's deep blocks are merely too far from the loss to receive useful credit: making the network shallower does not reach the deep blocks any better. The failure is structural to the credit signal rather than an artifact of depth. +The layerwise pattern is essentially depth-invariant. DFA's layer-0 cosine stays in $[+0.39,+0.40]$ across all five depths, while its mean deep cosine sits within $[-0.005,+0.000]$ and its deep $\rho$ collapses to numerical zero in every condition. Credit Bridge shows a slightly milder version of the same shape, with a small positive deep cosine that does not improve as depth shrinks. BP, by contrast, maintains a deep cosine of $+0.94$ even at $L{=}12$, so the BP reference is still measurably non-degenerate where DFA and Credit Bridge are flat. The $L{=}4$ row, which matches the main audit's architecture, has also been replicated across three seeds (42, 123, 456): 3-seed DFA layer-0 cosine is $+0.412 \pm 0.011$, 3-seed DFA deep cosine is $-0.0004 \pm 0.0008$, and 3-seed CB deep cosine is $+0.039 \pm 0.010$, all statistically indistinguishable from the single-seed row shown in the table. This rules out the explanation that DFA's deep blocks are merely too far from the loss to receive useful credit: making the network shallower does not reach the deep blocks any better. The failure is structural to the credit signal rather than an artifact of depth. \section{No-Residual Ablation: Skip Path Is Not the Proximate Trigger} \label{app:no_residual} @@ -456,7 +456,7 @@ Credit Bridge & $19{,}974$ & $3.2\times 10^{-6}$ & $0.092$ \\ \end{tabular} \end{table} -The cross-method version of the test rules out the explanation that the random-target growth is specific to DFA's particular feedback projection. State Bridge and Credit Bridge use bridge constructions with target normalization and stop-gradients, so any residual-stream growth they exhibit cannot be attributed to a simple absence of normalization. Their $\|g_L\|$ values at three epochs are still well above the $10^{-7}$ floor used by diagnostic~(b), so the gradient collapse part of Mode~1 does not yet appear at this horizon for SB/CB; the activation-growth part of Mode~1 is already present. We treat this as evidence that the local-credit growth incentive is not unique to DFA but is shared by the audited family of fixed-feedback methods. +The cross-method version of the test rules out the explanation that the random-target growth is specific to DFA's particular feedback projection. State Bridge and Credit Bridge use bridge constructions with target normalization and stop-gradients, so any residual-stream growth they exhibit cannot be attributed to a simple absence of normalization. Their $\|g_L\|$ values at three epochs are still well above the $10^{-7}$ floor used by diagnostic~(b), so the gradient collapse part of Mode~1 does not yet appear at this horizon for SB/CB; the activation-growth part of Mode~1 is already present. At the full $100$-epoch trajectory of the same random-target protocol, both SB and CB also reach the (b) floor: SB converges to $\|h_L\|\approx 3.6\times 10^5$ and $\|g_L\|\approx 4\times 10^{-8}$, and CB converges to $\|h_L\|\approx 1.38\times 10^8$ and $\|g_L\|\approx 0$ (below the numerical clamp), with test accuracies $0.100$ and $0.085$ respectively, consistent with DFA's $1.67\times 10^8$ and $8.0\times 10^{-12}$ at the same horizon. We treat this as evidence that the local-credit growth incentive is not unique to DFA but is shared by the audited family of fixed-feedback methods. The cleanest negative control for the random-target assay is Equilibrium Propagation, which trains the same backbone with a contrastive nudged-vs-free local energy objective rather than a fixed feedback projection. We re-ran EP on the same ResMLP-d256 with i.i.d.\ random class targets, seed 42, identical hyperparameters: EP's $\|h_L\|$ stays at about $586$ at five epochs of training and converges to about $2{,}085$ over the full $100$-epoch trajectory, which is roughly $25\times$ smaller than DFA's $14{,}510$ at three epochs and is in the same range as vanilla EP's bounded trajectory on real labels ($\sim\!5\times 10^3$). At convergence, the random-target EP run reaches headline accuracy $0.081$, headline $\Gamma{=}{-}0.0003$, and headline $\rho{=}{-}0.006$, all consistent with chance-level performance and a non-degenerate measurement regime. The random-target assay therefore separates the audited fixed-feedback methods (DFA/SB/CB) from EP cleanly: fixed-feedback objectives without an explicit scale-control term exhibit data-agnostic activation growth on this architecture, while EP's energy-based local objective does not. |
