Fix bibtex citation key: refinetti2023align -> refinetti2023aligning (matches bibitem)

author: YurenHao0426 <Blackhao0426@gmail.com> 2026-04-08 06:24:28 -0500
committer: YurenHao0426 <Blackhao0426@gmail.com> 2026-04-08 06:24:28 -0500
commit: afc2821acceb11d50b74d68584b1bf8378adc9c7 (patch)
tree: b70475285b7d04c0b67a204e0c5d41edc921b752
parent: b5e572feacc1b37a47ec2622e69d70a0a1cc3b24 (diff)
2 files changed, 2 insertions, 2 deletions
diff --git a/paper/main.pdf b/paper/main.pdf
index 1c94c36..bd881ce 100644
--- a/paper/main.pdf
+++ b/paper/main.pdf
diff --git a/paper/main.tex b/paper/main.tex
index 2d2d3fe..d58e53a 100644
--- a/paper/main.tex
+++ b/paper/main.tex
@@ -85,7 +85,7 @@ The matched same-backbone causal control for diagnostic~(b) is removing terminal
 \section{Failure Mode 2: Low Intrinsic Credit-Direction Quality}
 \label{sec:mode2}
 
-The second failure mode appears even in the meaningful-measurement regime. At the earliest vanilla DFA checkpoints on ResMLP, the hidden backpropagated gradient at the first deep block remains above the numerical floor: at epoch 1, $\|g_2\|$ is $6.7\times 10^{-7}$, $6.5\times 10^{-7}$, and $3.9\times 10^{-7}$ across the three seeds, all above the $10^{-7}$ threshold used to distinguish measurable from collapsed gradients. Yet the corresponding deep-layer cosine values are already essentially null: across layers $1$--$4$, all seed-level measurements at epoch 1 lie in $[-0.04,+0.02]$, with a three-seed mean of $-0.008 \pm 0.013$, and by epoch 2 the deep mean is still only $-0.018 \pm 0.018$ (Table~\ref{tab:mode_validation}). This is the observational pattern predicted by low credit-direction quality rather than mere disappearance of signal: the gradient is still present enough to measure, but the directions delivered to the deep network carry little agreement with backpropagation, consistent with prior concerns that alternative feedback rules can fail by supplying poor credit assignments even before full collapse \citep{bartunov2018assessing,moskovitz2018feedback,crafton2019backpropagation,refinetti2023align}. This rules out the simplest objection that the deep-layer null result is merely a byproduct of collapse.
+The second failure mode appears even in the meaningful-measurement regime. At the earliest vanilla DFA checkpoints on ResMLP, the hidden backpropagated gradient at the first deep block remains above the numerical floor: at epoch 1, $\|g_2\|$ is $6.7\times 10^{-7}$, $6.5\times 10^{-7}$, and $3.9\times 10^{-7}$ across the three seeds, all above the $10^{-7}$ threshold used to distinguish measurable from collapsed gradients. Yet the corresponding deep-layer cosine values are already essentially null: across layers $1$--$4$, all seed-level measurements at epoch 1 lie in $[-0.04,+0.02]$, with a three-seed mean of $-0.008 \pm 0.013$, and by epoch 2 the deep mean is still only $-0.018 \pm 0.018$ (Table~\ref{tab:mode_validation}). This is the observational pattern predicted by low credit-direction quality rather than mere disappearance of signal: the gradient is still present enough to measure, but the directions delivered to the deep network carry little agreement with backpropagation, consistent with prior concerns that alternative feedback rules can fail by supplying poor credit assignments even before full collapse \citep{bartunov2018assessing,moskovitz2018feedback,crafton2019backpropagation,refinetti2023aligning}. This rules out the simplest objection that the deep-layer null result is merely a byproduct of collapse.
 
 A second metric with different numerical failure modes tells the same story. Cosine measures directional agreement with the BP gradient, whereas perturbation correlation $\rho$ measures whether the proposed update predicts the correct sign and relative magnitude of loss change under actual perturbations; their failure modes are therefore different, especially with respect to normalization and small-denominator effects. In our controls, $\rho$ behaves as expected, with a Taylor-ceiling positive control near $+0.997$ and a random-vector negative control near $+0.006$ (Figure~\ref{fig:penalty_rescue}, Table~\ref{tab:mode_validation}). On vanilla DFA, deep $\rho$ is likewise null: for the early checkpoints where the gradients remain measurable, the deep average is $-0.003 \pm 0.005$ across seeds and epochs, and in a floor-level checkpoint it is $+0.002$, again indistinguishable from noise. The agreement between cosine and $\rho$ therefore rules out the interpretation that the null deep result is an artifact of cosine's $\varepsilon$-clamp or vector normalization. The deep blocks are not just hard to measure; they are receiving weakly useful directions.
 
@@ -113,7 +113,7 @@ Fresh-$B$ null control & $\overline{\cos}_{deep}{=}+0.002{\pm}0.022$ ($n{=}20$ d
 \end{tabular}
 \end{table}
 
-Once the reference vector is meaningful again, the deep layers no longer sit exactly at null. At $\lambda{=}10^{-2}$, penalized DFA reaches a three-seed deep-layer mean cosine of $+0.155 \pm 0.025$ and deep perturbation correlation of $+0.080 \pm 0.011$, whereas vanilla DFA is essentially zero on both metrics in the deep blocks, consistent with prior concerns that alternative feedback can fail by supplying poor credit directions even before full collapse \citep{bartunov2018assessing,moskovitz2018feedback,crafton2019backpropagation,refinetti2023align}. The null calibration rules out the interpretation that this recovered signal is merely measurement noise: on the same penalized checkpoint, replacing the training-time feedback matrices with 20 fresh random $B_l$ draws gives a deep cosine of only $+0.002 \pm 0.022$, with per-layer standard deviations of $0.013$--$0.023$, all within noise of zero (Table~\ref{tab:mode_validation}). The $\lambda$ sweep sharpens the dissociation further: at $\lambda{=}10^{-4}$, Mode~1 is already alleviated, with $\|h_L\|{=}2.4\times 10^4$ and $\|g_L\|{=}6.3\times 10^{-7}$, but deep cosine remains $-0.022$, while at $\lambda{=}10^{-2}$ it rises to $+0.165$ and deep $\rho$ to $+0.091$ (Figure~\ref{fig:penalty_rescue}). The improvement is real, but it is only partial.
+Once the reference vector is meaningful again, the deep layers no longer sit exactly at null. At $\lambda{=}10^{-2}$, penalized DFA reaches a three-seed deep-layer mean cosine of $+0.155 \pm 0.025$ and deep perturbation correlation of $+0.080 \pm 0.011$, whereas vanilla DFA is essentially zero on both metrics in the deep blocks, consistent with prior concerns that alternative feedback can fail by supplying poor credit directions even before full collapse \citep{bartunov2018assessing,moskovitz2018feedback,crafton2019backpropagation,refinetti2023aligning}. The null calibration rules out the interpretation that this recovered signal is merely measurement noise: on the same penalized checkpoint, replacing the training-time feedback matrices with 20 fresh random $B_l$ draws gives a deep cosine of only $+0.002 \pm 0.022$, with per-layer standard deviations of $0.013$--$0.023$, all within noise of zero (Table~\ref{tab:mode_validation}). The $\lambda$ sweep sharpens the dissociation further: at $\lambda{=}10^{-4}$, Mode~1 is already alleviated, with $\|h_L\|{=}2.4\times 10^4$ and $\|g_L\|{=}6.3\times 10^{-7}$, but deep cosine remains $-0.022$, while at $\lambda{=}10^{-2}$ it rises to $+0.165$ and deep $\rho$ to $+0.091$ (Figure~\ref{fig:penalty_rescue}). The improvement is real, but it is only partial.
 
 A rescue intervention is only informative if its direct cost is controlled. The relevant control is BP trained under the same penalty: BP falls from $0.609 \pm 0.004$ without the penalty to $0.530$ with $\lambda{=}10^{-2}$, so the penalty has a direct cost of about $8$ percentage points even when credit assignment is correct, whereas DFA moves in the opposite direction, from $0.308 \pm 0.014$ to $0.363 \pm 0.001$ under the same intervention (Figure~\ref{fig:penalty_rescue}). Relative to the frozen-blocks baseline of $0.349$, BP+penalty still retains a margin of $+18.1$ points, while DFA+penalty retains only $+1.4$ points. The remaining gap, $0.530 - 0.363 = 17$ points, is therefore a lower bound on the part of DFA's deficit that is not explained by simple penalty-induced capacity loss alone, though not a clean isolation because BP uses an end-to-end loss whereas DFA uses block-local losses. The residual gap after that control is what keeps Mode~2 substantively alive.
author	YurenHao0426 <Blackhao0426@gmail.com>	2026-04-08 06:24:28 -0500
committer	YurenHao0426 <Blackhao0426@gmail.com>	2026-04-08 06:24:28 -0500
commit	afc2821acceb11d50b74d68584b1bf8378adc9c7 (patch)
tree	b70475285b7d04c0b67a204e0c5d41edc921b752
parent	b5e572feacc1b37a47ec2622e69d70a0a1cc3b24 (diff)