summaryrefslogtreecommitdiff
path: root/paper
diff options
context:
space:
mode:
authorYurenHao0426 <Blackhao0426@gmail.com>2026-04-08 21:21:22 -0500
committerYurenHao0426 <Blackhao0426@gmail.com>2026-04-08 21:21:22 -0500
commit4bbe7f0e7b9985f790b528f639bde39717a8f379 (patch)
treee1053f360b912ef08591735fa7ba77f398e6c73e /paper
parent5995929511404ba3e0b8b4f1bfef69dbf291c7a9 (diff)
paper v2.37.1: abstract mentions nudging + training-loss confirmation
Earlier (during the page-budget-constrained polish loop) I tried to add the nudging-test mention to the abstract but had to revert because it pushed §7 onto p10. With page budget relaxed, re-attempting the update. Old abstract sentence about Mode 2 dissociation: "...while Credit Bridge attains much higher deep BP cosine than DFA at the same final accuracy, a dissociation that motivates reporting layerwise credit quality jointly with a depth-utilization baseline." New abstract sentence: "...while Credit Bridge attains roughly 4× DFA's deep BP cosine yet matches DFA's accuracy—a dissociation that single-step nudging and integrated training-loss decrease both confirm against the reverse cosine ordering, and that motivates reporting layerwise credit quality jointly with a depth-utilization baseline." This now references the v2.33 functional triangulation in the abstract, matching the §4 main-text framing. A reader of just the abstract now sees the strongest form of the cos-vs-acc dissociation: it's not just "CB has higher cos but same acc" (which could be a noisy single measurement) but "three independent functional metrics rank the methods opposite to deep cosine". Page count: 20 (unchanged). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Diffstat (limited to 'paper')
-rw-r--r--paper/main.pdfbin537138 -> 537258 bytes
-rw-r--r--paper/main.tex2
2 files changed, 1 insertions, 1 deletions
diff --git a/paper/main.pdf b/paper/main.pdf
index 30f1465..631684a 100644
--- a/paper/main.pdf
+++ b/paper/main.pdf
Binary files differ
diff --git a/paper/main.tex b/paper/main.tex
index a63d0b5..039ce2a 100644
--- a/paper/main.tex
+++ b/paper/main.tex
@@ -27,7 +27,7 @@
\maketitle
\begin{abstract}
-Modern feedback-alignment evaluation on deep residual networks is still summarized by a deceptively simple pair: headline accuracy and headline cosine alignment $\Gamma$ to the backpropagation gradient. We show that this pair can silently fail in two distinct ways on standard CIFAR-10 pre-LayerNorm ResMLP and ViT-Mini settings: first, \emph{measurement degeneracy}, where residual-stream growth drives hidden-layer BP gradients to the numerical floor and makes $\Gamma$ uninterpretable; and second, \emph{low intrinsic credit-direction quality}, where random-feedback credit remains essentially unaligned with BP on the deep blocks even when the reference gradient is still meaningful. The headline result is that the field-standard reporting pair walks back none of the methods we audit, whereas a four-diagnostic protocol walks back the three degenerate methods and passes the two trustworthy controls. Intervention with a per-block scale-control penalty further reveals method-dependent severity within the audited fixed-feedback family: State Bridge then exceeds the architecture-matched frozen-blocks baseline by about $10$ percentage points, while Credit Bridge attains much higher deep BP cosine than DFA at the same final accuracy, a dissociation that motivates reporting layerwise credit quality jointly with a depth-utilization baseline. Our contribution is an evaluation methodology paper for the NeurIPS 2026 Evaluations \& Datasets track: we provide the protocol, the calibration logic for its thresholds, a reference implementation, a five-method audit, and validation through temporal replay, cross-architecture checks, intervention-based disambiguation, and a documented catalog of pipeline pitfalls, in the spirit of critical evaluation analyses such as \citet{jordan2020evaluating,obray2022evaluation,paleka2026pitfalls}.
+Modern feedback-alignment evaluation on deep residual networks is still summarized by a deceptively simple pair: headline accuracy and headline cosine alignment $\Gamma$ to the backpropagation gradient. We show that this pair can silently fail in two distinct ways on standard CIFAR-10 pre-LayerNorm ResMLP and ViT-Mini settings: first, \emph{measurement degeneracy}, where residual-stream growth drives hidden-layer BP gradients to the numerical floor and makes $\Gamma$ uninterpretable; and second, \emph{low intrinsic credit-direction quality}, where random-feedback credit remains essentially unaligned with BP on the deep blocks even when the reference gradient is still meaningful. The headline result is that the field-standard reporting pair walks back none of the methods we audit, whereas a four-diagnostic protocol walks back the three degenerate methods and passes the two trustworthy controls. Intervention with a per-block scale-control penalty further reveals method-dependent severity within the audited fixed-feedback family: State Bridge then exceeds the architecture-matched frozen-blocks baseline by about $10$ percentage points, while Credit Bridge attains roughly $4\times$ DFA's deep BP cosine yet matches DFA's accuracy---a dissociation that single-step nudging and integrated training-loss decrease both confirm against the reverse cosine ordering, and that motivates reporting layerwise credit quality jointly with a depth-utilization baseline. Our contribution is an evaluation methodology paper for the NeurIPS 2026 Evaluations \& Datasets track: we provide the protocol, the calibration logic for its thresholds, a reference implementation, a five-method audit, and validation through temporal replay, cross-architecture checks, intervention-based disambiguation, and a documented catalog of pipeline pitfalls, in the spirit of critical evaluation analyses such as \citet{jordan2020evaluating,obray2022evaluation,paleka2026pitfalls}.
\end{abstract}
\section{Introduction}