From 4bbe7f0e7b9985f790b528f639bde39717a8f379 Mon Sep 17 00:00:00 2001 From: YurenHao0426 Date: Wed, 8 Apr 2026 21:21:22 -0500 Subject: paper v2.37.1: abstract mentions nudging + training-loss confirmation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Earlier (during the page-budget-constrained polish loop) I tried to add the nudging-test mention to the abstract but had to revert because it pushed §7 onto p10. With page budget relaxed, re-attempting the update. Old abstract sentence about Mode 2 dissociation: "...while Credit Bridge attains much higher deep BP cosine than DFA at the same final accuracy, a dissociation that motivates reporting layerwise credit quality jointly with a depth-utilization baseline." New abstract sentence: "...while Credit Bridge attains roughly 4× DFA's deep BP cosine yet matches DFA's accuracy—a dissociation that single-step nudging and integrated training-loss decrease both confirm against the reverse cosine ordering, and that motivates reporting layerwise credit quality jointly with a depth-utilization baseline." This now references the v2.33 functional triangulation in the abstract, matching the §4 main-text framing. A reader of just the abstract now sees the strongest form of the cos-vs-acc dissociation: it's not just "CB has higher cos but same acc" (which could be a noisy single measurement) but "three independent functional metrics rank the methods opposite to deep cosine". Page count: 20 (unchanged). Co-Authored-By: Claude Opus 4.6 (1M context) --- paper/main.pdf | Bin 537138 -> 537258 bytes paper/main.tex | 2 +- 2 files changed, 1 insertion(+), 1 deletion(-) diff --git a/paper/main.pdf b/paper/main.pdf index 30f1465..631684a 100644 Binary files a/paper/main.pdf and b/paper/main.pdf differ diff --git a/paper/main.tex b/paper/main.tex index a63d0b5..039ce2a 100644 --- a/paper/main.tex +++ b/paper/main.tex @@ -27,7 +27,7 @@ \maketitle \begin{abstract} -Modern feedback-alignment evaluation on deep residual networks is still summarized by a deceptively simple pair: headline accuracy and headline cosine alignment $\Gamma$ to the backpropagation gradient. We show that this pair can silently fail in two distinct ways on standard CIFAR-10 pre-LayerNorm ResMLP and ViT-Mini settings: first, \emph{measurement degeneracy}, where residual-stream growth drives hidden-layer BP gradients to the numerical floor and makes $\Gamma$ uninterpretable; and second, \emph{low intrinsic credit-direction quality}, where random-feedback credit remains essentially unaligned with BP on the deep blocks even when the reference gradient is still meaningful. The headline result is that the field-standard reporting pair walks back none of the methods we audit, whereas a four-diagnostic protocol walks back the three degenerate methods and passes the two trustworthy controls. Intervention with a per-block scale-control penalty further reveals method-dependent severity within the audited fixed-feedback family: State Bridge then exceeds the architecture-matched frozen-blocks baseline by about $10$ percentage points, while Credit Bridge attains much higher deep BP cosine than DFA at the same final accuracy, a dissociation that motivates reporting layerwise credit quality jointly with a depth-utilization baseline. Our contribution is an evaluation methodology paper for the NeurIPS 2026 Evaluations \& Datasets track: we provide the protocol, the calibration logic for its thresholds, a reference implementation, a five-method audit, and validation through temporal replay, cross-architecture checks, intervention-based disambiguation, and a documented catalog of pipeline pitfalls, in the spirit of critical evaluation analyses such as \citet{jordan2020evaluating,obray2022evaluation,paleka2026pitfalls}. +Modern feedback-alignment evaluation on deep residual networks is still summarized by a deceptively simple pair: headline accuracy and headline cosine alignment $\Gamma$ to the backpropagation gradient. We show that this pair can silently fail in two distinct ways on standard CIFAR-10 pre-LayerNorm ResMLP and ViT-Mini settings: first, \emph{measurement degeneracy}, where residual-stream growth drives hidden-layer BP gradients to the numerical floor and makes $\Gamma$ uninterpretable; and second, \emph{low intrinsic credit-direction quality}, where random-feedback credit remains essentially unaligned with BP on the deep blocks even when the reference gradient is still meaningful. The headline result is that the field-standard reporting pair walks back none of the methods we audit, whereas a four-diagnostic protocol walks back the three degenerate methods and passes the two trustworthy controls. Intervention with a per-block scale-control penalty further reveals method-dependent severity within the audited fixed-feedback family: State Bridge then exceeds the architecture-matched frozen-blocks baseline by about $10$ percentage points, while Credit Bridge attains roughly $4\times$ DFA's deep BP cosine yet matches DFA's accuracy---a dissociation that single-step nudging and integrated training-loss decrease both confirm against the reverse cosine ordering, and that motivates reporting layerwise credit quality jointly with a depth-utilization baseline. Our contribution is an evaluation methodology paper for the NeurIPS 2026 Evaluations \& Datasets track: we provide the protocol, the calibration logic for its thresholds, a reference implementation, a five-method audit, and validation through temporal replay, cross-architecture checks, intervention-based disambiguation, and a documented catalog of pipeline pitfalls, in the spirit of critical evaluation analyses such as \citet{jordan2020evaluating,obray2022evaluation,paleka2026pitfalls}. \end{abstract} \section{Introduction} -- cgit v1.2.3