From 4bbe7f0e7b9985f790b528f639bde39717a8f379 Mon Sep 17 00:00:00 2001
From: YurenHao0426 <Blackhao0426@gmail.com>
Date: Wed, 8 Apr 2026 21:21:22 -0500
Subject: paper v2.37.1: abstract mentions nudging + training-loss confirmation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Earlier (during the page-budget-constrained polish loop) I tried to add
the nudging-test mention to the abstract but had to revert because it
pushed §7 onto p10. With page budget relaxed, re-attempting the update.

Old abstract sentence about Mode 2 dissociation:
  "...while Credit Bridge attains much higher deep BP cosine than DFA
  at the same final accuracy, a dissociation that motivates reporting
  layerwise credit quality jointly with a depth-utilization baseline."

New abstract sentence:
  "...while Credit Bridge attains roughly 4× DFA's deep BP cosine yet
  matches DFA's accuracy—a dissociation that single-step nudging and
  integrated training-loss decrease both confirm against the reverse
  cosine ordering, and that motivates reporting layerwise credit quality
  jointly with a depth-utilization baseline."

This now references the v2.33 functional triangulation in the abstract,
matching the §4 main-text framing. A reader of just the abstract now
sees the strongest form of the cos-vs-acc dissociation: it's not just
"CB has higher cos but same acc" (which could be a noisy single
measurement) but "three independent functional metrics rank the
methods opposite to deep cosine".

Page count: 20 (unchanged).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 paper/main.pdf | Bin 537138 -> 537258 bytes
 paper/main.tex |   2 +-
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/paper/main.pdf b/paper/main.pdf
index 30f1465..631684a 100644
Binary files a/paper/main.pdf and b/paper/main.pdf differ
diff --git a/paper/main.tex b/paper/main.tex
index a63d0b5..039ce2a 100644
--- a/paper/main.tex
+++ b/paper/main.tex
@@ -27,7 +27,7 @@
 \maketitle
 
 \begin{abstract}
-Modern feedback-alignment evaluation on deep residual networks is still summarized by a deceptively simple pair: headline accuracy and headline cosine alignment $\Gamma$ to the backpropagation gradient. We show that this pair can silently fail in two distinct ways on standard CIFAR-10 pre-LayerNorm ResMLP and ViT-Mini settings: first, \emph{measurement degeneracy}, where residual-stream growth drives hidden-layer BP gradients to the numerical floor and makes $\Gamma$ uninterpretable; and second, \emph{low intrinsic credit-direction quality}, where random-feedback credit remains essentially unaligned with BP on the deep blocks even when the reference gradient is still meaningful. The headline result is that the field-standard reporting pair walks back none of the methods we audit, whereas a four-diagnostic protocol walks back the three degenerate methods and passes the two trustworthy controls. Intervention with a per-block scale-control penalty further reveals method-dependent severity within the audited fixed-feedback family: State Bridge then exceeds the architecture-matched frozen-blocks baseline by about $10$ percentage points, while Credit Bridge attains much higher deep BP cosine than DFA at the same final accuracy, a dissociation that motivates reporting layerwise credit quality jointly with a depth-utilization baseline. Our contribution is an evaluation methodology paper for the NeurIPS 2026 Evaluations \& Datasets track: we provide the protocol, the calibration logic for its thresholds, a reference implementation, a five-method audit, and validation through temporal replay, cross-architecture checks, intervention-based disambiguation, and a documented catalog of pipeline pitfalls, in the spirit of critical evaluation analyses such as \citet{jordan2020evaluating,obray2022evaluation,paleka2026pitfalls}.
+Modern feedback-alignment evaluation on deep residual networks is still summarized by a deceptively simple pair: headline accuracy and headline cosine alignment $\Gamma$ to the backpropagation gradient. We show that this pair can silently fail in two distinct ways on standard CIFAR-10 pre-LayerNorm ResMLP and ViT-Mini settings: first, \emph{measurement degeneracy}, where residual-stream growth drives hidden-layer BP gradients to the numerical floor and makes $\Gamma$ uninterpretable; and second, \emph{low intrinsic credit-direction quality}, where random-feedback credit remains essentially unaligned with BP on the deep blocks even when the reference gradient is still meaningful. The headline result is that the field-standard reporting pair walks back none of the methods we audit, whereas a four-diagnostic protocol walks back the three degenerate methods and passes the two trustworthy controls. Intervention with a per-block scale-control penalty further reveals method-dependent severity within the audited fixed-feedback family: State Bridge then exceeds the architecture-matched frozen-blocks baseline by about $10$ percentage points, while Credit Bridge attains roughly $4\times$ DFA's deep BP cosine yet matches DFA's accuracy---a dissociation that single-step nudging and integrated training-loss decrease both confirm against the reverse cosine ordering, and that motivates reporting layerwise credit quality jointly with a depth-utilization baseline. Our contribution is an evaluation methodology paper for the NeurIPS 2026 Evaluations \& Datasets track: we provide the protocol, the calibration logic for its thresholds, a reference implementation, a five-method audit, and validation through temporal replay, cross-architecture checks, intervention-based disambiguation, and a documented catalog of pipeline pitfalls, in the spirit of critical evaluation analyses such as \citet{jordan2020evaluating,obray2022evaluation,paleka2026pitfalls}.
 \end{abstract}
 
 \section{Introduction}
-- 
cgit v1.2.3