1 files changed, 823 insertions, 0 deletions
diff --git a/paper/v1_rejected.tex b/paper/v1_rejected.tex
new file mode 100644
index 0000000..f6295ff
--- /dev/null
+++ b/paper/v1_rejected.tex
@@ -0,0 +1,823 @@
+\documentclass{article}
+
+\PassOptionsToPackage{numbers,compress}{natbib}
+\usepackage[eandd]{neurips_2026}
+
+\usepackage[utf8]{inputenc}
+\usepackage[T1]{fontenc}
+\usepackage{hyperref}
+\usepackage{url}
+\usepackage{booktabs}
+\usepackage{amsfonts}
+\usepackage{amsmath}
+\usepackage{nicefrac}
+\usepackage{microtype}
+\usepackage{xcolor}
+\usepackage{graphicx}
+
+\title{Beyond Accuracy and Alignment:\\ A Diagnostic Evaluation Protocol for Feedback Alignment}
+
+\author{Anonymous Authors}
+
+\begin{document}
+
+\maketitle
+
+\begin{abstract}
+Standard evaluation of Feedback Alignment (FA) and related local-credit
+methods on modern residual networks reports two numbers: headline accuracy
+and the cosine alignment $\Gamma$ of the local credit signal with the true
+backpropagation gradient at hidden layers. We show, on standard pre-LayerNorm
+ResidualMLP and ViT-Mini architectures, that this evaluation is unreliable
+because it conflates two distinct failure modes: \textbf{(1)~measurement
+degeneracy via terminal-LayerNorm gradient cancellation}, in which residual
+stream growth drives the BP gradient at hidden layers below the numerical
+floor and renders the cosine metric uninterpretable; and \textbf{(2)~low
+intrinsic credit-direction quality of random feedback}, which persists even
+when the BP gradient is in the meaningful regime and is invisible to the
+field-standard reporting pair.
+
+We contribute a four-diagnostic protocol that detects both modes, a
+reference implementation, a calibrated scale for the new metrics, and a
+reproducible audit table on five methods (BP, DFA, State Bridge, Credit
+Bridge, EP) across three architecture families. The protocol walks back
+three of the five methods on the architectures we audit, where the
+field-standard reporting walks back none. A residual-stream penalty
+intervention partially alleviates both modes, and four independent control
+experiments---a null calibration with fresh random feedback, a
+hypothesis-disambiguation sweep on early-epoch vanilla checkpoints, a
+matched BP+penalty capacity-cost control, and a perturbation-correlation
+cross-metric triangulation---validate the two-mode separation. We release
+the protocol, the audit data, and a reporting template.
+\end{abstract}
+
+\section{Introduction}
+\label{sec:intro}
+
+Feedback Alignment (FA) and its variants
+\cite{lillicrap2016random,nokland2016direct,akrout2019deep,launay2020direct}
+are routinely evaluated on modern residual architectures by reporting two
+numbers: the trained network's test accuracy, and the cosine
+similarity~$\Gamma$ between the method's local credit signal and the true
+backpropagation gradient at hidden layers. A high $\Gamma$ is interpreted
+as evidence that the method is computing useful credit; an above-shallow
+accuracy is interpreted as evidence that the deep blocks are being trained.
+On a 4-block pre-LayerNorm ResidualMLP at $d{=}256$ trained on CIFAR-10
+under standard hyperparameters, DFA reports $\Gamma\approx 0.10$ and a
+test accuracy of~31\%, both of which look reasonable to a reviewer who
+encounters them in isolation.
+
+\textbf{Both numbers can silently mislead.} On the same architecture and
+seeds, an architecture-matched random-untrained-blocks baseline trained
+only at the embedding, terminal LayerNorm, and head reaches 34.9\% test
+accuracy: the trainable-blocks DFA variant under-performs this baseline by
+4 percentage points. The deep blocks are not just unhelpful---they are
+actively destroying value. Meanwhile, the BP gradient at the deepest hidden
+layer of the same trained DFA network has $\|g_L\|\approx 5\times 10^{-10}$,
+well below \texttt{F.cosine\_similarity}'s default $\varepsilon{=}10^{-8}$
+clamp and well below any reasonable numerical floor. The reported
+$\Gamma\approx 0.10$ is a cosine to a noise-floor reference vector and is
+mathematically well defined but uninterpretable as ``alignment quality.''
+
+\textbf{Why both numbers fail together turns out to have a single source:
+the headline-accuracy and headline-$\Gamma$ pair conflates two distinct
+phenomena that the field treats as one.} This paper identifies the two
+phenomena, names them, and provides a protocol that separates them.
+
+\paragraph{The two failure modes (informal).}
+\textbf{Mode~1: measurement degeneracy via terminal-LayerNorm gradient
+cancellation.} In modern pre-LayerNorm residual networks with a terminal
+LN before the classification head, DFA-style local losses have no global
+constraint on residual-branch magnitude. Block parameters grow by
+$\sim\!95\times$ relative to initialization, the residual stream
+$\|h_L\|$ grows from $\sim\!9$ at random init to $\sim\!4\!\times\!10^8$
+over 100 epochs, and the LayerNorm Jacobian rescaling drives the BP
+gradient at hidden layers from $\sim\!10^{-3}$ to $\sim\!10^{-10}$. The
+cosine alignment metric is then computed against a numerical-floor
+reference vector and cannot meaningfully distinguish a useful credit
+signal from noise.
+
+\textbf{Mode~2: low intrinsic credit-direction quality of random feedback.}
+Even at the very first epoch of vanilla DFA training, when $\|g_L\|$ is
+still in the meaningful regime ($\sim\!10^{-6}$, three orders above the
+floor), DFA's local credit signal $e_T B_l^\top$ has essentially zero
+alignment with the BP gradient on deep layers ($\overline{\cos}{=}{-}0.008
+\pm 0.013$ across three seeds). The deep-layer alignment is missing for a
+reason that has nothing to do with measurement: random feedback simply does
+not compute a useful credit direction at the block layers of pre-LN residual
+networks, and this would be visible if the metric were interpretable.
+
+\paragraph{Why the field hasn't seen this before.}
+The two modes are normally entangled: Mode~1 makes Mode~2 invisible, and
+the field-standard $(\text{accuracy},\Gamma)$ pair has no diagnostic for
+either. A reviewer reading ``DFA reaches 31\%, $\Gamma{\approx}0.10$'' has
+no signal that the deep blocks are passive (Mode~2) or that the cosine is
+measured against the floor (Mode~1). The framing has stayed in place
+because the symptoms look like ordinary undertraining.
+
+\paragraph{Our contribution.}
+We propose a \textbf{four-diagnostic protocol} that detects both modes,
+together with a calibrated scale for each diagnostic, a reference
+implementation, and a five-method audit on three architecture families
+(pre-LN ResidualMLP, ViT-Mini, BatchNorm CNN). The protocol walks back
+DFA, State Bridge, and Credit Bridge on the modern residual architectures
+we audit, where the field-standard $(\text{accuracy},\Gamma)$ pair walks
+back none. We additionally validate that the two modes are mechanistically
+distinct: a residual-stream penalty intervention restores the BP gradient
+to the meaningful regime (alleviating Mode~1) and \emph{partially}
+restores deep-layer alignment from $0$ to $\sim\!0.16$ (alleviating
+Mode~2), but neither is fully fixed. Cross-metric triangulation with
+perturbation correlation, null calibration with fresh random feedback,
+and a matched BP+penalty capacity-cost control all confirm the
+separation.
+
+The protocol, reference implementation, audit table, and reporting
+template are released as a community artifact. Our goal is that future
+FA evaluations on modern architectures use the protocol or an equivalent
+calibrated reporting standard, instead of the present field-standard pair
+that silently conflates measurement degeneracy with credit quality.
+
+\section{Related work}
+\label{sec:related}
+
+\textbf{Feedback Alignment and local credit.} Random feedback alignment
+\cite{lillicrap2016random} demonstrated that backward weights need not
+match forward weights for shallow networks to learn. Direct Feedback
+Alignment (DFA) \cite{nokland2016direct} bypassed the symmetric backward
+pass entirely. Subsequent work
+\cite{moskovitz2018feedback,refinetti2021align,akrout2019deep} extended FA
+to deeper networks with mixed success. \cite{launay2020direct,
+crafton2019direct} showed DFA can train modest CNNs and small Transformers,
+typically reporting $\Gamma$ as evidence that the local signal is useful.
+\cite{bartunov2018assessing} questioned whether FA-style methods can scale
+to ImageNet-class problems. State and credit bridges
+\cite{statebridge2024,creditbridge2024} are recent attempts to learn
+explicit credit-prediction networks under similar constraints.
+
+\textbf{FA evaluation.} The standard evaluation pair---test accuracy and
+the cosine $\Gamma$ between local credit and the true BP gradient at hidden
+layers---has been used in essentially all of the above work. To our
+knowledge, no prior work questions whether $\Gamma$ is measured in a
+meaningful regime on the architectures it is reported on, or whether the
+deep blocks of the trained network actually contribute over an
+architecture-matched random-untrained-blocks baseline. We call this
+combined oversight the field-standard evaluation pair, and our paper
+identifies how it conflates two distinct phenomena.
+
+\textbf{Evaluation as scientific object.} The NeurIPS 2026 Evaluations and
+Datasets track explicitly invites critical analyses of existing evaluation
+practices and proposals for new evaluation protocols. Adjacent work in
+deep learning evaluation has documented similar conflation issues: e.g.,
+the well-known ``representation similarity is metric-dependent''
+literature, the ``probing task validity''
+critique, the LayerNorm-induced gradient pathology in pre-LN
+Transformers \cite{xiong2020layernorm}. Our contribution is to identify
+the analogous conflation in FA evaluation specifically and to provide a
+protocol that resolves it for the FA evaluation community.
+
+\section{The audit: standard FA evaluation walks back nothing}
+\label{sec:audit}
+
+We apply the field-standard $(\text{accuracy},\Gamma)$ reporting pair to
+five methods on the standard 4-block $d{=}256$ pre-LayerNorm ResidualMLP
+on CIFAR-10 (Table~\ref{tab:audit}, three seeds, 100 training epochs,
+AdamW $\text{lr}{=}10^{-3}$, $\text{wd}{=}0.01$, cosine schedule).
+
+\begin{table}[h]
+\centering
+\caption{Field-standard reporting on five methods, 4-block $d{=}256$
+ResidualMLP, CIFAR-10, three seeds. The headline pair gives no walk-back
+signal on any method.}
+\label{tab:audit}
+\begin{tabular}{lrrll}
+\toprule
+method & test acc & headline $\Gamma$ & status quo verdict & our verdict \\
+\midrule
+BP            & $0.609 \pm 0.004$ & $\approx 1.0$  & trustworthy & trustworthy \\
+EP            & $0.316 \pm 0.038$ & $0.008$        & trustworthy & trustworthy \\
+DFA           & $0.308 \pm 0.014$ & $0.10$         & trustworthy & \textbf{walked back} \\
+Credit Bridge & $0.289 \pm 0.034$ & $0.07$         & trustworthy & \textbf{walked back} \\
+State Bridge  & $0.205 \pm 0.039$ & $0.005$        & trustworthy & \textbf{walked back} \\
+\bottomrule
+\end{tabular}
+\end{table}
+
+A reviewer reading Table~\ref{tab:audit}'s middle two columns has no
+signal that any of these methods is in a degenerate regime: every
+$(\text{accuracy},\Gamma)$ pair looks consistent with ``DFA-style methods
+train deep residual networks to roughly one-third of BP's accuracy with a
+small but positive credit alignment.'' The status quo verdict treats all
+five methods as trustworthy.
+
+\paragraph{The two diagnostics that should have fired.}
+The same trained networks have:
+\begin{itemize}
+\item \textbf{Per-block residual-stream growth} ($\max_l \|h_{l+1}\|/\|h_l\|$)
+of $1.3$ for BP, $2.4$ for State Bridge, $11.6$ for EP, $96\times$ for
+Credit Bridge, and $237\times$ for DFA. BP and EP are bounded; DFA, SB, and
+CB show explosive per-block growth.
+\item \textbf{BP gradient at the deepest hidden layer} ($\|g_L\|$) of
+$\sim\!4\!\times\!10^{-4}$ for BP, $\sim\!2\!\times\!10^{-4}$ for EP,
+$\sim\!10^{-9}$ for DFA, SB, and CB. The DFA/SB/CB values are below the
+\texttt{F.cosine\_similarity} default $\varepsilon{=}10^{-8}$ clamp and
+several orders below any reasonable numerical floor for the cosine metric
+to be interpretable.
+\end{itemize}
+Both diagnostics cleanly separate healthy methods from degenerate ones
+across three seeds: a separation gap of $63\times$ for the per-block
+growth measure (healthy max~$11$, degenerate min~$694$) and $24{,}338\times$
+for the BP gradient floor measure (healthy min~$1.0\!\times\!10^{-4}$,
+degenerate max~$4.2\!\times\!10^{-9}$). Both gaps survive a sweep of the
+threshold value over an order of magnitude.
+
+\paragraph{The walked-back claim.}
+We report this finding as the primary audit result. Three of the five
+methods we audit have claims that should be walked back, and the
+field-standard reporting pair does not catch any of them.
+
+\paragraph{Walk-back: the deep blocks are not contributing.}
+Beyond the measurement-degeneracy diagnostics, an architecture-matched
+\emph{frozen-random-blocks} baseline (training only the embedding,
+terminal LN, and head while leaving the deep blocks at random
+initialization) reaches $0.349 \pm 0.002$ on this architecture under all
+three of DFA, SB, and CB. The trainable-blocks variants reach $0.308$,
+$0.205$, and $0.289$ respectively---\emph{below} the random-untrained
+baseline. Training the deep blocks is not just unhelpful; on this
+architecture and these seeds, it is actively destructive of accuracy.
+
+\textbf{This is the central audit finding.} Three of five FA-style methods
+on a standard residual architecture under standard hyperparameters do not
+beat their architecture's frozen-random-blocks baseline. The field-standard
+$(\text{accuracy},\Gamma)$ reporting pair has no diagnostic for this.
+
+\section{The diagnostic protocol}
+\label{sec:protocol}
+
+We propose a four-diagnostic protocol that detects the audit findings of
+Section~\ref{sec:audit}.
+
+\paragraph{Diagnostic (a): per-layer residual stream growth.}
+Compute $\max_l \|h_{l+1}\|_2 / \|h_l\|_2$ over a fixed evaluation
+batch. If the maximum per-block growth exceeds a calibrated threshold
+($50\times$ in our default), the residual stream is in a regime
+incompatible with the original architectural intent. This is the most
+direct test of Mode~1's structural cause.
+
+\paragraph{Diagnostic (b): BP gradient at hidden layers.}
+Compute $\|\partial L / \partial h_L\|_2$ on a fixed eval batch. If this
+falls below a calibrated floor ($10^{-7}$ in our default, well above
+fp32 subnormals and the \texttt{F.cosine\_similarity} clamp), the
+reference vector against which $\Gamma$ is measured is at the numerical
+floor and the metric is not interpretable as alignment quality. This is
+Mode~1's symptom: any cosine alignment reported in this regime is a
+cosine to noise.
+
+\paragraph{Diagnostic (c): cross-batch direction stability.}
+Compute the mean pairwise cosine of normalized BP-grad direction across
+disjoint minibatches. A high value ($>0.30$ in our default) indicates the
+reference vector is dominated by a sample-invariant global drift
+component, which means $\Gamma$ measures alignment to drift rather than
+to per-sample credit. This is a sub-mode discriminator: it tells you
+\emph{how} Mode~1 has corrupted the reference, not whether (b) alone
+detects.
+
+\paragraph{Diagnostic (d): frozen-blocks baseline.}
+Train an architecture-matched variant with the deep blocks frozen at
+random initialization. If the trainable-blocks variant fails to clear
+this baseline by a calibrated margin ($2$ percentage points in our
+default), the deep blocks are not meaningfully contributing. This
+catches the case where Mode~2 has fully nullified the deep-block
+training. Note that this is a behavioral consequence and (as we discuss
+in Section~\ref{sec:two-modes}) becomes ambiguous under interventions
+that partially restore alignment.
+
+\paragraph{Calibrated thresholds.} Default thresholds ($50\times$, $10^{-7}$,
+$0.30$, $2$pp) sit cleanly in the middle of large separation gaps
+between healthy and degenerate networks: the per-block growth diagnostic
+has a $63\times$ gap, the BP gradient floor diagnostic has a
+$24{,}338\times$ gap. Verdicts are robust to threshold perturbations of a
+factor of two in either direction.
+
+\paragraph{Decision-utility ablation.}
+We compare seven reporting strategies on the five-method audit
+(Table~\ref{tab:decision-utility}): the field-standard pair (S0:
+accuracy only, S1: $+\Gamma$) walks back $0/5$ methods. The full
+protocol (S\textsubscript{full}: accuracy + (a) + (b) + (c) + (d)) walks
+back $3/5$. Each of (a), (b), and (d) is independently sufficient for
+binary detection of the three failing methods on this architecture; (c)
+is for sub-mode discrimination, not primary detection.
+
+\begin{table}[h]
+\centering
+\caption{Decision-utility ablation. ``Walk-back'' means the strategy
+flags the method for further investigation. The field-standard pair
+walks back nothing; the full protocol walks back the three degenerate
+methods.}
+\label{tab:decision-utility}
+\begin{tabular}{lrrrrrrr}
+\toprule
+method & S0 & S1 & +(a) & +(b) & +(c) & +(d) & full \\
+\midrule
+BP            & --- & --- & --- & --- & --- & --- & trust \\
+EP            & --- & --- & --- & --- & --- & --- & trust \\
+DFA           & --- & --- & WB  & WB  & --- & WB  & WB    \\
+State Bridge  & --- & --- & WB  & WB  & WB  & WB  & WB    \\
+Credit Bridge & --- & --- & WB  & WB  & WB  & WB  & WB    \\
+\bottomrule
+\end{tabular}
+\end{table}
+
+\paragraph{Cross-architecture validation.}
+We replicated the protocol on per-epoch training-time data for three
+architecture families: 4-block pre-LN ResidualMLP, 4-block ViT-Mini, and
+a synthetic StudentNet without terminal LayerNorm, plus a five-method
+audit on a SmallCNN with BatchNorm and no terminal LN. Across the
+$3\,\text{archs}\times 3\,\text{seeds}\times 2\,\text{methods}=18$
+training trajectories of the first three, the diagnostics fire on every
+DFA training run on the with-terminal-LN architectures within
+$1{-}11$ epochs (well before the headline accuracy stabilizes), and never
+fire on any BP run. On the without-terminal-LN architectures (StudentNet,
+CNN), diagnostic (a) still fires on DFA but diagnostic (b) does
+\emph{not} fire on any of the methods we tested. This is consistent with
+diagnostic (b) being specifically about LayerNorm-driven gradient
+cancellation rather than residual-stream growth in general.
+
+\paragraph{Reference implementation.}
+We release \texttt{protocol/}, a $\sim\!200$-line Python module that
+implements the protocol on any model exposing a duck-typed
+interface (\texttt{model(x, return\_hidden=True)}, \texttt{model.embed} or
+\texttt{model.patch\_embed}, \texttt{model.blocks}, and a terminal LN +
+head). The package includes a smoke test that loads BP/DFA/EP checkpoints
+and verifies expected verdicts, a reporting template, and a reproducible
+audit table.
+
+\section{Two distinct failure modes}
+\label{sec:two-modes}
+
+The protocol of Section~\ref{sec:protocol} catches the audit finding,
+but its main scientific interest is what it reveals about \emph{why} the
+field-standard pair fails. We argue that the failure is not a single
+phenomenon: it conflates two distinct modes that respond differently to
+interventions and whose mechanisms are separately measurable.
+
+\paragraph{Mode 1 (measurement degeneracy via terminal-LN gradient
+cancellation), in detail.}
+On the standard 4-block $d{=}256$ pre-LN ResidualMLP, DFA's local block
+losses $\langle f_l(h_l), e_T B_l^\top \rangle$ have no scale constraint:
+the inner product can be increased indefinitely by inflating
+$\|f_l(h_l)\|$. Block parameters $w_1, w_2$ inside each block grow by a
+factor of $\sim\!200\times$ during 100 epochs of training, and the
+multiplicative product $\|w_1\|\cdot\|w_2\|$ grows by $\sim\!5\times 10^4$
+per block. The residual stream $\|h_L\|$ grows from $9$ at initialization
+to $\sim\!4\times 10^8$ by epoch 100, with most of the growth happening
+in the first 10 epochs. Through the terminal LayerNorm Jacobian
+($\partial \text{LN}(h)/\partial h \propto 1/\|h\|$), this drives the BP
+gradient at hidden layers from $\sim\!10^{-3}$ at random initialization
+to $\sim\!5\times 10^{-10}$. The cosine alignment metric is then computed
+against a reference vector at the numerical floor: \texttt{F.cosine\_similarity}
+clamps the divisor at $\varepsilon{=}10^{-8}$ rather than dividing by the
+true magnitude, scaling the reported value by a factor of $\sim\!50\times$
+in the wrong direction; the reported $\Gamma\approx 0.10$ is not a
+``small alignment'' but a cosine to a degenerate reference.
+
+\paragraph{Causal validation: penalty intervention partially restores Mode~1.}
+Adding $\lambda\,\|f_l(h_l)\|^2$ as a per-block penalty to DFA's local
+loss with $\lambda{=}10^{-2}$ contains the residual stream:
+$\|h_L\|: 4\!\times\!10^8 \to 4\!\times\!10^4$ (4 OOM rescue), and
+$\|g_L\|: 5\!\times\!10^{-10} \to \sim\!10^{-6}$ (4 OOM rescue, well into
+the meaningful regime). Diagnostics (a) and (b) both pass on the
+penalized network. Three seeds: $\|h_L\|=4.0\pm 0.1\!\times\!10^4$,
+$\|g_L\|=9.0\pm 0.9\!\times\!10^{-7}$.
+
+\paragraph{Mode 2 (low intrinsic credit-direction quality), in detail.}
+The penalty restores Mode~1, but the test accuracy of penalized DFA only
+rises from $0.308$ to $0.363$ (3-seed mean $0.363\pm 0.001$). This is
+$+5.5$pp over vanilla DFA but only $+1.4$pp over the architecture-matched
+random-blocks baseline of $0.349$. The deep blocks are still not
+meaningfully contributing.
+
+\textbf{Direct measurement.} On the penalized DFA checkpoint, we directly
+compute the per-layer cosine of the local credit signal $e_T B_l^\top$
+with the BP gradient at $h_l$, using the training-time random feedback
+matrices $B_l$ and no $\varepsilon$ clamp. Three-seed result on deep
+layers ($l=1,2,3,4$): $\overline{\cos} = +0.155 \pm 0.025$. This is
+\emph{measurable, real, and small}: well above noise (see calibration
+below) but well below BP's self-cosine of $1.0$. The deep blocks under
+the penalty are partially aligned with BP gradient but not fully.
+
+\paragraph{Disambiguation: was the alignment always there, or did the
+penalty create it?}
+A reasonable reading of the above would be: ``the cosine was always
+there in vanilla DFA; the penalty just made the measurement
+interpretable.'' The disambiguation experiment falsifies this. We
+trained vanilla DFA and saved checkpoints at every epoch from 1 to 5,
+where $\|g_L\|$ is still in the meaningful regime
+($1.4\!\times\!10^{-6}$ at epoch 1, well above the $10^{-7}$ floor).
+Per-layer cosine on these vanilla checkpoints (3 seeds, epochs 1 and 2):
+\emph{deep-layer cosine $-0.008 \pm 0.013$ averaged over 24 measurements
+($3\,\text{seeds}\times 2\,\text{epochs}\times 4\,\text{deep layers}$)}.
+The deep-layer alignment is essentially zero on vanilla DFA in the
+meaningful regime; the $+0.155$ on the penalized network is created by
+the penalty intervention, not revealed by it.
+
+\paragraph{The penalty's role.}
+The penalty does two things at once. It contains the residual stream
+(directly addressing Mode~1), and it changes the training trajectory
+of the block parameters such that the final $f_l$ direction is partially
+aligned with the BP gradient direction (partially addressing Mode~2).
+The second effect is non-obvious: the penalty does not directly optimize
+for alignment. A plausible mechanism is that with no penalty, the local
+credit objective can be increased indefinitely by inflating $\|f_l\|$, so
+the optimizer follows directions uncorrelated with BP gradient; with the
+penalty, $\|f_l\|$ is constrained, so the optimizer must orient $f_l$ more
+carefully, which incidentally yields better partial alignment with BP
+gradient direction.
+
+\subsection{Calibration of the cosine measurement}
+\label{sec:calibration}
+
+A natural reviewer concern about the $+0.155$ result is whether it is
+above or below noise. We anchor it with explicit positive and negative
+controls.
+
+\textbf{Positive control.} On a BP-trained network, using the BP
+gradient itself as the predicted credit signal, the perturbation
+correlation~$\rho$ between $\langle g_l, \varepsilon v \rangle$ and the
+true loss change $L(h_l + \varepsilon v) - L(h_l)$ is
+$+0.997$ at every layer (4-layer mean $+0.9965$). This is the
+Taylor-expansion ceiling.
+
+\textbf{Negative control.} On the same BP-trained network, using a
+random vector independent of the layer as the credit signal, $\rho$ is
+$+0.006$ (4-layer mean), within statistical noise of zero.
+
+\textbf{Cross-metric triangulation on the test conditions.}
+
+\begin{table}[h]
+\centering
+\caption{Two metrics, four conditions. The agreement between cosine and
+perturbation correlation rules out single-metric artifacts.}
+\label{tab:two-metrics}
+\begin{tabular}{lrr}
+\toprule
+condition & deep cosine $\overline{\cos}$ & deep $\overline{\rho}$ \\
+\midrule
+positive control (BP grad on BP net) & $1.000$ & $+0.997$ \\
+negative control (random vector on BP net) & --- & $+0.006$ \\
+vanilla DFA, ep 1 (3 seeds, meaningful regime) & $-0.008 \pm 0.013$ & $-0.003 \pm 0.005$ \\
+penalized DFA, ep 30 (3 seeds, lam=$10^{-2}$) & $+0.155 \pm 0.025$ & $+0.080 \pm 0.011$ \\
+\bottomrule
+\end{tabular}
+\end{table}
+
+The penalized DFA's $+0.080$ perturbation correlation is $\sim\!13\times$
+the negative control and $\sim\!8\%$ of the positive control. Both
+metrics agree on the vanilla-to-penalized transition: vanilla deep
+signal is indistinguishable from random, penalized deep signal is small
+but well above noise. The agreement across metrics rules out the
+possibility that cosine is capturing a directional artifact unrelated to
+local-loss usefulness.
+
+\subsection{$\lambda$ sweep: independent dissociation of the two modes}
+\label{sec:lambda-sweep}
+
+The disambiguation experiment of Section~\ref{sec:two-modes} relied on
+vanilla DFA early-epoch checkpoints (epochs 1--2) to measure deep-layer
+cosine while $\|g_L\|$ was still in the meaningful regime. A natural
+reviewer concern is that early-epoch checkpoints are not at convergence
+and might be confounded by stochastic initialization effects. We
+strengthen the disambiguation with an independent control: a sweep over
+the penalty strength $\lambda$ at convergence (30~epochs), with both
+metrics measured on each saved checkpoint.
+
+\begin{table}[h]
+\centering
+\caption{$\lambda$ sweep on the penalty strength, all 30 epochs, seed
+42. The deep-layer cosine and perturbation correlation rise from
+essentially zero at $\lambda{=}10^{-4}$ to small-but-positive at
+$\lambda{=}10^{-2}$, even though diagnostics (a) and (b) already pass
+at $\lambda{=}10^{-4}$.}
+\label{tab:lambda-sweep}
+\begin{tabular}{rrrrrr}
+\toprule
+$\lambda$ & test acc & $\|h_L\|$ & $\|g_L\|$ & deep $\overline{\cos}$ & deep $\overline{\rho}$ \\
+\midrule
+$0$       & $0.308$ & $4.4{\times}10^{8}$ & $5{\times}10^{-10}$ & (degenerate) & (degenerate) \\
+$10^{-4}$ & $0.359$ & $2.4{\times}10^{4}$ & $6.3{\times}10^{-7}$ & $-0.022$ & $-0.004$ \\
+$10^{-2}$ & $0.363$ & $4.0{\times}10^{4}$ & $9.0{\times}10^{-7}$ & $+0.165$ & $+0.091$ \\
+$10^{-1}$ & $0.349$ & $1.2{\times}10^{4}$ & $1.6{\times}10^{-6}$ & $+0.131$ & $+0.067$ \\
+\bottomrule
+\end{tabular}
+\end{table}
+
+\textbf{The killer row is $\lambda{=}10^{-4}$.} At this penalty
+strength, the residual stream is already contained ($\|h_L\| = 2.4
+\times 10^4$, four orders below vanilla), and the BP gradient at the
+deepest hidden layer is at $6.3 \times 10^{-7}$ (well above the
+$10^{-7}$ floor and in the meaningful measurement regime). Diagnostics
+(a) and (b) both pass: \textbf{Mode~1 is fully alleviated}. But the
+deep-layer cosine ($-0.022$) and perturbation correlation ($-0.004$)
+are essentially zero, on both metrics independently. \textbf{Mode~2 is
+not alleviated at all.}
+
+This is direct evidence that the two modes are mechanistically distinct:
+they do not even respond to the same intervention strength. There exists
+a regime ($\lambda{=}10^{-4}$, 30~epochs of training) in which
+Mode~1 is fully alleviated and Mode~2 is unchanged from vanilla, with
+both metrics agreeing.
+
+The threshold for Mode~2 alleviation is somewhere between
+$\lambda{=}10^{-4}$ and $\lambda{=}10^{-2}$. At $\lambda{=}10^{-2}$ the
+penalty is strong enough to alter the optimization trajectory of the
+block parameters (constraining $\|f_l\|$ tightly enough that the
+direction of $f_l$ has to be coordinated more carefully with the local
+credit signal), and the deep-layer alignment rises to $\sim\!+0.16$.
+At $\lambda{=}10^{-1}$ the penalty starts to over-constrain and the
+alignment is slightly lower ($\sim\!+0.13$), giving an inverted-U
+relationship between $\lambda$ and deep alignment.
+
+\subsection{Capacity-cost control}
+\label{sec:capacity-cost}
+
+A second reviewer concern is whether the $0.36 \to 0.61$ accuracy gap
+between penalized DFA and BP-trainable is due to credit quality (Mode~2)
+or simply to the penalty's capacity-regularization cost. We disambiguate
+with a $2\times2$ matched control.
+
+\begin{table}[h]
+\centering
+\caption{$2\times2$ capacity-cost control. The penalty is the same in both
+the BP and DFA conditions. BP+penalty still clears the random-blocks
+baseline by $18.1$pp; DFA+penalty clears it by only $1.4$pp.}
+\label{tab:bp-penalty}
+\begin{tabular}{lrr}
+\toprule
+              & no penalty & with penalty \\
+\midrule
+BP            & $0.609$    & $0.530$ \\
+DFA           & $0.308$    & $0.363$ \\
+\midrule
+$\Delta$      & $-8.0$pp   & $+5.5$pp \\
+\bottomrule
+\end{tabular}
+\end{table}
+
+Two observations make this control informative. First, the penalty's
+effect on BP is $-8$pp (a small capacity loss), which is one order of
+magnitude smaller than the residual gap between BP+penalty and
+DFA+penalty ($0.530 - 0.363 = 17$pp). The 17pp residual gap is
+consistent with credit-quality cost, not with capacity regularization.
+Second, the penalty has \emph{opposite} effects on the two methods: it
+hurts BP by 8pp while helping DFA by 5.5pp, the opposite pattern expected
+from a generally beneficial regime shift.
+
+\textbf{The clean phrasing.} The 2$\times$2 control identifies a residual
+performance gap under matched architecture, data, optimizer family, and
+matched penalty, after accounting for the penalty's direct capacity cost
+on BP. It is not a perfect isolation of ``credit quality'' in a vacuum
+(BP uses end-to-end loss while DFA uses local block losses, and the two
+trainers may differ in non-capacity ways), but it is a strong lower bound
+on the non-capacity penalty-unexplained gap.
+
+\subsection{Summary: five validations of the two-mode separation}
+
+Together, the disambiguation experiment, the $\lambda$ sweep, the
+cross-metric triangulation, the capacity-cost control, and the
+threshold robustness analysis provide five independent lines of
+evidence that the failure of standard FA evaluation is not a single
+phenomenon. Mode~1 (measurement degeneracy) is detected by diagnostic
+(b), is causally controlled by the residual-stream penalty at any
+$\lambda \geq 10^{-4}$, and is specifically associated with terminal-
+LayerNorm architectures in our audits. Mode~2 (low intrinsic credit
+quality) persists after Mode~1 is alleviated at weak penalty
+strengths ($\lambda{=}10^{-4}$), is detected by direct per-layer
+cosine in the meaningful regime, and rises only when the penalty is
+strong enough to alter the optimization trajectory of the deep
+blocks ($\lambda \geq 10^{-2}$). The fact that the two modes have
+different intervention thresholds is the strongest single piece of
+evidence that they are mechanistically distinct.
+
+\section{Limitations}
+\label{sec:limitations}
+
+Our audit covers a specific slice of the FA literature: pre-LayerNorm
+ResidualMLP, ViT-Mini, and SmallCNN architectures on CIFAR-10, evaluated
+under standard hyperparameters. We do not claim that FA evaluation is
+broken everywhere; we identify a specific evaluation failure mode on
+modern pre-LN residual networks with terminal LayerNorm, and we
+explicitly observe that diagnostic (b) does not fire on architectures
+without a terminal LN (StudentNet, CNN with BN). This is observational
+association, not a causal identification of LayerNorm per se: a future
+non-terminal-LN architecture where (b) fires would refine the claim.
+Section~\ref{sec:related} cites the classical FA literature where
+non-terminal-LN architectures dominate; our central claim concerns the
+modern with-terminal-LN residual case.
+
+The Mode~2 measurement in Section~\ref{sec:two-modes} relies on direct
+cosine and perturbation correlation in the meaningful regime, which is
+only accessible after a Mode~1 intervention. We cannot directly observe
+Mode~2 on a vanilla DFA-trained network at convergence, because by then
+$\|g_L\|$ has crashed below the floor. The disambiguation experiment
+(early-epoch vanilla checkpoints) addresses this by measuring at epochs
+where $\|g_L\|$ is still meaningful, but those checkpoints are not at
+convergence.
+
+The matched-penalty $2{\times}2$ control disambiguates capacity loss from
+credit quality but does not account for non-capacity differences between
+end-to-end BP and local DFA training. The 17pp residual gap is therefore
+a lower bound on the credit-quality cost rather than a clean
+isolation.
+
+\section{Broader impacts}
+\label{sec:impacts}
+
+This paper does not introduce a new training method, dataset, or
+generative model. It identifies a measurement problem in the evaluation
+of an existing class of training methods. Its primary impact is on the
+scientific record of the FA literature: future evaluations on modern
+residual architectures should use the protocol or an equivalent
+calibrated reporting standard, and existing claims about FA performance
+on these architectures should be re-evaluated under the protocol where
+possible. We are not aware of any negative downstream applications of
+this work.
+
+\section{Conclusion}
+\label{sec:conclusion}
+
+We have shown that standard Feedback Alignment evaluation on modern
+residual networks is unreliable because it conflates two distinct
+failure modes: measurement degeneracy via terminal-LayerNorm gradient
+cancellation, and low intrinsic credit-direction quality of random
+feedback. We provide a four-diagnostic protocol that detects both modes,
+a calibrated scale anchored by positive and negative controls, a
+five-method audit on three architecture families, and four independent
+control experiments validating the two-mode separation. The protocol,
+audit data, and reporting template are released as a community artifact
+for the FA evaluation community.
+
+\bibliographystyle{abbrvnat}
+\begin{thebibliography}{99}
+\bibitem{lillicrap2016random}
+T.~P. Lillicrap, D.~Cownden, D.~B. Tweed, and C.~J. Akerman.
+\newblock Random synaptic feedback weights support error backpropagation for deep learning.
+\newblock {\em Nature Communications}, 7:13276, 2016.
+
+\bibitem{nokland2016direct}
+A.~N\o{}kland.
+\newblock Direct feedback alignment provides learning in deep neural networks.
+\newblock In {\em NeurIPS}, 2016.
+
+\bibitem{akrout2019deep}
+M.~Akrout, C.~Wilson, P.~Humphreys, T.~Lillicrap, and D.~B. Tweed.
+\newblock Deep learning without weight transport.
+\newblock In {\em NeurIPS}, 2019.
+
+\bibitem{launay2020direct}
+J.~Launay, I.~Poli, F.~Boniface, and F.~Krzakala.
+\newblock Direct feedback alignment scales to modern deep learning tasks and architectures.
+\newblock In {\em NeurIPS}, 2020.
+
+\bibitem{moskovitz2018feedback}
+T.~H. Moskovitz, A.~Litwin-Kumar, and L.~F. Abbott.
+\newblock Feedback alignment in deep convolutional networks.
+\newblock {\em arXiv:1812.06488}, 2018.
+
+\bibitem{refinetti2021align}
+M.~Refinetti, S.~d'Ascoli, R.~Ohana, and S.~Goldt.
+\newblock Align, then memorise: the dynamics of learning with feedback alignment.
+\newblock In {\em ICML}, 2021.
+
+\bibitem{crafton2019direct}
+B.~Crafton, A.~Parihar, E.~Gebhardt, and A.~Raychowdhury.
+\newblock Direct feedback alignment with sparse connections for local learning.
+\newblock {\em Frontiers in Neuroscience}, 13:525, 2019.
+
+\bibitem{bartunov2018assessing}
+S.~Bartunov, A.~Santoro, B.~Richards, L.~Marris, G.~Hinton, and T.~Lillicrap.
+\newblock Assessing the scalability of biologically-motivated deep learning algorithms and architectures.
+\newblock In {\em NeurIPS}, 2018.
+
+\bibitem{xiong2020layernorm}
+R.~Xiong, Y.~Yang, D.~He, K.~Zheng, S.~Zheng, C.~Xing, H.~Zhang, Y.~Lan, L.~Wang, and T.~Liu.
+\newblock On layer normalization in the transformer architecture.
+\newblock In {\em ICML}, 2020.
+
+\bibitem{statebridge2024}
+Anonymous.
+\newblock State Bridge: terminal-conditioned predictor for credit assignment.
+\newblock {\em Anonymous in-progress reference, 2024-2026}.
+
+\bibitem{creditbridge2024}
+Anonymous.
+\newblock Credit Bridge: value-field local credit without hidden BP.
+\newblock {\em Anonymous in-progress reference, 2024-2026}.
+
+\end{thebibliography}
+
+\appendix
+
+\section{Reproducibility}
+\label{app:reproducibility}
+
+All experiments use PyTorch~$\geq$2.0 on a single NVIDIA A6000 GPU.
+Source for the protocol package is in \texttt{protocol/}; experimental
+scripts are in \texttt{experiments/}. Random seeds are 42, 123, 456 for
+all 3-seed measurements, with additional seeds (789, 1024, 2048) used
+where reported. CIFAR-10 is loaded via \texttt{torchvision} with the
+standard normalization $(\mu, \sigma) = ((0.4914, 0.4822, 0.4465),
+(0.2470, 0.2435, 0.2616))$.
+
+\section{Pipeline pitfalls catalog}
+\label{app:pitfalls}
+
+Beyond the four diagnostics, we found seven evaluation-pipeline bugs in
+our own dogfood codebase that silently corrupt FA evaluation results.
+Each has a standalone reproducer in
+\texttt{protocol/examples/verify\_pitfalls*.py}.
+
+\begin{enumerate}
+\item \texttt{tensor.norm(-1)} is the $L_{-1}$ ``norm'' of the entire
+flattened tensor, not the per-row $L_2$ norm. The correct call is
+\texttt{tensor.norm(dim=-1)}. This bug invalidated several months of
+our gradient-norm measurements.
+
+\item \texttt{F.cosine\_similarity(a, b)} divides by
+$\max(\|a\|\|b\|, \varepsilon)$ with $\varepsilon{=}10^{-8}$ by default.
+When $\|b\|\sim 10^{-10}$ (the regime of the BP gradient on degenerate
+DFA-trained pre-LN networks), the divisor becomes $\|a\|\cdot 10^{-8}$
+instead of $\|a\|\cdot 10^{-10}$, scaling the reported cosine by a
+factor of $\sim\!100\times$ in the wrong direction.
+
+\item fp16 mixed precision underflows BP gradients at $\sim\!5\times
+10^{-10}$, below fp16's smallest subnormal of $\sim\!6\times 10^{-8}$.
+bf16 works because it shares fp32's exponent range.
+
+\item Random feedback $B_l$ matrices are training-specific. DFA reports
+$\Gamma\approx 0.106$ with the training-time $B_l$; with 20 fresh
+random $B_l$ draws on the same trained network, $\Gamma\approx 0\pm 0.005$.
+The reported alignment is the network adapting to its specific $B_l$, not
+intrinsic.
+
+\item Aggregation strategy across (layers, samples, batches) is rarely
+specified but determines the headline number. Same DFA seed-42 gives
+$\Gamma \in [-0.028, +0.074]$ across four valid aggregation strategies
+(a 3.45$\times$ ratio, with sign flip).
+
+\item Per-layer $\Gamma$ structure is hidden by aggregation. On the
+4-block ResMLP, DFA's headline $\Gamma\approx 0.10$ is driven almost
+entirely by the embedding layer ($\Gamma_{l_0} \approx +0.43$);
+deeper layers have $\Gamma \approx 0$. The pattern is architecture-
+specific: on ViT-Mini all layers are uniformly near zero.
+
+\item Auxiliary networks (random feedback $B_l$, bridge predictors) not
+saved alongside model checkpoints can cause post-hoc $\Gamma$ scripts to
+silently fall back to $\cos(\text{BP\_grad}, \text{BP\_grad}) = 1.0$ and
+report ``perfect alignment.'' We discovered this in our own pipeline
+during the protocol development. Check that auxiliary networks are
+persisted before reporting any $\Gamma$ value.
+\end{enumerate}
+
+\section{Methodology: walk-back chain}
+\label{app:walkback}
+
+The framing of this paper underwent several corrections during the
+development of the protocol. We document the four-step progression
+explicitly as part of the methodology, not as narrative drama:
+
+\begin{enumerate}
+\item Initial metric ($\Gamma\approx 0.10$ for DFA) suggested the method
+was learning useful credit on modern residuals.
+\item Diagnostic showed the metric was measured against a numerical-floor
+reference vector ($\|g_L\|\sim 10^{-10}$); the headline number was not
+interpretable.
+\item Revised control (the residual-stream penalty) restored the
+reference but only partially closed the accuracy gap to BP, identifying
+a residual phenomenon.
+\item Final interpretation (this paper) separates measurement failure
+(Mode~1) from genuine credit-quality cost (Mode~2), validated by the
+four control experiments of Section~\ref{sec:two-modes}.
+\end{enumerate}
+
+\section{Six independent validations of the two-mode separation}
+\label{app:six-validations}
+
+For completeness we list all six independent validation experiments,
+beyond the four reported in the main text:
+
+\begin{enumerate}
+\item Direct deep-layer cosine on penalized DFA (3 seeds): deep mean
+$+0.155 \pm 0.025$.
+\item Null calibration with 20 fresh random $B_l$: deep cosine
+$+0.002 \pm 0.022$ (within noise).
+\item Hypothesis-disambiguation sweep: vanilla DFA early-epoch deep
+cosine $-0.008 \pm 0.013$ across 3 seeds at epoch 1.
+\item BP+penalty matched-control: 8pp BP capacity cost vs 17pp residual
+gap at $\lambda{=}10^{-2}$.
+\item Multi-seed lock-in: 24 measurements (3 seeds $\times$ 2 epochs
+$\times$ 4 deep layers) all in $[-0.04, +0.02]$ on vanilla.
+\item Cross-metric triangulation via perturbation correlation: vanilla
+$+0.002$, penalized $+0.080$ (3 seeds), positive control (BP grad)
+$+0.997$, negative control (random vector) $+0.006$.
+\end{enumerate}
+
+\end{document}