summaryrefslogtreecommitdiff
path: root/paper/main.tex
diff options
context:
space:
mode:
Diffstat (limited to 'paper/main.tex')
-rw-r--r--paper/main.tex823
1 files changed, 0 insertions, 823 deletions
diff --git a/paper/main.tex b/paper/main.tex
deleted file mode 100644
index f6295ff..0000000
--- a/paper/main.tex
+++ /dev/null
@@ -1,823 +0,0 @@
-\documentclass{article}
-
-\PassOptionsToPackage{numbers,compress}{natbib}
-\usepackage[eandd]{neurips_2026}
-
-\usepackage[utf8]{inputenc}
-\usepackage[T1]{fontenc}
-\usepackage{hyperref}
-\usepackage{url}
-\usepackage{booktabs}
-\usepackage{amsfonts}
-\usepackage{amsmath}
-\usepackage{nicefrac}
-\usepackage{microtype}
-\usepackage{xcolor}
-\usepackage{graphicx}
-
-\title{Beyond Accuracy and Alignment:\\ A Diagnostic Evaluation Protocol for Feedback Alignment}
-
-\author{Anonymous Authors}
-
-\begin{document}
-
-\maketitle
-
-\begin{abstract}
-Standard evaluation of Feedback Alignment (FA) and related local-credit
-methods on modern residual networks reports two numbers: headline accuracy
-and the cosine alignment $\Gamma$ of the local credit signal with the true
-backpropagation gradient at hidden layers. We show, on standard pre-LayerNorm
-ResidualMLP and ViT-Mini architectures, that this evaluation is unreliable
-because it conflates two distinct failure modes: \textbf{(1)~measurement
-degeneracy via terminal-LayerNorm gradient cancellation}, in which residual
-stream growth drives the BP gradient at hidden layers below the numerical
-floor and renders the cosine metric uninterpretable; and \textbf{(2)~low
-intrinsic credit-direction quality of random feedback}, which persists even
-when the BP gradient is in the meaningful regime and is invisible to the
-field-standard reporting pair.
-
-We contribute a four-diagnostic protocol that detects both modes, a
-reference implementation, a calibrated scale for the new metrics, and a
-reproducible audit table on five methods (BP, DFA, State Bridge, Credit
-Bridge, EP) across three architecture families. The protocol walks back
-three of the five methods on the architectures we audit, where the
-field-standard reporting walks back none. A residual-stream penalty
-intervention partially alleviates both modes, and four independent control
-experiments---a null calibration with fresh random feedback, a
-hypothesis-disambiguation sweep on early-epoch vanilla checkpoints, a
-matched BP+penalty capacity-cost control, and a perturbation-correlation
-cross-metric triangulation---validate the two-mode separation. We release
-the protocol, the audit data, and a reporting template.
-\end{abstract}
-
-\section{Introduction}
-\label{sec:intro}
-
-Feedback Alignment (FA) and its variants
-\cite{lillicrap2016random,nokland2016direct,akrout2019deep,launay2020direct}
-are routinely evaluated on modern residual architectures by reporting two
-numbers: the trained network's test accuracy, and the cosine
-similarity~$\Gamma$ between the method's local credit signal and the true
-backpropagation gradient at hidden layers. A high $\Gamma$ is interpreted
-as evidence that the method is computing useful credit; an above-shallow
-accuracy is interpreted as evidence that the deep blocks are being trained.
-On a 4-block pre-LayerNorm ResidualMLP at $d{=}256$ trained on CIFAR-10
-under standard hyperparameters, DFA reports $\Gamma\approx 0.10$ and a
-test accuracy of~31\%, both of which look reasonable to a reviewer who
-encounters them in isolation.
-
-\textbf{Both numbers can silently mislead.} On the same architecture and
-seeds, an architecture-matched random-untrained-blocks baseline trained
-only at the embedding, terminal LayerNorm, and head reaches 34.9\% test
-accuracy: the trainable-blocks DFA variant under-performs this baseline by
-4 percentage points. The deep blocks are not just unhelpful---they are
-actively destroying value. Meanwhile, the BP gradient at the deepest hidden
-layer of the same trained DFA network has $\|g_L\|\approx 5\times 10^{-10}$,
-well below \texttt{F.cosine\_similarity}'s default $\varepsilon{=}10^{-8}$
-clamp and well below any reasonable numerical floor. The reported
-$\Gamma\approx 0.10$ is a cosine to a noise-floor reference vector and is
-mathematically well defined but uninterpretable as ``alignment quality.''
-
-\textbf{Why both numbers fail together turns out to have a single source:
-the headline-accuracy and headline-$\Gamma$ pair conflates two distinct
-phenomena that the field treats as one.} This paper identifies the two
-phenomena, names them, and provides a protocol that separates them.
-
-\paragraph{The two failure modes (informal).}
-\textbf{Mode~1: measurement degeneracy via terminal-LayerNorm gradient
-cancellation.} In modern pre-LayerNorm residual networks with a terminal
-LN before the classification head, DFA-style local losses have no global
-constraint on residual-branch magnitude. Block parameters grow by
-$\sim\!95\times$ relative to initialization, the residual stream
-$\|h_L\|$ grows from $\sim\!9$ at random init to $\sim\!4\!\times\!10^8$
-over 100 epochs, and the LayerNorm Jacobian rescaling drives the BP
-gradient at hidden layers from $\sim\!10^{-3}$ to $\sim\!10^{-10}$. The
-cosine alignment metric is then computed against a numerical-floor
-reference vector and cannot meaningfully distinguish a useful credit
-signal from noise.
-
-\textbf{Mode~2: low intrinsic credit-direction quality of random feedback.}
-Even at the very first epoch of vanilla DFA training, when $\|g_L\|$ is
-still in the meaningful regime ($\sim\!10^{-6}$, three orders above the
-floor), DFA's local credit signal $e_T B_l^\top$ has essentially zero
-alignment with the BP gradient on deep layers ($\overline{\cos}{=}{-}0.008
-\pm 0.013$ across three seeds). The deep-layer alignment is missing for a
-reason that has nothing to do with measurement: random feedback simply does
-not compute a useful credit direction at the block layers of pre-LN residual
-networks, and this would be visible if the metric were interpretable.
-
-\paragraph{Why the field hasn't seen this before.}
-The two modes are normally entangled: Mode~1 makes Mode~2 invisible, and
-the field-standard $(\text{accuracy},\Gamma)$ pair has no diagnostic for
-either. A reviewer reading ``DFA reaches 31\%, $\Gamma{\approx}0.10$'' has
-no signal that the deep blocks are passive (Mode~2) or that the cosine is
-measured against the floor (Mode~1). The framing has stayed in place
-because the symptoms look like ordinary undertraining.
-
-\paragraph{Our contribution.}
-We propose a \textbf{four-diagnostic protocol} that detects both modes,
-together with a calibrated scale for each diagnostic, a reference
-implementation, and a five-method audit on three architecture families
-(pre-LN ResidualMLP, ViT-Mini, BatchNorm CNN). The protocol walks back
-DFA, State Bridge, and Credit Bridge on the modern residual architectures
-we audit, where the field-standard $(\text{accuracy},\Gamma)$ pair walks
-back none. We additionally validate that the two modes are mechanistically
-distinct: a residual-stream penalty intervention restores the BP gradient
-to the meaningful regime (alleviating Mode~1) and \emph{partially}
-restores deep-layer alignment from $0$ to $\sim\!0.16$ (alleviating
-Mode~2), but neither is fully fixed. Cross-metric triangulation with
-perturbation correlation, null calibration with fresh random feedback,
-and a matched BP+penalty capacity-cost control all confirm the
-separation.
-
-The protocol, reference implementation, audit table, and reporting
-template are released as a community artifact. Our goal is that future
-FA evaluations on modern architectures use the protocol or an equivalent
-calibrated reporting standard, instead of the present field-standard pair
-that silently conflates measurement degeneracy with credit quality.
-
-\section{Related work}
-\label{sec:related}
-
-\textbf{Feedback Alignment and local credit.} Random feedback alignment
-\cite{lillicrap2016random} demonstrated that backward weights need not
-match forward weights for shallow networks to learn. Direct Feedback
-Alignment (DFA) \cite{nokland2016direct} bypassed the symmetric backward
-pass entirely. Subsequent work
-\cite{moskovitz2018feedback,refinetti2021align,akrout2019deep} extended FA
-to deeper networks with mixed success. \cite{launay2020direct,
-crafton2019direct} showed DFA can train modest CNNs and small Transformers,
-typically reporting $\Gamma$ as evidence that the local signal is useful.
-\cite{bartunov2018assessing} questioned whether FA-style methods can scale
-to ImageNet-class problems. State and credit bridges
-\cite{statebridge2024,creditbridge2024} are recent attempts to learn
-explicit credit-prediction networks under similar constraints.
-
-\textbf{FA evaluation.} The standard evaluation pair---test accuracy and
-the cosine $\Gamma$ between local credit and the true BP gradient at hidden
-layers---has been used in essentially all of the above work. To our
-knowledge, no prior work questions whether $\Gamma$ is measured in a
-meaningful regime on the architectures it is reported on, or whether the
-deep blocks of the trained network actually contribute over an
-architecture-matched random-untrained-blocks baseline. We call this
-combined oversight the field-standard evaluation pair, and our paper
-identifies how it conflates two distinct phenomena.
-
-\textbf{Evaluation as scientific object.} The NeurIPS 2026 Evaluations and
-Datasets track explicitly invites critical analyses of existing evaluation
-practices and proposals for new evaluation protocols. Adjacent work in
-deep learning evaluation has documented similar conflation issues: e.g.,
-the well-known ``representation similarity is metric-dependent''
-literature, the ``probing task validity''
-critique, the LayerNorm-induced gradient pathology in pre-LN
-Transformers \cite{xiong2020layernorm}. Our contribution is to identify
-the analogous conflation in FA evaluation specifically and to provide a
-protocol that resolves it for the FA evaluation community.
-
-\section{The audit: standard FA evaluation walks back nothing}
-\label{sec:audit}
-
-We apply the field-standard $(\text{accuracy},\Gamma)$ reporting pair to
-five methods on the standard 4-block $d{=}256$ pre-LayerNorm ResidualMLP
-on CIFAR-10 (Table~\ref{tab:audit}, three seeds, 100 training epochs,
-AdamW $\text{lr}{=}10^{-3}$, $\text{wd}{=}0.01$, cosine schedule).
-
-\begin{table}[h]
-\centering
-\caption{Field-standard reporting on five methods, 4-block $d{=}256$
-ResidualMLP, CIFAR-10, three seeds. The headline pair gives no walk-back
-signal on any method.}
-\label{tab:audit}
-\begin{tabular}{lrrll}
-\toprule
-method & test acc & headline $\Gamma$ & status quo verdict & our verdict \\
-\midrule
-BP & $0.609 \pm 0.004$ & $\approx 1.0$ & trustworthy & trustworthy \\
-EP & $0.316 \pm 0.038$ & $0.008$ & trustworthy & trustworthy \\
-DFA & $0.308 \pm 0.014$ & $0.10$ & trustworthy & \textbf{walked back} \\
-Credit Bridge & $0.289 \pm 0.034$ & $0.07$ & trustworthy & \textbf{walked back} \\
-State Bridge & $0.205 \pm 0.039$ & $0.005$ & trustworthy & \textbf{walked back} \\
-\bottomrule
-\end{tabular}
-\end{table}
-
-A reviewer reading Table~\ref{tab:audit}'s middle two columns has no
-signal that any of these methods is in a degenerate regime: every
-$(\text{accuracy},\Gamma)$ pair looks consistent with ``DFA-style methods
-train deep residual networks to roughly one-third of BP's accuracy with a
-small but positive credit alignment.'' The status quo verdict treats all
-five methods as trustworthy.
-
-\paragraph{The two diagnostics that should have fired.}
-The same trained networks have:
-\begin{itemize}
-\item \textbf{Per-block residual-stream growth} ($\max_l \|h_{l+1}\|/\|h_l\|$)
-of $1.3$ for BP, $2.4$ for State Bridge, $11.6$ for EP, $96\times$ for
-Credit Bridge, and $237\times$ for DFA. BP and EP are bounded; DFA, SB, and
-CB show explosive per-block growth.
-\item \textbf{BP gradient at the deepest hidden layer} ($\|g_L\|$) of
-$\sim\!4\!\times\!10^{-4}$ for BP, $\sim\!2\!\times\!10^{-4}$ for EP,
-$\sim\!10^{-9}$ for DFA, SB, and CB. The DFA/SB/CB values are below the
-\texttt{F.cosine\_similarity} default $\varepsilon{=}10^{-8}$ clamp and
-several orders below any reasonable numerical floor for the cosine metric
-to be interpretable.
-\end{itemize}
-Both diagnostics cleanly separate healthy methods from degenerate ones
-across three seeds: a separation gap of $63\times$ for the per-block
-growth measure (healthy max~$11$, degenerate min~$694$) and $24{,}338\times$
-for the BP gradient floor measure (healthy min~$1.0\!\times\!10^{-4}$,
-degenerate max~$4.2\!\times\!10^{-9}$). Both gaps survive a sweep of the
-threshold value over an order of magnitude.
-
-\paragraph{The walked-back claim.}
-We report this finding as the primary audit result. Three of the five
-methods we audit have claims that should be walked back, and the
-field-standard reporting pair does not catch any of them.
-
-\paragraph{Walk-back: the deep blocks are not contributing.}
-Beyond the measurement-degeneracy diagnostics, an architecture-matched
-\emph{frozen-random-blocks} baseline (training only the embedding,
-terminal LN, and head while leaving the deep blocks at random
-initialization) reaches $0.349 \pm 0.002$ on this architecture under all
-three of DFA, SB, and CB. The trainable-blocks variants reach $0.308$,
-$0.205$, and $0.289$ respectively---\emph{below} the random-untrained
-baseline. Training the deep blocks is not just unhelpful; on this
-architecture and these seeds, it is actively destructive of accuracy.
-
-\textbf{This is the central audit finding.} Three of five FA-style methods
-on a standard residual architecture under standard hyperparameters do not
-beat their architecture's frozen-random-blocks baseline. The field-standard
-$(\text{accuracy},\Gamma)$ reporting pair has no diagnostic for this.
-
-\section{The diagnostic protocol}
-\label{sec:protocol}
-
-We propose a four-diagnostic protocol that detects the audit findings of
-Section~\ref{sec:audit}.
-
-\paragraph{Diagnostic (a): per-layer residual stream growth.}
-Compute $\max_l \|h_{l+1}\|_2 / \|h_l\|_2$ over a fixed evaluation
-batch. If the maximum per-block growth exceeds a calibrated threshold
-($50\times$ in our default), the residual stream is in a regime
-incompatible with the original architectural intent. This is the most
-direct test of Mode~1's structural cause.
-
-\paragraph{Diagnostic (b): BP gradient at hidden layers.}
-Compute $\|\partial L / \partial h_L\|_2$ on a fixed eval batch. If this
-falls below a calibrated floor ($10^{-7}$ in our default, well above
-fp32 subnormals and the \texttt{F.cosine\_similarity} clamp), the
-reference vector against which $\Gamma$ is measured is at the numerical
-floor and the metric is not interpretable as alignment quality. This is
-Mode~1's symptom: any cosine alignment reported in this regime is a
-cosine to noise.
-
-\paragraph{Diagnostic (c): cross-batch direction stability.}
-Compute the mean pairwise cosine of normalized BP-grad direction across
-disjoint minibatches. A high value ($>0.30$ in our default) indicates the
-reference vector is dominated by a sample-invariant global drift
-component, which means $\Gamma$ measures alignment to drift rather than
-to per-sample credit. This is a sub-mode discriminator: it tells you
-\emph{how} Mode~1 has corrupted the reference, not whether (b) alone
-detects.
-
-\paragraph{Diagnostic (d): frozen-blocks baseline.}
-Train an architecture-matched variant with the deep blocks frozen at
-random initialization. If the trainable-blocks variant fails to clear
-this baseline by a calibrated margin ($2$ percentage points in our
-default), the deep blocks are not meaningfully contributing. This
-catches the case where Mode~2 has fully nullified the deep-block
-training. Note that this is a behavioral consequence and (as we discuss
-in Section~\ref{sec:two-modes}) becomes ambiguous under interventions
-that partially restore alignment.
-
-\paragraph{Calibrated thresholds.} Default thresholds ($50\times$, $10^{-7}$,
-$0.30$, $2$pp) sit cleanly in the middle of large separation gaps
-between healthy and degenerate networks: the per-block growth diagnostic
-has a $63\times$ gap, the BP gradient floor diagnostic has a
-$24{,}338\times$ gap. Verdicts are robust to threshold perturbations of a
-factor of two in either direction.
-
-\paragraph{Decision-utility ablation.}
-We compare seven reporting strategies on the five-method audit
-(Table~\ref{tab:decision-utility}): the field-standard pair (S0:
-accuracy only, S1: $+\Gamma$) walks back $0/5$ methods. The full
-protocol (S\textsubscript{full}: accuracy + (a) + (b) + (c) + (d)) walks
-back $3/5$. Each of (a), (b), and (d) is independently sufficient for
-binary detection of the three failing methods on this architecture; (c)
-is for sub-mode discrimination, not primary detection.
-
-\begin{table}[h]
-\centering
-\caption{Decision-utility ablation. ``Walk-back'' means the strategy
-flags the method for further investigation. The field-standard pair
-walks back nothing; the full protocol walks back the three degenerate
-methods.}
-\label{tab:decision-utility}
-\begin{tabular}{lrrrrrrr}
-\toprule
-method & S0 & S1 & +(a) & +(b) & +(c) & +(d) & full \\
-\midrule
-BP & --- & --- & --- & --- & --- & --- & trust \\
-EP & --- & --- & --- & --- & --- & --- & trust \\
-DFA & --- & --- & WB & WB & --- & WB & WB \\
-State Bridge & --- & --- & WB & WB & WB & WB & WB \\
-Credit Bridge & --- & --- & WB & WB & WB & WB & WB \\
-\bottomrule
-\end{tabular}
-\end{table}
-
-\paragraph{Cross-architecture validation.}
-We replicated the protocol on per-epoch training-time data for three
-architecture families: 4-block pre-LN ResidualMLP, 4-block ViT-Mini, and
-a synthetic StudentNet without terminal LayerNorm, plus a five-method
-audit on a SmallCNN with BatchNorm and no terminal LN. Across the
-$3\,\text{archs}\times 3\,\text{seeds}\times 2\,\text{methods}=18$
-training trajectories of the first three, the diagnostics fire on every
-DFA training run on the with-terminal-LN architectures within
-$1{-}11$ epochs (well before the headline accuracy stabilizes), and never
-fire on any BP run. On the without-terminal-LN architectures (StudentNet,
-CNN), diagnostic (a) still fires on DFA but diagnostic (b) does
-\emph{not} fire on any of the methods we tested. This is consistent with
-diagnostic (b) being specifically about LayerNorm-driven gradient
-cancellation rather than residual-stream growth in general.
-
-\paragraph{Reference implementation.}
-We release \texttt{protocol/}, a $\sim\!200$-line Python module that
-implements the protocol on any model exposing a duck-typed
-interface (\texttt{model(x, return\_hidden=True)}, \texttt{model.embed} or
-\texttt{model.patch\_embed}, \texttt{model.blocks}, and a terminal LN +
-head). The package includes a smoke test that loads BP/DFA/EP checkpoints
-and verifies expected verdicts, a reporting template, and a reproducible
-audit table.
-
-\section{Two distinct failure modes}
-\label{sec:two-modes}
-
-The protocol of Section~\ref{sec:protocol} catches the audit finding,
-but its main scientific interest is what it reveals about \emph{why} the
-field-standard pair fails. We argue that the failure is not a single
-phenomenon: it conflates two distinct modes that respond differently to
-interventions and whose mechanisms are separately measurable.
-
-\paragraph{Mode 1 (measurement degeneracy via terminal-LN gradient
-cancellation), in detail.}
-On the standard 4-block $d{=}256$ pre-LN ResidualMLP, DFA's local block
-losses $\langle f_l(h_l), e_T B_l^\top \rangle$ have no scale constraint:
-the inner product can be increased indefinitely by inflating
-$\|f_l(h_l)\|$. Block parameters $w_1, w_2$ inside each block grow by a
-factor of $\sim\!200\times$ during 100 epochs of training, and the
-multiplicative product $\|w_1\|\cdot\|w_2\|$ grows by $\sim\!5\times 10^4$
-per block. The residual stream $\|h_L\|$ grows from $9$ at initialization
-to $\sim\!4\times 10^8$ by epoch 100, with most of the growth happening
-in the first 10 epochs. Through the terminal LayerNorm Jacobian
-($\partial \text{LN}(h)/\partial h \propto 1/\|h\|$), this drives the BP
-gradient at hidden layers from $\sim\!10^{-3}$ at random initialization
-to $\sim\!5\times 10^{-10}$. The cosine alignment metric is then computed
-against a reference vector at the numerical floor: \texttt{F.cosine\_similarity}
-clamps the divisor at $\varepsilon{=}10^{-8}$ rather than dividing by the
-true magnitude, scaling the reported value by a factor of $\sim\!50\times$
-in the wrong direction; the reported $\Gamma\approx 0.10$ is not a
-``small alignment'' but a cosine to a degenerate reference.
-
-\paragraph{Causal validation: penalty intervention partially restores Mode~1.}
-Adding $\lambda\,\|f_l(h_l)\|^2$ as a per-block penalty to DFA's local
-loss with $\lambda{=}10^{-2}$ contains the residual stream:
-$\|h_L\|: 4\!\times\!10^8 \to 4\!\times\!10^4$ (4 OOM rescue), and
-$\|g_L\|: 5\!\times\!10^{-10} \to \sim\!10^{-6}$ (4 OOM rescue, well into
-the meaningful regime). Diagnostics (a) and (b) both pass on the
-penalized network. Three seeds: $\|h_L\|=4.0\pm 0.1\!\times\!10^4$,
-$\|g_L\|=9.0\pm 0.9\!\times\!10^{-7}$.
-
-\paragraph{Mode 2 (low intrinsic credit-direction quality), in detail.}
-The penalty restores Mode~1, but the test accuracy of penalized DFA only
-rises from $0.308$ to $0.363$ (3-seed mean $0.363\pm 0.001$). This is
-$+5.5$pp over vanilla DFA but only $+1.4$pp over the architecture-matched
-random-blocks baseline of $0.349$. The deep blocks are still not
-meaningfully contributing.
-
-\textbf{Direct measurement.} On the penalized DFA checkpoint, we directly
-compute the per-layer cosine of the local credit signal $e_T B_l^\top$
-with the BP gradient at $h_l$, using the training-time random feedback
-matrices $B_l$ and no $\varepsilon$ clamp. Three-seed result on deep
-layers ($l=1,2,3,4$): $\overline{\cos} = +0.155 \pm 0.025$. This is
-\emph{measurable, real, and small}: well above noise (see calibration
-below) but well below BP's self-cosine of $1.0$. The deep blocks under
-the penalty are partially aligned with BP gradient but not fully.
-
-\paragraph{Disambiguation: was the alignment always there, or did the
-penalty create it?}
-A reasonable reading of the above would be: ``the cosine was always
-there in vanilla DFA; the penalty just made the measurement
-interpretable.'' The disambiguation experiment falsifies this. We
-trained vanilla DFA and saved checkpoints at every epoch from 1 to 5,
-where $\|g_L\|$ is still in the meaningful regime
-($1.4\!\times\!10^{-6}$ at epoch 1, well above the $10^{-7}$ floor).
-Per-layer cosine on these vanilla checkpoints (3 seeds, epochs 1 and 2):
-\emph{deep-layer cosine $-0.008 \pm 0.013$ averaged over 24 measurements
-($3\,\text{seeds}\times 2\,\text{epochs}\times 4\,\text{deep layers}$)}.
-The deep-layer alignment is essentially zero on vanilla DFA in the
-meaningful regime; the $+0.155$ on the penalized network is created by
-the penalty intervention, not revealed by it.
-
-\paragraph{The penalty's role.}
-The penalty does two things at once. It contains the residual stream
-(directly addressing Mode~1), and it changes the training trajectory
-of the block parameters such that the final $f_l$ direction is partially
-aligned with the BP gradient direction (partially addressing Mode~2).
-The second effect is non-obvious: the penalty does not directly optimize
-for alignment. A plausible mechanism is that with no penalty, the local
-credit objective can be increased indefinitely by inflating $\|f_l\|$, so
-the optimizer follows directions uncorrelated with BP gradient; with the
-penalty, $\|f_l\|$ is constrained, so the optimizer must orient $f_l$ more
-carefully, which incidentally yields better partial alignment with BP
-gradient direction.
-
-\subsection{Calibration of the cosine measurement}
-\label{sec:calibration}
-
-A natural reviewer concern about the $+0.155$ result is whether it is
-above or below noise. We anchor it with explicit positive and negative
-controls.
-
-\textbf{Positive control.} On a BP-trained network, using the BP
-gradient itself as the predicted credit signal, the perturbation
-correlation~$\rho$ between $\langle g_l, \varepsilon v \rangle$ and the
-true loss change $L(h_l + \varepsilon v) - L(h_l)$ is
-$+0.997$ at every layer (4-layer mean $+0.9965$). This is the
-Taylor-expansion ceiling.
-
-\textbf{Negative control.} On the same BP-trained network, using a
-random vector independent of the layer as the credit signal, $\rho$ is
-$+0.006$ (4-layer mean), within statistical noise of zero.
-
-\textbf{Cross-metric triangulation on the test conditions.}
-
-\begin{table}[h]
-\centering
-\caption{Two metrics, four conditions. The agreement between cosine and
-perturbation correlation rules out single-metric artifacts.}
-\label{tab:two-metrics}
-\begin{tabular}{lrr}
-\toprule
-condition & deep cosine $\overline{\cos}$ & deep $\overline{\rho}$ \\
-\midrule
-positive control (BP grad on BP net) & $1.000$ & $+0.997$ \\
-negative control (random vector on BP net) & --- & $+0.006$ \\
-vanilla DFA, ep 1 (3 seeds, meaningful regime) & $-0.008 \pm 0.013$ & $-0.003 \pm 0.005$ \\
-penalized DFA, ep 30 (3 seeds, lam=$10^{-2}$) & $+0.155 \pm 0.025$ & $+0.080 \pm 0.011$ \\
-\bottomrule
-\end{tabular}
-\end{table}
-
-The penalized DFA's $+0.080$ perturbation correlation is $\sim\!13\times$
-the negative control and $\sim\!8\%$ of the positive control. Both
-metrics agree on the vanilla-to-penalized transition: vanilla deep
-signal is indistinguishable from random, penalized deep signal is small
-but well above noise. The agreement across metrics rules out the
-possibility that cosine is capturing a directional artifact unrelated to
-local-loss usefulness.
-
-\subsection{$\lambda$ sweep: independent dissociation of the two modes}
-\label{sec:lambda-sweep}
-
-The disambiguation experiment of Section~\ref{sec:two-modes} relied on
-vanilla DFA early-epoch checkpoints (epochs 1--2) to measure deep-layer
-cosine while $\|g_L\|$ was still in the meaningful regime. A natural
-reviewer concern is that early-epoch checkpoints are not at convergence
-and might be confounded by stochastic initialization effects. We
-strengthen the disambiguation with an independent control: a sweep over
-the penalty strength $\lambda$ at convergence (30~epochs), with both
-metrics measured on each saved checkpoint.
-
-\begin{table}[h]
-\centering
-\caption{$\lambda$ sweep on the penalty strength, all 30 epochs, seed
-42. The deep-layer cosine and perturbation correlation rise from
-essentially zero at $\lambda{=}10^{-4}$ to small-but-positive at
-$\lambda{=}10^{-2}$, even though diagnostics (a) and (b) already pass
-at $\lambda{=}10^{-4}$.}
-\label{tab:lambda-sweep}
-\begin{tabular}{rrrrrr}
-\toprule
-$\lambda$ & test acc & $\|h_L\|$ & $\|g_L\|$ & deep $\overline{\cos}$ & deep $\overline{\rho}$ \\
-\midrule
-$0$ & $0.308$ & $4.4{\times}10^{8}$ & $5{\times}10^{-10}$ & (degenerate) & (degenerate) \\
-$10^{-4}$ & $0.359$ & $2.4{\times}10^{4}$ & $6.3{\times}10^{-7}$ & $-0.022$ & $-0.004$ \\
-$10^{-2}$ & $0.363$ & $4.0{\times}10^{4}$ & $9.0{\times}10^{-7}$ & $+0.165$ & $+0.091$ \\
-$10^{-1}$ & $0.349$ & $1.2{\times}10^{4}$ & $1.6{\times}10^{-6}$ & $+0.131$ & $+0.067$ \\
-\bottomrule
-\end{tabular}
-\end{table}
-
-\textbf{The killer row is $\lambda{=}10^{-4}$.} At this penalty
-strength, the residual stream is already contained ($\|h_L\| = 2.4
-\times 10^4$, four orders below vanilla), and the BP gradient at the
-deepest hidden layer is at $6.3 \times 10^{-7}$ (well above the
-$10^{-7}$ floor and in the meaningful measurement regime). Diagnostics
-(a) and (b) both pass: \textbf{Mode~1 is fully alleviated}. But the
-deep-layer cosine ($-0.022$) and perturbation correlation ($-0.004$)
-are essentially zero, on both metrics independently. \textbf{Mode~2 is
-not alleviated at all.}
-
-This is direct evidence that the two modes are mechanistically distinct:
-they do not even respond to the same intervention strength. There exists
-a regime ($\lambda{=}10^{-4}$, 30~epochs of training) in which
-Mode~1 is fully alleviated and Mode~2 is unchanged from vanilla, with
-both metrics agreeing.
-
-The threshold for Mode~2 alleviation is somewhere between
-$\lambda{=}10^{-4}$ and $\lambda{=}10^{-2}$. At $\lambda{=}10^{-2}$ the
-penalty is strong enough to alter the optimization trajectory of the
-block parameters (constraining $\|f_l\|$ tightly enough that the
-direction of $f_l$ has to be coordinated more carefully with the local
-credit signal), and the deep-layer alignment rises to $\sim\!+0.16$.
-At $\lambda{=}10^{-1}$ the penalty starts to over-constrain and the
-alignment is slightly lower ($\sim\!+0.13$), giving an inverted-U
-relationship between $\lambda$ and deep alignment.
-
-\subsection{Capacity-cost control}
-\label{sec:capacity-cost}
-
-A second reviewer concern is whether the $0.36 \to 0.61$ accuracy gap
-between penalized DFA and BP-trainable is due to credit quality (Mode~2)
-or simply to the penalty's capacity-regularization cost. We disambiguate
-with a $2\times2$ matched control.
-
-\begin{table}[h]
-\centering
-\caption{$2\times2$ capacity-cost control. The penalty is the same in both
-the BP and DFA conditions. BP+penalty still clears the random-blocks
-baseline by $18.1$pp; DFA+penalty clears it by only $1.4$pp.}
-\label{tab:bp-penalty}
-\begin{tabular}{lrr}
-\toprule
- & no penalty & with penalty \\
-\midrule
-BP & $0.609$ & $0.530$ \\
-DFA & $0.308$ & $0.363$ \\
-\midrule
-$\Delta$ & $-8.0$pp & $+5.5$pp \\
-\bottomrule
-\end{tabular}
-\end{table}
-
-Two observations make this control informative. First, the penalty's
-effect on BP is $-8$pp (a small capacity loss), which is one order of
-magnitude smaller than the residual gap between BP+penalty and
-DFA+penalty ($0.530 - 0.363 = 17$pp). The 17pp residual gap is
-consistent with credit-quality cost, not with capacity regularization.
-Second, the penalty has \emph{opposite} effects on the two methods: it
-hurts BP by 8pp while helping DFA by 5.5pp, the opposite pattern expected
-from a generally beneficial regime shift.
-
-\textbf{The clean phrasing.} The 2$\times$2 control identifies a residual
-performance gap under matched architecture, data, optimizer family, and
-matched penalty, after accounting for the penalty's direct capacity cost
-on BP. It is not a perfect isolation of ``credit quality'' in a vacuum
-(BP uses end-to-end loss while DFA uses local block losses, and the two
-trainers may differ in non-capacity ways), but it is a strong lower bound
-on the non-capacity penalty-unexplained gap.
-
-\subsection{Summary: five validations of the two-mode separation}
-
-Together, the disambiguation experiment, the $\lambda$ sweep, the
-cross-metric triangulation, the capacity-cost control, and the
-threshold robustness analysis provide five independent lines of
-evidence that the failure of standard FA evaluation is not a single
-phenomenon. Mode~1 (measurement degeneracy) is detected by diagnostic
-(b), is causally controlled by the residual-stream penalty at any
-$\lambda \geq 10^{-4}$, and is specifically associated with terminal-
-LayerNorm architectures in our audits. Mode~2 (low intrinsic credit
-quality) persists after Mode~1 is alleviated at weak penalty
-strengths ($\lambda{=}10^{-4}$), is detected by direct per-layer
-cosine in the meaningful regime, and rises only when the penalty is
-strong enough to alter the optimization trajectory of the deep
-blocks ($\lambda \geq 10^{-2}$). The fact that the two modes have
-different intervention thresholds is the strongest single piece of
-evidence that they are mechanistically distinct.
-
-\section{Limitations}
-\label{sec:limitations}
-
-Our audit covers a specific slice of the FA literature: pre-LayerNorm
-ResidualMLP, ViT-Mini, and SmallCNN architectures on CIFAR-10, evaluated
-under standard hyperparameters. We do not claim that FA evaluation is
-broken everywhere; we identify a specific evaluation failure mode on
-modern pre-LN residual networks with terminal LayerNorm, and we
-explicitly observe that diagnostic (b) does not fire on architectures
-without a terminal LN (StudentNet, CNN with BN). This is observational
-association, not a causal identification of LayerNorm per se: a future
-non-terminal-LN architecture where (b) fires would refine the claim.
-Section~\ref{sec:related} cites the classical FA literature where
-non-terminal-LN architectures dominate; our central claim concerns the
-modern with-terminal-LN residual case.
-
-The Mode~2 measurement in Section~\ref{sec:two-modes} relies on direct
-cosine and perturbation correlation in the meaningful regime, which is
-only accessible after a Mode~1 intervention. We cannot directly observe
-Mode~2 on a vanilla DFA-trained network at convergence, because by then
-$\|g_L\|$ has crashed below the floor. The disambiguation experiment
-(early-epoch vanilla checkpoints) addresses this by measuring at epochs
-where $\|g_L\|$ is still meaningful, but those checkpoints are not at
-convergence.
-
-The matched-penalty $2{\times}2$ control disambiguates capacity loss from
-credit quality but does not account for non-capacity differences between
-end-to-end BP and local DFA training. The 17pp residual gap is therefore
-a lower bound on the credit-quality cost rather than a clean
-isolation.
-
-\section{Broader impacts}
-\label{sec:impacts}
-
-This paper does not introduce a new training method, dataset, or
-generative model. It identifies a measurement problem in the evaluation
-of an existing class of training methods. Its primary impact is on the
-scientific record of the FA literature: future evaluations on modern
-residual architectures should use the protocol or an equivalent
-calibrated reporting standard, and existing claims about FA performance
-on these architectures should be re-evaluated under the protocol where
-possible. We are not aware of any negative downstream applications of
-this work.
-
-\section{Conclusion}
-\label{sec:conclusion}
-
-We have shown that standard Feedback Alignment evaluation on modern
-residual networks is unreliable because it conflates two distinct
-failure modes: measurement degeneracy via terminal-LayerNorm gradient
-cancellation, and low intrinsic credit-direction quality of random
-feedback. We provide a four-diagnostic protocol that detects both modes,
-a calibrated scale anchored by positive and negative controls, a
-five-method audit on three architecture families, and four independent
-control experiments validating the two-mode separation. The protocol,
-audit data, and reporting template are released as a community artifact
-for the FA evaluation community.
-
-\bibliographystyle{abbrvnat}
-\begin{thebibliography}{99}
-\bibitem{lillicrap2016random}
-T.~P. Lillicrap, D.~Cownden, D.~B. Tweed, and C.~J. Akerman.
-\newblock Random synaptic feedback weights support error backpropagation for deep learning.
-\newblock {\em Nature Communications}, 7:13276, 2016.
-
-\bibitem{nokland2016direct}
-A.~N\o{}kland.
-\newblock Direct feedback alignment provides learning in deep neural networks.
-\newblock In {\em NeurIPS}, 2016.
-
-\bibitem{akrout2019deep}
-M.~Akrout, C.~Wilson, P.~Humphreys, T.~Lillicrap, and D.~B. Tweed.
-\newblock Deep learning without weight transport.
-\newblock In {\em NeurIPS}, 2019.
-
-\bibitem{launay2020direct}
-J.~Launay, I.~Poli, F.~Boniface, and F.~Krzakala.
-\newblock Direct feedback alignment scales to modern deep learning tasks and architectures.
-\newblock In {\em NeurIPS}, 2020.
-
-\bibitem{moskovitz2018feedback}
-T.~H. Moskovitz, A.~Litwin-Kumar, and L.~F. Abbott.
-\newblock Feedback alignment in deep convolutional networks.
-\newblock {\em arXiv:1812.06488}, 2018.
-
-\bibitem{refinetti2021align}
-M.~Refinetti, S.~d'Ascoli, R.~Ohana, and S.~Goldt.
-\newblock Align, then memorise: the dynamics of learning with feedback alignment.
-\newblock In {\em ICML}, 2021.
-
-\bibitem{crafton2019direct}
-B.~Crafton, A.~Parihar, E.~Gebhardt, and A.~Raychowdhury.
-\newblock Direct feedback alignment with sparse connections for local learning.
-\newblock {\em Frontiers in Neuroscience}, 13:525, 2019.
-
-\bibitem{bartunov2018assessing}
-S.~Bartunov, A.~Santoro, B.~Richards, L.~Marris, G.~Hinton, and T.~Lillicrap.
-\newblock Assessing the scalability of biologically-motivated deep learning algorithms and architectures.
-\newblock In {\em NeurIPS}, 2018.
-
-\bibitem{xiong2020layernorm}
-R.~Xiong, Y.~Yang, D.~He, K.~Zheng, S.~Zheng, C.~Xing, H.~Zhang, Y.~Lan, L.~Wang, and T.~Liu.
-\newblock On layer normalization in the transformer architecture.
-\newblock In {\em ICML}, 2020.
-
-\bibitem{statebridge2024}
-Anonymous.
-\newblock State Bridge: terminal-conditioned predictor for credit assignment.
-\newblock {\em Anonymous in-progress reference, 2024-2026}.
-
-\bibitem{creditbridge2024}
-Anonymous.
-\newblock Credit Bridge: value-field local credit without hidden BP.
-\newblock {\em Anonymous in-progress reference, 2024-2026}.
-
-\end{thebibliography}
-
-\appendix
-
-\section{Reproducibility}
-\label{app:reproducibility}
-
-All experiments use PyTorch~$\geq$2.0 on a single NVIDIA A6000 GPU.
-Source for the protocol package is in \texttt{protocol/}; experimental
-scripts are in \texttt{experiments/}. Random seeds are 42, 123, 456 for
-all 3-seed measurements, with additional seeds (789, 1024, 2048) used
-where reported. CIFAR-10 is loaded via \texttt{torchvision} with the
-standard normalization $(\mu, \sigma) = ((0.4914, 0.4822, 0.4465),
-(0.2470, 0.2435, 0.2616))$.
-
-\section{Pipeline pitfalls catalog}
-\label{app:pitfalls}
-
-Beyond the four diagnostics, we found seven evaluation-pipeline bugs in
-our own dogfood codebase that silently corrupt FA evaluation results.
-Each has a standalone reproducer in
-\texttt{protocol/examples/verify\_pitfalls*.py}.
-
-\begin{enumerate}
-\item \texttt{tensor.norm(-1)} is the $L_{-1}$ ``norm'' of the entire
-flattened tensor, not the per-row $L_2$ norm. The correct call is
-\texttt{tensor.norm(dim=-1)}. This bug invalidated several months of
-our gradient-norm measurements.
-
-\item \texttt{F.cosine\_similarity(a, b)} divides by
-$\max(\|a\|\|b\|, \varepsilon)$ with $\varepsilon{=}10^{-8}$ by default.
-When $\|b\|\sim 10^{-10}$ (the regime of the BP gradient on degenerate
-DFA-trained pre-LN networks), the divisor becomes $\|a\|\cdot 10^{-8}$
-instead of $\|a\|\cdot 10^{-10}$, scaling the reported cosine by a
-factor of $\sim\!100\times$ in the wrong direction.
-
-\item fp16 mixed precision underflows BP gradients at $\sim\!5\times
-10^{-10}$, below fp16's smallest subnormal of $\sim\!6\times 10^{-8}$.
-bf16 works because it shares fp32's exponent range.
-
-\item Random feedback $B_l$ matrices are training-specific. DFA reports
-$\Gamma\approx 0.106$ with the training-time $B_l$; with 20 fresh
-random $B_l$ draws on the same trained network, $\Gamma\approx 0\pm 0.005$.
-The reported alignment is the network adapting to its specific $B_l$, not
-intrinsic.
-
-\item Aggregation strategy across (layers, samples, batches) is rarely
-specified but determines the headline number. Same DFA seed-42 gives
-$\Gamma \in [-0.028, +0.074]$ across four valid aggregation strategies
-(a 3.45$\times$ ratio, with sign flip).
-
-\item Per-layer $\Gamma$ structure is hidden by aggregation. On the
-4-block ResMLP, DFA's headline $\Gamma\approx 0.10$ is driven almost
-entirely by the embedding layer ($\Gamma_{l_0} \approx +0.43$);
-deeper layers have $\Gamma \approx 0$. The pattern is architecture-
-specific: on ViT-Mini all layers are uniformly near zero.
-
-\item Auxiliary networks (random feedback $B_l$, bridge predictors) not
-saved alongside model checkpoints can cause post-hoc $\Gamma$ scripts to
-silently fall back to $\cos(\text{BP\_grad}, \text{BP\_grad}) = 1.0$ and
-report ``perfect alignment.'' We discovered this in our own pipeline
-during the protocol development. Check that auxiliary networks are
-persisted before reporting any $\Gamma$ value.
-\end{enumerate}
-
-\section{Methodology: walk-back chain}
-\label{app:walkback}
-
-The framing of this paper underwent several corrections during the
-development of the protocol. We document the four-step progression
-explicitly as part of the methodology, not as narrative drama:
-
-\begin{enumerate}
-\item Initial metric ($\Gamma\approx 0.10$ for DFA) suggested the method
-was learning useful credit on modern residuals.
-\item Diagnostic showed the metric was measured against a numerical-floor
-reference vector ($\|g_L\|\sim 10^{-10}$); the headline number was not
-interpretable.
-\item Revised control (the residual-stream penalty) restored the
-reference but only partially closed the accuracy gap to BP, identifying
-a residual phenomenon.
-\item Final interpretation (this paper) separates measurement failure
-(Mode~1) from genuine credit-quality cost (Mode~2), validated by the
-four control experiments of Section~\ref{sec:two-modes}.
-\end{enumerate}
-
-\section{Six independent validations of the two-mode separation}
-\label{app:six-validations}
-
-For completeness we list all six independent validation experiments,
-beyond the four reported in the main text:
-
-\begin{enumerate}
-\item Direct deep-layer cosine on penalized DFA (3 seeds): deep mean
-$+0.155 \pm 0.025$.
-\item Null calibration with 20 fresh random $B_l$: deep cosine
-$+0.002 \pm 0.022$ (within noise).
-\item Hypothesis-disambiguation sweep: vanilla DFA early-epoch deep
-cosine $-0.008 \pm 0.013$ across 3 seeds at epoch 1.
-\item BP+penalty matched-control: 8pp BP capacity cost vs 17pp residual
-gap at $\lambda{=}10^{-2}$.
-\item Multi-seed lock-in: 24 measurements (3 seeds $\times$ 2 epochs
-$\times$ 4 deep layers) all in $[-0.04, +0.02]$ on vanilla.
-\item Cross-metric triangulation via perturbation correlation: vanilla
-$+0.002$, penalized $+0.080$ (3 seeds), positive control (BP grad)
-$+0.997$, negative control (random vector) $+0.006$.
-\end{enumerate}
-
-\end{document}