summaryrefslogtreecommitdiff
path: root/paper/v1_rejected.tex
blob: f6295ffc838d12905d1401fae5f31226d8f5a206 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
\documentclass{article}

\PassOptionsToPackage{numbers,compress}{natbib}
\usepackage[eandd]{neurips_2026}

\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{hyperref}
\usepackage{url}
\usepackage{booktabs}
\usepackage{amsfonts}
\usepackage{amsmath}
\usepackage{nicefrac}
\usepackage{microtype}
\usepackage{xcolor}
\usepackage{graphicx}

\title{Beyond Accuracy and Alignment:\\ A Diagnostic Evaluation Protocol for Feedback Alignment}

\author{Anonymous Authors}

\begin{document}

\maketitle

\begin{abstract}
Standard evaluation of Feedback Alignment (FA) and related local-credit
methods on modern residual networks reports two numbers: headline accuracy
and the cosine alignment $\Gamma$ of the local credit signal with the true
backpropagation gradient at hidden layers. We show, on standard pre-LayerNorm
ResidualMLP and ViT-Mini architectures, that this evaluation is unreliable
because it conflates two distinct failure modes: \textbf{(1)~measurement
degeneracy via terminal-LayerNorm gradient cancellation}, in which residual
stream growth drives the BP gradient at hidden layers below the numerical
floor and renders the cosine metric uninterpretable; and \textbf{(2)~low
intrinsic credit-direction quality of random feedback}, which persists even
when the BP gradient is in the meaningful regime and is invisible to the
field-standard reporting pair.

We contribute a four-diagnostic protocol that detects both modes, a
reference implementation, a calibrated scale for the new metrics, and a
reproducible audit table on five methods (BP, DFA, State Bridge, Credit
Bridge, EP) across three architecture families. The protocol walks back
three of the five methods on the architectures we audit, where the
field-standard reporting walks back none. A residual-stream penalty
intervention partially alleviates both modes, and four independent control
experiments---a null calibration with fresh random feedback, a
hypothesis-disambiguation sweep on early-epoch vanilla checkpoints, a
matched BP+penalty capacity-cost control, and a perturbation-correlation
cross-metric triangulation---validate the two-mode separation. We release
the protocol, the audit data, and a reporting template.
\end{abstract}

\section{Introduction}
\label{sec:intro}

Feedback Alignment (FA) and its variants
\cite{lillicrap2016random,nokland2016direct,akrout2019deep,launay2020direct}
are routinely evaluated on modern residual architectures by reporting two
numbers: the trained network's test accuracy, and the cosine
similarity~$\Gamma$ between the method's local credit signal and the true
backpropagation gradient at hidden layers. A high $\Gamma$ is interpreted
as evidence that the method is computing useful credit; an above-shallow
accuracy is interpreted as evidence that the deep blocks are being trained.
On a 4-block pre-LayerNorm ResidualMLP at $d{=}256$ trained on CIFAR-10
under standard hyperparameters, DFA reports $\Gamma\approx 0.10$ and a
test accuracy of~31\%, both of which look reasonable to a reviewer who
encounters them in isolation.

\textbf{Both numbers can silently mislead.} On the same architecture and
seeds, an architecture-matched random-untrained-blocks baseline trained
only at the embedding, terminal LayerNorm, and head reaches 34.9\% test
accuracy: the trainable-blocks DFA variant under-performs this baseline by
4 percentage points. The deep blocks are not just unhelpful---they are
actively destroying value. Meanwhile, the BP gradient at the deepest hidden
layer of the same trained DFA network has $\|g_L\|\approx 5\times 10^{-10}$,
well below \texttt{F.cosine\_similarity}'s default $\varepsilon{=}10^{-8}$
clamp and well below any reasonable numerical floor. The reported
$\Gamma\approx 0.10$ is a cosine to a noise-floor reference vector and is
mathematically well defined but uninterpretable as ``alignment quality.''

\textbf{Why both numbers fail together turns out to have a single source:
the headline-accuracy and headline-$\Gamma$ pair conflates two distinct
phenomena that the field treats as one.} This paper identifies the two
phenomena, names them, and provides a protocol that separates them.

\paragraph{The two failure modes (informal).}
\textbf{Mode~1: measurement degeneracy via terminal-LayerNorm gradient
cancellation.} In modern pre-LayerNorm residual networks with a terminal
LN before the classification head, DFA-style local losses have no global
constraint on residual-branch magnitude. Block parameters grow by
$\sim\!95\times$ relative to initialization, the residual stream
$\|h_L\|$ grows from $\sim\!9$ at random init to $\sim\!4\!\times\!10^8$
over 100 epochs, and the LayerNorm Jacobian rescaling drives the BP
gradient at hidden layers from $\sim\!10^{-3}$ to $\sim\!10^{-10}$. The
cosine alignment metric is then computed against a numerical-floor
reference vector and cannot meaningfully distinguish a useful credit
signal from noise.

\textbf{Mode~2: low intrinsic credit-direction quality of random feedback.}
Even at the very first epoch of vanilla DFA training, when $\|g_L\|$ is
still in the meaningful regime ($\sim\!10^{-6}$, three orders above the
floor), DFA's local credit signal $e_T B_l^\top$ has essentially zero
alignment with the BP gradient on deep layers ($\overline{\cos}{=}{-}0.008
\pm 0.013$ across three seeds). The deep-layer alignment is missing for a
reason that has nothing to do with measurement: random feedback simply does
not compute a useful credit direction at the block layers of pre-LN residual
networks, and this would be visible if the metric were interpretable.

\paragraph{Why the field hasn't seen this before.}
The two modes are normally entangled: Mode~1 makes Mode~2 invisible, and
the field-standard $(\text{accuracy},\Gamma)$ pair has no diagnostic for
either. A reviewer reading ``DFA reaches 31\%, $\Gamma{\approx}0.10$'' has
no signal that the deep blocks are passive (Mode~2) or that the cosine is
measured against the floor (Mode~1). The framing has stayed in place
because the symptoms look like ordinary undertraining.

\paragraph{Our contribution.}
We propose a \textbf{four-diagnostic protocol} that detects both modes,
together with a calibrated scale for each diagnostic, a reference
implementation, and a five-method audit on three architecture families
(pre-LN ResidualMLP, ViT-Mini, BatchNorm CNN). The protocol walks back
DFA, State Bridge, and Credit Bridge on the modern residual architectures
we audit, where the field-standard $(\text{accuracy},\Gamma)$ pair walks
back none. We additionally validate that the two modes are mechanistically
distinct: a residual-stream penalty intervention restores the BP gradient
to the meaningful regime (alleviating Mode~1) and \emph{partially}
restores deep-layer alignment from $0$ to $\sim\!0.16$ (alleviating
Mode~2), but neither is fully fixed. Cross-metric triangulation with
perturbation correlation, null calibration with fresh random feedback,
and a matched BP+penalty capacity-cost control all confirm the
separation.

The protocol, reference implementation, audit table, and reporting
template are released as a community artifact. Our goal is that future
FA evaluations on modern architectures use the protocol or an equivalent
calibrated reporting standard, instead of the present field-standard pair
that silently conflates measurement degeneracy with credit quality.

\section{Related work}
\label{sec:related}

\textbf{Feedback Alignment and local credit.} Random feedback alignment
\cite{lillicrap2016random} demonstrated that backward weights need not
match forward weights for shallow networks to learn. Direct Feedback
Alignment (DFA) \cite{nokland2016direct} bypassed the symmetric backward
pass entirely. Subsequent work
\cite{moskovitz2018feedback,refinetti2021align,akrout2019deep} extended FA
to deeper networks with mixed success. \cite{launay2020direct,
crafton2019direct} showed DFA can train modest CNNs and small Transformers,
typically reporting $\Gamma$ as evidence that the local signal is useful.
\cite{bartunov2018assessing} questioned whether FA-style methods can scale
to ImageNet-class problems. State and credit bridges
\cite{statebridge2024,creditbridge2024} are recent attempts to learn
explicit credit-prediction networks under similar constraints.

\textbf{FA evaluation.} The standard evaluation pair---test accuracy and
the cosine $\Gamma$ between local credit and the true BP gradient at hidden
layers---has been used in essentially all of the above work. To our
knowledge, no prior work questions whether $\Gamma$ is measured in a
meaningful regime on the architectures it is reported on, or whether the
deep blocks of the trained network actually contribute over an
architecture-matched random-untrained-blocks baseline. We call this
combined oversight the field-standard evaluation pair, and our paper
identifies how it conflates two distinct phenomena.

\textbf{Evaluation as scientific object.} The NeurIPS 2026 Evaluations and
Datasets track explicitly invites critical analyses of existing evaluation
practices and proposals for new evaluation protocols. Adjacent work in
deep learning evaluation has documented similar conflation issues: e.g.,
the well-known ``representation similarity is metric-dependent''
literature, the ``probing task validity''
critique, the LayerNorm-induced gradient pathology in pre-LN
Transformers \cite{xiong2020layernorm}. Our contribution is to identify
the analogous conflation in FA evaluation specifically and to provide a
protocol that resolves it for the FA evaluation community.

\section{The audit: standard FA evaluation walks back nothing}
\label{sec:audit}

We apply the field-standard $(\text{accuracy},\Gamma)$ reporting pair to
five methods on the standard 4-block $d{=}256$ pre-LayerNorm ResidualMLP
on CIFAR-10 (Table~\ref{tab:audit}, three seeds, 100 training epochs,
AdamW $\text{lr}{=}10^{-3}$, $\text{wd}{=}0.01$, cosine schedule).

\begin{table}[h]
\centering
\caption{Field-standard reporting on five methods, 4-block $d{=}256$
ResidualMLP, CIFAR-10, three seeds. The headline pair gives no walk-back
signal on any method.}
\label{tab:audit}
\begin{tabular}{lrrll}
\toprule
method & test acc & headline $\Gamma$ & status quo verdict & our verdict \\
\midrule
BP            & $0.609 \pm 0.004$ & $\approx 1.0$  & trustworthy & trustworthy \\
EP            & $0.316 \pm 0.038$ & $0.008$        & trustworthy & trustworthy \\
DFA           & $0.308 \pm 0.014$ & $0.10$         & trustworthy & \textbf{walked back} \\
Credit Bridge & $0.289 \pm 0.034$ & $0.07$         & trustworthy & \textbf{walked back} \\
State Bridge  & $0.205 \pm 0.039$ & $0.005$        & trustworthy & \textbf{walked back} \\
\bottomrule
\end{tabular}
\end{table}

A reviewer reading Table~\ref{tab:audit}'s middle two columns has no
signal that any of these methods is in a degenerate regime: every
$(\text{accuracy},\Gamma)$ pair looks consistent with ``DFA-style methods
train deep residual networks to roughly one-third of BP's accuracy with a
small but positive credit alignment.'' The status quo verdict treats all
five methods as trustworthy.

\paragraph{The two diagnostics that should have fired.}
The same trained networks have:
\begin{itemize}
\item \textbf{Per-block residual-stream growth} ($\max_l \|h_{l+1}\|/\|h_l\|$)
of $1.3$ for BP, $2.4$ for State Bridge, $11.6$ for EP, $96\times$ for
Credit Bridge, and $237\times$ for DFA. BP and EP are bounded; DFA, SB, and
CB show explosive per-block growth.
\item \textbf{BP gradient at the deepest hidden layer} ($\|g_L\|$) of
$\sim\!4\!\times\!10^{-4}$ for BP, $\sim\!2\!\times\!10^{-4}$ for EP,
$\sim\!10^{-9}$ for DFA, SB, and CB. The DFA/SB/CB values are below the
\texttt{F.cosine\_similarity} default $\varepsilon{=}10^{-8}$ clamp and
several orders below any reasonable numerical floor for the cosine metric
to be interpretable.
\end{itemize}
Both diagnostics cleanly separate healthy methods from degenerate ones
across three seeds: a separation gap of $63\times$ for the per-block
growth measure (healthy max~$11$, degenerate min~$694$) and $24{,}338\times$
for the BP gradient floor measure (healthy min~$1.0\!\times\!10^{-4}$,
degenerate max~$4.2\!\times\!10^{-9}$). Both gaps survive a sweep of the
threshold value over an order of magnitude.

\paragraph{The walked-back claim.}
We report this finding as the primary audit result. Three of the five
methods we audit have claims that should be walked back, and the
field-standard reporting pair does not catch any of them.

\paragraph{Walk-back: the deep blocks are not contributing.}
Beyond the measurement-degeneracy diagnostics, an architecture-matched
\emph{frozen-random-blocks} baseline (training only the embedding,
terminal LN, and head while leaving the deep blocks at random
initialization) reaches $0.349 \pm 0.002$ on this architecture under all
three of DFA, SB, and CB. The trainable-blocks variants reach $0.308$,
$0.205$, and $0.289$ respectively---\emph{below} the random-untrained
baseline. Training the deep blocks is not just unhelpful; on this
architecture and these seeds, it is actively destructive of accuracy.

\textbf{This is the central audit finding.} Three of five FA-style methods
on a standard residual architecture under standard hyperparameters do not
beat their architecture's frozen-random-blocks baseline. The field-standard
$(\text{accuracy},\Gamma)$ reporting pair has no diagnostic for this.

\section{The diagnostic protocol}
\label{sec:protocol}

We propose a four-diagnostic protocol that detects the audit findings of
Section~\ref{sec:audit}.

\paragraph{Diagnostic (a): per-layer residual stream growth.}
Compute $\max_l \|h_{l+1}\|_2 / \|h_l\|_2$ over a fixed evaluation
batch. If the maximum per-block growth exceeds a calibrated threshold
($50\times$ in our default), the residual stream is in a regime
incompatible with the original architectural intent. This is the most
direct test of Mode~1's structural cause.

\paragraph{Diagnostic (b): BP gradient at hidden layers.}
Compute $\|\partial L / \partial h_L\|_2$ on a fixed eval batch. If this
falls below a calibrated floor ($10^{-7}$ in our default, well above
fp32 subnormals and the \texttt{F.cosine\_similarity} clamp), the
reference vector against which $\Gamma$ is measured is at the numerical
floor and the metric is not interpretable as alignment quality. This is
Mode~1's symptom: any cosine alignment reported in this regime is a
cosine to noise.

\paragraph{Diagnostic (c): cross-batch direction stability.}
Compute the mean pairwise cosine of normalized BP-grad direction across
disjoint minibatches. A high value ($>0.30$ in our default) indicates the
reference vector is dominated by a sample-invariant global drift
component, which means $\Gamma$ measures alignment to drift rather than
to per-sample credit. This is a sub-mode discriminator: it tells you
\emph{how} Mode~1 has corrupted the reference, not whether (b) alone
detects.

\paragraph{Diagnostic (d): frozen-blocks baseline.}
Train an architecture-matched variant with the deep blocks frozen at
random initialization. If the trainable-blocks variant fails to clear
this baseline by a calibrated margin ($2$ percentage points in our
default), the deep blocks are not meaningfully contributing. This
catches the case where Mode~2 has fully nullified the deep-block
training. Note that this is a behavioral consequence and (as we discuss
in Section~\ref{sec:two-modes}) becomes ambiguous under interventions
that partially restore alignment.

\paragraph{Calibrated thresholds.} Default thresholds ($50\times$, $10^{-7}$,
$0.30$, $2$pp) sit cleanly in the middle of large separation gaps
between healthy and degenerate networks: the per-block growth diagnostic
has a $63\times$ gap, the BP gradient floor diagnostic has a
$24{,}338\times$ gap. Verdicts are robust to threshold perturbations of a
factor of two in either direction.

\paragraph{Decision-utility ablation.}
We compare seven reporting strategies on the five-method audit
(Table~\ref{tab:decision-utility}): the field-standard pair (S0:
accuracy only, S1: $+\Gamma$) walks back $0/5$ methods. The full
protocol (S\textsubscript{full}: accuracy + (a) + (b) + (c) + (d)) walks
back $3/5$. Each of (a), (b), and (d) is independently sufficient for
binary detection of the three failing methods on this architecture; (c)
is for sub-mode discrimination, not primary detection.

\begin{table}[h]
\centering
\caption{Decision-utility ablation. ``Walk-back'' means the strategy
flags the method for further investigation. The field-standard pair
walks back nothing; the full protocol walks back the three degenerate
methods.}
\label{tab:decision-utility}
\begin{tabular}{lrrrrrrr}
\toprule
method & S0 & S1 & +(a) & +(b) & +(c) & +(d) & full \\
\midrule
BP            & --- & --- & --- & --- & --- & --- & trust \\
EP            & --- & --- & --- & --- & --- & --- & trust \\
DFA           & --- & --- & WB  & WB  & --- & WB  & WB    \\
State Bridge  & --- & --- & WB  & WB  & WB  & WB  & WB    \\
Credit Bridge & --- & --- & WB  & WB  & WB  & WB  & WB    \\
\bottomrule
\end{tabular}
\end{table}

\paragraph{Cross-architecture validation.}
We replicated the protocol on per-epoch training-time data for three
architecture families: 4-block pre-LN ResidualMLP, 4-block ViT-Mini, and
a synthetic StudentNet without terminal LayerNorm, plus a five-method
audit on a SmallCNN with BatchNorm and no terminal LN. Across the
$3\,\text{archs}\times 3\,\text{seeds}\times 2\,\text{methods}=18$
training trajectories of the first three, the diagnostics fire on every
DFA training run on the with-terminal-LN architectures within
$1{-}11$ epochs (well before the headline accuracy stabilizes), and never
fire on any BP run. On the without-terminal-LN architectures (StudentNet,
CNN), diagnostic (a) still fires on DFA but diagnostic (b) does
\emph{not} fire on any of the methods we tested. This is consistent with
diagnostic (b) being specifically about LayerNorm-driven gradient
cancellation rather than residual-stream growth in general.

\paragraph{Reference implementation.}
We release \texttt{protocol/}, a $\sim\!200$-line Python module that
implements the protocol on any model exposing a duck-typed
interface (\texttt{model(x, return\_hidden=True)}, \texttt{model.embed} or
\texttt{model.patch\_embed}, \texttt{model.blocks}, and a terminal LN +
head). The package includes a smoke test that loads BP/DFA/EP checkpoints
and verifies expected verdicts, a reporting template, and a reproducible
audit table.

\section{Two distinct failure modes}
\label{sec:two-modes}

The protocol of Section~\ref{sec:protocol} catches the audit finding,
but its main scientific interest is what it reveals about \emph{why} the
field-standard pair fails. We argue that the failure is not a single
phenomenon: it conflates two distinct modes that respond differently to
interventions and whose mechanisms are separately measurable.

\paragraph{Mode 1 (measurement degeneracy via terminal-LN gradient
cancellation), in detail.}
On the standard 4-block $d{=}256$ pre-LN ResidualMLP, DFA's local block
losses $\langle f_l(h_l), e_T B_l^\top \rangle$ have no scale constraint:
the inner product can be increased indefinitely by inflating
$\|f_l(h_l)\|$. Block parameters $w_1, w_2$ inside each block grow by a
factor of $\sim\!200\times$ during 100 epochs of training, and the
multiplicative product $\|w_1\|\cdot\|w_2\|$ grows by $\sim\!5\times 10^4$
per block. The residual stream $\|h_L\|$ grows from $9$ at initialization
to $\sim\!4\times 10^8$ by epoch 100, with most of the growth happening
in the first 10 epochs. Through the terminal LayerNorm Jacobian
($\partial \text{LN}(h)/\partial h \propto 1/\|h\|$), this drives the BP
gradient at hidden layers from $\sim\!10^{-3}$ at random initialization
to $\sim\!5\times 10^{-10}$. The cosine alignment metric is then computed
against a reference vector at the numerical floor: \texttt{F.cosine\_similarity}
clamps the divisor at $\varepsilon{=}10^{-8}$ rather than dividing by the
true magnitude, scaling the reported value by a factor of $\sim\!50\times$
in the wrong direction; the reported $\Gamma\approx 0.10$ is not a
``small alignment'' but a cosine to a degenerate reference.

\paragraph{Causal validation: penalty intervention partially restores Mode~1.}
Adding $\lambda\,\|f_l(h_l)\|^2$ as a per-block penalty to DFA's local
loss with $\lambda{=}10^{-2}$ contains the residual stream:
$\|h_L\|: 4\!\times\!10^8 \to 4\!\times\!10^4$ (4 OOM rescue), and
$\|g_L\|: 5\!\times\!10^{-10} \to \sim\!10^{-6}$ (4 OOM rescue, well into
the meaningful regime). Diagnostics (a) and (b) both pass on the
penalized network. Three seeds: $\|h_L\|=4.0\pm 0.1\!\times\!10^4$,
$\|g_L\|=9.0\pm 0.9\!\times\!10^{-7}$.

\paragraph{Mode 2 (low intrinsic credit-direction quality), in detail.}
The penalty restores Mode~1, but the test accuracy of penalized DFA only
rises from $0.308$ to $0.363$ (3-seed mean $0.363\pm 0.001$). This is
$+5.5$pp over vanilla DFA but only $+1.4$pp over the architecture-matched
random-blocks baseline of $0.349$. The deep blocks are still not
meaningfully contributing.

\textbf{Direct measurement.} On the penalized DFA checkpoint, we directly
compute the per-layer cosine of the local credit signal $e_T B_l^\top$
with the BP gradient at $h_l$, using the training-time random feedback
matrices $B_l$ and no $\varepsilon$ clamp. Three-seed result on deep
layers ($l=1,2,3,4$): $\overline{\cos} = +0.155 \pm 0.025$. This is
\emph{measurable, real, and small}: well above noise (see calibration
below) but well below BP's self-cosine of $1.0$. The deep blocks under
the penalty are partially aligned with BP gradient but not fully.

\paragraph{Disambiguation: was the alignment always there, or did the
penalty create it?}
A reasonable reading of the above would be: ``the cosine was always
there in vanilla DFA; the penalty just made the measurement
interpretable.'' The disambiguation experiment falsifies this. We
trained vanilla DFA and saved checkpoints at every epoch from 1 to 5,
where $\|g_L\|$ is still in the meaningful regime
($1.4\!\times\!10^{-6}$ at epoch 1, well above the $10^{-7}$ floor).
Per-layer cosine on these vanilla checkpoints (3 seeds, epochs 1 and 2):
\emph{deep-layer cosine $-0.008 \pm 0.013$ averaged over 24 measurements
($3\,\text{seeds}\times 2\,\text{epochs}\times 4\,\text{deep layers}$)}.
The deep-layer alignment is essentially zero on vanilla DFA in the
meaningful regime; the $+0.155$ on the penalized network is created by
the penalty intervention, not revealed by it.

\paragraph{The penalty's role.}
The penalty does two things at once. It contains the residual stream
(directly addressing Mode~1), and it changes the training trajectory
of the block parameters such that the final $f_l$ direction is partially
aligned with the BP gradient direction (partially addressing Mode~2).
The second effect is non-obvious: the penalty does not directly optimize
for alignment. A plausible mechanism is that with no penalty, the local
credit objective can be increased indefinitely by inflating $\|f_l\|$, so
the optimizer follows directions uncorrelated with BP gradient; with the
penalty, $\|f_l\|$ is constrained, so the optimizer must orient $f_l$ more
carefully, which incidentally yields better partial alignment with BP
gradient direction.

\subsection{Calibration of the cosine measurement}
\label{sec:calibration}

A natural reviewer concern about the $+0.155$ result is whether it is
above or below noise. We anchor it with explicit positive and negative
controls.

\textbf{Positive control.} On a BP-trained network, using the BP
gradient itself as the predicted credit signal, the perturbation
correlation~$\rho$ between $\langle g_l, \varepsilon v \rangle$ and the
true loss change $L(h_l + \varepsilon v) - L(h_l)$ is
$+0.997$ at every layer (4-layer mean $+0.9965$). This is the
Taylor-expansion ceiling.

\textbf{Negative control.} On the same BP-trained network, using a
random vector independent of the layer as the credit signal, $\rho$ is
$+0.006$ (4-layer mean), within statistical noise of zero.

\textbf{Cross-metric triangulation on the test conditions.}

\begin{table}[h]
\centering
\caption{Two metrics, four conditions. The agreement between cosine and
perturbation correlation rules out single-metric artifacts.}
\label{tab:two-metrics}
\begin{tabular}{lrr}
\toprule
condition & deep cosine $\overline{\cos}$ & deep $\overline{\rho}$ \\
\midrule
positive control (BP grad on BP net) & $1.000$ & $+0.997$ \\
negative control (random vector on BP net) & --- & $+0.006$ \\
vanilla DFA, ep 1 (3 seeds, meaningful regime) & $-0.008 \pm 0.013$ & $-0.003 \pm 0.005$ \\
penalized DFA, ep 30 (3 seeds, lam=$10^{-2}$) & $+0.155 \pm 0.025$ & $+0.080 \pm 0.011$ \\
\bottomrule
\end{tabular}
\end{table}

The penalized DFA's $+0.080$ perturbation correlation is $\sim\!13\times$
the negative control and $\sim\!8\%$ of the positive control. Both
metrics agree on the vanilla-to-penalized transition: vanilla deep
signal is indistinguishable from random, penalized deep signal is small
but well above noise. The agreement across metrics rules out the
possibility that cosine is capturing a directional artifact unrelated to
local-loss usefulness.

\subsection{$\lambda$ sweep: independent dissociation of the two modes}
\label{sec:lambda-sweep}

The disambiguation experiment of Section~\ref{sec:two-modes} relied on
vanilla DFA early-epoch checkpoints (epochs 1--2) to measure deep-layer
cosine while $\|g_L\|$ was still in the meaningful regime. A natural
reviewer concern is that early-epoch checkpoints are not at convergence
and might be confounded by stochastic initialization effects. We
strengthen the disambiguation with an independent control: a sweep over
the penalty strength $\lambda$ at convergence (30~epochs), with both
metrics measured on each saved checkpoint.

\begin{table}[h]
\centering
\caption{$\lambda$ sweep on the penalty strength, all 30 epochs, seed
42. The deep-layer cosine and perturbation correlation rise from
essentially zero at $\lambda{=}10^{-4}$ to small-but-positive at
$\lambda{=}10^{-2}$, even though diagnostics (a) and (b) already pass
at $\lambda{=}10^{-4}$.}
\label{tab:lambda-sweep}
\begin{tabular}{rrrrrr}
\toprule
$\lambda$ & test acc & $\|h_L\|$ & $\|g_L\|$ & deep $\overline{\cos}$ & deep $\overline{\rho}$ \\
\midrule
$0$       & $0.308$ & $4.4{\times}10^{8}$ & $5{\times}10^{-10}$ & (degenerate) & (degenerate) \\
$10^{-4}$ & $0.359$ & $2.4{\times}10^{4}$ & $6.3{\times}10^{-7}$ & $-0.022$ & $-0.004$ \\
$10^{-2}$ & $0.363$ & $4.0{\times}10^{4}$ & $9.0{\times}10^{-7}$ & $+0.165$ & $+0.091$ \\
$10^{-1}$ & $0.349$ & $1.2{\times}10^{4}$ & $1.6{\times}10^{-6}$ & $+0.131$ & $+0.067$ \\
\bottomrule
\end{tabular}
\end{table}

\textbf{The killer row is $\lambda{=}10^{-4}$.} At this penalty
strength, the residual stream is already contained ($\|h_L\| = 2.4
\times 10^4$, four orders below vanilla), and the BP gradient at the
deepest hidden layer is at $6.3 \times 10^{-7}$ (well above the
$10^{-7}$ floor and in the meaningful measurement regime). Diagnostics
(a) and (b) both pass: \textbf{Mode~1 is fully alleviated}. But the
deep-layer cosine ($-0.022$) and perturbation correlation ($-0.004$)
are essentially zero, on both metrics independently. \textbf{Mode~2 is
not alleviated at all.}

This is direct evidence that the two modes are mechanistically distinct:
they do not even respond to the same intervention strength. There exists
a regime ($\lambda{=}10^{-4}$, 30~epochs of training) in which
Mode~1 is fully alleviated and Mode~2 is unchanged from vanilla, with
both metrics agreeing.

The threshold for Mode~2 alleviation is somewhere between
$\lambda{=}10^{-4}$ and $\lambda{=}10^{-2}$. At $\lambda{=}10^{-2}$ the
penalty is strong enough to alter the optimization trajectory of the
block parameters (constraining $\|f_l\|$ tightly enough that the
direction of $f_l$ has to be coordinated more carefully with the local
credit signal), and the deep-layer alignment rises to $\sim\!+0.16$.
At $\lambda{=}10^{-1}$ the penalty starts to over-constrain and the
alignment is slightly lower ($\sim\!+0.13$), giving an inverted-U
relationship between $\lambda$ and deep alignment.

\subsection{Capacity-cost control}
\label{sec:capacity-cost}

A second reviewer concern is whether the $0.36 \to 0.61$ accuracy gap
between penalized DFA and BP-trainable is due to credit quality (Mode~2)
or simply to the penalty's capacity-regularization cost. We disambiguate
with a $2\times2$ matched control.

\begin{table}[h]
\centering
\caption{$2\times2$ capacity-cost control. The penalty is the same in both
the BP and DFA conditions. BP+penalty still clears the random-blocks
baseline by $18.1$pp; DFA+penalty clears it by only $1.4$pp.}
\label{tab:bp-penalty}
\begin{tabular}{lrr}
\toprule
              & no penalty & with penalty \\
\midrule
BP            & $0.609$    & $0.530$ \\
DFA           & $0.308$    & $0.363$ \\
\midrule
$\Delta$      & $-8.0$pp   & $+5.5$pp \\
\bottomrule
\end{tabular}
\end{table}

Two observations make this control informative. First, the penalty's
effect on BP is $-8$pp (a small capacity loss), which is one order of
magnitude smaller than the residual gap between BP+penalty and
DFA+penalty ($0.530 - 0.363 = 17$pp). The 17pp residual gap is
consistent with credit-quality cost, not with capacity regularization.
Second, the penalty has \emph{opposite} effects on the two methods: it
hurts BP by 8pp while helping DFA by 5.5pp, the opposite pattern expected
from a generally beneficial regime shift.

\textbf{The clean phrasing.} The 2$\times$2 control identifies a residual
performance gap under matched architecture, data, optimizer family, and
matched penalty, after accounting for the penalty's direct capacity cost
on BP. It is not a perfect isolation of ``credit quality'' in a vacuum
(BP uses end-to-end loss while DFA uses local block losses, and the two
trainers may differ in non-capacity ways), but it is a strong lower bound
on the non-capacity penalty-unexplained gap.

\subsection{Summary: five validations of the two-mode separation}

Together, the disambiguation experiment, the $\lambda$ sweep, the
cross-metric triangulation, the capacity-cost control, and the
threshold robustness analysis provide five independent lines of
evidence that the failure of standard FA evaluation is not a single
phenomenon. Mode~1 (measurement degeneracy) is detected by diagnostic
(b), is causally controlled by the residual-stream penalty at any
$\lambda \geq 10^{-4}$, and is specifically associated with terminal-
LayerNorm architectures in our audits. Mode~2 (low intrinsic credit
quality) persists after Mode~1 is alleviated at weak penalty
strengths ($\lambda{=}10^{-4}$), is detected by direct per-layer
cosine in the meaningful regime, and rises only when the penalty is
strong enough to alter the optimization trajectory of the deep
blocks ($\lambda \geq 10^{-2}$). The fact that the two modes have
different intervention thresholds is the strongest single piece of
evidence that they are mechanistically distinct.

\section{Limitations}
\label{sec:limitations}

Our audit covers a specific slice of the FA literature: pre-LayerNorm
ResidualMLP, ViT-Mini, and SmallCNN architectures on CIFAR-10, evaluated
under standard hyperparameters. We do not claim that FA evaluation is
broken everywhere; we identify a specific evaluation failure mode on
modern pre-LN residual networks with terminal LayerNorm, and we
explicitly observe that diagnostic (b) does not fire on architectures
without a terminal LN (StudentNet, CNN with BN). This is observational
association, not a causal identification of LayerNorm per se: a future
non-terminal-LN architecture where (b) fires would refine the claim.
Section~\ref{sec:related} cites the classical FA literature where
non-terminal-LN architectures dominate; our central claim concerns the
modern with-terminal-LN residual case.

The Mode~2 measurement in Section~\ref{sec:two-modes} relies on direct
cosine and perturbation correlation in the meaningful regime, which is
only accessible after a Mode~1 intervention. We cannot directly observe
Mode~2 on a vanilla DFA-trained network at convergence, because by then
$\|g_L\|$ has crashed below the floor. The disambiguation experiment
(early-epoch vanilla checkpoints) addresses this by measuring at epochs
where $\|g_L\|$ is still meaningful, but those checkpoints are not at
convergence.

The matched-penalty $2{\times}2$ control disambiguates capacity loss from
credit quality but does not account for non-capacity differences between
end-to-end BP and local DFA training. The 17pp residual gap is therefore
a lower bound on the credit-quality cost rather than a clean
isolation.

\section{Broader impacts}
\label{sec:impacts}

This paper does not introduce a new training method, dataset, or
generative model. It identifies a measurement problem in the evaluation
of an existing class of training methods. Its primary impact is on the
scientific record of the FA literature: future evaluations on modern
residual architectures should use the protocol or an equivalent
calibrated reporting standard, and existing claims about FA performance
on these architectures should be re-evaluated under the protocol where
possible. We are not aware of any negative downstream applications of
this work.

\section{Conclusion}
\label{sec:conclusion}

We have shown that standard Feedback Alignment evaluation on modern
residual networks is unreliable because it conflates two distinct
failure modes: measurement degeneracy via terminal-LayerNorm gradient
cancellation, and low intrinsic credit-direction quality of random
feedback. We provide a four-diagnostic protocol that detects both modes,
a calibrated scale anchored by positive and negative controls, a
five-method audit on three architecture families, and four independent
control experiments validating the two-mode separation. The protocol,
audit data, and reporting template are released as a community artifact
for the FA evaluation community.

\bibliographystyle{abbrvnat}
\begin{thebibliography}{99}
\bibitem{lillicrap2016random}
T.~P. Lillicrap, D.~Cownden, D.~B. Tweed, and C.~J. Akerman.
\newblock Random synaptic feedback weights support error backpropagation for deep learning.
\newblock {\em Nature Communications}, 7:13276, 2016.

\bibitem{nokland2016direct}
A.~N\o{}kland.
\newblock Direct feedback alignment provides learning in deep neural networks.
\newblock In {\em NeurIPS}, 2016.

\bibitem{akrout2019deep}
M.~Akrout, C.~Wilson, P.~Humphreys, T.~Lillicrap, and D.~B. Tweed.
\newblock Deep learning without weight transport.
\newblock In {\em NeurIPS}, 2019.

\bibitem{launay2020direct}
J.~Launay, I.~Poli, F.~Boniface, and F.~Krzakala.
\newblock Direct feedback alignment scales to modern deep learning tasks and architectures.
\newblock In {\em NeurIPS}, 2020.

\bibitem{moskovitz2018feedback}
T.~H. Moskovitz, A.~Litwin-Kumar, and L.~F. Abbott.
\newblock Feedback alignment in deep convolutional networks.
\newblock {\em arXiv:1812.06488}, 2018.

\bibitem{refinetti2021align}
M.~Refinetti, S.~d'Ascoli, R.~Ohana, and S.~Goldt.
\newblock Align, then memorise: the dynamics of learning with feedback alignment.
\newblock In {\em ICML}, 2021.

\bibitem{crafton2019direct}
B.~Crafton, A.~Parihar, E.~Gebhardt, and A.~Raychowdhury.
\newblock Direct feedback alignment with sparse connections for local learning.
\newblock {\em Frontiers in Neuroscience}, 13:525, 2019.

\bibitem{bartunov2018assessing}
S.~Bartunov, A.~Santoro, B.~Richards, L.~Marris, G.~Hinton, and T.~Lillicrap.
\newblock Assessing the scalability of biologically-motivated deep learning algorithms and architectures.
\newblock In {\em NeurIPS}, 2018.

\bibitem{xiong2020layernorm}
R.~Xiong, Y.~Yang, D.~He, K.~Zheng, S.~Zheng, C.~Xing, H.~Zhang, Y.~Lan, L.~Wang, and T.~Liu.
\newblock On layer normalization in the transformer architecture.
\newblock In {\em ICML}, 2020.

\bibitem{statebridge2024}
Anonymous.
\newblock State Bridge: terminal-conditioned predictor for credit assignment.
\newblock {\em Anonymous in-progress reference, 2024-2026}.

\bibitem{creditbridge2024}
Anonymous.
\newblock Credit Bridge: value-field local credit without hidden BP.
\newblock {\em Anonymous in-progress reference, 2024-2026}.

\end{thebibliography}

\appendix

\section{Reproducibility}
\label{app:reproducibility}

All experiments use PyTorch~$\geq$2.0 on a single NVIDIA A6000 GPU.
Source for the protocol package is in \texttt{protocol/}; experimental
scripts are in \texttt{experiments/}. Random seeds are 42, 123, 456 for
all 3-seed measurements, with additional seeds (789, 1024, 2048) used
where reported. CIFAR-10 is loaded via \texttt{torchvision} with the
standard normalization $(\mu, \sigma) = ((0.4914, 0.4822, 0.4465),
(0.2470, 0.2435, 0.2616))$.

\section{Pipeline pitfalls catalog}
\label{app:pitfalls}

Beyond the four diagnostics, we found seven evaluation-pipeline bugs in
our own dogfood codebase that silently corrupt FA evaluation results.
Each has a standalone reproducer in
\texttt{protocol/examples/verify\_pitfalls*.py}.

\begin{enumerate}
\item \texttt{tensor.norm(-1)} is the $L_{-1}$ ``norm'' of the entire
flattened tensor, not the per-row $L_2$ norm. The correct call is
\texttt{tensor.norm(dim=-1)}. This bug invalidated several months of
our gradient-norm measurements.

\item \texttt{F.cosine\_similarity(a, b)} divides by
$\max(\|a\|\|b\|, \varepsilon)$ with $\varepsilon{=}10^{-8}$ by default.
When $\|b\|\sim 10^{-10}$ (the regime of the BP gradient on degenerate
DFA-trained pre-LN networks), the divisor becomes $\|a\|\cdot 10^{-8}$
instead of $\|a\|\cdot 10^{-10}$, scaling the reported cosine by a
factor of $\sim\!100\times$ in the wrong direction.

\item fp16 mixed precision underflows BP gradients at $\sim\!5\times
10^{-10}$, below fp16's smallest subnormal of $\sim\!6\times 10^{-8}$.
bf16 works because it shares fp32's exponent range.

\item Random feedback $B_l$ matrices are training-specific. DFA reports
$\Gamma\approx 0.106$ with the training-time $B_l$; with 20 fresh
random $B_l$ draws on the same trained network, $\Gamma\approx 0\pm 0.005$.
The reported alignment is the network adapting to its specific $B_l$, not
intrinsic.

\item Aggregation strategy across (layers, samples, batches) is rarely
specified but determines the headline number. Same DFA seed-42 gives
$\Gamma \in [-0.028, +0.074]$ across four valid aggregation strategies
(a 3.45$\times$ ratio, with sign flip).

\item Per-layer $\Gamma$ structure is hidden by aggregation. On the
4-block ResMLP, DFA's headline $\Gamma\approx 0.10$ is driven almost
entirely by the embedding layer ($\Gamma_{l_0} \approx +0.43$);
deeper layers have $\Gamma \approx 0$. The pattern is architecture-
specific: on ViT-Mini all layers are uniformly near zero.

\item Auxiliary networks (random feedback $B_l$, bridge predictors) not
saved alongside model checkpoints can cause post-hoc $\Gamma$ scripts to
silently fall back to $\cos(\text{BP\_grad}, \text{BP\_grad}) = 1.0$ and
report ``perfect alignment.'' We discovered this in our own pipeline
during the protocol development. Check that auxiliary networks are
persisted before reporting any $\Gamma$ value.
\end{enumerate}

\section{Methodology: walk-back chain}
\label{app:walkback}

The framing of this paper underwent several corrections during the
development of the protocol. We document the four-step progression
explicitly as part of the methodology, not as narrative drama:

\begin{enumerate}
\item Initial metric ($\Gamma\approx 0.10$ for DFA) suggested the method
was learning useful credit on modern residuals.
\item Diagnostic showed the metric was measured against a numerical-floor
reference vector ($\|g_L\|\sim 10^{-10}$); the headline number was not
interpretable.
\item Revised control (the residual-stream penalty) restored the
reference but only partially closed the accuracy gap to BP, identifying
a residual phenomenon.
\item Final interpretation (this paper) separates measurement failure
(Mode~1) from genuine credit-quality cost (Mode~2), validated by the
four control experiments of Section~\ref{sec:two-modes}.
\end{enumerate}

\section{Six independent validations of the two-mode separation}
\label{app:six-validations}

For completeness we list all six independent validation experiments,
beyond the four reported in the main text:

\begin{enumerate}
\item Direct deep-layer cosine on penalized DFA (3 seeds): deep mean
$+0.155 \pm 0.025$.
\item Null calibration with 20 fresh random $B_l$: deep cosine
$+0.002 \pm 0.022$ (within noise).
\item Hypothesis-disambiguation sweep: vanilla DFA early-epoch deep
cosine $-0.008 \pm 0.013$ across 3 seeds at epoch 1.
\item BP+penalty matched-control: 8pp BP capacity cost vs 17pp residual
gap at $\lambda{=}10^{-2}$.
\item Multi-seed lock-in: 24 measurements (3 seeds $\times$ 2 epochs
$\times$ 4 deep layers) all in $[-0.04, +0.02]$ on vanilla.
\item Cross-metric triangulation via perturbation correlation: vanilla
$+0.002$, penalized $+0.080$ (3 seeds), positive control (BP grad)
$+0.997$, negative control (random vector) $+0.006$.
\end{enumerate}

\end{document}