paper v2.29: add Scellier & Bengio 2017 EP citation

§1 ¶1 referenced "equilibrium propagation" without a bibitem despite EP being the trustworthy non-BP control throughout the paper. Added the canonical Scellier & Bengio 2017 Frontiers in Computational Neuroscience reference and cited it where EP is first named in the FA-first intro. Main content stays at 9 pages (§7 closes mid-p9, refs start p10); 0 overfull boxes; 18 pages total. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
author: YurenHao0426 <Blackhao0426@gmail.com> 2026-04-08 16:54:40 -0500
committer: YurenHao0426 <Blackhao0426@gmail.com> 2026-04-08 16:54:40 -0500
commit: 4b731f824e4a4ee8606aa472a9e5adc4021991b8 (patch)
tree: 38587bb607299560d2d95a50cbb1eacf4253ea6f
parent: 8fda0e0042d04343e0f4e6cdc04ba5e927f69064 (diff)
1 files changed, 7 insertions, 1 deletions
diff --git a/paper/main.tex b/paper/main.tex
index edf3717..bf9b73c 100644
--- a/paper/main.tex
+++ b/paper/main.tex
@@ -33,7 +33,7 @@ Modern feedback-alignment evaluation on deep residual networks is still summariz
 \section{Introduction}
 \label{sec:intro}
 
-Backpropagation (BP) is the de facto training method for deep neural networks, but its requirement that each feedback connection carry a weight identical to the corresponding forward connection -- the weight-transport problem -- has long been considered biologically implausible \citep{lillicrap2016random,bartunov2018assessing}. \emph{Feedback alignment} (FA) \citep{lillicrap2016random} side-steps weight transport by delivering per-layer credit through fixed random feedback matrices, and its direct variant (DFA) \citep{nokland2016direct} projects the output error to every hidden layer through an independent random matrix; parallel lines include target propagation \citep{lee2015difference} and equilibrium propagation. These rules are studied both as biologically-plausible alternatives to BP and as scalable, asynchronous training schemes, with recent work scaling DFA to transformer-scale architectures on language, recommendation, and view-synthesis tasks \citep{launay2020direct,akrout2019deep}. Evaluation in this line of work has converged on a two-number summary: final task accuracy, and an aggregate cosine alignment $\Gamma$ between the method's per-layer credit and the BP gradient on the trained network \citep{lillicrap2016random,nokland2016direct,akrout2019deep,launay2020direct,bartunov2018assessing}.
+Backpropagation (BP) is the de facto training method for deep neural networks, but its requirement that each feedback connection carry a weight identical to the corresponding forward connection -- the weight-transport problem -- has long been considered biologically implausible \citep{lillicrap2016random,bartunov2018assessing}. \emph{Feedback alignment} (FA) \citep{lillicrap2016random} side-steps weight transport by delivering per-layer credit through fixed random feedback matrices, and its direct variant (DFA) \citep{nokland2016direct} projects the output error to every hidden layer through an independent random matrix; parallel lines include target propagation \citep{lee2015difference} and equilibrium propagation \citep{scellier2017equilibrium}. These rules are studied both as biologically-plausible alternatives to BP and as scalable, asynchronous training schemes, with recent work scaling DFA to transformer-scale architectures on language, recommendation, and view-synthesis tasks \citep{launay2020direct,akrout2019deep}. Evaluation in this line of work has converged on a two-number summary: final task accuracy, and an aggregate cosine alignment $\Gamma$ between the method's per-layer credit and the BP gradient on the trained network \citep{lillicrap2016random,nokland2016direct,akrout2019deep,launay2020direct,bartunov2018assessing}.
 
 On the audited 4-block $d{=}256$ ResMLP, however, Table~\ref{tab:main_audit} already shows that this accuracy-plus-$\Gamma$ pair is not a validity check: DFA reaches only $0.306 \pm 0.006$ test accuracy, below the architecture-matched frozen-blocks baseline of $0.349 \pm 0.002$, while still looking superficially comparable to other non-BP methods. Figure~\ref{fig:audit_hero} further shows that the apparent cosine evidence is concentrated at the shallowest block, with DFA at seed 42 reaching about $+0.42$ at layer 0 but approximately $-0.03$ to $0$ on layers 1--4, so the aggregate obscures where credit direction is and is not present. At the same time, the deepest BP reference norm is only about $5 \times 10^{-10}$ for DFA, State Bridge, and Credit Bridge, below the $10^{-8}$ clamp used by \texttt{F.cosine\_similarity}, whereas BP remains around $4 \times 10^{-4}$, so the reported deep cosine is partly computed against a numerical-floor reference rather than an informative gradient direction (Figure~\ref{fig:audit_hero}; Table~\ref{tab:main_audit}). Those numbers can be useful, but only if the measurement regime itself is valid.
 
@@ -234,6 +234,12 @@ Sergey Bartunov, Adam Santoro, Blake~A. Richards, Luke Marris, Geoffrey~E.
   algorithms and architectures.
 \newblock In {\em Advances in Neural Information Processing Systems}, 2018.
 
+\bibitem[Scellier and Bengio(2017)]{scellier2017equilibrium}
+Benjamin Scellier and Yoshua Bengio.
+\newblock Equilibrium propagation: bridging the gap between energy-based models
+  and backpropagation.
+\newblock {\em Frontiers in Computational Neuroscience}, 11:24, 2017.
+
 \bibitem[Moskovitz et~al.(2018)Moskovitz, Litwin-Kumar, and
   Abbott]{moskovitz2018feedback}
 Theodore~H. Moskovitz, Ashok Litwin-Kumar, and L.~F. Abbott.
author	YurenHao0426 <Blackhao0426@gmail.com>	2026-04-08 16:54:40 -0500
committer	YurenHao0426 <Blackhao0426@gmail.com>	2026-04-08 16:54:40 -0500
commit	4b731f824e4a4ee8606aa472a9e5adc4021991b8 (patch)
tree	38587bb607299560d2d95a50cbb1eacf4253ea6f
parent	8fda0e0042d04343e0f4e6cdc04ba5e927f69064 (diff)