diff options
| author | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-08 16:54:40 -0500 |
|---|---|---|
| committer | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-08 16:54:40 -0500 |
| commit | 4b731f824e4a4ee8606aa472a9e5adc4021991b8 (patch) | |
| tree | 38587bb607299560d2d95a50cbb1eacf4253ea6f | |
| parent | 8fda0e0042d04343e0f4e6cdc04ba5e927f69064 (diff) | |
paper v2.29: add Scellier & Bengio 2017 EP citation
§1 ¶1 referenced "equilibrium propagation" without a bibitem despite EP
being the trustworthy non-BP control throughout the paper. Added the
canonical Scellier & Bengio 2017 Frontiers in Computational Neuroscience
reference and cited it where EP is first named in the FA-first intro.
Main content stays at 9 pages (§7 closes mid-p9, refs start p10);
0 overfull boxes; 18 pages total.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| -rw-r--r-- | paper/main.tex | 8 |
1 files changed, 7 insertions, 1 deletions
diff --git a/paper/main.tex b/paper/main.tex index edf3717..bf9b73c 100644 --- a/paper/main.tex +++ b/paper/main.tex @@ -33,7 +33,7 @@ Modern feedback-alignment evaluation on deep residual networks is still summariz \section{Introduction} \label{sec:intro} -Backpropagation (BP) is the de facto training method for deep neural networks, but its requirement that each feedback connection carry a weight identical to the corresponding forward connection -- the weight-transport problem -- has long been considered biologically implausible \citep{lillicrap2016random,bartunov2018assessing}. \emph{Feedback alignment} (FA) \citep{lillicrap2016random} side-steps weight transport by delivering per-layer credit through fixed random feedback matrices, and its direct variant (DFA) \citep{nokland2016direct} projects the output error to every hidden layer through an independent random matrix; parallel lines include target propagation \citep{lee2015difference} and equilibrium propagation. These rules are studied both as biologically-plausible alternatives to BP and as scalable, asynchronous training schemes, with recent work scaling DFA to transformer-scale architectures on language, recommendation, and view-synthesis tasks \citep{launay2020direct,akrout2019deep}. Evaluation in this line of work has converged on a two-number summary: final task accuracy, and an aggregate cosine alignment $\Gamma$ between the method's per-layer credit and the BP gradient on the trained network \citep{lillicrap2016random,nokland2016direct,akrout2019deep,launay2020direct,bartunov2018assessing}. +Backpropagation (BP) is the de facto training method for deep neural networks, but its requirement that each feedback connection carry a weight identical to the corresponding forward connection -- the weight-transport problem -- has long been considered biologically implausible \citep{lillicrap2016random,bartunov2018assessing}. \emph{Feedback alignment} (FA) \citep{lillicrap2016random} side-steps weight transport by delivering per-layer credit through fixed random feedback matrices, and its direct variant (DFA) \citep{nokland2016direct} projects the output error to every hidden layer through an independent random matrix; parallel lines include target propagation \citep{lee2015difference} and equilibrium propagation \citep{scellier2017equilibrium}. These rules are studied both as biologically-plausible alternatives to BP and as scalable, asynchronous training schemes, with recent work scaling DFA to transformer-scale architectures on language, recommendation, and view-synthesis tasks \citep{launay2020direct,akrout2019deep}. Evaluation in this line of work has converged on a two-number summary: final task accuracy, and an aggregate cosine alignment $\Gamma$ between the method's per-layer credit and the BP gradient on the trained network \citep{lillicrap2016random,nokland2016direct,akrout2019deep,launay2020direct,bartunov2018assessing}. On the audited 4-block $d{=}256$ ResMLP, however, Table~\ref{tab:main_audit} already shows that this accuracy-plus-$\Gamma$ pair is not a validity check: DFA reaches only $0.306 \pm 0.006$ test accuracy, below the architecture-matched frozen-blocks baseline of $0.349 \pm 0.002$, while still looking superficially comparable to other non-BP methods. Figure~\ref{fig:audit_hero} further shows that the apparent cosine evidence is concentrated at the shallowest block, with DFA at seed 42 reaching about $+0.42$ at layer 0 but approximately $-0.03$ to $0$ on layers 1--4, so the aggregate obscures where credit direction is and is not present. At the same time, the deepest BP reference norm is only about $5 \times 10^{-10}$ for DFA, State Bridge, and Credit Bridge, below the $10^{-8}$ clamp used by \texttt{F.cosine\_similarity}, whereas BP remains around $4 \times 10^{-4}$, so the reported deep cosine is partly computed against a numerical-floor reference rather than an informative gradient direction (Figure~\ref{fig:audit_hero}; Table~\ref{tab:main_audit}). Those numbers can be useful, but only if the measurement regime itself is valid. @@ -234,6 +234,12 @@ Sergey Bartunov, Adam Santoro, Blake~A. Richards, Luke Marris, Geoffrey~E. algorithms and architectures. \newblock In {\em Advances in Neural Information Processing Systems}, 2018. +\bibitem[Scellier and Bengio(2017)]{scellier2017equilibrium} +Benjamin Scellier and Yoshua Bengio. +\newblock Equilibrium propagation: bridging the gap between energy-based models + and backpropagation. +\newblock {\em Frontiers in Computational Neuroscience}, 11:24, 2017. + \bibitem[Moskovitz et~al.(2018)Moskovitz, Litwin-Kumar, and Abbott]{moskovitz2018feedback} Theodore~H. Moskovitz, Ashok Litwin-Kumar, and L.~F. Abbott. |
