diff options
| author | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-08 11:53:26 -0500 |
|---|---|---|
| committer | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-08 11:53:26 -0500 |
| commit | 35be969067396306c19a3caac2d887bcde48c5d0 (patch) | |
| tree | c9d17ffb4209ae754160a74dd92a217b221283e2 | |
| parent | ee5ad5a24917784d30ad71679f074e47362458a5 (diff) | |
Polish: spell out 'Equilibrium Propagation (EP)' on first discussion in §2
The paper uses 'EP' throughout but never spelled it out in §1 or §2. Added
first-mention spell-out with a brief 1-line description ('a contrastive
energy-based alternative to BP that updates weights from the difference
between a free-phase and a nudged-phase hidden trajectory') so the reader
has context before EP is used as a key internal comparison.
Main content still 9 pages.
| -rw-r--r-- | paper/main.pdf | bin | 482200 -> 482361 bytes | |||
| -rw-r--r-- | paper/main.tex | 2 |
2 files changed, 1 insertions, 1 deletions
diff --git a/paper/main.pdf b/paper/main.pdf Binary files differindex 6866d79..0414d31 100644 --- a/paper/main.pdf +++ b/paper/main.pdf diff --git a/paper/main.tex b/paper/main.tex index 6de5fc7..faa22df 100644 --- a/paper/main.tex +++ b/paper/main.tex @@ -61,7 +61,7 @@ Credit Bridge & $0.289 \pm 0.026$ & $0.07$ & trustworthy & walked back \\ By the field's usual criteria, the non-BP methods appear to train to nontrivial accuracy and report nonzero alignment. In Table~\ref{tab:main_audit}, DFA reaches $0.306 \pm 0.006$ test accuracy with headline $\Gamma{=}0.10$, State Bridge reaches $0.205 \pm 0.032$ with $\Gamma{=}0.005$, and Credit Bridge reaches $0.289 \pm 0.026$ with $\Gamma{=}0.07$; none of these rows looks like an obvious invalidation if one is reading the usual pair of final accuracy and aggregate alignment in the style of prior FA reporting \citep{lillicrap2016random,nokland2016direct,akrout2019deep,launay2020direct}. Even the absolute scale does not itself force a walk-back, because all three methods are plainly above chance and all three report positive headline alignment rather than a visibly broken or undefined quantity. That reading is exactly what the rest of the paper overturns. -Low accuracy by itself is not the pathology. EP is the key internal comparison in Table~\ref{tab:main_audit} and Figure~\ref{fig:audit_hero}: it achieves only $0.316 \pm 0.030$ accuracy and a very small headline $\Gamma{=}0.008$, yet its per-block growth is only $11.6\times$, its deepest BP reference norm remains around $1.3\times 10^{-4}$ rather than collapsing to the numerical floor, and its cross-batch direction-stability score is $0.02$ rather than the much higher drift-dominated values seen for DFA-family methods. At the same time, EP is not a positive result for depth usage in the stronger sense, because its trainable-model accuracy is still $3.3$ percentage points below the frozen-blocks baseline of $0.349 \pm 0.002$. The distinction matters because it separates underperformance from invalid evaluation. +Low accuracy by itself is not the pathology. Equilibrium Propagation (EP), a contrastive energy-based alternative to BP that updates weights from the difference between a free-phase and a nudged-phase hidden trajectory, is the key internal comparison in Table~\ref{tab:main_audit} and Figure~\ref{fig:audit_hero}: it achieves only $0.316 \pm 0.030$ accuracy and a very small headline $\Gamma{=}0.008$, yet its per-block growth is only $11.6\times$, its deepest BP reference norm remains around $1.3\times 10^{-4}$ rather than collapsing to the numerical floor, and its cross-batch direction-stability score is $0.02$ rather than the much higher drift-dominated values seen for DFA-family methods. At the same time, EP is not a positive result for depth usage in the stronger sense, because its trainable-model accuracy is still $3.3$ percentage points below the frozen-blocks baseline of $0.349 \pm 0.002$. The distinction matters because it separates underperformance from invalid evaluation. When we compare each method to a frozen-blocks baseline matched to the same architecture, the headline interpretation changes immediately. The frozen-blocks model, which trains only the embedding, LayerNorm, and head while holding the residual blocks fixed, reaches $0.349 \pm 0.002$ across the same three seeds; against that baseline, BP is higher by $26.6$ points, but DFA is lower by $4.3$ points, State Bridge by $14.4$ points, Credit Bridge by $6.0$ points, and even EP by $3.3$ points. Figure~\ref{fig:audit_hero} shows that this accuracy comparison lines up with the diagnostic split: DFA, State Bridge, and Credit Bridge also combine extreme per-block growth ($237\times$, $12000\times$, and $96\times$), deepest-layer BP norms around $10^{-9}$, and high cross-batch instability ($0.16$, $0.53$, and $0.37$), so their deep blocks are at best passengers and in practice often harmful. This establishes the audit question the rest of the paper must answer: why do the standard signals fail so badly? |
