Initial commit: ept — backprop-free equilibrium transformer (EP)

Code (ep_run/), organized docs (docs/{method,campaign,hardware,outreach,paper}), analysis scripts (scripts/), ONBOARDING.md entry point. Large data/checkpoints git-ignored (share separately). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_014FAPDWQ49M5Ye3NpTndTpn
author: Yuren Hao <yurenh2@illinois.edu> 2026-07-03 05:56:50 -0500
committer: Yuren Hao <yurenh2@illinois.edu> 2026-07-03 05:56:50 -0500
commit: b83947778e2c776f757a07d4719b7ce961d7ed55 (patch)
tree: b9cc01d7adda691d9156d9d04f4fb2f644674e96 /refs/scurria_2602.03670v2.txt
1 files changed, 3311 insertions, 0 deletions
diff --git a/refs/scurria_2602.03670v2.txt b/refs/scurria_2602.03670v2.txt
new file mode 100644
index 0000000..d9eba6a
--- /dev/null
+++ b/refs/scurria_2602.03670v2.txt
@@ -0,0 +1,3311 @@
+Equilibrium Propagation for Non-Conservative Systems
+
+Antonino Emanuele Scurria 1 Dimitri Vanden Abeele 1 Bortolo Matteo Mognetti 2 Serge Massar 1
+
+arXiv:2602.03670v2 [cs.LG] 1 Jun 2026
+
+Abstract
+
+from inference, the transmission of nonlocal error signals,
+and synchronous layer-wise computations with explicit gradient storage. These constraints have no clear analog in
+physical systems, making backpropagation challenging to
+implement in neuromorphic or analog hardware. Consequently, understanding how credit assignment can instead
+emerge from intrinsic system dynamics, through local interactions and continuous relaxation, is a central question in
+neuroscience and machine learning.
+
+Equilibrium Propagation (EP) is a physicsinspired learning algorithm that uses stationary
+states of a dynamical system both for inference
+and learning. In its original formulation it is
+limited to conservative systems, i.e. to dynamics which derive from an energy function. Given
+their applications, it is important to extend EP
+to non-conservative systems, i.e. systems with
+non-reciprocal interactions. Previous attempts to
+generalize EP to such systems failed to compute
+the exact gradient of the cost function. Here we
+propose a framework that extends EP to arbitrary
+non-conservative systems, including feedforward
+networks. We keep the key property of equilibrium propagation, namely the use of stationary
+states both for inference and learning. However,
+we modify the dynamics in the learning phase by
+a term proportional to the non-reciprocal part of
+the interaction so as to obtain the exact gradient
+of the cost function. This algorithm can also be
+derived using a variational formulation that generates the learning dynamics through an energy
+function defined over an augmented state space.
+Numerical experiments show that this algorithm
+achieves better performance and learns faster than
+previous proposals.
+
+Equilibrium Propagation (EP) (Scellier &Bengio, 2017)
+represents one of the most promising advances in this direction. It formulates supervised learning as a contrast between
+two stationary states of a dynamical system: a ‘free’ phase
+where the system evolves autonomously, and a ‘nudged’
+phase where outputs are weakly pushed toward their targets.
+The local change in neural states between these phases recovers the exact gradient of the cost function with respect to
+parameters. This enables spatially local learning exploiting
+the continuous relaxation of the system without a distinct
+backward circuit or explicit weight transport.
+Since its introduction, several works have sought to improve the practicality and biological realism of EP. Algorithmic adaptations include enforcing temporal locality to
+avoid state storage (Ernoult et al., 2020; Falk et al., 2025),
+deriving agnostic updates for black-box energies (Scellier et al., 2022), and substituting nudging with clamping
+(Stern et al., 2021). Theoretically, the framework has been
+extended to stochastic systems (Scellier &Bengio, 2017;
+Massar &Mognetti, 2025) and Lagrangian dynamics for
+time-varying inputs (Massar, 2025; Pourcel et al., 2025;
+Berneman &Hexner, 2025). In parallel, simulations have
+explored suitable substrates, ranging from spiking (Martin et al., 2021; O’Connor et al., 2019) and resistive networks (Kendall et al., 2020) to coupled oscillators (Wang
+et al., 2024; Rageau &Grollier, 2025), as well as quantum
+systems (Wanjura &Marquardt, 2025; Massar &Mognetti,
+2025; Scellier, 2024). Experimental realizations have been
+demonstrated in memristor crossbars (Yi et al., 2023), selfadjusting electrical circuits (Dillavou et al., 2022; 2024),
+elastic networks (Altman et al., 2024), and classical Ising
+models trained on quantum annealers (Laydevant et al.,
+2024).
+
+1. Introduction
+Standard neural network optimization relies on error backpropagation, an algorithm whose computational mechanism
+is difficult to reconcile with biological (Crick, 1989) and
+physical implementations (Indiveri &Liu, 2015). Specifically, backpropagation requires a backward pass distinct
+1
+Laboratoire d’Information Quantique (LIQ) CP224, Université
+libre de Bruxelles (ULB), Av. F. D. Roosevelt 50, 1050 Bruxelles,
+Belgium 2 Interdisciplinary Center for Nonlinear Phenomena and
+Complex Systems CP231, Université libre de Bruxelles (ULB), Av.
+F. D. Roosevelt 50, 1050 Bruxelles, Belgium. Correspondence to:
+Antonino Emanuele Scurria <antonino.scurria@ulb.be>.
+
+Proceedings of the 43 rd International Conference on Machine
+Learning, Seoul, South Korea. PMLR 306, 2026. Copyright 2026
+by the author(s).
+
+Despite these recent developments and the theoretical el-
+
+1
+
+Equilibrium Propagation for Non-Conservative Systems
+
+egance of EP, its standard formulation remains restricted
+to conservative systems. In these systems, dynamics are
+derived from an energy function, which inherently enforces
+symmetry (e.g., symmetric synaptic connections Jij = Jji )
+through the action-reaction principle. This constraint precludes the use of EP in a broad class of models characterized
+by non-conservative forces. This includes the feedforward
+architectures dominant in modern AI, biological circuits,
+as well as physical systems that reach stationary states far
+from thermodynamic equilibrium, such as nonlinear optical
+systems driven by external lasers (Cin et al., 2025), optoelectronic systems (Kalinin et al., 2025), exciton-polariton
+condensates (Sajnok &Matuszewski, 2025), active metamaterials (Brandenbourger et al., 2019) and active colloids
+(Bishop et al., 2023; Osat &Golestanian, 2023) (see (Bowick
+et al., 2022) for a review).
+
+a framework where the original dynamics serve for inference, while a new augmented dynamic is used to compute
+gradients of the cost Eq. (2). In this augmented phase, the
+output neurons are nudged towards their targets (as in standard EP), while a local corrective term – proportional to the
+antisymmetric part of the Jacobian at the free equilibrium
+∂
+JF (x0 , θ, u) = ∂x
+F (x0 , θ, u) – is added to the forces. The
+exact gradients of the cost with respect to parameters are
+then obtained by contrasting stationary states of the augmented system.
+Second, we introduce Dyadic EP, a ‘variational’ approach
+to learning in non-conservative systems. This method involves doubling the number of variables in the system’s
+state space and subsequently introducing a new energy function in this extended space. This approach takes advantage
+of the extended space to execute the positive and negative
+nudging phases in parallel, recovering the same computational cost as AsymEP. Derived from first principles, this
+approach is inspired by established methods for mapping
+dissipative dynamical systems onto conservative ones by
+doubling the degrees of freedom (Bateman, 1931; Galley,
+2013; Aykroyd et al., 2025). A more comprehensive study
+of the theoretical framework and its application to feedforward networks can be found in (Scurria, 2026). Our method
+is related to the Dual Propagation algorithm (Høier et al.,
+2023; Høier &Zach, 2023; 2024) and constitutes an independent, first-principles generalization of Dyadic Learning
+(Nest &Høier; Høier et al., 2024)—previously limited to
+Hopfield networks—to arbitrary force fields.
+
+Formally, we consider a dynamical system governed by
+a non-reciprocal force field F (x, θ, u), which relaxes to a
+stationary configuration x0 satisfying:
+F (x0 , θ, u) = 0,
+
+(1)
+
+where x represents the state variables, θ the learnable parameters and u the static input. Our goal, given a target y(u), is
+to compute the gradient of the cost function C(x0 , y) at this
+equilibrium,
+dC 0
+(x , y),
+(2)
+dθ
+and update θ to minimize the cost.
+Previous attempts to extend EP to non-conservative dynamics include the Vector Field (VF) algorithm (Scellier et al.,
+2018). However, as noted by the authors, this method provides an unbiased gradient of the cost Eq. (2) only in the
+conservative case. To mitigate this, (Laborieux &Zenke,
+2024) proposed adding a penalty to keep the Jacobian close
+to symmetry, essentially forcing the system to be as conservative as possible. Alternative methods related to VF,
+which similarly do not compute the exact gradient, were
+proposed in (Farinha et al., 2020; Costa &Santos, 2025) and
+for specific systems in simulation (Cin et al., 2025; Sajnok
+&Matuszewski, 2025).
+
+Third, we validate our framework on MNIST (LeCun, 1998),
+Fashion-MNIST, and CIFAR-10. In continuous Hopfield
+networks initialized with symmetric connection matrices,
+AsymEP achieves better accuracy and learns faster than
+EP and VF. Additionally, when we constrain the network
+to have a strong degree of structural asymmetry, in which
+case EP is inapplicable, AsymEP outperforms VF. Finally,
+when we restrict connections to a feedforward structure, our
+algorithm effectively trains all parameters; in contrast, VF
+is limited to training the last layer, acting essentially as an
+Extreme Learning Machine (Huang et al., 2006; Wang et al.,
+2022) with poor performance.
+
+Conversely, generalizations of backpropagation can handle
+non-reciprocal forces and compute the exact gradient of
+the cost Eq. (2) but inherit the same challenges in physical
+implementations. For instance, Backpropagation Through
+Time (Werbos, 1990) unfolds the network in time to apply standard backpropagation, Recurrent Backpropagation
+(Almeida, 1990; Pineda, 1987) avoids this memory requirement but still requires a specific circuit to propagate errors,
+and the continuous Adjoint Method (Chen et al., 2018) additionally requires integrating the dynamics backward in time
+which is not physically possible for a dissipative system.
+
+In summary, this theoretical work proposes two generalizations of EP beyond conservative systems to arbitrary differentiable dynamics that compute in their stationary states.
+
+2. Equilibrium Propagation Overview
+2.1. Conservative Systems
+We first review standard Equilibrium Propagation (EP)
+(Scellier &Bengio, 2017). We consider a network described
+by an energy function E(x, θ, u), such that the force field is
+
+In this paper, we first propose Asymmetric EP (AsymEP),
+2
+
+Equilibrium Propagation for Non-Conservative Systems
+
+derived from the potential E:
+FE (x, θ, u) = −
+
+∂
+E(x, θ, u).
+∂x
+
+stationary point, i.e., that Eq. (7) holds. Second, EP implic∂
+itly assumes that the Jacobian JE (x0 , u) = ∂x
+FE (x0 , u) is
+invertible. In this work, we assume this condition holds and
+will not state it explicitly hereafter. Third, for simplicity,
+we omit the dependency on the input u and target y in the
+following equations.
+
+(3)
+
+0
+The objective is to compute the total gradient dC
+dθ (x , y) of a
+(quadratic) cost function C(x, y) evaluated at the minimum
+energy configuration of the system. This free equilibrium
+denoted x0 (which depend implicitly in θ and u), satisfies
+the stationarity condition:
+
+∂
+− E(x0 , θ, u) = 0.
+∂x
+
+2.2. Vector Field
+The Vector Field (VF) algorithm, introduced in (Scellier
+et al., 2018), is an early attempt to adapt EP to nonreciprocal forces. This method relies on the observation
+that, for conservative systems, linearizing the right-hand
+side of Eq. (9) around the equilibrium point x0 yields
+
+(4)
+
+To compute gradients, we introduce the augmented energy
+functional:
+ET (x, θ, β, u, y) = E(x, θ, u) + βC(x, y),
+
+1
+β→0 2β
+
+(5)
+
+lim
+
+where β is a scalar nudging parameter. The stationary configuration of this augmented system is obtained by integrating
+the dynamics
+∂ET (x, θ, β, u)
+dx
+=−
+,
+dt
+∂x
+
+where FE = −∂x E(x, θ) is the conservative force. It is
+therefore tempting to use the right-hand side of Eq. (10) for
+parameter updates of non-conservative systems, for which
+no energy function E exists.
+
+(6)
+
+until the energy minimum is reached. This new fixed point
+xβ , called nudged equilibrium, satisfies:
+∂E(xβ , θ, u)
+∂C(xβ , y)
++β
+= 0.
+∂x
+∂x
+
+The VF algorithm adopts precisely this approach. It uses
+the nudged counterpart of Eq. (7),
+
+(7)
+
+F (xβ , θ) − β
+
+The training procedure, as improved in (Laborieux et al.,
+2021), uses two nudged phases with factors ±β (with
+β ̸= 0). Starting from x0 , the system relaxes to two
+nearby perturbed equilibria, x+β and x−β . The displacement x+β − x−β is then used to compute the parameter
+update in the learning rule:
+
+
+1 ∂E(xβ , θ, u) ∂E(x−β , θ, u)
+−
+, (8)
+∆θ = −ϵ
+2β
+∂θ
+∂θ
+
+∂C β
+(x ) = 0,
+∂x
+
+(11)
+
+in conjunction with the learning rule Eq. (10):
+
+∆θ = ϵ
+
+∂F 0
+(x , θ)
+∂θ
+
+⊤  β
+
+x − x−β
+.
+2β
+
+(12)
+
+However, as noted in (Scellier et al., 2018), Eq. (12) does
+0
+not align with the true gradient dC
+dθ (x ) and is exact only if
+the force is conservative. To see this, let JF (x, θ) denote
+the Jacobian of the vector field F (x, θ) (in components
+i (x,θ)
+(JF (x, θ))ij = ∂F∂x
+). Differentiating the equilibrium
+j
+0
+condition F (x , θ) = 0 with respect to θ gives
+
+where ϵ > 0 is the learning rate. The theoretical foundation
+of EP is the result that, in the limβ→0 of Eq. (8), we get:
+dC(x0 , y)
+d ∂E(xβ , θ, u)
+=
+,
+dθ
+dβ
+∂θ
+
+
+∂E(xβ , θ) ∂E(x−β , θ)
+−
+∂θ
+∂θ
+
+⊤  β
+ (10)
+x − x−β
+∂FE 0
+(x , θ)
+= lim −
+,
+β→0
+∂θ
+2β
+
+
+
+(9)
+
+JF (x0 , θ)
+
+see Appendix D.1. The error of the above method is O(β 2 ).
+This error can be further reduced using holomorphic equilibrium propagation (Laborieux &Zenke, 2022).
+
+dx0
+∂F 0
++
+(x , θ) = 0.
+dθ
+∂θ
+
+(13)
+
+Consequently, the exact gradient of the cost is
+
+Thus, EP recovers the exact gradient of the cost function
+using only local computations. In this manner, learning
+implements gradient descent without an explicit backward
+pass, and credit assignment is realized through the system’s
+intrinsic relaxation dynamics.
+
+⊤
+
+dx0 ∂C 0
+dC 0
+(x ) =
+(x )
+dθ
+dθ ∂x
+
+⊤ 
+
+−1 ∂C 0
+∂F 0
+⊤ 0
+=−
+(x , θ)
+JF (x , θ)
+(x ) .
+∂θ
+∂x
+|
+{z
+}|
+{z
+}
+
+Three remarks can be made at this point. First, EP does not
+require the system to be at an energy minimum, but only at a
+
+pre-synaptic
+
+post-synaptic
+
+(14)
+3
+
+Equilibrium Propagation for Non-Conservative Systems
+
+Algorithm 1 Asymmetric EP (AsymEP)
+
+The terms ’pre-synaptic’ and ’post-synaptic’ in Eq. (14)
+are used by analogy with neuronal transmission: the presynaptic factor captures the local influence of θ on the force
+F , while the post-synaptic factor is the sensitivity of the
+cost to state perturbations.
+
+1: Inputs: Force field F (x, θ), cost function C(x), nudg-
+
+ing parameter β, learning rate ϵ.
+2: repeat
+3:
+1. Free Phase: Evolve to stationary state
+4:
+Evolve the system dynamics
+5:
+
+If instead we differentiate the nudged equilibrium condition
+in Eq. (11) with respect to β and evaluate at β = 0, we
+obtain
+β
+
+JF (x0 , θ)
+
+dx
+dβ
+
+−
+β=0
+
+∂C 0
+(x ) = 0,
+∂x
+
+dx
+= F (x, θ),
+dt
+
+(15)
+
+6:
+7:
+8:
+9:
+
+which gives
+−1 ∂C 0
+dxβ
+= JF (x0 , θ)
+(x , y).
+dβ β=0
+∂x
+
+(17)
+
+until convergence to the stationary state x0 .
+2. Jacobian Decomposition
+Compute the Jacobian at equilibrium:
+∂F 0
+(x , θ),
+(18)
+∂x
+and decompose it in its antisymmetric part:
+JF (x0 , θ) =
+
+(16)
+10:
+11:
+
+The right-hand side of Eq. (16) represents the effective postsynaptic term used by the VF algorithm (Eq. 12). Comparing this with the exact post-synaptic term derived in Eq. (14),
+we see that they coincide only if JF = JF⊤ , i.e., only if the
+system is conservative.
+
+AJ (x0 , θ) = 12 (JF (x0 , θ) − JF (x0 , θ)⊤ ). (19)
+12:
+13:
+14:
+
+Now, let SJ (x0 , θ) and AJ (x0 , θ) denote the symmetric
+and antisymmetric parts of the Jacobian at the free (unnudged) equilibrium, respectively. Then, we show in Appendix A that the gradient error increases with the spectral
+−1
+radius of SJ (x0 , θ)
+AJ (x0 , θ). Consequently, large
+antisymmetric contributions degrade the gradient estimation, confirming empirical observations in the Appendix of
+(Ernoult et al., 2020). In fact, in the pathological limit where
+the Jacobian would be purely antisymmetric SJ (x0 , θ) = 0,
+the update of VF gives the negative of the true gradient,
+maximizing the cost rather than minimizing it.
+
+15:
+16:
+17:
+18:
+
+3. Nudged Phase: Augmented Dynamics
+Integrate the dynamics twice starting from x0
+dx
+∂C
+= F (x, θ) − β
+(x) − 2AJ (x0 , θ) (x − x0 ),
+dt
+∂x
+(20)
+until convergence to two new stationary states
+x±β
+A .
+4. Parameter Update
+Update the parameters according to:
+
+∆θ = ϵ
+
+3. Asymmetric EP
+
+⊤
+∂F 0
+(x , θ)
+∂θ
+
+xβA − x−β
+A
+2β
+
+!
+.
+
+(21)
+
+19: until convergence of θ
+20: Output: Optimized parameters θ.
+
+Here, we introduce Asymmetric EP (AsymEP), see Algorithm 1, which removes the gradient estimate error inherent
+to VF by adding a local correction term to the augmented
+inference dynamics. The new nudged equilibrium xβA satisfies:
+∂C β
+F (xβA , θ) − β
+(x ) − 2AJ (x0 , θ) (xβA − x0 ) = 0, (22)
+∂x A
+
+where JFA (x, θ) is the Jacobian of the modified dynamical
+system Eq. (20). At the equilibrium point x0 , JFA is equal
+to the transpose of the original Jacobian:
+JFA (x0 , θ)
+
+As in VF, we then obtain two perturbed states x±β
+A for opposite nudging ±β and apply the contrastive learning rule
+of Eq. (12).
+
+=
+
+JF (x0 , θ) − 2AJ (x0 , θ)
+
+=
+
+SJ (x0 , θ) − AJ (x0 , θ)
+
+=
+
+JF⊤ (x0 , θ).
+
+(24)
+
+We now show that AsymEP gives rise to the correct learning
+rule, i.e. that right-hand side of Eq. (21) is proportional to
+0
+the gradient of the cost function dC
+dθ (x ) at the equilibrium
+0
+point x (Eq. 14). To this end, note that the same reasoning
+leading to Eq. (16) leads to
+
+where we have used the decomposition Eq. (44) of the original Jacobian J into its symmetric and antisymmetric components. Therefore, the left hand side of Eq. (23) is equal to
+the true post-synaptic term
+
+−1 ∂C 0
+dxβA
+= JFA (x0 , θ)
+(x ).
+dβ β=0
+∂x
+
+−1 ∂C 0
+dxβA
+= JF⊤ (x0 , θ)
+(x ),
+dβ β=0
+∂x
+
+(23)
+4
+
+(25)
+
+Equilibrium Propagation for Non-Conservative Systems
+
+until a stationary point (z β , z ′β ) is reached. Upon convergence, we follow the standard EP paradigm in using the
+difference z β − z ′β to compute the post-synaptic term. Un′
+′
+der the change of variables m = z+z
+2 and d = z − z , we
+prove in Appendix D that m follows the original dynamics
+F (ensuring valid inference), while d relaxes to a "physical"
+error signal proportional to the cost gradient.
+
+which, using Eq. (14), proves the result. Additionally, although implied by the equality with the true gradient, we
+explicitly demonstrate the equivalence of the gradient estimates obtained by AsymEP and Backpropagation Through
+Time in Appendix B following (Ernoult et al., 2019).
+Note that the corrective term −2AJ (x0 , θ)(x − x0 ) in
+Eq. (20) is spatially local: AJ vanishes for unconnected
+neurons, and (x − x0 ) is available at the synapse given the
+memory mechanism already required by Eq. (12). This
+correction can create backward connections (Section 5.3).
+However, in physical realizations, both feedforward and
+feedback connections must be physically present, though
+feedback may be deactivated during inference.
+
+It is important to notice that while Dyadic EP introduces a
+distinct formulation, it remains consistent with the general
+theoretical setting of EP and matches the computational
+cost of AsymEP. Note also that we start the evolution of
+the free phase (β = 0) with the identical initial condition
+for z and z ′ , (i.e., d = 0). This guarantees that integrating Eq. (32) leads to a symmetric stationary point where
+z 0 = z ′0 . Finally, we underline that the modified variational update rule in Eq. (34) is equivalent to the standard
+symmetric EP update rule in Eq. (8) (see Appendix D).
+
+4. Dyadic EP
+We now introduce Dyadic EP (Algorithm 2), a variational
+algorithm that computes the exact cost gradient in the limit
+of infinitesimal nudging. It maps the original n-variable
+dynamics F (x, θ) onto a 2n-variable system (z, z ′ ) defined
+by an energy H(z, z ′ , θ) and cost D(z, z ′ ). We show in
+Appendix E that AsymEP can be seen as the first-order
+projection of Dyadic EP onto the original n-dimensional
+state space.
+
+Now, to make this concrete, consider a continuous Hopfield
+network (see also Eq. (35)) with an asymmetric connection
+matrix J. After some calculations (see Appendix F), the
+augmented energy of the system can be re-expressed as:
+1
+1
+HT = − ρ(z)⊤ Sρ(z) + ρ(z ′ )⊤ Sρ(z ′ ) − ρ(z)⊤ Aρ(z ′ )
+2
+2
+1
+β
+2
+′ 2
++ (∥z∥ − ∥z ∥ ) + (C(z, y) + C(z ′ , y)) ,
+2
+2
+(29)
+where S and A are the symmetric and antisymmetric parts
+of J, respectively and ρ is an element-wise non-linearity.
+An interesting analogy can be drawn with standard learning
+rules in discrete Hopfield networks (Hopfield, 1982). For
+a sequence of binary memories {ξ 1 , . . . , ξ m } where ξ µ ∈
+corresponds to the standard autoassociative
+{−1, 1}n , S P
+Hebbian rule µ ξ µ (ξ µ )⊤ , creating stable attractors at each
+pattern. In contrast, A corresponds to the heteroassociative
+rule (e.g., a cycle between ξ µ and ξ ν given by ξ ν (ξ µ )⊤ −
+ξ µ (ξ ν )⊤ ), encoding transitions between patterns.
+
+The new system is defined by the energy H and cost function
+D, given in terms of F and C by:
+
+
+z + z′
+H(z, z ′ , θ) = −(z − z ′ )⊤ F
+,θ ,
+2
+
+
+′
+z+z
+,
+(26)
+D(z, z ′ ) = C
+2
+where z, z ′ ∈ Rn . In order to learn, we introduce the augmented energy
+HT (z, z ′ , θ, β) = H(z, z ′ , θ) + βD(z, z ′ ).
+
+(27)
+
+The equilibrium configuration corresponds to a saddle point
+of HT , where z minimizes and z ′ maximizes the energy.
+This poses no issue for EP, which requires only that the
+joint state (z, z ′ ) reaches a stationary state. Although this
+min-maximization can be interpreted as z evolving forward
+and z ′ backward in time, in practice they evolve forward
+simultaneously, as we integrate the coupled equations:
+
+
+dz
+∂HT
+z + z′
+=−
+=F
+,θ
+dt
+2
+∂z
+⊤
+
+
+z − z′
+∂F
+β ∂C z + z ′
++
+−
+,
+2
+∂z z+z′
+2 ∂z
+2
+2
+
+
+dz ′
+∂HT
+z + z′
+=+
+=F
+,θ
+′
+dt
+2
+∂z
+
+
+
+′ ⊤
+z−z
+∂F
+β ∂C z + z ′
+−
++
+,
+2
+∂z ′ z+z′
+2 ∂z ′
+2
+2
+(28)
+
+For this specific energy, the update rule given by Eq. (34)
+can be re-expressed as:
+
+⊤
+1
+∆J ∝ −
+ρ(z ′β ) − ρ(z β ) ρ(z ′β ) + ρ(z β ) . (30)
+2β
+In the limit β → 0, this gives:
+!
+β
+d
+∆J ∝
+⊙ ρ′ (m)ρ(m)⊤ .
+β
+
+(31)
+
+matching the learning rule in (Pineda, 1987), with
+β
+limβ→0 dβ being the error signal.
+
+5. Numerical Experiments
+In this section, we numerically validate AsymEP (Algorithm 1). The neuronal dynamics follows the one introduced
+5
+
+Equilibrium Propagation for Non-Conservative Systems
+
+where ∥ · ∥F denotes the Frobenius norm. Note that this
+metric does not capture the asymmetry of the Jacobian,
+which depends on the state x.
+
+Algorithm 2 Dyadic EP
+1: Inputs: Force field F (x, θ), cost function C(x, y),
+
+nudging parameter β, learning rate ϵ
+2: repeat
+3:
+1. Free Phase: Evolve to stationary state
+4:
+Evolve the system dynamics, starting from identi-
+
+For numerical experiments, we restricted the network to a
+layered architecture with a single hidden layer to facilitate
+comparison with prior work. Accordingly, J in contains
+only input-to-hidden connections, while J dyn is block offdiagonal, encoding bidirectional interactions between the
+hidden and output layers. Both J in and J dyn are trained.
+
+cal initial conditions z(0) = z ′ (0) = z0 ,
+5:
+
+6:
+7:
+8:
+
+dz
+∂H
+=−
+,
+dt
+∂z
+
+11:
+12:
+13:
+
+(32)
+
+We first use MNIST (LeCun, 1998) (60k train, 10k test)
+followed by Fashion-MNIST to validate AsymEP, and then
+we further validate AsymEP and Dyadic EP by comparing
+them to Backpropagation on a convolutional feedforward,
+with CIFAR-10. Inputs are normalized using min-max to
+[−1, 1] and targets are one-hot encoded in {−1, 1}. All
+hyperparameters are detailed in Appendix G, along with
+additional details and numerical results.
+
+until stationary states z 0 , z ′0 are reached.
+2. Nudged Equilibrium
+Evolve the system dynamics, starting from the
+solution of the free phase z 0 = z ′0 :
+
+9:
+
+10:
+
+dz ′
+∂H
+=+ ′,
+dt
+∂z
+
+dz
+∂HT
+=−
+,
+dt
+∂z
+
+dz ′
+∂HT
+,
+=+
+dt
+∂z ′
+
+(33)
+
+until two nudged stationary states z β , z ′β are
+reached.
+3. Parameter Update
+Update the parameters according to:
+
+
+1 ∂H(z β , z ′β , θ)
+∆θ = −ϵ
+(34)
+β
+∂θ
+
+5.1. Symmetric Initialization
+We start by comparing AsymEP with standard EP and
+VF. All algorithms are initialized with an identical symmetric matrix J dyn . EP maintains this symmetry throughout training, while VF and AsymEP induce asymmetry in
+J dyn . Since EP and VF already achieve strong performance
+on MNIST, the purpose of this experiment is to validate
+AsymEP and compare it against EP and VF rather than
+outperform the state of the art.
+
+14: until convergence of θ
+15: Output: Optimized parameters θ.
+
+in (Scellier &Bengio, 2017), and is generalized to allow
+for non-reciprocal forces as in (Scellier et al., 2018). For
+clarity, we express the forces in a form that explicitly separates the contributions of the external input and the recurrent
+interactions:
+
+F (x) = ρ′ (x) ⊙ J in u + J dyn ρ(x) − x,
+(35)
+
+Figure 1 compares the three algorithms as a function of
+hidden-layer dimension after 1 and 20 epochs. AsymEP
+consistently outperforms the baselines, suggesting it learns
+faster and better.
+Figure 2 studies the evolution of the asymmetry ratio rstr .
+The results are reported for 50 hidden neurons. As expected,
+EP preserves the initial weight symmetry. In contrast, VF
+and AsymEP induce non-trivial evolution of rstr following
+two distinct patterns, resulting in three distinct network
+configurations. A complementary figure is available in Appendix G.1.
+
+where u ∈ RNin denotes the input and x ∈ RNdyn the neuronal state, comprising both hidden and output units. The
+matrices J in ∈ RNdyn ×Nin and J dyn ∈ RNdyn ×Ndyn define the
+input and recurrent connectivity, respectively. The activation
+function ρ(·) is taken to be the hyperbolic tangent, applied
+element-wise.
+
+5.2. Fixed Asymmetry Ratio
+
+If J dyn is symmetric, we can define the energy:
+E(x) =
+
+While the previous section focused on networks compatible
+with all three algorithms (EP, VF, AsymEP), we now turn
+to architectures with strong structural asymmetry. In this
+regime, EP is inapplicable by construction, and, as we show,
+VF performs poorly, contrary to AsymEP which remains
+effective.
+
+1
+1
+∥x∥2 − ρ(x)⊤ J dyn ρ(x) − ρ(x)⊤ J in u, (36)
+2
+2
+
+which is identical to that of (Scellier &Bengio, 2017), provided that the input neurons are activated as ρ(u).
+Equation (35) naturally motivates a quantitative measure of
+structural asymmetry rstr , defined as:
+
+To this end, we consider a class of networks where the
+asymmetry ratio rstr defined in Eq. (37) is kept fixed. Let S̃
+and Ã be arbitrary symmetric and antisymmetric matrices
+in RNdyn ×Ndyn respectively. We enforce a fixed rstr via the
+
+⊤
+
+rstr =
+
+∥(J dyn − J dyn )/2∥F
+,
+∥J dyn ∥F
+
+(37)
+6
+
+Equilibrium Propagation for Non-Conservative Systems
+
+where γ ∈ R is a learnable global scale.
+Using VF and AsymEP, we train a layered network with one
+hidden layer of 50 neurons (in which case S̃ and Ã are block
+off-diagonal) for different values of rstr to investigate the
+impact of structural asymmetry. We compare two training
+regimes: training only the input weights J in (and the scale
+γ), versus training all parameters including J dyn . The first
+regime trains only the external forces from the input ρ′ (x) ⊙
+J in u (which correspond to a symmetric contribution in the
+Jacobian) applied to our non-conservative system, while
+the second additionally trains J dyn and therefore the nonsymmetric part of the Jacobian directly.
+
+(a) Results after one epoch.
+
+Figure 3 summarizes the results. We find that AsymEP
+maintains robust performance across all asymmetry levels
+(e.g., achieving an accuracy of 93.8 ± 0.4% at rstr = 0 and
+94.9 ± 0.2% at rstr = 0.875 when training all parameters)
+and can even learn when the recurrent connection matrix
+J dyn is completely antisymmetric (rstr = 1). Additionally,
+training all parameters shows significant improvement over
+training only J in .
+In contrast, VF performs well at low asymmetry ratios
+but degrades as asymmetry increases, eventually dropping
+to chance levels (e.g., accuracies of 5 ± 3% and 8 ± 4%
+at rstr = 1 for input-only and all-parameter training, respectively). When only J in is trained, VF accuracy collapses around rstr ≈ 0.5, whereas training all parameters
+delays this collapse until rstr ≈ 0.8. Our analysis in Appendix G.2.1 reveals that VF adjusts the dynamics such that
+the asymmetry of the Jacobian’s off-diagonal terms remains
+strictly lower than the structural asymmetry ratio. The training appears to adjust the neuronal state such that neurons
+connected by strongly asymmetric weights have low activation. As shown in Appendix G.2.1, AsymEP learns faster
+than VF across all levels of asymmetry.
+
+(b) Results after 20 epochs.
+Figure 1. Comparison of algorithm performance on MNIST using
+a layered architecture with one hidden layer and symmetric initialization. Squares denote AsymEP, circles EP, and triangles VF.
+Test accuracy (averaged over 10 runs) is shown after one epoch
+(Fig. 1a) and 20 epochs (Fig. 1b).
+
+Finally, Appendix G.3 opens with a brief theoretical discussion of the stability of these non-conservative dynamics,
+followed by simulations on all-to-all topologies with constrained rstr and input projections J in . Even in this worstcase setting, AsymEP reduces oscillations and improves
+stability.
+5.3. Feedforward Architectures
+Figure 2. Evolution of the asymmetry ratio rstr (defined in Eq. (37))
+during training on MNIST for AsymEP, EP and VF, initialized
+from a symmetric configuration. The models use 50 hidden neurons.
+
+We now consider a purely feedforward architecture. Here
+VF trains only the last layer: with no backward connections,
+the output nudging signal cannot reach earlier layers, so for
+every layer but the last the nudged stationary states coincide
+with the free states, giving zero weight updates. As only
+the output layer is trained, the system essentially becomes
+an Extreme Learning Machine (Huang et al., 2006; Wang
+et al., 2022). In contrast, AsymEP introduces a correction
+that generates effective backward connections, allowing the
+
+following parameterization of the recurrent parameters:
+"q
+#
+S̃
+Ã
+dyn
+2
+J =γ
+1 − rstr
++ rstr
+,
+(38)
+∥S̃∥F
+∥Ã∥F
+7
+
+Equilibrium Propagation for Non-Conservative Systems
+
+tivity structures inspired by (Millidge et al., 2023), while
+keeping the number of trainable parameters fixed.
+Experiments are conducted on Fashion-MNIST using a twohidden-layer network with hidden dimensions 500 and 200.
+Network states are denoted (x0 , x1 , x2 , x3 ), where x0 is
+the input and x3 = xL the output. Forward and backward
+connections are denoted by Wk and Bk , respectively, with
+W1 = J in .
+We consider three classes of dynamics. First, the Continuous
+Hopfield (CH) dynamics introduced previously:
+
+
+dxk
+= −xk +ρ′ (xk )⊙ Wk ρ(xk−1 )+(1−δk,L )Bk ρ(xk+1 ) .
+dt
+(40)
+Second, Predictive Coding (PC) dynamics, defined through
+the prediction errors ek = xk − Wk ρ(xk−1 ), whose fixed
+point ek = 0 corresponds to a standard feedforward network:
+
+Figure 3. Impact of the structural asymmetry ratio rstr on accuracy
+(top) and standard deviation over 10 runs (bottom) on MNIST.
+We compare VF (orange) and AsymEP (blue) under two training
+regimes: training only J in (dashed) or all parameters (solid).
+
+dxk
+= −ek + (1 − δk,L ) (ρ′ (xk ) ⊙ (Bk ek+1 )) .
+dt
+
+nudging signal to influence all layers. We make this explicit
+for a network with one hidden layer.
+
+(41)
+
+Third, a standard dynamics chosen for direct comparison
+with backpropagation:
+
+Let the state x be partitioned in hidden h and output o
+layers. The recurrent connection matrix is then J dyn =
+
+0
+0
+. The forces of the system are:
+Wh→o 0
+ β
+
+
+Fh = ρ′ (h) ⊙ J in u + λ(Wh→o )⊤ (o − o0 ) − h
+
+
+
+
+
+ β
+0
+Fo = ρ′ (o) ⊙ Wh→o ρ(h) − λWh→o (h − h )
+(39)
+
+
+
+∂C
+
+
+−o
++ λβ
+∂o
+
+dxk
+= −xk + Wk ρ(xk−1 ) + (1 − δk,L )Bk ρ(xk+1 ). (42)
+dt
+For each dynamics, we examine three connectivity scenarios.
+⊤
+• In the asymmetric case (Bk ̸= Wk+1
+), the backward
+weights Bk are randomly initialized and kept fixed
+while only the forward weights are trained, ensuring a
+fair comparison (i.e., identical number of parameters);
+in PC, the learning rule for Bk is zero when only inputs
+are clamped.
+
+where λ is 0 during the free inference and 1 during the
+nudged phase (Eq. 20). The force on the hidden layer Fhβ
+now depends on the output layer through the term ρ′ (h) ⊙
+⊤
+(Wh→o ) (o − o0 ), enabling the nudge (the term β ∂C
+∂o ) to
+influence the hidden layer. This implicitly assumes that the
+hardware implementation supports the physical activation
+of these backward connections.
+
+⊤
+• In the symmetric / conservative case (Bk = Wk+1
+), the
+CH and PC dynamics derive from an energy functional,
+while the standard dynamics remains non-conservative
+due to its non-symmetric Jacobian.
+
+• In the feedforward case (Bk = 0), the PC and standard dynamics coincide; for the standard dynamics, the
+AsymEP learning rule mirrors backpropagation, with
+1
+∆xβk = 2β
+(xβ − x−β ) acting as the propagated error
+signal.
+
+We validate this using a single hidden layer of only 20 neurons on MNIST. After training, VF saturates with 64.3 ±
+2.0% accuracy, whereas AsymEP reaches 92.7 ± 0.5% accuracy. We expect this discrepancy to increase with network
+depth, since this increases the number of layers unable to
+learn under VF. A figure with the accuracy during training
+can be found in Appendix G.4.2.
+
+Table 1 shows that AsymEP consistently outperforms VF
+in both asymmetric and feedforward settings, in final accuracy, learning speed, and stability. After a single epoch
+it already provides on average a 15% accuracy gain with
+an order-of-magnitude reduction in variance. Remarkably,
+AsymEP with asymmetric connectivity also surpasses EP
+on symmetric networks despite training only the forward
+
+5.4. Advantages of Non-Conservative Dynamics
+AsymEP is not tied to a specific neural dynamics. To further
+assess the benefits of training non-conservative dynamics
+using AsymEP, we compare several dynamics and connec8
+
+Equilibrium Propagation for Non-Conservative Systems
+
+6. Discussion and Conclusion
+
+weights, suggesting that relaxing symmetry constraints may
+improve expressivity. Supplementary results are provided
+in Appendix G.5.
+
+In this work, we extended Equilibrium Propagation (EP)
+to non-conservative systems that reach stationary states by
+deriving two mathematically equivalent algorithms that recover the exact gradient of the cost function in the limit of
+infinitesimal nudging.
+
+Table 1. Test accuracy on Fashion-MNIST (%) at Epoch 50 (mean
+± std 10 runs). BP on a standard feedforward architecture using
+MSE and SGD achieve 87.37 ± 0.29%.
+EP
+
+AsymEP
+
+Asym
+86.78 ± 0.14
+Feedfor
+86.05 ± 0.12
+Sym
+84.30 ± 0.13
+Asym
+86.20 ± 0.17
+PC
+Sym
+84.78 ± 0.14
+Asym
+82.91 ± 0.48
+Standard
+Feedfor
+86.25 ± 0.16
+CH
+
+The first approach, Asymmetric EP, preserves the original
+inference dynamics. It introduces a corrective force during
+the nudged phase that remains spatially local, as the antisymmetric Jacobian is null for unconnected neurons and the
+perturbation from equilibrium is available at the synapse
+level. Unlike standard methods like Recurrent Backpropagation (Almeida, 1990; Pineda, 1987), this avoids explicit
+digital weight transposition. However, a physical mechanism to obtain the local corrective force at the synapse
+level remains a subject for future work. We also note that
+AsymEP shares the temporal non-locality of standard EP.
+
+VF
+85.20 ± 0.12
+77.76 ± 0.37
+80.71 ± 6.17
+75.52 ± 1.69
+78.58 ± 0.28
+
+Finally, to investigate how AsymEP scales with depth, we
+trained deeper fully connected networks with two and three
+hidden layers of 500 neurons on Fashion-MNIST, reaching
+86.41 ± 0.22% and 87.8 ± 0.15% test accuracy respectively.
+
+The second approach, Dyadic EP, doubles the state space
+to map non-reciprocal dynamics onto an energy landscape—conceptually reminiscent of multi-compartment cortical neurons, where apical dendrites integrate feedback
+(analogous to z − z ′ ) separately from basal feedforward
+input (analogous to z + z ′ ) (Guerguiev et al., 2017). Additionally, this expanded space also enables the positive and
+negative nudging phases to run in parallel. This offers a
+pathway to implement a version of EP that is local in time,
+but would require a doubling of the degrees of freedom
+on the physical hardware. More fundamentally, the energy
+defined on the extended state shows that the tools and theoretical guarantees obtained for EP should also apply to
+the case of non-reciprocal forces, and that the variational
+principle behind EP is universal in the sense that it can be
+applied to all networks which operate in a stationary state.
+
+5.5. Feedforward Training on CIFAR-10: BP vs. Dyadic
+EP vs. AsymEP
+To test whether our framework scales beyond shallow networks, we conclude with a deep, purely feedforward CNN
+architecture trained on CIFAR-10. We compare backpropagation (BP), VF, AsymEP and Dyadic EP in a controlled
+setting where the gradient estimator is the only difference
+between runs: all methods share the same configuration,
+with the BP gradient replaced by the contrast of stationary
+states for the EP-based methods (see App. G.6 for details).
+Each configuration is trained for 40 epochs over 5 seeds.
+
+Furthermore, Dyadic EP is not restricted to the EP community and could suggest a more physically plausible alternative to the stationary-state Adjoint Method (for fixed
+inputs) (Chen et al., 2018): by solving the forward and adjoint equations simultaneously via relaxation, it circumvents
+a separate backward-in-time pass.
+
+Table 2 reports the final test accuracy. Both of our algorithms scale to this regime, closely tracking the BP baseline
+throughout training and matching its final accuracy: a paired
+t-test finds no significant difference between Dyadic EP and
+BP (p = 0.75), and only a sub-percent gap for AsymEP.
+In contrast, VF makes slight initial progress (peaking near
+30%) before collapsing to chance level (10%). Additional
+details can be found in Appendix G.6
+
+Finally, our experiments on MNIST, Fashion-MNIST, and
+CIFAR-10 confirm that AsymEP and Dyadic EP consistently outperform EP and VF, and notably enables effective
+training of feedforward networks.
+Our work thus opens new avenues for learning in neuromorphic hardware, dissipative physical systems, and neural
+architectures where asymmetry is intrinsic rather than incidental.
+
+Table 2. Test accuracy on CIFAR-10 (%) at epoch 40 (mean ± std
+over 5 seeds).
+Method
+
+Test Acc. (%)
+
+Backpropagation
+Dyadic EP
+AsymEP
+VF
+
+90.66 ± 0.25
+90.69 ± 0.14
+89.74 ± 0.14
+10.00 ± 0.00
+
+9
+
+Equilibrium Propagation for Non-Conservative Systems
+
+Impact Statement
+
+References
+
+This paper presents results that advance the field of machine
+learning. There are many potential societal consequences
+of our work, none of which we feel must be specifically
+highlighted here.
+
+Almeida, L. B. A learning rule for asynchronous perceptrons with feedback in a combinatorial environment. In
+Artificial neural networks: concept learning, pp. 102–111.
+1990.
+
+Acknowledgments
+
+Altman, L. E., Stern, M., Liu, A. J., and Durian, D. J. Experimental demonstration of coupled learning in elastic
+networks. Physical Review Applied, 22(2):024053, 2024.
+
+AES is fully funded by the Horizon Europe Marie
+Skłodowska-Curie Doctoral Network ’Postdigital Plus’
+(Grant 101169118). DVA acknowledges the support of
+the French Community of Belgium through a FRIA fellowship. SM acknowledges financial support by the Fonds de la
+Recherche Scientifique–FNRS, Belgium under EOS Project
+No. 40007536. Computational resources have been provided by the Consortium des Équipements de Calcul Intensif
+(CÉCI), funded by the Fonds de la Recherche Scientifique
+de Belgique (F.R.S.-FNRS) under Grant No. 2.5020.11 and
+by the Walloon Region.
+
+Aykroyd, C., Bourgoin, A., and Poncin-Lafitte, C. L. Hamiltonian treatment of non-conservative systems. arXiv
+preprint arXiv:2507.18658, 2025.
+Bateman, H. On dissipative systems and related variational
+principles. Physical Review, 38(4):815, 1931.
+Berneman, M. and Hexner, D. Equilibrium propagation for
+dissipative dynamics. Advanced Intelligent Systems, pp.
+e202501310, 2025.
+Bishop, K. J., Biswal, S. L., and Bharti, B. Active colloids
+as models, materials, and machines. Annual Review of
+Chemical and Biomolecular Engineering, 14(1):1–30,
+2023.
+
+“ἁρμονίη ἀφανὴς φανερῆς κρείττων”
+
+Bowick, M. J., Fakhri, N., Marchetti, M. C., and Ramaswamy, S. Symmetry, thermodynamics, and topology
+in active matter. Physical Review X, 12(1):010501, 2022.
+Brandenbourger, M., Locsin, X., Lerner, E., and Coulais, C.
+Non-reciprocal robotic metamaterials. Nature communications, 10(1):4608, 2019.
+Cesa-Bianchi, N. and Lugosi, G. Prediction, learning, and
+games. Cambridge university press, 2006.
+Chen, R. T., Rubanova, Y., Bettencourt, J., and Duvenaud,
+D. K. Neural ordinary differential equations. Advances
+in neural information processing systems, 31, 2018.
+Cin, N. D., Marquardt, F., and Wanjura, C. C. Training
+nonlinear optical neural networks with scattering backpropagation. arXiv preprint arXiv:2508.11750, 2025.
+Costa, P. and Santos, P. A. Directed equilibrium propagation
+revisited. Mathematics, 13(11), 2025. ISSN 2227-7390.
+Crick, F. The recent excitement about neural networks.
+Nature, 337, 1989.
+Dillavou, S., Stern, M., Liu, A. J., and Durian, D. J. Demonstration of decentralized physics-driven learning. Physical Review Applied, 18(1):014040, 2022.
+Dillavou, S., Beyer, B. D., Stern, M., Liu, A. J., Miskin,
+M. Z., and Durian, D. J. Machine learning without a processor: Emergent learning in a nonlinear analog network.
+Proceedings of the National Academy of Sciences, 121
+(28):e2319718121, 2024.
+10
+
+Equilibrium Propagation for Non-Conservative Systems
+
+Ernoult, M., Grollier, J., Querlioz, D., Bengio, Y., and Scellier, B. Updates of equilibrium prop match gradients
+of backprop through time in an rnn with static input.
+Advances in neural information processing systems, 32,
+2019.
+
+Indiveri, G. and Liu, S.-C. Memory and information processing in neuromorphic systems. Proceedings of the
+IEEE, 103(8):1379–1397, 2015.
+
+Ernoult, M., Grollier, J., Querlioz, D., Bengio, Y., and Scellier, B. Equilibrium propagation with continual weight
+updates. arXiv preprint arXiv:2005.04168, 2020.
+
+Kalinin, K. P., Gladrow, J., Chu, J., Clegg, J. H., Cletheroe,
+D., Kelly, D. J., Rahmani, B., Brennan, G., Canakci, B.,
+Falck, F., et al. Analog optical computer for ai inference
+and combinatorial optimization. Nature, 645(8080):354–
+361, 2025.
+
+Falk, M. J., Strupp, A. T., Scellier, B., and Murugan,
+A. Temporal contrastive learning through implicit nonequilibrium memory. Nature Communications, (16),
+2025.
+
+Kendall, J., Pantone, R., Manickavasagam, K., Bengio,
+Y., and Scellier, B. Training end-to-end analog neural
+networks with equilibrium propagation. arXiv preprint
+arXiv:2006.01981, 2020.
+
+Farinha, M. T., Pequito, S., Santos, P. A., and Figueiredo,
+M. A. T. Equilibrium propagation for complete directed
+neural networks. In Proceedings of the 28th European
+Symposium on Artificial Neural Networks, Computational
+Intelligence and Machine Learning (ESANN 2020), 2020.
+
+Laborieux, A. and Zenke, F. Holomorphic equilibrium
+propagation computes exact gradients through finite size
+oscillations. Advances in Neural Information Processing
+Systems, 35:12950–12963, 2022.
+Laborieux, A. and Zenke, F. Improving equilibrium propagation without weight symmetry through jacobian homeostasis. In Proceedings of the International Conference on Learning Representations (ICLR) 2024, Virtual
+(ICLR), May 2024.
+
+Galley, C. R. Classical mechanics of nonconservative systems. Physical review letters, 110(17):174301, 2013.
+Guerguiev, J., Lillicrap, T. P., and Richards, B. A. Towards
+deep learning with segregated dendrites. elife, 6:e22901,
+2017.
+
+Laborieux, A., Ernoult, M., Scellier, B., Bengio, Y., Grollier,
+J., and Querlioz, D. Scaling equilibrium propagation to
+deep convnets by drastically reducing its gradient estimator bias. Frontiers in neuroscience, 15:633674, 2021.
+
+Høier, R. and Zach, C. A lagrangian perspective on dual
+propagation. In Proceedings of the First Workshop on Machine Learning with New Compute Paradigms at NeurIPS
+2023, New Orleans, LA, USA, Dec 2023.
+
+Laydevant, J., Marković, D., and Grollier, J. Training an
+ising machine with equilibrium propagation. Nature Communications, 15(1):3671, 2024.
+
+Høier, R. and Zach, C. Two tales of single-phase contrastive
+hebbian learning. In Salakhutdinov, R., Kolter, Z., Heller,
+K., Weller, A., Oliver, N., Scarlett, J., and Berkenkamp, F.
+(eds.), Proceedings of the 41st International Conference
+on Machine Learning, volume 235 of Proceedings of
+Machine Learning Research, pp. 18470–18488. PMLR,
+21–27 Jul 2024.
+
+LeCun, Y. The mnist database of handwritten digits.
+http://yann. lecun. com/exdb/mnist/, 1998.
+Martin, E., Ernoult, M., Laydevant, J., Li, S., Querlioz, D.,
+Petrisor, T., and Grollier, J. Eqspike: spike-driven equilibrium propagation for neuromorphic implementations.
+Iscience, 24(3), 2021.
+
+Høier, R., Staudt, D., and Zach, C. Dual propagation: accelerating contrastive hebbian learning with dyadic neurons.
+In Proceedings of the 40th International Conference on
+Machine Learning, ICML’23. JMLR.org, 2023.
+
+Massar, S. Equilibrium propagation for learning in lagrangian dynamical systems. Physical Review E, 112
+(3):035304, 2025.
+
+Høier, R., Kalinin, K., Ernoult, M., and Zach, C. Dyadic
+learning in recurrent and feedforward models. In NeurIPS
+2024 Workshop Machine Learning with new Compute
+Paradigms, 2024.
+
+Massar, S. and Mognetti, B. M. Equilibrium propagation:
+the quantum and the thermal cases. Quantum Studies:
+Mathematics and Foundations, 12(1):6, 2025.
+
+Hopfield, J. J. Neural networks and physical systems with
+emergent collective computational abilities. Proceedings
+of the national academy of sciences, 79(8):2554–2558,
+1982.
+
+Millidge, B., Song, Y., Salvatori, T., Lukasiewicz, T., and
+Bogacz, R. Backpropagation at the infinitesimal inference limit of energy-based models: Unifying predictive
+coding, equilibrium propagation, and contrastive hebbian
+learning. In International Conference on Learning Representations (ICLR), 2023.
+
+Huang, G.-B., Zhu, Q.-Y., and Siew, C.-K. Extreme learning
+machine: theory and applications. Neurocomputing, 70
+(1-3):489–501, 2006.
+11
+
+Equilibrium Propagation for Non-Conservative Systems
+
+Nest, T. and Høier, R. Dyadic learning in asymmetric
+convnets. In New Frontiers in Associative MemoriesWorkshop at ICLR 2026.
+
+Wang, Q., Wanjura, C. C., and Marquardt, F. Training
+coupled phase oscillators as a neuromorphic platform
+using equilibrium propagation. Neuromorphic Computing
+and Engineering, 4(3):034014, 2024.
+
+Osat, S. and Golestanian, R. Non-reciprocal multifarious
+self-organization. Nature Nanotechnology, 18(1):79–85,
+2023.
+
+Wanjura, C. C. and Marquardt, F. Quantum equilibrium
+propagation for efficient training of quantum systems
+based on onsager reciprocity. Nature Communications,
+16(1):6595, 2025.
+
+O’Connor, P., Gavves, E., and Welling, M. Training a spiking neural network with equilibrium propagation. In The
+22nd international conference on artificial intelligence
+and statistics, pp. 1516–1523. PMLR, 2019.
+
+Werbos, P. J. Backpropagation through time: what it does
+and how to do it. Proceedings of the IEEE, 78(10):1550–
+1560, 1990.
+
+Pineda, F. Generalization of back propagation to recurrent
+and higher order neural networks. In Neural information
+processing systems, 1987.
+
+Yi, S.-i., Kendall, J. D., Williams, R. S., and Kumar, S.
+Activity-difference training of deep neural networks using
+memristor crossbars. Nature Electronics, 6(1):45–51,
+2023.
+
+Pourcel, G., Basu, D., Ernoult, M., and Gilra, A. Lagrangianbased equilibrium propagation: generalisation to arbitrary boundary conditions & equivalence with hamiltonian echo learning. arXiv preprint arXiv:2506.06248,
+2025.
+Rageau, T. and Grollier, J. Training and synchronizing
+oscillator networks with equilibrium propagation. Neuromorphic Computing and Engineering, 2025.
+Sajnok, K. and Matuszewski, M. Near-equilibrium propagation training in nonlinear wave systems. arXiv preprint
+arXiv:2510.16084, 2025.
+Scellier, B. Quantum equilibrium propagation: Gradientdescent training of quantum systems. arXiv preprint
+arXiv:2406.00879, 2024.
+Scellier, B. and Bengio, Y. Equilibrium propagation: Bridging the gap between energy-based models and backpropagation. Frontiers in computational neuroscience, 11:24,
+2017.
+Scellier, B., Goyal, A., Binas, J., Mesnard, T., and Bengio,
+Y. Generalization of equilibrium propagation to vector
+field dynamics. arXiv preprint arXiv:1808.04873, 2018.
+Scellier, B., Mishra, S., Bengio, Y., and Ollivier, Y. Agnostic
+physics-driven deep learning. arXiv:2205.15021v1, 2022.
+Scurria, A. E. A physical theory of backpropagation: Exact
+gradients from the least-action principle. 2026.
+Stern, M., Hexner, D., Rocks, J. W., and Liu, A. J. Supervised learning in physical networks: From machine
+learning to learning machines. Physical Review X, 11(2):
+021045, 2021.
+Wang, J., Lu, S., Wang, S.-H., and Zhang, Y.-D. A review
+on extreme learning machine. Multimedia Tools and
+Applications, 81(29):41611–41660, 2022.
+12
+
+Equilibrium Propagation for Non-Conservative Systems
+
+A. Gradient Estimation Error in VF
+
+where s denotes the dynamical state of the system. This
+symmetry is the linchpin of the equivalence proof, as the
+gradient expressions derived for BPTT and standard EP
+differ precisely by a transpose operation applied to ∂F
+∂s .
+
+In this appendix, we quantify the gradient estimation error
+introduced by VF in the limit where the Jacobian asymmetry
+is small.
+
+This observation aligns with our analysis in the main text:
+VF fails in non-conservative systems due to the missing
+transpose in the post-synaptic term (see Eq. (16)). Following
+the derivation in Ernoult et al. (2019) (viz., Appendix A, Eqs.
+(31–33)), the recursive relations for the gradients in BPTT
+are given by:
+
+Comparing the post-synaptic update terms in Eqs. (12) and
+(14) gives the following error in the gradient of the cost:
+⊤
+∂F 0
+(x , θ)
+∂θ
+
+−1
+−1  ∂C 0
+(x , y), (43)
+× JF (x0 , θ)
+− JF⊤ (x0 , θ)
+∂x
+
+
+Error = −
+
+∇BPTT
+(0) =
+s
+
+To quantify this error, we decompose the Jacobian JF (x, θ)
+into its symmetric part SJ (x, θ) and antisymmetric part
+
+SJ (x, θ) = 12 JF (x, θ) + JF⊤ (x, θ) ,
+(44)
+
+AJ (x, θ) = 12 JF (x, θ) − JF⊤ (x, θ) .
+
+(JF )
+
+=
+
+(JF⊤ )−1 =
+
+∞
+X
+
+∇BPTT
+(t) =
+s
+∇BPTT
+(t) =
+θ
+
+(−1)
+
+n=0
+∞
+X
+
+(SJ−1 AJ )n
+
+SJ−1 ,
+
+
+
+
+∂F
+(x, s⋆ , θ)
+∂s
+
+⊤
+
+∂F
+(x, s⋆ , θ)
+∂θ
+
+⊤
+
+∇BPTT
+(t − 1),
+s
+
+(50)
+
+∇BPTT
+(t − 1),
+s
+
+(51)
+
+where θ represents the optimization parameters, ℓ is the
+cost function, s⋆ is the free equilibrium state (satisfying
+F (s⋆ ) = 0), y is the target, and x is the input. The index t
+denotes the unrolled time steps, initialized at s(0) = s⋆ .
+
+!
+n
+
+(49)
+
+and for all t = 1, . . . , K,
+
+Assuming the asymmetry AJ (x, θ) is small, we can make
+a series expansion in SJ−1 AJ (omitting the dependencies
+for clarity). Applying the Neumann expansion for small
+∥SJ−1 AJ ∥ gives
+−1
+
+∂ℓ
+(s⋆ , y),
+∂s
+
+(45)
+
+In contrast, the gradients computed by VF follow the recursion (viz., Ernoult et al. (2019), Appendix A, Eqs. (24–26)):
+
+!
+(SJ−1 AJ )n
+
+SJ−1 .
+
+(46)
+
+∆EP
+s (0) = −
+
+n=0
+
+∂ℓ
+(s⋆ , y),
+∂s
+
+(52)
+
+and for all t ≥ 0,
+
+Subtracting the two series and assuming convergence, we
+finally obtain
+!
+∞
+X
+2n+1
+−1
+−1
+⊤ −1
+(JF ) − (JF ) = −2
+SJ AJ
+SJ−1 .
+
+∂F
+(x, s⋆ , θ) ∆EP
+s (t),
+∂s
+⊤
+
+∂F
+(x,
+s
+,
+θ)
+∆EP
+∆EP
+(t
++
+1)
+=
+⋆
+s (t).
+θ
+∂θ
+
+∆EP
+s (t + 1) =
+
+n=0
+
+(47)
+
+(53)
+(54)
+
+Comparing these two sets of equations confirms that the only
+difference are Eqs. (50) and (53), specifically the transpose
+of the Jacobian ∂F
+∂s (ignoring the global sign difference in
+Eqs. (49) and (52)).
+
+B. Equivalence between AsymEP and BPTT
+In this appendix, we sketch the equivalence between the
+gradient estimate computed by AsymEP and Backpropagation Through Time (BPTT) (Werbos, 1990) for a Recurrent
+Neural Network with fixed inputs. Our derivation relies on
+the proof provided by Ernoult et al. (2019), which established that standard (conservative) EP computes gradients
+identical to those of BPTT. To facilitate direct comparison,
+we adopt their notation for this section.
+
+In AsymEP, we modify the dynamics by adding a correction
+term dependent on the antisymmetric part of the Jacobian.
+Denoting the force of this augmented system by F A , the
+Jacobian at the free equilibrium satisfies:
+
+The proof provided by Ernoult et al. (2019) relies on the
+assumption that the vector field F (i.e., transition function)
+is derived from a scalar potential function, which implies
+that
+
+⊤
+∂F
+∂F
+=
+,
+(48)
+∂s
+∂s
+
+By substituting this corrected Jacobian into the recursive
+relations, AsymEP recovers the exact transpose required
+by BPTT. Consequently, our method extends the equivalence between EP and BPTT to the general case of nonconservative force.
+
+∂F A
+(x, s⋆ , θ) =
+∂s
+
+13
+
+
+
+∂F
+(x, s⋆ , θ)
+∂s
+
+⊤
+.
+
+(55)
+
+Equilibrium Propagation for Non-Conservative Systems
+
+C. Out-of-Equilibrium Mechanics
+
+C.3. Symmetry Breaking as Credit Assignment
+
+Here we sketch the physical picture behind the doubledenergy construction of Eq. (26). The full derivation from
+Hamilton’s least-action principle, together with its connection to the Bateman–Galley formalism for non-conservative
+classical mechanics (Bateman, 1931; Galley, 2013; Aykroyd
+et al., 2025), can be found in (Scurria, 2026).
+
+On the diagonal manifold z = z ′ the doubled system enjoys
+a gauge symmetry: the auxiliary variable z ′ is redundant and
+the difference d is identically zero. Credit assignment is implemented by deliberately breaking this symmetry through
+the task cost. Adding βD(z, z ′ ) = β C(m) to H exerts
+opposite forces on z and z ′ and drives them apart, so that
+the difference d ceases to be redundant and begins to carry
+information about the loss landscape.
+
+C.1. The Helmholtz Obstruction
+The natural physical route to a variational principle for a
+dynamical system ẋ = F (x, θ) is to seek a scalar potential
+E such that F = −∂x E. The classical Helmholtz integrability condition states that such an E exists if and only if the
+Jacobian JF is symmetric everywhere. Whenever the interactions are non-reciprocal — as in feedforward networks,
+active matter, or driven optical systems — JF acquires
+a non-zero antisymmetric part and the Helmholtz condition fails identically. No scalar potential on the original
+n-dimensional state space can then generate the dynamics,
+and the “energy minimisation” route at the heart of standard
+EP is blocked at the structural level. The obstruction is not
+a matter of computational convenience: it reflects the fact
+that the rotational component of F carries information that
+no scalar function of x alone can record.
+
+D. Proofs for Dyadic EP
+We now demonstrate that Dyadic EP correctly trains the
+parameters θ of the original force field F (x, θ), giving the
+0
+)
+in the limit of infinitesimal nudging.
+exact gradient dC(x̄
+dθ
+D.1. Proof of EP
+First, recall that standard EP does not strictly require the
+system to settle at an energy minimum; it requires only that
+the system reaches a stationary state (a fixed point of the
+dynamics). Indeed, using the notation of Section 2.1, EP
+relies on the key identity:
+d2
+d2
+ET (xβ , θ) =
+ET (xβ , θ).
+dθdβ
+dβdθ
+
+C.2. Variational Reconstruction on a Doubled Space
+
+Expanding the total derivative with respect to β gives:
+
+Applying the Bateman–Galley formalism circumvents this
+obstruction by enlarging the configuration space. The single
+state x ∈ Rn is replaced by a conjugate pair (z, z ′ ) ∈ R2n ,
+and the rotational component of F — which has no scalar
+generator on the original n-dimensional space — is absorbed into a bilinear coupling between z and z ′ on the
+doubled space, where it does admit a variational description. The physical motion is recovered on the diagonal
+submanifold z = z ′ (the so called ’physical limit’), while
+the off-diagonal direction d = z − z ′ supplies the additional
+degree of freedom needed to encode non-reciprocity.
+
+d
+ET (xβ , θ) =
+dβ
+
+
+
+
+z + z′
+,θ ,
+2
+
+
+
+∂ET (xβ , θ)
+∂x
+
+⊤
+
+dxβ
+∂ET (xβ , θ)
++
+dβ
+∂β
+
+= C(xβ ).
+
+(58)
+
+Where the first term vanishes because the system is at a
+∂
+ET (xβ , θ) = 0; this holds even if
+stationary state, i.e., ∂x
+the system is not at a minimum of ET . Similarly, for the
+derivative with respect to θ:
+d
+∂ET (xβ , θ)
+ET (xβ , θ) =
+,
+dθ
+∂θ
+
+Specializing this reconstruction to the overdamped (firstorder) regime relevant to relaxational neural dynamics yields
+the bilinear energy
+H(z, z ′ , θ) = −(z − z ′ )⊤ F
+
+(57)
+
+(59)
+
+where we additionally assume that the cost function does
+not depend explicitly on the parameters θ. Substituting these
+results into Eq. (57) in the limit of infinitesimal nudging
+(β → 0) recovers the fundamental relation given by Eq. (9).
+
+(56)
+D.2. Proof of Dyadic EP
+
+which is precisely Eq. (26). The symmetric midpoint m =
+(z + z ′ )/2 plays the role of the physical coordinate of the
+doubled system, while d is the auxiliary direction along
+which non-reciprocity is stored. On the submanifold z = z ′
+the coupling proportional to (z − z ′ ) vanishes identically
+and both states evolve under the original field F , so the
+doubling leaves the on-shell physics unchanged. We refer
+the reader to (Scurria, 2026) for the full construction.
+
+We analyze now the stationary states of Dyadic EP by introducing the change of variables:
+m=
+
+z + z′
+,
+2
+
+d = z − z′.
+
+(60)
+
+In these coordinates, the augmented energy HT becomes
+HT (m, d, θ, β) = −d⊤ F (m, θ) + βC(m)
+14
+
+(61)
+
+Equilibrium Propagation for Non-Conservative Systems
+
+and the dynamics in Eq. (28) can be rewritten as:
+∂HT
+dm
+=−
+= F (m, θ),
+dt
+∂d
+dd
+∂HT
+∂
+=−
+= dT JF (m, θ) − β
+C(m).
+dt
+∂m
+∂m
+
+In Dyadic EP, we instead employ the single-phase update:
+
+
+1 ∂H(z β , z ′β , θ)
+∆θ ∝ −
+(70)
+β
+∂θ
+
+(62)
+(63)
+
+This choice avoids the overhead of evolving two coupled
+equations in the extended space, which would be computationally equivalent to evolving four equations in the original
+space (two for +β and two for −β). Using Eq. (70), we
+evolve only one coupled equation for +β in the extended
+space; this corresponds to two equations in the original
+space, thereby achieving the same computational complexity as AsymEP. Furthermore, this single-phase formulation
+suggests a pathway toward making the update local in time,
+provided appropriate hardware is used to implement the
+augmented phase.
+
+β
+
+The stationary states (mβ , d ) are the solutions to:
+F (mβ , θ) = 0,
+∂
+βT
+d JF (mβ , θ) − β
+C(mβ ) = 0.
+∂m
+
+(64)
+(65)
+
+This leads to the following observations:
+1) The stationary state of m is independent of β and coincides with the stationary state of the original system:
+z β + z ′β
+= mβ = m0 = x0 .
+2
+
+Mathematically, these two approaches yield the same gradient estimate because the equations for dβ are linear. Explicitly we have :
+
+
+∂H(z β , z ′β , θ)
+∂F z β + z ′β
+= −(z β − z ′β )⊤
+,θ
+∂θ
+∂θ
+2
+
+⊤
+−1
+∂F 0 
+= −β
+z ,θ
+JF⊤ (z 0 , θ)
+∂θ
+
+
+∂C 0
+×
+(z ) ,
+(71)
+∂x
+
+(66)
+
+2) The Jacobian of the extended system defined in Eq. (26)
+is invertible, provided JF is invertible. This is most evident
+from Eq. (63).
+3) The stationary state value of d is given by:
+
+
+−1 ∂C 0
+β
+d = β JF⊤ (m0 , θ)
+(x )
+∂x
+
+(67)
+
+where we have used Eqs. (66) and (67). Inspection of
+Eq. (71) confirms that, up to corrections of order β 2 , we
+obtain exactly the same gradient as in AsymEP.
+
+0
+
+In particular, when β = 0, we have d = 0, which implies
+that the free stationary states coincide: z 0 = z ′0 .
+
+E. AsymEP versus Dyadic EP
+
+4) The cost at the stationary state of the extended system
+is equal to the cost at the stationary state of the original
+system:
+D(m0 ) = C(x0 ).
+(68)
+
+In this appendix, we demonstrate that Asymmetric Equilibrium Propagation (AsymEP) emerges naturally as the firstorder projection of the 2N -dimensional Dyadic Equilibrium
+Propagation onto a single N -dimensional state space. We
+then formalize the physical trade-offs between the two architectures.
+
+Consequently, the gradients of the cost with respect to the
+parameters are identical.
+Since both the original and extended systems, given respectively in Eq. (28) and Eq. (1-2), share the same cost at their
+respective stationary states, and because the Jacobians of
+both models are invertible, applying EP update rule to the
+extended system give the correct gradient estimate for the
+parameters θ of the original system.
+
+E.1. AsymEP as the Linear Projection of Dyadic EP
+As established in Appendix D.2, transforming the 2N dimensional extended space (z, z ′ ) into the mean state
+′
+m = z+z
+and the difference state d = z − z ′ exactly
+2
+decouples the stationary dynamics. Because the stationary
+state of m is the free state of the original system (mβ = x0 ),
+the cost function drives the difference variable to a stationary
+β
+state d satisfying:
+
+The final step of the proof is to establish the equivalence
+between the standard parameter update rule in Eq. (8) and
+the modified rule used by Dyadic EP in Eq. (34). Indeed, if
+we were to apply the standard update rule in the extended
+space, the update would be:
+1
+∆θ ∝ −
+2β
+
+
+
+β
+
+′β
+
+−β
+
+∂H(z , z , θ) ∂H(z , z
+−
+∂θ
+∂θ
+
+′−β
+
+, θ)
+
+β
+
+JF⊤ (x0 , θ)d = β
+
+
+.
+
+∂C 0
+(x )
+∂x
+
+(72)
+
+To recover this exact error signal in an N -dimensional space,
+we postulate a modified dynamical system FA (x) compris-
+
+(69)
+15
+
+Equilibrium Propagation for Non-Conservative Systems
+
+F. Derivation of the Hopfield-like Energy
+
+ing the standard EP dynamics and a spatial correction Γ(x):
+FA (x) = F (x) − β
+
+∂C
+(x) + Γ(x)
+∂x
+
+In this section, we derive the explicit energy functional for
+the Continuous Asymmetric Hopfield dynamics defined in
+Eq. (35). The force field is given by:
+
+(73)
+
+Let ∆x = xβA − x0 denote the displacement from the
+free equilibrium. Expanding the stationarity condition
+FA (xβA ) = 0 to first order around x0 yields:
+JF (x0 , θ)∆x − β
+
+∂C 0
+(x ) + Γ(xβA ) ≈ 0
+∂x
+
+F (x) = ρ′ (x) ⊙ (Jρ(x)) − x.
+
+We omit external inputs J in for brevity, as they appear symmetrically in the Jacobian. The variational Hamiltonian is
+defined as:
+
+
+
+
+z + z′
+z + z′
+H(z, z ′ ) = −(z − z ′ )⊤ F
++ βC
+.
+2
+2
+(79)
+
+(74)
+
+To ensure the first-order displacement matches the Dyadic
+β
+EP error signal (i.e., ∆x ≈ d ), we substitute Eq. (72) into
+the expansion:
+
+Γ(xβA ) = JF⊤ (x0 , θ) − JF (x0 , θ) ∆x
+
+(75)
+
+= −2AJ (x0 , θ)(xβA − x0 )
+
+(76)
+
+To analyze this expression, we introduce the midpoint m =
+z+z ′
+and the difference d = z − z ′ . Since the separation
+2
+between z and z ′ is induced solely by the nudging parameter
+β, the difference scales as ∥d∥ ∼ O(β). We therefore
+neglect terms of order O(∥d∥3 ) (i.e., or equivalently O(β 3 ))
+as they do not contribute to the gradient of the cost.
+
+This uniquely recovers the AsymEP augmented dynamics.
+Finally, to eliminate the O(β 2 ) error, AsymEP evaluates the
+centered difference of two opposite nudges:
+0
+x±β
+A =x ±β
+
+dxA
++ O(β 2 )
+dβ β=0
+
+(78)
+
+The activation at the midpoint can be approximated as:
+ρ(m) =
+
+(77)
+
+ρ(z) + ρ(z ′ )
++ O(∥d∥2 ).
+2
+
+(80)
+
+Similarly, the difference in activations is:
+
+Subtracting these states cancels the O(β 2 ) error, yielding
+β
++β
+−β
+1
+3
+2 (xA − xA ) = d + O(β ), successfully recovering the
+exact post-synaptic update term.
+
+ρ(z) − ρ(z ′ ) = ρ′ (m) ⊙ d + O(∥d∥3 ).
+
+(81)
+
+Inverting this relation, we express the state difference as:
+E.2. Physical Trade-offs and the Extended Space
+
+z − z ′ = (ρ(z) − ρ(z ′ )) ⊙ ρ′ (m) + O(∥d∥3 ).
+
+We can view AsymEP and Dyadic EP as a space-time tradeoff of the same underlying physical optimization problem.
+
+(82)
+
+We substitute these expansions into the interaction term
+of the Hamiltonian, Hint = −(z − z ′ )⊤ (ρ′ (m) ⊙ Jρ(m)).
+Applying the identity a⊤ (b ⊙ c) = (a ⊙ b)⊤ c, we obtain:
+
+AsymEP preserves the original N -dimensional state space
+of the network at the cost of temporal non-locality. The system must evolve sequentially, requiring physical memory
+not only to store the free equilibrium x0 for the asymmetric correction, but also to store the successive stationary
+states required to evaluate the contrastive gradient update.
+AsymEP thus serves as the direct, spatially minimal extension of EP.
+
+⊤
+
+Hint = − ((z − z ′ ) ⊙ ρ′ (m)) Jρ(m)
+
+
+ρ(z) + ρ(z ′ )
+≈ −(ρ(z) − ρ(z ′ ))⊤ J
+.
+2
+
+(83)
+
+Expanding the product gives:
+
+Dyadic EP provide a learning signal that is local in both
+space (where z − z ′ encodes the gradient) and time (allowing the nudged phases to execute in parallel) at the cost
+of doubling the state space. In particular, capturing nonconservative forces in this extended space requires a specific bilinear coupling, rather than a trivial superposition
+of uncoupled subsystems. It can be seen as a blueprint for
+future neuromorphic hardware.
+
+Hint = −
+
+1h
+ρ(z)⊤ Jρ(z) + ρ(z)⊤ Jρ(z ′ )
+2
+i
+
+− ρ(z ′ )⊤ Jρ(z) − ρ(z ′ )⊤ Jρ(z ′ ) .
+
+(84)
+
+We decompose the connectivity matrix J into its symmetric
+part S and antisymmetric part A. The first and last terms
+simplify to ρ(z)⊤ Sρ(z). The cross terms satisfy:
+
+Ultimately, the reduction of Dyadic EP to AsymEP via the
+variables m and d proves the universality of EP’s variational
+principle.
+
+ρ(z)⊤ Jρ(z ′ ) − ρ(z ′ )⊤ Jρ(z) = ρ(z)⊤ (J − J ⊤ )ρ(z ′ )
+= ρ(z)⊤ (2A)ρ(z ′ ).
+16
+
+(85)
+
+Equilibrium Propagation for Non-Conservative Systems
+
+Thus, the interaction term reduces to:
+1
+1
+Hint = − ρ(z)⊤ Sρ(z) + ρ(z ′ )⊤ Sρ(z ′ )
+2
+2
+− ρ(z)⊤ Aρ(z ′ ) + O(∥d∥3 ).
+
+The input parameters are then updated using the standard
+learning rule (21). In particular, the presynaptic term associated with the input weights is given by,
+∂Fi
+= δik ρ′ (xi )ul .
+in
+∂Jkl
+
+(86)
+
+Finally, for the nudging term, we expand the cost function
+around the midpoint:
+C(m) =
+
+1
+(C(z) + C(z ′ )) + O(∥d∥2 ).
+2
+
+The presynaptic terms associated with the dynamical paramdyn
+eters Jij
+depend on the experiment.
+G.1. Symmetric Initialization
+
+(87)
+
+G.1.1. L EARNING RULES
+
+When multiplying by β, the remainder term becomes β ·
+O(∥d∥2 ). Since ∥d∥ ∼ O(β), this remainder is of order
+O(β 3 ) and can be consistently discarded alongside the thirdorder terms from the interaction expansion.
+
+For clarity, we write the learning rules for VF and AsymEP.
+For the input weights, using (93), we have:
+i
+1 h +β
+in
+′ 0
+∆Jik
+∝
+(xi − x−β
+(94)
+i )ρ (xi )uk ,
+2β
+
+Combining all these components, the final Hamiltonian is:
+
+while for the recurrent weight, we get:
+i
+1 h +β
+dyn
+0
+′ 0
+(xi − x−β
+)ρ(x
+)
+.
+∆Jij
+)ρ
+(x
+∝
+i
+j
+i
+2β
+
+1
+1
+H(z, z ′ ) = − ρ(z)⊤ Sρ(z) + ρ(z ′ )⊤ Sρ(z ′ )
+2
+2
+1
+⊤
+′
+− ρ(z) Aρ(z ) + (∥z∥2 − ∥z ′ ∥2 )
+2
+β
+′
++ (C(z) + C(z )).
+(88)
+2
+
+in
+∆Jik
+∝
+
+(89)
+(90)
+
+To complement Fig. 2, we report the evolution of the accuracy of the three methods in Fig. 4. We consider a layered
+network with 50 hidden neurons. While this capacity is
+insufficient for state-of-the-art performance, it amplifies the
+difference in accuracy between models to aid visualization.
+Models are trained for 20 epochs starting from a symmetric
+configuration, the natural setting for both VF and EP. With
+this initialization, AsymEP consistently outperforms the
+other methods and learns faster by exploiting the additional
+degrees of freedom of the asymmetric network.
+
+As in the main text, the neuronal dynamics are governed by
+the vector field:
+
+
+X dyn
+Fi = ρ′ (xi ) 
+Jij ρ(xj ) + bi (u) − xi ,
+(91)
+j
+
+G.2. Fixed Asymmetry Ratio
+
+where the input-dependent bias bi (u) is precomputed for
+each MNIST input u as:
+Jilin ul .
+
+(96)
+
+G.1.2. S UPPLEMENTARY N UMERICAL R ESULTS
+
+G. Experimental Details
+
+bi (u) =
+
+ i
+1 h +β
+ρ(xi ) − ρ(x−β
+)
+uk ,
+i
+2β
+
+and for the recurrent weights:
+i
+1 h +β
+dyn
+−β
+−β
+∆Jij
+∝
+ρ(xi )ρ(x+β
+j ) − ρ(xi )ρ(xj ) . (97)
+2β
+
+This system recovers the original continuous Hopfield dynamics when z = z ′ (assuming β = 0).
+
+X
+
+(95)
+
+For EP, we have:
+
+The saddle-point dynamics, given by Eq. 32, generated by
+this Hamiltonian are:
+β ∂C
+dz
+= ρ′ (z) ⊙ (Sρ(z) + Aρ(z ′ )) − z −
+,
+dt
+2 ∂z
+′
+dz
+β ∂C
+= ρ′ (z ′ ) ⊙ (Sρ(z ′ ) + Aρ(z)) − z ′ +
+.
+dt
+2 ∂z ′
+
+(93)
+
+This section details the implementation for the fixed asymmetry ratio experiments presented in Section 5.2, followed
+by complementary numerical results regarding learning
+speed and induced Jacobian asymmetry.
+
+(92)
+
+l∈in
+
+This term projects the input space into the recurrent subspace. The bias yields a diagonal contribution to the Jacobian JF = ∂F
+∂x , and therefore does not contribute to the
+antisymmetric correction used in the augmented dynamics
+Eq. (20) of AsymEP.
+
+G.2.1. L EARNING RULES
+Parametrization and notation. To enforce a fixed asymmetry ratio, we explicitly parameterize the independent elements of Eq. (38). We introduce two parameter vectors θS
+17
+
+Equilibrium Propagation for Non-Conservative Systems
+
+Parameter
+
+Sym. Init. / Feedforward
+sec. 5.1 & 5.3
+
+Fixed rstr
+sec. 5.2
+
+Fixed rstr & rin
+app. G.3
+
+0.05
+0.01
+0.5
+0.5
+20
+10
+40 / 20
+64
+n.a.
+784 - n.a. -10
+tanh
+s ∼ U (−1, 1)
+θ ∼ N (0, N1 )
+10
+
+0.05
+0.01
+0.3
+0.5
+30
+10
+30
+√64
+60
+784-50-10
+tanh
+s ∼ U(−1, 1)
+θ ∼ N (0, N1 )
+10
+
+0.0125
+0.0025
+0.3
+0.5
+40
+10
+40
+√64
+60
+all-to-all, 500 hid
+tanh
+s ∼ U(−1, 1)
+θ ∼ N (0, N1 )
+10
+
+Learning Rate (Input-Hidden)
+Learning Rate (Hidden-Output)
+Time Step (Dynamics Integration)
+Nudging Parameter (β)
+Free-phase Steps (nfree )
+Nudged-phase Steps (nnudge )
+Number of Epochs
+Batch Size
+Scaling Parameter γ
+Structure
+Activation function ρ
+Initial Recurrent State s
+Initial Parameters θ
+Number of Runs (training + inference)
+
+Table 3. Trained Model Hyperparameters on MNIST. N is the total number of neurons, U(−1, 1) is a uniform distribution, and N (µ, σ 2 )
+is a Gaussian distribution. For the rstr parametrization, we choose more cautious hyperparameters for training and inference compared to
+the symmetric initialization, due to increasingly non-conservative and potentially oscillatory dynamics.
+
+elements of S̃, the full matrices are constructed as:
+S
+S̃ij = δij ξi + (1 − δij )θk(max(i,j),min(i,j))
+,
+A
+Ãij = ϵij θk(max(i,j),min(i,j))
+,
+
+(99)
+(100)
+
+where ϵij is the Levi-Civita symbol. The dynamical parameters are then given by:
+dyn
+Jij
+= γ(cS S̃ij + cA Ãij ),
+
+with normalization coefficients
+p
+2
+1 − rstr
+cS =
+,
+FS
+
+cA =
+
+(101)
+
+rstr
+,
+FA
+
+defined in terms of the Frobenius norms:
+v
+uN
+M
+uX
+X
+2
+F =t
+ξ2 + 2
+θS ,
+
+Figure 4. Evolution of the mean accuracy and standard deviation
+(over 10 runs) during training on MNIST for AsymEP, EP, and VF.
+Models use 50 hidden neurons.
+
+S
+
+i
+
+i=1
+
+k
+
+(102)
+
+(103)
+
+k=1
+
+v
+u M
+u X
+2
+θkA .
+FA = t2
+
+(104)
+
+k=1
+
+and θA of size M = Ndyn (Ndyn − 1)/2, which encode the
+off-diagonal elements of the symmetric and antisymmetric
+components S̃ and Ã, respectively. The correspondence
+between matrix and vector indices is given by:
+
+k(i, j) =
+
+(i − 1)(i − 2)
++ j,
+2
+
+Presynaptic computation. The dependence of the normalization coefficients on the parameters introduces additional regularization terms in the learning rule compared
+to the parameterization of (Scellier &Bengio, 2017). The
+gradients of the normalization coefficients are:
+
+(1 ≤ j < i ≤ Ndyn )
+(98)
+
+∂cS
+θkS
+=
+−2c
+S
+2,
+∂θkS
+(FS )
+
+where the condition j < i selects the strictly lower triangular
+elements. Introducing an additional vector ξ for the diagonal
+
+∂cA
+θA
+= −2cA k 2 .
+A
+∂θk
+(FA )
+18
+
+∂cS
+ξm
+= −cS
+2,
+∂ξm
+(FS )
+
+(105)
+(106)
+
+Equilibrium Propagation for Non-Conservative Systems
+
+Parameter
+
+Comparison Dyn.
+sec. 5.4
+
+2 hidden layers
+sec. 5.4
+
+3 hidden layers
+sec. 5.4
+
+Learning Rate (Input-Hidden)
+Learning Rate (Hidden-Hidden)
+Learning Rate (Hidden-Output)
+Time Step (Dynamics Integration)
+Nudging Parameter (β)
+Free-phase Steps (nfree )
+Nudged-phase Steps (nnudge )
+Number of Epochs
+Batch Size
+Layer Structure
+Activation function ρ
+Initial Recurrent State s
+Initial Parameters θ
+Number of Runs (training + inference)
+
+0.0016
+0.0016
+0.0016
+0.4
+0.3
+40
+20
+50
+64
+784-500-200-10
+tanh
+s ∼ U(−1, 1)
+θ ∼ N (0, N1 )
+10
+
+0.0013
+0.0013
+0.0013
+0.3
+0.5
+40
+20
+40
+64
+784-500-500-10
+tanh
+s ∼ U(−1, 1)
+θ ∼ N (0, N1 )
+10
+
+0.6
+0.6
+0.6
+0.0075
+0.20
+60
+30
+40
+64
+784-500-500-500-10
+tanh
+s ∼ U (−1, 1)
+θ ∼ N (0, N1 )
+10
+
+Table 4. Trained Model Hyperparameters on Fashion-MNIST. N is the total number of neurons, U(−1, 1) is a uniform distribution, and
+N (µ, σ 2 ) is a Gaussian distribution. For the rstr parametrization, we choose more cautious hyperparameters for training and inference
+compared to the symmetric initialization, due to increasingly non-conservative and potentially oscillatory dynamics.
+
+Combining these with the derivatives of the matrices S̃ and
+Ã, we have:
+∂ S̃ij
+= δip δjq + δiq δjp ,
+∂θkS
+
+∂ S̃ij
+= δij δkj
+∂ξk
+
+∂ Ãij
+= δip δjq − δiq δjp ,
+∂θkA
+
+(where p > q):
+
+N
+θkA X
+∂Fi
+′
+=
+γc
+ρ
+(x
+)
+−2
+Ãij ρ(xj )
+A
+i
+(FA )2 j=1
+∂θkA
+
+(107)
+
+
++ δip ρ(xq ) − δiq ρ(xp ) .
+
+(108)
+
+(111)
+
+where k corresponds to the index pair (p, q) with p > q, as
+defined in Eq. (98). The full presynaptic terms are then:
+
+Initialization. To ensure the stability of the system, we
+initialize our parameters suchhthat the
+i variance of dynamdyn
+ical parameters scales as Var Jij ∝ 1/Ndyn . This is a
+conservative choice for the layered architectures used in our
+dyn
+experiments, where many entries of Jij
+are zero.
+
+• For the diagonal parameters ξm :
+
+N
+∂Fi
+ξm X
+S̃ij ρ(xj )
+= γcS ρ′ (xi ) −
+∂ξm
+(FS )2 j=1
+
++ δim ρ(xm ) .
+
+In practice, we initialize the parameter vectors θS , θA , and
+ξ with identical variances σ 2 . For large Ndyn , the expected
+Frobenius norms approximate to E[FS,A ] ≈ Ndyn σ. Consequently, the normalization coefficients become:
+
+(109)
+
+p
+
+• For the off-diagonal symmetric parameters θkS (where
+p > q):
+
+cS ≈
+
+
+N
+∂Fi
+θkS X
+′
+=
+γc
+ρ
+(x
+)
+−2
+S̃ij ρ(xj )
+S
+i
+(FS )2 j=1
+∂θkS
+
+2
+1 − rstr
+,
+Ndyn σ
+
+cA ≈
+
+rstr
+.
+Ndyn σ
+
+(112)
+
+Since the symmetric and antisymmetric components are statistically independent, the variance of the weights is derived
+as follows:
+
+
++ δip ρ(xq ) + δiq ρ(xp ) .
+
+• Diagonal elements (i = j):
+
+(110)
+h
+i
+2
+1 − rstr
+Var Jiidyn = γ 2 c2S σ 2 ≈ γ 2
+.
+2
+Ndyn
+
+• For the off-diagonal antisymmetric parameters θkA
+19
+
+(113)
+
+Equilibrium Propagation for Non-Conservative Systems
+
+• Off-diagonal elements (i ̸= j):
+i
+h
+
+γ2
+dyn
+= γ 2 c2S + c2A σ 2 ≈ 2 ,
+Var Jij
+Ndyn
+
+a zero-cost baseline (perfect prediction) during learning.
+Specifically, for each method and value of rstr , we calculate the cumulative loss by summing the batch-averaged
+costs of the first 5 epochs (out of 30, to avoid saturation
+effects), and reporting the mean and standard deviation over
+10 independent training runs. Mathematically, for each run:
+
+(114)
+
+i
+h
+dyn
+∝ 1/Ndyn , we set:
+To satisfy Var Jij
+γ=
+
+p
+Ndyn
+
+(115)
+Cumul. Loss =
+
+Note that by random matrix theory, diagonal elements do
+not affect stability in the large Ndyn limit.
+
+(116)
+
+Ãij = ϵij θk(max(i,j),min(i,j)) .
+
+(117)
+
+NX
+batches
+
+
+X
+
+
+epoch=1 k=1
+
+(x0 ,u)∈Bk
+
+
+C(x0 , u) 
+,
+|Bk |
+
+(120)
+where Bk represents the k-th batch, and |Bk | denotes the
+number of examples in the batch. The parameters are updated after each batch step; consequently, the free equilibrium x0 is inferred using the updated parameters and the
+current example u.
+
+Potential Simplification. Although the parameterization
+above is fully general, a simpler construction is possible
+by removing self-connections (ξ = 0) and enforcing identical parameterization for the symmetric and antisymmetric
+components, i.e., θS = θA = θ. The matrix elements then
+become:
+S̃ij = (1 − δij )θk(max(i,j),min(i,j)) ,
+
+5
+X
+
+In this case, the Frobenius norms are equal (FS = FA ), and
+we can omit the explicit normalization:
+q
+dyn
+2 S̃ + r Ã .
+Jij
+= 1 − rstr
+(118)
+ij
+str ij
+For a parameter θk corresponding to indices (p, q) with
+p > q, the presynaptic term is given by:
+q
+
+∂Fi
+2 +r
+1 − rstr
+= ρ′ (xi )
+str δip ρ(xq )
+∂θk
+q
+
+ (119)
+2 −r
+1 − rstr
++
+δ
+ρ(x
+)
+str
+iq
+p .
+While this parameterization works in simulations and keeps
+the number of parameters constant for all rstr , it constrains
+the asymmetry to be “homogeneous”, by which we mean
+that the asymmetry ratio is identical for every pair of neurons; hence, the network cannot learn to be symmetric in one
+region and antisymmetric in another. Therefore, we choose
+to explore the more general case of (38) in our experiments.
+
+Figure 5. Cumulative loss as defined by (120) over the first 5
+epochs of training, for different asymmetry ratios rstr . We compare
+VF (orange) and AsymEP (blue), under two training regimes:
+training only J in (dashed) or all parameters (solid).
+
+In Fig 5, we observe that learning slows down for both algorithms when rstr ≳ 0.6. This behavior likely results from
+the increased difficulty of reaching a stationary state as the
+dynamics become strongly asymmetric. With a fixed number of inference steps, incomplete convergence degrades the
+accuracy of the gradient estimates, thereby slowing down
+the learning. Fig 5 shows that while VF can eventually
+achieve competitive accuracy, it is consistently slower than
+AsymEP as soon as asymmetry is introduced.
+
+G.2.2. S UPPLEMENTARY N UMERICAL R ESULTS
+To complement the results of Fig 3, we analyze the training
+efficiency as a function of the asymmetry ratio rstr and investigate the robustness of VF by monitoring the Jacobian
+asymmetry.
+Training efficiency. We first study the training efficiency
+of the two algorithms as a function of the asymmetry ratio rstr . Inspired by the related concept in (Cesa-Bianchi
+&Lugosi, 2006), we define the cumulative loss as the accumulated difference between the free equilibrium cost and
+
+Jacobian asymmetry. We next examine how the structural asymmetry rstr is reflected in the Jacobian of the dy20
+
+Equilibrium Propagation for Non-Conservative Systems
+
+namics (35), given by:
+∂Fi
+dyn ′
+= (1 − δij )ρ′ (xi )Jij
+ρ (xj )
+∂sj
+h
+i
++ δij ρ′ (xi )(Jiidyn ρ′ (xi )) + ρ′′ (xi )bi − 1 .
+(121)
+In our layered architecture, the self-connections are zero
+(Jiidyn = 0). For the following analysis, we neglect all diagonal terms in the Jacobian (including external inputs and
+potential), since they do not contribute to the antisymmetric
+correction (20) and thus to the discrepancy between the performance of VF and AsymEP. Consequently, we define the
+following asymmetry ratio based solely on the off-diagonal
+Jacobian JF,off :
+(122)
+
+Figure 6. Asymmetry ratio of the Jacobian rjac defined in equation
+(122) after training for different asymmetry ratios rstr . We compare
+VF (orange) and AsymEP (blue), under two training regimes:
+training only J in (dashed) or all parameters (solid).
+
+The results are presented in Fig 6. For each trained model
+and ratio rstr , we compute rjac averaged over the stationary
+states of the first batch (64 images) across 10 independent
+runs. We observe that when structural asymmetry is strong
+and all parameters are trained, VF partially compensates for
+the asymmetry by adjusting the neuronal states. This can be
+understood by rewriting the ratio as:
+
+
+dyn
+dyn ⊤
+ρ′ (xi ) Jij
+− (Jji
+) ρ′ (xj )
+F
+rjac =
+.
+(123)
+dyn ′
+ρ′ (xi )Jij
+ρ (xj )
+
+Consequently, local stability requires the largest real eigenvalue of the effective weight matrix to be strictly less than 1.
+Assuming weights are initialized independently with variance σ 2 , Girko’s circular law dictates that the eigenvalues
+of√an asymmetric matrix uniformly populate a disk of radius
+σ n in the complex plane. In contrast, imposing symmetry
+forces the eigenvalues
+√ onto the real line, broadening the
+spectral radius to 2σ n according to Wigner’s semicircle
+law. As a result, asymmetric networks can stably accommodate larger variance in the weight initializations than their
+symmetric counterparts.
+
+Compared to the structural asymmetry ratio in Eq. (37),
+a value of rjac < rstr indicates that the neuronal states effectively dampen the structural asymmetry, rendering the
+dynamics more symmetric. This symmetrization of the Jacobian appears without imposing an additional symmetrization penalty and could be enhanced using the method of
+(Laborieux &Zenke, 2022). This mechanism likely explains
+the superior performance of ‘All (VF)’ compared to ‘J in
+(VF)’ in Fig 3, as the former is able to use the additional
+degrees of freedom to reduce the effective asymmetry at
+high rstr .
+
+Asymmetry nevertheless introduces imaginary eigenvalues
+and, consequently, damped oscillations. To study this effect
+experimentally in a controlled setting, we constrain the input
+projections J in . In the experiments of the main text, fixing
+the structural asymmetry ratio rstr still allowed AsymEP
+to reduce oscillations by aligning and increasing the input
+projections J in , thereby adding stabilizing diagonal contributions to the Jacobian. To isolate the network’s ability to
+suppress oscillations independently of the magnitude of the
+input drive, we further constrain the relative scale of J in and
+J dyn by imposing
+
+rjac =
+
+⊤
+∥JF,off − JF,off
+∥F
+,
+∥JF,off ∥F
+
+F
+
+rin =
+
+G.3. Stability analysis with Fixed Asymmetry Ratio &
+Constrained Inputs Projection
+
+∥J in ∥F
+∥J in ∥F
+=
+,
+∥J dyn ∥F
+γ
+
+(124)
+
+where ∥J dyn ∥F = γ following Eq. (101). Defining unscaled
+input projections J˜in , we set
+
+A complete stability analysis of the non-conservative dynamics trainable with AsymEP is beyond the scope of this
+work. Nevertheless, for the class of continuous Hopfield
+networks considered here, standard arguments from random
+matrix theory suggest that asymmetry inherently improves
+asymptotic stability.
+
+J in = rin γ
+
+J˜in
+∥J˜in ∥F
+
+(125)
+
+G.3.1. L EARNING RULES
+
+In the dynamics defined by Eq. (91), the linear leak term
+−xi shifts the spectrum of the system’s Jacobian by −1.
+
+Reusing the notations of the previous section, we write
+Jilin = γcin J˜ilin with the normalization cin = rin /Fin , where
+21
+
+Equilibrium Propagation for Non-Conservative Systems
+
+Fin = ∥J˜in ∥F . Applying the chain rule yields:
+"
+#
+in X
+∂Fi
+J˜kl
+′
+in
+˜
+= γcin ρ (xi ) δik ul − 2
+J um .
+in
+Fin m im
+∂ J˜kl
+
+(126)
+
+And for γ we have:
+1
+∂Fi
+= (Fi + xi ).
+∂γ
+γ
+
+(127)
+
+G.3.2. S UPPLEMENTARY N UMERICAL R ESULTS
+Figure 7. Comparison of AsymEP and VF on a feedforward network. Test accuracy on MNIST is shown as a function of training
+epochs for a single-hidden-layer network with 20 neurons. Curves
+report the mean and standard deviation over 10 runs. Best accuracies are 92.7% ± 0.5% (AsymEP) and 64.3% ± 2.0% (VF).
+
+Table 5 reports a worst-case control experiment where the
+structural asymmetry is fixed at rstr = 0.7 while the input
+scale ratio rin is varied. The experiment uses an all-to-all
+architecture on MNIST (excluding direct input-to-output
+connections). The output variance during extended inference (steps 30-50) confirms that the system successfully
+learns to suppress oscillations even when rin is severely restricted. Any small residual oscillations can be mitigated by
+time-averaging over the inference steps.
+
+G.5. Advantages of Non-Conservatives Dynamics
+In Section 5.4, we compare three (non-)conservative dynamics under varying constraints. To further evaluate learning
+speed, Table 6 reports network performance after a single epoch. These results confirm our earlier observation:
+AsymEP learns faster than VF.
+
+Finally, rin can be interpreted as a measure of the external
+signal magnitude relative to the internal recurrent dynamics.
+These results indicate that the system remains capable of
+learning and stabilizing even under a low external input
+drive. Even when the input projection ∥J in ∥F is 100 times
+smaller than the recurrent connections ∥J dyn ∥F , the network
+still achieves 36.34 ± 6.25% accuracy, which is well above
+chance level (∼ 10%).
+
+G.6. Feedforward CIFAR-10 Experiments
+This appendix details the architecture and hyperparameters
+of the deep feedforward experiments comparing backpropagation (BP), VF, AsymEP and Dyadic EP on CIFAR-10
+(see subsection 5.5)
+
+G.4. Feedforward Network
+Architecture. We use a nine-layer convolutional network
+(denoted CNN9). The first eight layers are convolutional
+with 3 × 3 kernels and zero-padding; spatial downsampling
+is performed by strided convolutions (stride 2 on layers 2, 4,
+6, 8 and stride 1 otherwise), so no pooling is used. The channel widths follow the sequence 3 → 64 → 64 → 128 →
+128 → 256 → 256 → 512 → 512, reducing the spatial
+resolution from 32 × 32 to 2 × 2. The last layer is a fully
+connected readout mapping the 512 × 2 × 2 feature map
+to the 10 class logits. All hidden units use a ReLU nonlinearity.
+p Weights are initialized with the Kaiming scheme
+(σ = 2/fan-in) and biases at zero.
+
+G.4.1. L EARNING RULES
+For clarity, we write the learning rules for VF and AsymEP
+in a feedforward architecture with one hidden layer using
+the notation of Section 5.3. For the input weights connecting
+to the hidden layer, we get the usual formula:
+in
+∆Jik
+∝
+
+i
+1 h +β
+−β
+0
+(hi − hi )ρ′ (hi )uk ,
+2β
+
+(128)
+
+while for the feedforward weights connecting the hidden to
+the output layer, we get:
+∆(Wh→o )ji ∝
+
+i
+1 h +β
+0
+′ 0
+(oj − o−β
+j )ρ (oj )ρ(hi ) .
+2β
+
+Training setup. All methods are trained for 40 epochs
+with batch size 64 and repeated over 5 seeds. Inputs are
+normalized per channel and augmented with random 32 ×
+32 crops (padding 4), random horizontal flips and Cutout
+(one 8 × 8 patch). Parameters are updated with SGD with
+momentum 0.9, weight decay 5 × 10−4 and gradient-norm
+clipping at 1, under a cosine learning-rate schedule decaying
+from 3.5 × 10−2 to 2 × 10−4 . Test accuracy is reported
+on an exponential moving average of the weights (decay
+
+(129)
+
+Note that EP is not applicable in this case.
+G.4.2. S UPPLEMENTARY N UMERICAL R ESULTS
+In addition to the final accuracy reported in Sec. 5.3, we
+show in Fig. 7 the evolution of the accuracy over 20 epochs
+for AsymEP and VF.
+22
+
+Equilibrium Propagation for Non-Conservative Systems
+Table 5. Output variance and final test accuracy on MNIST (%) across different values of rin with rstr = 0.7. (mean ± std over 10 runs)
+(500 hiddens, all-to-all, no input-output).
+
+rin
+0.01
+0.10
+0.50
+1.00
+
+Output variance
+Untrained
+Epoch 80
+(3.38 ± 0.90) × 10−4
+(2.33 ± 0.48) × 10−4
+(1.34 ± 0.32) × 10−5
+(6.27 ± 1.24) × 10−7
+
+(5.22 ± 2.34) × 10−5
+(1.39 ± 0.17) × 10−4
+(1.06 ± 0.25) × 10−6
+(1.75 ± 0.50) × 10−8
+
+Table 6. Test accuracy on Fashion-MNIST (%) at Epoch 1 (mean
+± std 10 runs). The table compares three classes of network
+dynamics: Continuous Hopfield (CH), Predictive Coding (PC),
+and Standard dynamics. Each is evaluated under three connec⊤
+tivity structures: Asymmetric (Asym, Bk ̸= Wk+1
+), Symmet⊤
+ric/conservative (Sym, Bk = Wk+1 ), and Feedforward (Feedfor,
+Bk = 0).
+EP
+
+AsymEP
+
+Asym
+74.91 ± 0.45
+Feedfor
+74.36 ± 0.29
+Sym
+74.57 ± 0.43
+Asym
+77.83 ± 0.47
+PC
+Sym
+76.23 ± 0.39
+Asym
+76.87 ± 0.51
+Standard
+Feedfor
+77.92 ± 0.51
+CH
+
+VF
+48.98 ± 4.09
+48.84 ± 3.46
+57.75 ± 3.37
+61.50 ± 4.06
+63.98 ± 0.73
+
+0.9995). The targets are smoothed (ε = 0.1), which for
+the EP methods amounts to nudging toward the smoothed
+one-hot vector.
+Relaxation hyperparameters. The four methods differ
+only in the gradient estimator: BP uses automatic differentiation, while the EP-based methods contrast stationary
+states of the corresponding relaxation dynamics. VF uses
+an integration step η = 1.0, nudging β = 0.1, and at most
+K = 1000 relaxation steps with an early-stopping threshold
+of 9 × 10−6 on the mean state update. Dyadic EP uses
+the same settings except for a nudging strength β = 0.1.
+AsymEP uses a smaller step η = 0.5, nudging β = 0.1,
+and up to K = 250 relaxation steps with a threshold of
+1 × 10−4 .
+
+23
+
+Test Acc. (%)
+Epoch 80
+36.34 ± 6.25
+90.54 ± 0.19
+94.96 ± 0.10
+96.30 ± 0.09
+
+
+\ No newline at end of file
author	Yuren Hao <yurenh2@illinois.edu>	2026-07-03 05:56:50 -0500
committer	Yuren Hao <yurenh2@illinois.edu>	2026-07-03 05:56:50 -0500
commit	b83947778e2c776f757a07d4719b7ce961d7ed55 (patch)
tree	b9cc01d7adda691d9156d9d04f4fb2f644674e96 /refs/scurria_2602.03670v2.txt