summaryrefslogtreecommitdiff
path: root/ep_run/scurria_nonconservative.txt
diff options
context:
space:
mode:
authorYuren Hao <yurenh2@illinois.edu>2026-07-03 05:56:50 -0500
committerYuren Hao <yurenh2@illinois.edu>2026-07-03 05:56:50 -0500
commitb83947778e2c776f757a07d4719b7ce961d7ed55 (patch)
treeb9cc01d7adda691d9156d9d04f4fb2f644674e96 /ep_run/scurria_nonconservative.txt
Initial commit: ept — backprop-free equilibrium transformer (EP)
Code (ep_run/), organized docs (docs/{method,campaign,hardware,outreach,paper}), analysis scripts (scripts/), ONBOARDING.md entry point. Large data/checkpoints git-ignored (share separately). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_014FAPDWQ49M5Ye3NpTndTpn
Diffstat (limited to 'ep_run/scurria_nonconservative.txt')
-rw-r--r--ep_run/scurria_nonconservative.txt1708
1 files changed, 1708 insertions, 0 deletions
diff --git a/ep_run/scurria_nonconservative.txt b/ep_run/scurria_nonconservative.txt
new file mode 100644
index 0000000..fa6d620
--- /dev/null
+++ b/ep_run/scurria_nonconservative.txt
@@ -0,0 +1,1708 @@
+ Equilibrium Propagation for Non-Conservative Systems
+
+
+ Antonino Emanuele Scurria 1 Dimitri Vanden Abeele 1 Bortolo Matteo Mognetti 2 Serge Massar 1
+
+
+ Abstract from inference, the transmission of nonlocal error signals,
+ and synchronous layer-wise computations with explicit gra-
+ Equilibrium Propagation (EP) is a physics-
+ dient storage. These constraints have no clear analog in
+ inspired learning algorithm that uses stationary
+ physical systems, making backpropagation challenging to
+ states of a dynamical system both for inference
+ implement in neuromorphic or analog hardware. Conse-
+arXiv:2602.03670v2 [cs.LG] 1 Jun 2026
+
+
+
+
+ and learning. In its original formulation it is
+ quently, understanding how credit assignment can instead
+ limited to conservative systems, i.e. to dynam-
+ emerge from intrinsic system dynamics, through local inter-
+ ics which derive from an energy function. Given
+ actions and continuous relaxation, is a central question in
+ their applications, it is important to extend EP
+ neuroscience and machine learning.
+ to non-conservative systems, i.e. systems with
+ non-reciprocal interactions. Previous attempts to Equilibrium Propagation (EP) (Scellier &Bengio, 2017)
+ generalize EP to such systems failed to compute represents one of the most promising advances in this direc-
+ the exact gradient of the cost function. Here we tion. It formulates supervised learning as a contrast between
+ propose a framework that extends EP to arbitrary two stationary states of a dynamical system: a ‘free’ phase
+ non-conservative systems, including feedforward where the system evolves autonomously, and a ‘nudged’
+ networks. We keep the key property of equilib- phase where outputs are weakly pushed toward their targets.
+ rium propagation, namely the use of stationary The local change in neural states between these phases re-
+ states both for inference and learning. However, covers the exact gradient of the cost function with respect to
+ we modify the dynamics in the learning phase by parameters. This enables spatially local learning exploiting
+ a term proportional to the non-reciprocal part of the continuous relaxation of the system without a distinct
+ the interaction so as to obtain the exact gradient backward circuit or explicit weight transport.
+ of the cost function. This algorithm can also be
+ Since its introduction, several works have sought to im-
+ derived using a variational formulation that gen-
+ prove the practicality and biological realism of EP. Algo-
+ erates the learning dynamics through an energy
+ rithmic adaptations include enforcing temporal locality to
+ function defined over an augmented state space.
+ avoid state storage (Ernoult et al., 2020; Falk et al., 2025),
+ Numerical experiments show that this algorithm
+ deriving agnostic updates for black-box energies (Scel-
+ achieves better performance and learns faster than
+ lier et al., 2022), and substituting nudging with clamping
+ previous proposals.
+ (Stern et al., 2021). Theoretically, the framework has been
+ extended to stochastic systems (Scellier &Bengio, 2017;
+ Massar &Mognetti, 2025) and Lagrangian dynamics for
+ 1. Introduction time-varying inputs (Massar, 2025; Pourcel et al., 2025;
+ Standard neural network optimization relies on error back- Berneman &Hexner, 2025). In parallel, simulations have
+ propagation, an algorithm whose computational mechanism explored suitable substrates, ranging from spiking (Mar-
+ is difficult to reconcile with biological (Crick, 1989) and tin et al., 2021; O’Connor et al., 2019) and resistive net-
+ physical implementations (Indiveri &Liu, 2015). Specif- works (Kendall et al., 2020) to coupled oscillators (Wang
+ ically, backpropagation requires a backward pass distinct et al., 2024; Rageau &Grollier, 2025), as well as quantum
+ systems (Wanjura &Marquardt, 2025; Massar &Mognetti,
+ 1
+ Laboratoire d’Information Quantique (LIQ) CP224, Université 2025; Scellier, 2024). Experimental realizations have been
+ libre de Bruxelles (ULB), Av. F. D. Roosevelt 50, 1050 Bruxelles, demonstrated in memristor crossbars (Yi et al., 2023), self-
+ Belgium 2 Interdisciplinary Center for Nonlinear Phenomena and
+ Complex Systems CP231, Université libre de Bruxelles (ULB), Av. adjusting electrical circuits (Dillavou et al., 2022; 2024),
+ F. D. Roosevelt 50, 1050 Bruxelles, Belgium. Correspondence to: elastic networks (Altman et al., 2024), and classical Ising
+ Antonino Emanuele Scurria <antonino.scurria@ulb.be>. models trained on quantum annealers (Laydevant et al.,
+ 2024).
+ Proceedings of the 43 rd International Conference on Machine
+ Learning, Seoul, South Korea. PMLR 306, 2026. Copyright 2026 Despite these recent developments and the theoretical el-
+ by the author(s).
+
+ 1
+ Equilibrium Propagation for Non-Conservative Systems
+
+egance of EP, its standard formulation remains restricted a framework where the original dynamics serve for infer-
+to conservative systems. In these systems, dynamics are ence, while a new augmented dynamic is used to compute
+derived from an energy function, which inherently enforces gradients of the cost Eq. (2). In this augmented phase, the
+symmetry (e.g., symmetric synaptic connections Jij = Jji ) output neurons are nudged towards their targets (as in stan-
+through the action-reaction principle. This constraint pre- dard EP), while a local corrective term – proportional to the
+cludes the use of EP in a broad class of models characterized antisymmetric part of the Jacobian at the free equilibrium
+ ∂
+by non-conservative forces. This includes the feedforward JF (x0 , θ, u) = ∂x F (x0 , θ, u) – is added to the forces. The
+architectures dominant in modern AI, biological circuits, exact gradients of the cost with respect to parameters are
+as well as physical systems that reach stationary states far then obtained by contrasting stationary states of the aug-
+from thermodynamic equilibrium, such as nonlinear optical mented system.
+systems driven by external lasers (Cin et al., 2025), opto-
+ Second, we introduce Dyadic EP, a ‘variational’ approach
+electronic systems (Kalinin et al., 2025), exciton-polariton
+ to learning in non-conservative systems. This method in-
+condensates (Sajnok &Matuszewski, 2025), active meta-
+ volves doubling the number of variables in the system’s
+materials (Brandenbourger et al., 2019) and active colloids
+ state space and subsequently introducing a new energy func-
+(Bishop et al., 2023; Osat &Golestanian, 2023) (see (Bowick
+ tion in this extended space. This approach takes advantage
+et al., 2022) for a review).
+ of the extended space to execute the positive and negative
+Formally, we consider a dynamical system governed by nudging phases in parallel, recovering the same computa-
+a non-reciprocal force field F (x, θ, u), which relaxes to a tional cost as AsymEP. Derived from first principles, this
+stationary configuration x0 satisfying: approach is inspired by established methods for mapping
+ dissipative dynamical systems onto conservative ones by
+ F (x0 , θ, u) = 0, (1) doubling the degrees of freedom (Bateman, 1931; Galley,
+where x represents the state variables, θ the learnable param- 2013; Aykroyd et al., 2025). A more comprehensive study
+eters and u the static input. Our goal, given a target y(u), is of the theoretical framework and its application to feedfor-
+to compute the gradient of the cost function C(x0 , y) at this ward networks can be found in (Scurria, 2026). Our method
+equilibrium, is related to the Dual Propagation algorithm (Høier et al.,
+ dC 0 2023; Høier &Zach, 2023; 2024) and constitutes an inde-
+ (x , y), (2) pendent, first-principles generalization of Dyadic Learning
+ dθ
+and update θ to minimize the cost. (Nest &Høier; Høier et al., 2024)—previously limited to
+ Hopfield networks—to arbitrary force fields.
+Previous attempts to extend EP to non-conservative dynam-
+ics include the Vector Field (VF) algorithm (Scellier et al., Third, we validate our framework on MNIST (LeCun, 1998),
+2018). However, as noted by the authors, this method pro- Fashion-MNIST, and CIFAR-10. In continuous Hopfield
+vides an unbiased gradient of the cost Eq. (2) only in the networks initialized with symmetric connection matrices,
+conservative case. To mitigate this, (Laborieux &Zenke, AsymEP achieves better accuracy and learns faster than
+2024) proposed adding a penalty to keep the Jacobian close EP and VF. Additionally, when we constrain the network
+to symmetry, essentially forcing the system to be as con- to have a strong degree of structural asymmetry, in which
+servative as possible. Alternative methods related to VF, case EP is inapplicable, AsymEP outperforms VF. Finally,
+which similarly do not compute the exact gradient, were when we restrict connections to a feedforward structure, our
+proposed in (Farinha et al., 2020; Costa &Santos, 2025) and algorithm effectively trains all parameters; in contrast, VF
+for specific systems in simulation (Cin et al., 2025; Sajnok is limited to training the last layer, acting essentially as an
+&Matuszewski, 2025). Extreme Learning Machine (Huang et al., 2006; Wang et al.,
+ 2022) with poor performance.
+Conversely, generalizations of backpropagation can handle
+non-reciprocal forces and compute the exact gradient of In summary, this theoretical work proposes two generaliza-
+the cost Eq. (2) but inherit the same challenges in physical tions of EP beyond conservative systems to arbitrary differ-
+implementations. For instance, Backpropagation Through entiable dynamics that compute in their stationary states.
+Time (Werbos, 1990) unfolds the network in time to ap-
+ply standard backpropagation, Recurrent Backpropagation 2. Equilibrium Propagation Overview
+(Almeida, 1990; Pineda, 1987) avoids this memory require-
+ment but still requires a specific circuit to propagate errors, 2.1. Conservative Systems
+and the continuous Adjoint Method (Chen et al., 2018) addi- We first review standard Equilibrium Propagation (EP)
+tionally requires integrating the dynamics backward in time (Scellier &Bengio, 2017). We consider a network described
+which is not physically possible for a dissipative system. by an energy function E(x, θ, u), such that the force field is
+In this paper, we first propose Asymmetric EP (AsymEP),
+
+ 2
+ Equilibrium Propagation for Non-Conservative Systems
+
+derived from the potential E: stationary point, i.e., that Eq. (7) holds. Second, EP implic-
+ ∂
+ itly assumes that the Jacobian JE (x0 , u) = ∂x FE (x0 , u) is
+ ∂
+ FE (x, θ, u) = − E(x, θ, u). (3) invertible. In this work, we assume this condition holds and
+ ∂x will not state it explicitly hereafter. Third, for simplicity,
+The objective is to compute the total gradient dC 0 we omit the dependency on the input u and target y in the
+ dθ (x , y) of a
+(quadratic) cost function C(x, y) evaluated at the minimum following equations.
+energy configuration of the system. This free equilibrium
+denoted x0 (which depend implicitly in θ and u), satisfies 2.2. Vector Field
+the stationarity condition:
+ The Vector Field (VF) algorithm, introduced in (Scellier
+ ∂ et al., 2018), is an early attempt to adapt EP to non-
+ − E(x0 , θ, u) = 0. (4) reciprocal forces. This method relies on the observation
+ ∂x
+ that, for conservative systems, linearizing the right-hand
+To compute gradients, we introduce the augmented energy side of Eq. (9) around the equilibrium point x0 yields
+functional:
+ ∂E(xβ , θ) ∂E(x−β , θ)
+  
+ ET (x, θ, β, u, y) = E(x, θ, u) + βC(x, y), (5) 1
+ lim −
+ β→0 2β ∂θ ∂θ
+where β is a scalar nudging parameter. The stationary config- ⊤  β  (10)
+ x − x−β
+ 
+ ∂FE 0
+uration of this augmented system is obtained by integrating = lim − (x , θ) ,
+ β→0 ∂θ 2β
+the dynamics
+ dx ∂ET (x, θ, β, u) where FE = −∂x E(x, θ) is the conservative force. It is
+ =− , (6) therefore tempting to use the right-hand side of Eq. (10) for
+ dt ∂x
+ parameter updates of non-conservative systems, for which
+until the energy minimum is reached. This new fixed point no energy function E exists.
+xβ , called nudged equilibrium, satisfies:
+ The VF algorithm adopts precisely this approach. It uses
+ ∂E(xβ , θ, u) ∂C(xβ , y) the nudged counterpart of Eq. (7),
+ +β = 0. (7)
+ ∂x ∂x
+ ∂C β
+ F (xβ , θ) − β (x ) = 0, (11)
+The training procedure, as improved in (Laborieux et al., ∂x
+2021), uses two nudged phases with factors ±β (with
+β ̸= 0). Starting from x0 , the system relaxes to two in conjunction with the learning rule Eq. (10):
+nearby perturbed equilibria, x+β and x−β . The displace- ⊤  β
+ x − x−β
+  
+ment x+β − x−β is then used to compute the parameter ∂F 0
+ ∆θ = ϵ (x , θ) . (12)
+update in the learning rule: ∂θ 2β
+
+ 1 ∂E(xβ , θ, u) ∂E(x−β , θ, u)
+  
+ ∆θ = −ϵ − , (8) However, as noted in (Scellier et al., 2018), Eq. (12) does
+ 2β ∂θ ∂θ not align with the true gradient dC 0
+ dθ (x ) and is exact only if
+where ϵ > 0 is the learning rate. The theoretical foundation the force is conservative. To see this, let JF (x, θ) denote
+of EP is the result that, in the limβ→0 of Eq. (8), we get: the Jacobian of the vector field F (x, θ) (in components
+ (JF (x, θ))ij = ∂F∂xi (x,θ)
+ j
+ ). Differentiating the equilibrium
+ dC(x0 , y) d ∂E(xβ , θ, u) 0
+ condition F (x , θ) = 0 with respect to θ gives
+ = , (9)
+ dθ dβ ∂θ
+ dx0 ∂F 0
+see Appendix D.1. The error of the above method is O(β 2 ). JF (x0 , θ) + (x , θ) = 0. (13)
+This error can be further reduced using holomorphic equi- dθ ∂θ
+librium propagation (Laborieux &Zenke, 2022). Consequently, the exact gradient of the cost is
+Thus, EP recovers the exact gradient of the cost function ⊤
+using only local computations. In this manner, learning dC 0 dx0 ∂C 0
+ (x ) = (x )
+implements gradient descent without an explicit backward dθ dθ ∂x
+ ⊤ 
+pass, and credit assignment is realized through the system’s
+  
+ ∂F 0 ⊤ 0
+ −1 ∂C 0
+intrinsic relaxation dynamics. =− (x , θ) JF (x , θ) (x ) .
+ ∂θ ∂x
+ | {z }| {z }
+Three remarks can be made at this point. First, EP does not pre-synaptic post-synaptic
+require the system to be at an energy minimum, but only at a (14)
+
+ 3
+ Equilibrium Propagation for Non-Conservative Systems
+
+The terms ’pre-synaptic’ and ’post-synaptic’ in Eq. (14) Algorithm 1 Asymmetric EP (AsymEP)
+are used by analogy with neuronal transmission: the pre- 1: Inputs: Force field F (x, θ), cost function C(x), nudg-
+synaptic factor captures the local influence of θ on the force ing parameter β, learning rate ϵ.
+F , while the post-synaptic factor is the sensitivity of the 2: repeat
+cost to state perturbations. 3: 1. Free Phase: Evolve to stationary state
+If instead we differentiate the nudged equilibrium condition 4: Evolve the system dynamics
+in Eq. (11) with respect to β and evaluate at β = 0, we 5:
+ dx
+obtain = F (x, θ), (17)
+ β
+ dt
+ dx ∂C 0
+ JF (x0 , θ) − (x ) = 0, (15) 6: until convergence to the stationary state x0 .
+ dβ β=0 ∂x 7: 2. Jacobian Decomposition
+which gives 8: Compute the Jacobian at equilibrium:
+ 9:
+ dxβ −1 ∂C 0 ∂F 0
+ = JF (x0 , θ) (x , y). (16) JF (x0 , θ) = (x , θ), (18)
+ dβ β=0 ∂x ∂x
+The right-hand side of Eq. (16) represents the effective post- 10: and decompose it in its antisymmetric part:
+ 11:
+synaptic term used by the VF algorithm (Eq. 12). Compar-
+ing this with the exact post-synaptic term derived in Eq. (14),
+ AJ (x0 , θ) = 12 (JF (x0 , θ) − JF (x0 , θ)⊤ ). (19)
+we see that they coincide only if JF = JF⊤ , i.e., only if the
+system is conservative. 12: 3. Nudged Phase: Augmented Dynamics
+Now, let SJ (x0 , θ) and AJ (x0 , θ) denote the symmetric 13: Integrate the dynamics twice starting from x0
+and antisymmetric parts of the Jacobian at the free (un- 14:
+nudged) equilibrium, respectively. Then, we show in Ap- dx ∂C
+pendix A that the gradient error increases with the spectral = F (x, θ) − β (x) − 2AJ (x0 , θ) (x − x0 ),
+ −1 dt ∂x
+radius of SJ (x0 , θ) AJ (x0 , θ). Consequently, large (20)
+antisymmetric contributions degrade the gradient estima- 15: until convergence to two new stationary states
+tion, confirming empirical observations in the Appendix of x±β
+ A .
+(Ernoult et al., 2020). In fact, in the pathological limit where 16: 4. Parameter Update
+the Jacobian would be purely antisymmetric SJ (x0 , θ) = 0, 17: Update the parameters according to:
+the update of VF gives the negative of the true gradient, 18:
+maximizing the cost rather than minimizing it. ⊤ !
+ xβA − x−β
+ 
+ ∂F 0 A
+ ∆θ = ϵ (x , θ) . (21)
+ ∂θ 2β
+3. Asymmetric EP
+Here, we introduce Asymmetric EP (AsymEP), see Algo- 19: until convergence of θ
+rithm 1, which removes the gradient estimate error inherent 20: Output: Optimized parameters θ.
+to VF by adding a local correction term to the augmented
+inference dynamics. The new nudged equilibrium xβA satis-
+fies: where JFA (x, θ) is the Jacobian of the modified dynamical
+ ∂C β system Eq. (20). At the equilibrium point x0 , JFA is equal
+ F (xβA , θ) − β (x ) − 2AJ (x0 , θ) (xβA − x0 ) = 0, (22) to the transpose of the original Jacobian:
+ ∂x A
+As in VF, we then obtain two perturbed states x±β
+ A for op-
+ JFA (x0 , θ) = JF (x0 , θ) − 2AJ (x0 , θ)
+posite nudging ±β and apply the contrastive learning rule = SJ (x0 , θ) − AJ (x0 , θ)
+of Eq. (12). = JF⊤ (x0 , θ). (24)
+We now show that AsymEP gives rise to the correct learning
+rule, i.e. that right-hand side of Eq. (21) is proportional to where we have used the decomposition Eq. (44) of the orig-
+the gradient of the cost function dC 0 inal Jacobian J into its symmetric and antisymmetric com-
+ dθ (x ) at the equilibrium
+ 0
+point x (Eq. 14). To this end, note that the same reasoning ponents. Therefore, the left hand side of Eq. (23) is equal to
+leading to Eq. (16) leads to the true post-synaptic term
+
+ dxβA −1 ∂C 0 dxβA −1 ∂C 0
+ = JFA (x0 , θ) (x ). (23) = JF⊤ (x0 , θ) (x ), (25)
+ dβ β=0 ∂x dβ β=0 ∂x
+
+ 4
+ Equilibrium Propagation for Non-Conservative Systems
+
+which, using Eq. (14), proves the result. Additionally, al- until a stationary point (z β , z ′β ) is reached. Upon conver-
+though implied by the equality with the true gradient, we gence, we follow the standard EP paradigm in using the
+explicitly demonstrate the equivalence of the gradient esti- difference z β − z ′β to compute the post-synaptic term. Un-
+ ′
+ ′
+mates obtained by AsymEP and Backpropagation Through der the change of variables m = z+z 2 and d = z − z , we
+Time in Appendix B following (Ernoult et al., 2019). prove in Appendix D that m follows the original dynamics
+ F (ensuring valid inference), while d relaxes to a "physical"
+Note that the corrective term −2AJ (x0 , θ)(x − x0 ) in
+ error signal proportional to the cost gradient.
+Eq. (20) is spatially local: AJ vanishes for unconnected
+neurons, and (x − x0 ) is available at the synapse given the It is important to notice that while Dyadic EP introduces a
+memory mechanism already required by Eq. (12). This distinct formulation, it remains consistent with the general
+correction can create backward connections (Section 5.3). theoretical setting of EP and matches the computational
+However, in physical realizations, both feedforward and cost of AsymEP. Note also that we start the evolution of
+feedback connections must be physically present, though the free phase (β = 0) with the identical initial condition
+feedback may be deactivated during inference. for z and z ′ , (i.e., d = 0). This guarantees that integrat-
+ ing Eq. (32) leads to a symmetric stationary point where
+4. Dyadic EP z 0 = z ′0 . Finally, we underline that the modified varia-
+ tional update rule in Eq. (34) is equivalent to the standard
+We now introduce Dyadic EP (Algorithm 2), a variational symmetric EP update rule in Eq. (8) (see Appendix D).
+algorithm that computes the exact cost gradient in the limit
+ Now, to make this concrete, consider a continuous Hopfield
+of infinitesimal nudging. It maps the original n-variable
+ network (see also Eq. (35)) with an asymmetric connection
+dynamics F (x, θ) onto a 2n-variable system (z, z ′ ) defined
+ matrix J. After some calculations (see Appendix F), the
+by an energy H(z, z ′ , θ) and cost D(z, z ′ ). We show in
+ augmented energy of the system can be re-expressed as:
+Appendix E that AsymEP can be seen as the first-order
+projection of Dyadic EP onto the original n-dimensional 1 1
+ HT = − ρ(z)⊤ Sρ(z) + ρ(z ′ )⊤ Sρ(z ′ ) − ρ(z)⊤ Aρ(z ′ )
+state space. 2 2
+ 1 β
+The new system is defined by the energy H and cost function + (∥z∥ − ∥z ∥ ) + (C(z, y) + C(z ′ , y)) ,
+ 2 ′ 2
+ 2 2
+D, given in terms of F and C by: (29)
+ 
+ z + z′
+  where S and A are the symmetric and antisymmetric parts
+ H(z, z ′ , θ) = −(z − z ′ )⊤ F ,θ , of J, respectively and ρ is an element-wise non-linearity.
+ 2
+  ′
+  An interesting analogy can be drawn with standard learning
+ z+z
+ D(z, z ′ ) = C , (26) rules in discrete Hopfield networks (Hopfield, 1982). For
+ 2 a sequence of binary memories {ξ 1 , . . . , ξ m } where ξ µ ∈
+where z, z ′ ∈ Rn . In order to learn, we introduce the aug- {−1, 1}n , S P corresponds to the standard autoassociative
+mented energy Hebbian rule µ ξ µ (ξ µ )⊤ , creating stable attractors at each
+ pattern. In contrast, A corresponds to the heteroassociative
+ HT (z, z ′ , θ, β) = H(z, z ′ , θ) + βD(z, z ′ ). (27)
+ rule (e.g., a cycle between ξ µ and ξ ν given by ξ ν (ξ µ )⊤ −
+The equilibrium configuration corresponds to a saddle point ξ µ (ξ ν )⊤ ), encoding transitions between patterns.
+of HT , where z minimizes and z ′ maximizes the energy.
+ For this specific energy, the update rule given by Eq. (34)
+This poses no issue for EP, which requires only that the
+ can be re-expressed as:
+joint state (z, z ′ ) reaches a stationary state. Although this
+min-maximization can be interpreted as z evolving forward 1 ⊤
+ ρ(z ′β ) − ρ(z β ) ρ(z ′β ) + ρ(z β ) . (30)
+ 
+ ∆J ∝ −
+and z ′ backward in time, in practice they evolve forward 2β
+simultaneously, as we integrate the coupled equations: In the limit β → 0, this gives:
+ z + z′
+  
+ dz ∂HT β
+ !
+ =− =F ,θ d
+ dt ∆J ∝ ⊙ ρ′ (m)ρ(m)⊤ . (31)
+ ∂z ⊤ 2
+ β
+ z − z′ β ∂C z + z ′
+  
+ ∂F
+ + − ,
+ 2 ∂z z+z′ 2 ∂z 2
+ 2 matching the learning rule in (Pineda, 1987), with
+ β
+
+ dz ′ ∂HT
+ 
+ z + z′
+  limβ→0 dβ being the error signal.
+ =+ ′
+ =F ,θ
+ dt ∂z 2
+ ′ ⊤
+ β ∂C z + z ′
+   
+ z−z ∂F 5. Numerical Experiments
+ − + ,
+ 2 ∂z ′ z+z′ 2 ∂z ′ 2
+ 2 In this section, we numerically validate AsymEP (Algo-
+ (28) rithm 1). The neuronal dynamics follows the one introduced
+
+ 5
+ Equilibrium Propagation for Non-Conservative Systems
+
+Algorithm 2 Dyadic EP where ∥ · ∥F denotes the Frobenius norm. Note that this
+ 1: Inputs: Force field F (x, θ), cost function C(x, y), metric does not capture the asymmetry of the Jacobian,
+ nudging parameter β, learning rate ϵ which depends on the state x.
+ 2: repeat For numerical experiments, we restricted the network to a
+ 3: 1. Free Phase: Evolve to stationary state layered architecture with a single hidden layer to facilitate
+ 4: Evolve the system dynamics, starting from identi- comparison with prior work. Accordingly, J in contains
+ cal initial conditions z(0) = z ′ (0) = z0 , only input-to-hidden connections, while J dyn is block off-
+ 5: diagonal, encoding bidirectional interactions between the
+ dz ∂H dz ′ ∂H
+ =− , =+ ′, (32) hidden and output layers. Both J in and J dyn are trained.
+ dt ∂z dt ∂z
+ We first use MNIST (LeCun, 1998) (60k train, 10k test)
+ 6: until stationary states z 0 , z ′0 are reached.
+ followed by Fashion-MNIST to validate AsymEP, and then
+ 7: 2. Nudged Equilibrium
+ we further validate AsymEP and Dyadic EP by comparing
+ 8: Evolve the system dynamics, starting from the
+ them to Backpropagation on a convolutional feedforward,
+ solution of the free phase z 0 = z ′0 :
+ with CIFAR-10. Inputs are normalized using min-max to
+ 9:
+ dz ∂HT dz ′ ∂HT [−1, 1] and targets are one-hot encoded in {−1, 1}. All
+ =− , =+ , (33) hyperparameters are detailed in Appendix G, along with
+ dt ∂z dt ∂z ′
+ additional details and numerical results.
+10: until two nudged stationary states z β , z ′β are
+ reached.
+ 5.1. Symmetric Initialization
+11: 3. Parameter Update
+12: Update the parameters according to: We start by comparing AsymEP with standard EP and
+13: VF. All algorithms are initialized with an identical sym-
+ 1 ∂H(z β , z ′β , θ)
+  
+ ∆θ = −ϵ (34) metric matrix J dyn . EP maintains this symmetry through-
+ β ∂θ out training, while VF and AsymEP induce asymmetry in
+14: until convergence of θ J dyn . Since EP and VF already achieve strong performance
+15: Output: Optimized parameters θ. on MNIST, the purpose of this experiment is to validate
+ AsymEP and compare it against EP and VF rather than
+ outperform the state of the art.
+in (Scellier &Bengio, 2017), and is generalized to allow Figure 1 compares the three algorithms as a function of
+for non-reciprocal forces as in (Scellier et al., 2018). For hidden-layer dimension after 1 and 20 epochs. AsymEP
+clarity, we express the forces in a form that explicitly sepa- consistently outperforms the baselines, suggesting it learns
+rates the contributions of the external input and the recurrent faster and better.
+interactions:
+ Figure 2 studies the evolution of the asymmetry ratio rstr .
+ F (x) = ρ′ (x) ⊙ J in u + J dyn ρ(x) − x,
+ 
+ (35) The results are reported for 50 hidden neurons. As expected,
+ EP preserves the initial weight symmetry. In contrast, VF
+where u ∈ RNin denotes the input and x ∈ RNdyn the neu- and AsymEP induce non-trivial evolution of rstr following
+ronal state, comprising both hidden and output units. The two distinct patterns, resulting in three distinct network
+matrices J in ∈ RNdyn ×Nin and J dyn ∈ RNdyn ×Ndyn define the configurations. A complementary figure is available in Ap-
+input and recurrent connectivity, respectively. The activation pendix G.1.
+function ρ(·) is taken to be the hyperbolic tangent, applied
+element-wise. 5.2. Fixed Asymmetry Ratio
+If J dyn is symmetric, we can define the energy: While the previous section focused on networks compatible
+ 1 1 with all three algorithms (EP, VF, AsymEP), we now turn
+ E(x) = ∥x∥2 − ρ(x)⊤ J dyn ρ(x) − ρ(x)⊤ J in u, (36) to architectures with strong structural asymmetry. In this
+ 2 2
+ regime, EP is inapplicable by construction, and, as we show,
+which is identical to that of (Scellier &Bengio, 2017), pro- VF performs poorly, contrary to AsymEP which remains
+vided that the input neurons are activated as ρ(u). effective.
+Equation (35) naturally motivates a quantitative measure of To this end, we consider a class of networks where the
+structural asymmetry rstr , defined as: asymmetry ratio rstr defined in Eq. (37) is kept fixed. Let S̃
+ ⊤ and à be arbitrary symmetric and antisymmetric matrices
+ ∥(J dyn − J dyn )/2∥F in RNdyn ×Ndyn respectively. We enforce a fixed rstr via the
+ rstr = , (37)
+ ∥J dyn ∥F
+
+ 6
+ Equilibrium Propagation for Non-Conservative Systems
+
+ where γ ∈ R is a learnable global scale.
+ Using VF and AsymEP, we train a layered network with one
+ hidden layer of 50 neurons (in which case S̃ and à are block
+ off-diagonal) for different values of rstr to investigate the
+ impact of structural asymmetry. We compare two training
+ regimes: training only the input weights J in (and the scale
+ γ), versus training all parameters including J dyn . The first
+ regime trains only the external forces from the input ρ′ (x) ⊙
+ J in u (which correspond to a symmetric contribution in the
+ Jacobian) applied to our non-conservative system, while
+ the second additionally trains J dyn and therefore the non-
+ (a) Results after one epoch. symmetric part of the Jacobian directly.
+ Figure 3 summarizes the results. We find that AsymEP
+ maintains robust performance across all asymmetry levels
+ (e.g., achieving an accuracy of 93.8 ± 0.4% at rstr = 0 and
+ 94.9 ± 0.2% at rstr = 0.875 when training all parameters)
+ and can even learn when the recurrent connection matrix
+ J dyn is completely antisymmetric (rstr = 1). Additionally,
+ training all parameters shows significant improvement over
+ training only J in .
+ In contrast, VF performs well at low asymmetry ratios
+ but degrades as asymmetry increases, eventually dropping
+ to chance levels (e.g., accuracies of 5 ± 3% and 8 ± 4%
+ (b) Results after 20 epochs. at rstr = 1 for input-only and all-parameter training, re-
+Figure 1. Comparison of algorithm performance on MNIST using spectively). When only J in is trained, VF accuracy col-
+a layered architecture with one hidden layer and symmetric ini- lapses around rstr ≈ 0.5, whereas training all parameters
+tialization. Squares denote AsymEP, circles EP, and triangles VF. delays this collapse until rstr ≈ 0.8. Our analysis in Ap-
+Test accuracy (averaged over 10 runs) is shown after one epoch
+ pendix G.2.1 reveals that VF adjusts the dynamics such that
+(Fig. 1a) and 20 epochs (Fig. 1b).
+ the asymmetry of the Jacobian’s off-diagonal terms remains
+ strictly lower than the structural asymmetry ratio. The train-
+ ing appears to adjust the neuronal state such that neurons
+ connected by strongly asymmetric weights have low activa-
+ tion. As shown in Appendix G.2.1, AsymEP learns faster
+ than VF across all levels of asymmetry.
+ Finally, Appendix G.3 opens with a brief theoretical dis-
+ cussion of the stability of these non-conservative dynamics,
+ followed by simulations on all-to-all topologies with con-
+ strained rstr and input projections J in . Even in this worst-
+ case setting, AsymEP reduces oscillations and improves
+ stability.
+
+ 5.3. Feedforward Architectures
+Figure 2. Evolution of the asymmetry ratio rstr (defined in Eq. (37)) We now consider a purely feedforward architecture. Here
+during training on MNIST for AsymEP, EP and VF, initialized
+from a symmetric configuration. The models use 50 hidden neu- VF trains only the last layer: with no backward connections,
+rons. the output nudging signal cannot reach earlier layers, so for
+ every layer but the last the nudged stationary states coincide
+ with the free states, giving zero weight updates. As only
+following parameterization of the recurrent parameters: the output layer is trained, the system essentially becomes
+ "q # an Extreme Learning Machine (Huang et al., 2006; Wang
+ dyn 2 S̃ Ã et al., 2022). In contrast, AsymEP introduces a correction
+ J =γ 1 − rstr + rstr , (38) that generates effective backward connections, allowing the
+ ∥S̃∥F ∥Ã∥F
+
+ 7
+ Equilibrium Propagation for Non-Conservative Systems
+
+ tivity structures inspired by (Millidge et al., 2023), while
+ keeping the number of trainable parameters fixed.
+ Experiments are conducted on Fashion-MNIST using a two-
+ hidden-layer network with hidden dimensions 500 and 200.
+ Network states are denoted (x0 , x1 , x2 , x3 ), where x0 is
+ the input and x3 = xL the output. Forward and backward
+ connections are denoted by Wk and Bk , respectively, with
+ W1 = J in .
+ We consider three classes of dynamics. First, the Continuous
+ Hopfield (CH) dynamics introduced previously:
+ dxk  
+ = −xk +ρ′ (xk )⊙ Wk ρ(xk−1 )+(1−δk,L )Bk ρ(xk+1 ) .
+ dt
+ (40)
+ Second, Predictive Coding (PC) dynamics, defined through
+ the prediction errors ek = xk − Wk ρ(xk−1 ), whose fixed
+Figure 3. Impact of the structural asymmetry ratio rstr on accuracy
+(top) and standard deviation over 10 runs (bottom) on MNIST. point ek = 0 corresponds to a standard feedforward net-
+We compare VF (orange) and AsymEP (blue) under two training work:
+regimes: training only J in (dashed) or all parameters (solid).
+ dxk
+ = −ek + (1 − δk,L ) (ρ′ (xk ) ⊙ (Bk ek+1 )) . (41)
+ dt
+nudging signal to influence all layers. We make this explicit
+ Third, a standard dynamics chosen for direct comparison
+for a network with one hidden layer.
+ with backpropagation:
+Let the state x be partitioned in hidden h and output o
+ dxk
+layers. The recurrent connection matrix is then J dyn = = −xk + Wk ρ(xk−1 ) + (1 − δk,L )Bk ρ(xk+1 ). (42)
+ dt
+
+ 0 0
+ . The forces of the system are:
+ Wh→o 0 For each dynamics, we examine three connectivity scenar-
+  β ios.
+ Fh = ρ′ (h) ⊙ J in u + λ(Wh→o )⊤ (o − o0 ) − h
+ 
+ 
+ 
+   
+ 0
+ 
+ ⊤
+ Fo = ρ′ (o) ⊙ Wh→o ρ(h) − λWh→o (h − h ) • In the asymmetric case (Bk ̸= Wk+1
+  β
+ ), the backward
+ (39)
+  weights Bk are randomly initialized and kept fixed
+ 
+  ∂C while only the forward weights are trained, ensuring a
+ + λβ −o
+ 
+ 
+ ∂o fair comparison (i.e., identical number of parameters);
+where λ is 0 during the free inference and 1 during the in PC, the learning rule for Bk is zero when only inputs
+nudged phase (Eq. 20). The force on the hidden layer Fhβ are clamped.
+now depends on the output layer through the term ρ′ (h) ⊙ ⊤
+ ⊤ • In the symmetric / conservative case (Bk = Wk+1 ), the
+(Wh→o ) (o − o0 ), enabling the nudge (the term β ∂C ∂o ) to CH and PC dynamics derive from an energy functional,
+influence the hidden layer. This implicitly assumes that the while the standard dynamics remains non-conservative
+hardware implementation supports the physical activation due to its non-symmetric Jacobian.
+of these backward connections.
+ • In the feedforward case (Bk = 0), the PC and stan-
+We validate this using a single hidden layer of only 20 neu-
+ dard dynamics coincide; for the standard dynamics, the
+rons on MNIST. After training, VF saturates with 64.3 ±
+ AsymEP learning rule mirrors backpropagation, with
+2.0% accuracy, whereas AsymEP reaches 92.7 ± 0.5% ac-
+ ∆xβk = 2β1
+ (xβ − x−β ) acting as the propagated error
+curacy. We expect this discrepancy to increase with network
+ signal.
+depth, since this increases the number of layers unable to
+learn under VF. A figure with the accuracy during training
+can be found in Appendix G.4.2. Table 1 shows that AsymEP consistently outperforms VF
+ in both asymmetric and feedforward settings, in final ac-
+ curacy, learning speed, and stability. After a single epoch
+5.4. Advantages of Non-Conservative Dynamics
+ it already provides on average a 15% accuracy gain with
+AsymEP is not tied to a specific neural dynamics. To further an order-of-magnitude reduction in variance. Remarkably,
+assess the benefits of training non-conservative dynamics AsymEP with asymmetric connectivity also surpasses EP
+using AsymEP, we compare several dynamics and connec- on symmetric networks despite training only the forward
+
+ 8
+ Equilibrium Propagation for Non-Conservative Systems
+
+weights, suggesting that relaxing symmetry constraints may 6. Discussion and Conclusion
+improve expressivity. Supplementary results are provided
+in Appendix G.5. In this work, we extended Equilibrium Propagation (EP)
+ to non-conservative systems that reach stationary states by
+ deriving two mathematically equivalent algorithms that re-
+ cover the exact gradient of the cost function in the limit of
+Table 1. Test accuracy on Fashion-MNIST (%) at Epoch 50 (mean
+± std 10 runs). BP on a standard feedforward architecture using
+ infinitesimal nudging.
+MSE and SGD achieve 87.37 ± 0.29%. The first approach, Asymmetric EP, preserves the original
+ inference dynamics. It introduces a corrective force during
+ EP AsymEP VF
+ the nudged phase that remains spatially local, as the anti-
+ Asym - 86.78 ± 0.14 85.20 ± 0.12 symmetric Jacobian is null for unconnected neurons and the
+ CH Feedfor - 86.05 ± 0.12 77.76 ± 0.37
+ Sym 84.30 ± 0.13 - -
+ perturbation from equilibrium is available at the synapse
+ Asym - 86.20 ± 0.17 80.71 ± 6.17 level. Unlike standard methods like Recurrent Backpropa-
+ PC gation (Almeida, 1990; Pineda, 1987), this avoids explicit
+ Sym 84.78 ± 0.14 - -
+ Asym - 82.91 ± 0.48 75.52 ± 1.69 digital weight transposition. However, a physical mech-
+ Standard
+ Feedfor - 86.25 ± 0.16 78.58 ± 0.28 anism to obtain the local corrective force at the synapse
+ level remains a subject for future work. We also note that
+ AsymEP shares the temporal non-locality of standard EP.
+Finally, to investigate how AsymEP scales with depth, we The second approach, Dyadic EP, doubles the state space
+trained deeper fully connected networks with two and three to map non-reciprocal dynamics onto an energy land-
+hidden layers of 500 neurons on Fashion-MNIST, reaching scape—conceptually reminiscent of multi-compartment cor-
+86.41 ± 0.22% and 87.8 ± 0.15% test accuracy respectively. tical neurons, where apical dendrites integrate feedback
+ (analogous to z − z ′ ) separately from basal feedforward
+5.5. Feedforward Training on CIFAR-10: BP vs. Dyadic input (analogous to z + z ′ ) (Guerguiev et al., 2017). Addi-
+ EP vs. AsymEP tionally, this expanded space also enables the positive and
+ negative nudging phases to run in parallel. This offers a
+To test whether our framework scales beyond shallow net- pathway to implement a version of EP that is local in time,
+works, we conclude with a deep, purely feedforward CNN but would require a doubling of the degrees of freedom
+architecture trained on CIFAR-10. We compare backprop- on the physical hardware. More fundamentally, the energy
+agation (BP), VF, AsymEP and Dyadic EP in a controlled defined on the extended state shows that the tools and the-
+setting where the gradient estimator is the only difference oretical guarantees obtained for EP should also apply to
+between runs: all methods share the same configuration, the case of non-reciprocal forces, and that the variational
+with the BP gradient replaced by the contrast of stationary principle behind EP is universal in the sense that it can be
+states for the EP-based methods (see App. G.6 for details). applied to all networks which operate in a stationary state.
+Each configuration is trained for 40 epochs over 5 seeds.
+ Furthermore, Dyadic EP is not restricted to the EP com-
+Table 2 reports the final test accuracy. Both of our algo- munity and could suggest a more physically plausible al-
+rithms scale to this regime, closely tracking the BP baseline ternative to the stationary-state Adjoint Method (for fixed
+throughout training and matching its final accuracy: a paired inputs) (Chen et al., 2018): by solving the forward and ad-
+t-test finds no significant difference between Dyadic EP and joint equations simultaneously via relaxation, it circumvents
+BP (p = 0.75), and only a sub-percent gap for AsymEP. a separate backward-in-time pass.
+In contrast, VF makes slight initial progress (peaking near
+30%) before collapsing to chance level (10%). Additional Finally, our experiments on MNIST, Fashion-MNIST, and
+details can be found in Appendix G.6 CIFAR-10 confirm that AsymEP and Dyadic EP consis-
+ tently outperform EP and VF, and notably enables effective
+ training of feedforward networks.
+ Our work thus opens new avenues for learning in neuro-
+Table 2. Test accuracy on CIFAR-10 (%) at epoch 40 (mean ± std
+over 5 seeds). morphic hardware, dissipative physical systems, and neural
+ architectures where asymmetry is intrinsic rather than inci-
+ Method Test Acc. (%) dental.
+ Backpropagation 90.66 ± 0.25
+ Dyadic EP 90.69 ± 0.14
+ AsymEP 89.74 ± 0.14
+ VF 10.00 ± 0.00
+
+
+ 9
+ Equilibrium Propagation for Non-Conservative Systems
+
+Impact Statement References
+This paper presents results that advance the field of machine Almeida, L. B. A learning rule for asynchronous percep-
+learning. There are many potential societal consequences trons with feedback in a combinatorial environment. In
+of our work, none of which we feel must be specifically Artificial neural networks: concept learning, pp. 102–111.
+highlighted here. 1990.
+ Altman, L. E., Stern, M., Liu, A. J., and Durian, D. J. Ex-
+Acknowledgments perimental demonstration of coupled learning in elastic
+ networks. Physical Review Applied, 22(2):024053, 2024.
+AES is fully funded by the Horizon Europe Marie
+Skłodowska-Curie Doctoral Network ’Postdigital Plus’ Aykroyd, C., Bourgoin, A., and Poncin-Lafitte, C. L. Hamil-
+(Grant 101169118). DVA acknowledges the support of tonian treatment of non-conservative systems. arXiv
+the French Community of Belgium through a FRIA fellow- preprint arXiv:2507.18658, 2025.
+ship. SM acknowledges financial support by the Fonds de la
+ Bateman, H. On dissipative systems and related variational
+Recherche Scientifique–FNRS, Belgium under EOS Project
+ principles. Physical Review, 38(4):815, 1931.
+No. 40007536. Computational resources have been pro-
+vided by the Consortium des Équipements de Calcul Intensif Berneman, M. and Hexner, D. Equilibrium propagation for
+(CÉCI), funded by the Fonds de la Recherche Scientifique dissipative dynamics. Advanced Intelligent Systems, pp.
+de Belgique (F.R.S.-FNRS) under Grant No. 2.5020.11 and e202501310, 2025.
+by the Walloon Region.
+ Bishop, K. J., Biswal, S. L., and Bharti, B. Active colloids
+ as models, materials, and machines. Annual Review of
+ Chemical and Biomolecular Engineering, 14(1):1–30,
+ “ἁρμονίη ἀφανὴς φανερῆς κρείττων”
+ 2023.
+ Bowick, M. J., Fakhri, N., Marchetti, M. C., and Ra-
+ maswamy, S. Symmetry, thermodynamics, and topology
+ in active matter. Physical Review X, 12(1):010501, 2022.
+ Brandenbourger, M., Locsin, X., Lerner, E., and Coulais, C.
+ Non-reciprocal robotic metamaterials. Nature communi-
+ cations, 10(1):4608, 2019.
+ Cesa-Bianchi, N. and Lugosi, G. Prediction, learning, and
+ games. Cambridge university press, 2006.
+ Chen, R. T., Rubanova, Y., Bettencourt, J., and Duvenaud,
+ D. K. Neural ordinary differential equations. Advances
+ in neural information processing systems, 31, 2018.
+ Cin, N. D., Marquardt, F., and Wanjura, C. C. Training
+ nonlinear optical neural networks with scattering back-
+ propagation. arXiv preprint arXiv:2508.11750, 2025.
+ Costa, P. and Santos, P. A. Directed equilibrium propagation
+ revisited. Mathematics, 13(11), 2025. ISSN 2227-7390.
+ Crick, F. The recent excitement about neural networks.
+ Nature, 337, 1989.
+ Dillavou, S., Stern, M., Liu, A. J., and Durian, D. J. Demon-
+ stration of decentralized physics-driven learning. Physi-
+ cal Review Applied, 18(1):014040, 2022.
+ Dillavou, S., Beyer, B. D., Stern, M., Liu, A. J., Miskin,
+ M. Z., and Durian, D. J. Machine learning without a pro-
+ cessor: Emergent learning in a nonlinear analog network.
+ Proceedings of the National Academy of Sciences, 121
+ (28):e2319718121, 2024.
+
+ 10
+ Equilibrium Propagation for Non-Conservative Systems
+
+Ernoult, M., Grollier, J., Querlioz, D., Bengio, Y., and Scel- Indiveri, G. and Liu, S.-C. Memory and information pro-
+ lier, B. Updates of equilibrium prop match gradients cessing in neuromorphic systems. Proceedings of the
+ of backprop through time in an rnn with static input. IEEE, 103(8):1379–1397, 2015.
+ Advances in neural information processing systems, 32,
+ 2019. Kalinin, K. P., Gladrow, J., Chu, J., Clegg, J. H., Cletheroe,
+ D., Kelly, D. J., Rahmani, B., Brennan, G., Canakci, B.,
+Ernoult, M., Grollier, J., Querlioz, D., Bengio, Y., and Scel- Falck, F., et al. Analog optical computer for ai inference
+ lier, B. Equilibrium propagation with continual weight and combinatorial optimization. Nature, 645(8080):354–
+ updates. arXiv preprint arXiv:2005.04168, 2020. 361, 2025.
+Falk, M. J., Strupp, A. T., Scellier, B., and Murugan, Kendall, J., Pantone, R., Manickavasagam, K., Bengio,
+ A. Temporal contrastive learning through implicit non- Y., and Scellier, B. Training end-to-end analog neural
+ equilibrium memory. Nature Communications, (16), networks with equilibrium propagation. arXiv preprint
+ 2025. arXiv:2006.01981, 2020.
+Farinha, M. T., Pequito, S., Santos, P. A., and Figueiredo,
+ Laborieux, A. and Zenke, F. Holomorphic equilibrium
+ M. A. T. Equilibrium propagation for complete directed
+ propagation computes exact gradients through finite size
+ neural networks. In Proceedings of the 28th European
+ oscillations. Advances in Neural Information Processing
+ Symposium on Artificial Neural Networks, Computational
+ Systems, 35:12950–12963, 2022.
+ Intelligence and Machine Learning (ESANN 2020), 2020.
+Galley, C. R. Classical mechanics of nonconservative sys- Laborieux, A. and Zenke, F. Improving equilibrium propa-
+ tems. Physical review letters, 110(17):174301, 2013. gation without weight symmetry through jacobian home-
+ ostasis. In Proceedings of the International Confer-
+Guerguiev, J., Lillicrap, T. P., and Richards, B. A. Towards ence on Learning Representations (ICLR) 2024, Virtual
+ deep learning with segregated dendrites. elife, 6:e22901, (ICLR), May 2024.
+ 2017.
+ Laborieux, A., Ernoult, M., Scellier, B., Bengio, Y., Grollier,
+Høier, R. and Zach, C. A lagrangian perspective on dual J., and Querlioz, D. Scaling equilibrium propagation to
+ propagation. In Proceedings of the First Workshop on Ma- deep convnets by drastically reducing its gradient estima-
+ chine Learning with New Compute Paradigms at NeurIPS tor bias. Frontiers in neuroscience, 15:633674, 2021.
+ 2023, New Orleans, LA, USA, Dec 2023.
+ Laydevant, J., Marković, D., and Grollier, J. Training an
+Høier, R. and Zach, C. Two tales of single-phase contrastive
+ ising machine with equilibrium propagation. Nature Com-
+ hebbian learning. In Salakhutdinov, R., Kolter, Z., Heller,
+ munications, 15(1):3671, 2024.
+ K., Weller, A., Oliver, N., Scarlett, J., and Berkenkamp, F.
+ (eds.), Proceedings of the 41st International Conference LeCun, Y. The mnist database of handwritten digits.
+ on Machine Learning, volume 235 of Proceedings of http://yann. lecun. com/exdb/mnist/, 1998.
+ Machine Learning Research, pp. 18470–18488. PMLR,
+ 21–27 Jul 2024. Martin, E., Ernoult, M., Laydevant, J., Li, S., Querlioz, D.,
+ Petrisor, T., and Grollier, J. Eqspike: spike-driven equi-
+Høier, R., Staudt, D., and Zach, C. Dual propagation: accel-
+ librium propagation for neuromorphic implementations.
+ erating contrastive hebbian learning with dyadic neurons.
+ Iscience, 24(3), 2021.
+ In Proceedings of the 40th International Conference on
+ Machine Learning, ICML’23. JMLR.org, 2023. Massar, S. Equilibrium propagation for learning in la-
+Høier, R., Kalinin, K., Ernoult, M., and Zach, C. Dyadic grangian dynamical systems. Physical Review E, 112
+ learning in recurrent and feedforward models. In NeurIPS (3):035304, 2025.
+ 2024 Workshop Machine Learning with new Compute Massar, S. and Mognetti, B. M. Equilibrium propagation:
+ Paradigms, 2024. the quantum and the thermal cases. Quantum Studies:
+Hopfield, J. J. Neural networks and physical systems with Mathematics and Foundations, 12(1):6, 2025.
+ emergent collective computational abilities. Proceedings
+ Millidge, B., Song, Y., Salvatori, T., Lukasiewicz, T., and
+ of the national academy of sciences, 79(8):2554–2558,
+ Bogacz, R. Backpropagation at the infinitesimal infer-
+ 1982.
+ ence limit of energy-based models: Unifying predictive
+Huang, G.-B., Zhu, Q.-Y., and Siew, C.-K. Extreme learning coding, equilibrium propagation, and contrastive hebbian
+ machine: theory and applications. Neurocomputing, 70 learning. In International Conference on Learning Rep-
+ (1-3):489–501, 2006. resentations (ICLR), 2023.
+
+ 11
+ Equilibrium Propagation for Non-Conservative Systems
+
+Nest, T. and Høier, R. Dyadic learning in asymmetric Wang, Q., Wanjura, C. C., and Marquardt, F. Training
+ convnets. In New Frontiers in Associative Memories- coupled phase oscillators as a neuromorphic platform
+ Workshop at ICLR 2026. using equilibrium propagation. Neuromorphic Computing
+ and Engineering, 4(3):034014, 2024.
+Osat, S. and Golestanian, R. Non-reciprocal multifarious
+ self-organization. Nature Nanotechnology, 18(1):79–85, Wanjura, C. C. and Marquardt, F. Quantum equilibrium
+ 2023. propagation for efficient training of quantum systems
+ based on onsager reciprocity. Nature Communications,
+O’Connor, P., Gavves, E., and Welling, M. Training a spik- 16(1):6595, 2025.
+ ing neural network with equilibrium propagation. In The
+ 22nd international conference on artificial intelligence Werbos, P. J. Backpropagation through time: what it does
+ and statistics, pp. 1516–1523. PMLR, 2019. and how to do it. Proceedings of the IEEE, 78(10):1550–
+ 1560, 1990.
+Pineda, F. Generalization of back propagation to recurrent
+ and higher order neural networks. In Neural information Yi, S.-i., Kendall, J. D., Williams, R. S., and Kumar, S.
+ processing systems, 1987. Activity-difference training of deep neural networks using
+ memristor crossbars. Nature Electronics, 6(1):45–51,
+Pourcel, G., Basu, D., Ernoult, M., and Gilra, A. Lagrangian- 2023.
+ based equilibrium propagation: generalisation to arbi-
+ trary boundary conditions & equivalence with hamilto-
+ nian echo learning. arXiv preprint arXiv:2506.06248,
+ 2025.
+
+Rageau, T. and Grollier, J. Training and synchronizing
+ oscillator networks with equilibrium propagation. Neuro-
+ morphic Computing and Engineering, 2025.
+
+Sajnok, K. and Matuszewski, M. Near-equilibrium propaga-
+ tion training in nonlinear wave systems. arXiv preprint
+ arXiv:2510.16084, 2025.
+
+Scellier, B. Quantum equilibrium propagation: Gradient-
+ descent training of quantum systems. arXiv preprint
+ arXiv:2406.00879, 2024.
+
+Scellier, B. and Bengio, Y. Equilibrium propagation: Bridg-
+ ing the gap between energy-based models and backprop-
+ agation. Frontiers in computational neuroscience, 11:24,
+ 2017.
+
+Scellier, B., Goyal, A., Binas, J., Mesnard, T., and Bengio,
+ Y. Generalization of equilibrium propagation to vector
+ field dynamics. arXiv preprint arXiv:1808.04873, 2018.
+
+Scellier, B., Mishra, S., Bengio, Y., and Ollivier, Y. Agnostic
+ physics-driven deep learning. arXiv:2205.15021v1, 2022.
+
+Scurria, A. E. A physical theory of backpropagation: Exact
+ gradients from the least-action principle. 2026.
+
+Stern, M., Hexner, D., Rocks, J. W., and Liu, A. J. Su-
+ pervised learning in physical networks: From machine
+ learning to learning machines. Physical Review X, 11(2):
+ 021045, 2021.
+
+Wang, J., Lu, S., Wang, S.-H., and Zhang, Y.-D. A review
+ on extreme learning machine. Multimedia Tools and
+ Applications, 81(29):41611–41660, 2022.
+
+ 12
+ Equilibrium Propagation for Non-Conservative Systems
+
+A. Gradient Estimation Error in VF where s denotes the dynamical state of the system. This
+ symmetry is the linchpin of the equivalence proof, as the
+In this appendix, we quantify the gradient estimation error gradient expressions derived for BPTT and standard EP
+introduced by VF in the limit where the Jacobian asymmetry differ precisely by a transpose operation applied to ∂F
+ ∂s .
+is small.
+ This observation aligns with our analysis in the main text:
+Comparing the post-synaptic update terms in Eqs. (12) and VF fails in non-conservative systems due to the missing
+(14) gives the following error in the gradient of the cost: transpose in the post-synaptic term (see Eq. (16)). Following
+  ⊤ the derivation in Ernoult et al. (2019) (viz., Appendix A, Eqs.
+ ∂F 0 (31–33)), the recursive relations for the gradients in BPTT
+ Error = − (x , θ)
+ ∂θ are given by:
+  −1 −1  ∂C 0
+× JF (x0 , θ) − JF⊤ (x0 , θ) (x , y), (43) ∂ℓ
+ ∂x ∇BPTT
+ s (0) = (s⋆ , y), (49)
+ ∂s
+To quantify this error, we decompose the Jacobian JF (x, θ) and for all t = 1, . . . , K,
+into its symmetric part SJ (x, θ) and antisymmetric part
+  ⊤
+ ∂F
+ SJ (x, θ) = 12 JF (x, θ) + JF⊤ (x, θ) , ∇BPTT ∇BPTT (t − 1),
+ 
+ s (t) = (x, s⋆ , θ) s (50)
+ (44) ∂s
+ AJ (x, θ) = 12 JF (x, θ) − JF⊤ (x, θ) .
+ 
+  ⊤
+ ∂F
+ ∇BPTT
+ θ (t) = (x, s⋆ , θ) ∇BPTT
+ s (t − 1), (51)
+Assuming the asymmetry AJ (x, θ) is small, we can make ∂θ
+a series expansion in SJ−1 AJ (omitting the dependencies where θ represents the optimization parameters, ℓ is the
+for clarity). Applying the Neumann expansion for small cost function, s⋆ is the free equilibrium state (satisfying
+∥SJ−1 AJ ∥ gives F (s⋆ ) = 0), y is the target, and x is the input. The index t
+ ∞
+ ! denotes the unrolled time steps, initialized at s(0) = s⋆ .
+ X
+ (JF ) −1
+ = (−1) n
+ (SJ−1 AJ )n SJ−1 , (45) In contrast, the gradients computed by VF follow the recur-
+ n=0 sion (viz., Ernoult et al. (2019), Appendix A, Eqs. (24–26)):
+ ∞
+ !
+ X
+ (JF⊤ )−1 = (SJ−1 AJ )n SJ−1 . (46) ∂ℓ
+ ∆EP
+ s (0) = − (s⋆ , y), (52)
+ n=0 ∂s
+Subtracting the two series and assuming convergence, we and for all t ≥ 0,
+finally obtain
+ ∂F
+ ∆EP
+ s (t + 1) = (x, s⋆ , θ) ∆EPs (t), (53)
+ ∞ ∂s
+ !
+ X
+ −1
+ 2n+1
+ −1 ⊤ −1
+ (JF ) − (JF ) = −2 SJ AJ SJ−1 . 
+ ∂F
+ ⊤
+ n=0 ∆EP
+ θ (t + 1) = (x, s ⋆ , θ) ∆EP
+ s (t). (54)
+ (47) ∂θ
+ Comparing these two sets of equations confirms that the only
+B. Equivalence between AsymEP and BPTT difference are Eqs. (50) and (53), specifically the transpose
+ of the Jacobian ∂F
+ ∂s (ignoring the global sign difference in
+In this appendix, we sketch the equivalence between the Eqs. (49) and (52)).
+gradient estimate computed by AsymEP and Backpropaga-
+tion Through Time (BPTT) (Werbos, 1990) for a Recurrent In AsymEP, we modify the dynamics by adding a correction
+Neural Network with fixed inputs. Our derivation relies on term dependent on the antisymmetric part of the Jacobian.
+the proof provided by Ernoult et al. (2019), which estab- Denoting the force of this augmented system by F A , the
+lished that standard (conservative) EP computes gradients Jacobian at the free equilibrium satisfies:
+identical to those of BPTT. To facilitate direct comparison, ⊤
+ ∂F A
+ 
+we adopt their notation for this section. ∂F
+ (x, s⋆ , θ) = (x, s⋆ , θ) . (55)
+ ∂s ∂s
+The proof provided by Ernoult et al. (2019) relies on the
+assumption that the vector field F (i.e., transition function) By substituting this corrected Jacobian into the recursive
+is derived from a scalar potential function, which implies relations, AsymEP recovers the exact transpose required
+that by BPTT. Consequently, our method extends the equiva-
+  ⊤
+ ∂F ∂F lence between EP and BPTT to the general case of non-
+ = , (48) conservative force.
+ ∂s ∂s
+
+ 13
+ Equilibrium Propagation for Non-Conservative Systems
+
+C. Out-of-Equilibrium Mechanics C.3. Symmetry Breaking as Credit Assignment
+Here we sketch the physical picture behind the doubled- On the diagonal manifold z = z ′ the doubled system enjoys
+energy construction of Eq. (26). The full derivation from a gauge symmetry: the auxiliary variable z ′ is redundant and
+Hamilton’s least-action principle, together with its connec- the difference d is identically zero. Credit assignment is im-
+tion to the Bateman–Galley formalism for non-conservative plemented by deliberately breaking this symmetry through
+classical mechanics (Bateman, 1931; Galley, 2013; Aykroyd the task cost. Adding βD(z, z ′ ) = β C(m) to H exerts
+et al., 2025), can be found in (Scurria, 2026). opposite forces on z and z ′ and drives them apart, so that
+ the difference d ceases to be redundant and begins to carry
+C.1. The Helmholtz Obstruction information about the loss landscape.
+
+The natural physical route to a variational principle for a
+dynamical system ẋ = F (x, θ) is to seek a scalar potential D. Proofs for Dyadic EP
+E such that F = −∂x E. The classical Helmholtz integra- We now demonstrate that Dyadic EP correctly trains the
+bility condition states that such an E exists if and only if the parameters θ of the original force field F (x, θ), giving the
+Jacobian JF is symmetric everywhere. Whenever the inter- 0
+ exact gradient dC(x̄
+ dθ
+ )
+ in the limit of infinitesimal nudging.
+actions are non-reciprocal — as in feedforward networks,
+active matter, or driven optical systems — JF acquires
+a non-zero antisymmetric part and the Helmholtz condi- D.1. Proof of EP
+tion fails identically. No scalar potential on the original First, recall that standard EP does not strictly require the
+n-dimensional state space can then generate the dynamics, system to settle at an energy minimum; it requires only that
+and the “energy minimisation” route at the heart of standard the system reaches a stationary state (a fixed point of the
+EP is blocked at the structural level. The obstruction is not dynamics). Indeed, using the notation of Section 2.1, EP
+a matter of computational convenience: it reflects the fact relies on the key identity:
+that the rotational component of F carries information that
+no scalar function of x alone can record. d2 d2
+ ET (xβ , θ) = ET (xβ , θ). (57)
+ dθdβ dβdθ
+C.2. Variational Reconstruction on a Doubled Space Expanding the total derivative with respect to β gives:
+Applying the Bateman–Galley formalism circumvents this ⊤
+ ∂ET (xβ , θ) dxβ ∂ET (xβ , θ)
+ 
+obstruction by enlarging the configuration space. The single d
+ ET (xβ , θ) = +
+state x ∈ Rn is replaced by a conjugate pair (z, z ′ ) ∈ R2n , dβ ∂x dβ ∂β
+and the rotational component of F — which has no scalar = C(xβ ). (58)
+generator on the original n-dimensional space — is ab-
+sorbed into a bilinear coupling between z and z ′ on the Where the first term vanishes because the system is at a
+ ∂
+doubled space, where it does admit a variational descrip- stationary state, i.e., ∂x ET (xβ , θ) = 0; this holds even if
+tion. The physical motion is recovered on the diagonal the system is not at a minimum of ET . Similarly, for the
+submanifold z = z ′ (the so called ’physical limit’), while derivative with respect to θ:
+the off-diagonal direction d = z − z ′ supplies the additional
+ d ∂ET (xβ , θ)
+degree of freedom needed to encode non-reciprocity. ET (xβ , θ) = , (59)
+ dθ ∂θ
+Specializing this reconstruction to the overdamped (first-
+ where we additionally assume that the cost function does
+order) regime relevant to relaxational neural dynamics yields
+ not depend explicitly on the parameters θ. Substituting these
+the bilinear energy
+ results into Eq. (57) in the limit of infinitesimal nudging
+ (β → 0) recovers the fundamental relation given by Eq. (9).
+ z + z′
+  
+ H(z, z ′ , θ) = −(z − z ′ )⊤ F ,θ , (56)
+ 2
+ D.2. Proof of Dyadic EP
+which is precisely Eq. (26). The symmetric midpoint m = We analyze now the stationary states of Dyadic EP by intro-
+(z + z ′ )/2 plays the role of the physical coordinate of the ducing the change of variables:
+doubled system, while d is the auxiliary direction along
+which non-reciprocity is stored. On the submanifold z = z ′ z + z′
+ m= , d = z − z′. (60)
+the coupling proportional to (z − z ′ ) vanishes identically 2
+and both states evolve under the original field F , so the In these coordinates, the augmented energy HT becomes
+doubling leaves the on-shell physics unchanged. We refer
+the reader to (Scurria, 2026) for the full construction. HT (m, d, θ, β) = −d⊤ F (m, θ) + βC(m) (61)
+
+ 14
+ Equilibrium Propagation for Non-Conservative Systems
+
+and the dynamics in Eq. (28) can be rewritten as: In Dyadic EP, we instead employ the single-phase update:
+
+ 1 ∂H(z β , z ′β , θ)
+  
+ dm ∂HT
+ =− = F (m, θ), (62) ∆θ ∝ − (70)
+ dt ∂d β ∂θ
+ dd ∂HT ∂
+ =− = dT JF (m, θ) − β C(m). (63) This choice avoids the overhead of evolving two coupled
+ dt ∂m ∂m
+ equations in the extended space, which would be computa-
+ β
+The stationary states (mβ , d ) are the solutions to: tionally equivalent to evolving four equations in the original
+ space (two for +β and two for −β). Using Eq. (70), we
+ F (mβ , θ) = 0, (64) evolve only one coupled equation for +β in the extended
+ space; this corresponds to two equations in the original
+ βT ∂
+ d JF (mβ , θ) − β C(mβ ) = 0. (65) space, thereby achieving the same computational complex-
+ ∂m ity as AsymEP. Furthermore, this single-phase formulation
+ suggests a pathway toward making the update local in time,
+This leads to the following observations: provided appropriate hardware is used to implement the
+1) The stationary state of m is independent of β and coin- augmented phase.
+cides with the stationary state of the original system: Mathematically, these two approaches yield the same gradi-
+ ent estimate because the equations for dβ are linear. Explic-
+ z β + z ′β
+ = mβ = m0 = x0 . (66) itly we have :
+ 2
+ ∂H(z β , z ′β , θ) ∂F z β + z ′β
+  
+ = −(z β − z ′β )⊤ ,θ
+2) The Jacobian of the extended system defined in Eq. (26) ∂θ ∂θ 2
+is invertible, provided JF is invertible. This is most evident  ⊤
+ ∂F 0  −1
+from Eq. (63). = −β z ,θ JF⊤ (z 0 , θ)
+ ∂θ
+3) The stationary state value of d is given by: 
+ ∂C 0
+ 
+ × (z ) , (71)
+ β
+ 
+ −1 ∂C 0
+  ∂x
+ d = β JF⊤ (m0 , θ) (x ) (67)
+ ∂x where we have used Eqs. (66) and (67). Inspection of
+ Eq. (71) confirms that, up to corrections of order β 2 , we
+ 0
+In particular, when β = 0, we have d = 0, which implies obtain exactly the same gradient as in AsymEP.
+that the free stationary states coincide: z 0 = z ′0 .
+4) The cost at the stationary state of the extended system E. AsymEP versus Dyadic EP
+is equal to the cost at the stationary state of the original
+ In this appendix, we demonstrate that Asymmetric Equilib-
+system:
+ rium Propagation (AsymEP) emerges naturally as the first-
+ D(m0 ) = C(x0 ). (68)
+ order projection of the 2N -dimensional Dyadic Equilibrium
+Consequently, the gradients of the cost with respect to the Propagation onto a single N -dimensional state space. We
+parameters are identical. then formalize the physical trade-offs between the two ar-
+ chitectures.
+Since both the original and extended systems, given respec-
+tively in Eq. (28) and Eq. (1-2), share the same cost at their
+ E.1. AsymEP as the Linear Projection of Dyadic EP
+respective stationary states, and because the Jacobians of
+both models are invertible, applying EP update rule to the As established in Appendix D.2, transforming the 2N -
+extended system give the correct gradient estimate for the dimensional extended space (z, z ′ ) into the mean state
+ ′
+parameters θ of the original system. m = z+z 2 and the difference state d = z − z ′ exactly
+The final step of the proof is to establish the equivalence decouples the stationary dynamics. Because the stationary
+between the standard parameter update rule in Eq. (8) and state of m is the free state of the original system (mβ = x0 ),
+the modified rule used by Dyadic EP in Eq. (34). Indeed, if the cost function drives the difference variable to a stationary
+ β
+we were to apply the standard update rule in the extended state d satisfying:
+space, the update would be: β ∂C 0
+ JF⊤ (x0 , θ)d = β (x ) (72)
+ 1
+  β ′β
+ ∂H(z , z , θ) ∂H(z , z −β ′−β
+ , θ)
+  ∂x
+ ∆θ ∝ − − .
+ 2β ∂θ ∂θ To recover this exact error signal in an N -dimensional space,
+ (69) we postulate a modified dynamical system FA (x) compris-
+
+ 15
+ Equilibrium Propagation for Non-Conservative Systems
+
+ing the standard EP dynamics and a spatial correction Γ(x): F. Derivation of the Hopfield-like Energy
+ ∂C In this section, we derive the explicit energy functional for
+ FA (x) = F (x) − β (x) + Γ(x) (73) the Continuous Asymmetric Hopfield dynamics defined in
+ ∂x
+ Eq. (35). The force field is given by:
+Let ∆x = xβA − x0 denote the displacement from the
+free equilibrium. Expanding the stationarity condition F (x) = ρ′ (x) ⊙ (Jρ(x)) − x. (78)
+FA (xβA ) = 0 to first order around x0 yields:
+ We omit external inputs J in for brevity, as they appear sym-
+ ∂C 0 metrically in the Jacobian. The variational Hamiltonian is
+ JF (x0 , θ)∆x − β (x ) + Γ(xβA ) ≈ 0 (74) defined as:
+ ∂x
+ z + z′ z + z′
+    
+To ensure the first-order displacement matches the Dyadic H(z, z ′ ) = −(z − z ′ )⊤ F + βC .
+ β 2 2
+EP error signal (i.e., ∆x ≈ d ), we substitute Eq. (72) into
+ (79)
+the expansion:
+ To analyze this expression, we introduce the midpoint m =
+ z+z ′
+ Γ(xβA ) = JF⊤ (x0 , θ) − JF (x0 , θ) ∆x and the difference d = z − z ′ . Since the separation
+ 
+ (75) 2
+ between z and z ′ is induced solely by the nudging parameter
+ = −2AJ (x0 , θ)(xβA − x0 ) (76)
+ β, the difference scales as ∥d∥ ∼ O(β). We therefore
+This uniquely recovers the AsymEP augmented dynamics. neglect terms of order O(∥d∥3 ) (i.e., or equivalently O(β 3 ))
+Finally, to eliminate the O(β 2 ) error, AsymEP evaluates the as they do not contribute to the gradient of the cost.
+centered difference of two opposite nudges: The activation at the midpoint can be approximated as:
+
+ dxA ρ(z) + ρ(z ′ )
+ x±β 0
+ A =x ±β + O(β 2 ) (77) ρ(m) = + O(∥d∥2 ). (80)
+ dβ β=0 2
+ Similarly, the difference in activations is:
+Subtracting these states cancels the O(β 2 ) error, yielding
+1 +β −β β 3
+2 (xA − xA ) = d + O(β ), successfully recovering the ρ(z) − ρ(z ′ ) = ρ′ (m) ⊙ d + O(∥d∥3 ). (81)
+exact post-synaptic update term.
+ Inverting this relation, we express the state difference as:
+E.2. Physical Trade-offs and the Extended Space
+ z − z ′ = (ρ(z) − ρ(z ′ )) ⊙ ρ′ (m) + O(∥d∥3 ). (82)
+We can view AsymEP and Dyadic EP as a space-time trade-
+off of the same underlying physical optimization problem.
+ We substitute these expansions into the interaction term
+AsymEP preserves the original N -dimensional state space of the Hamiltonian, Hint = −(z − z ′ )⊤ (ρ′ (m) ⊙ Jρ(m)).
+of the network at the cost of temporal non-locality. The sys- Applying the identity a⊤ (b ⊙ c) = (a ⊙ b)⊤ c, we obtain:
+tem must evolve sequentially, requiring physical memory
+ ⊤
+not only to store the free equilibrium x0 for the asymmet- Hint = − ((z − z ′ ) ⊙ ρ′ (m)) Jρ(m)
+ric correction, but also to store the successive stationary 
+ ρ(z) + ρ(z ′ )
+ 
+states required to evaluate the contrastive gradient update. ≈ −(ρ(z) − ρ(z ′ ))⊤ J . (83)
+ 2
+AsymEP thus serves as the direct, spatially minimal exten-
+sion of EP. Expanding the product gives:
+Dyadic EP provide a learning signal that is local in both
+ 1h
+space (where z − z ′ encodes the gradient) and time (allow- Hint = − ρ(z)⊤ Jρ(z) + ρ(z)⊤ Jρ(z ′ )
+ 2
+ing the nudged phases to execute in parallel) at the cost i
+of doubling the state space. In particular, capturing non- − ρ(z ′ )⊤ Jρ(z) − ρ(z ′ )⊤ Jρ(z ′ ) . (84)
+conservative forces in this extended space requires a spe-
+cific bilinear coupling, rather than a trivial superposition We decompose the connectivity matrix J into its symmetric
+of uncoupled subsystems. It can be seen as a blueprint for part S and antisymmetric part A. The first and last terms
+future neuromorphic hardware. simplify to ρ(z)⊤ Sρ(z). The cross terms satisfy:
+Ultimately, the reduction of Dyadic EP to AsymEP via the ρ(z)⊤ Jρ(z ′ ) − ρ(z ′ )⊤ Jρ(z) = ρ(z)⊤ (J − J ⊤ )ρ(z ′ )
+variables m and d proves the universality of EP’s variational
+principle. = ρ(z)⊤ (2A)ρ(z ′ ). (85)
+
+ 16
+ Equilibrium Propagation for Non-Conservative Systems
+
+Thus, the interaction term reduces to: The input parameters are then updated using the standard
+ learning rule (21). In particular, the presynaptic term associ-
+ 1 1
+ Hint = − ρ(z)⊤ Sρ(z) + ρ(z ′ )⊤ Sρ(z ′ ) ated with the input weights is given by,
+ 2 2
+ − ρ(z)⊤ Aρ(z ′ ) + O(∥d∥3 ). (86) ∂Fi
+ in
+ = δik ρ′ (xi )ul . (93)
+ ∂Jkl
+Finally, for the nudging term, we expand the cost function The presynaptic terms associated with the dynamical param-
+ dyn
+around the midpoint: eters Jij depend on the experiment.
+
+ 1
+ C(m) = (C(z) + C(z ′ )) + O(∥d∥2 ). (87) G.1. Symmetric Initialization
+ 2
+ G.1.1. L EARNING RULES
+When multiplying by β, the remainder term becomes β ·
+O(∥d∥2 ). Since ∥d∥ ∼ O(β), this remainder is of order For clarity, we write the learning rules for VF and AsymEP.
+O(β 3 ) and can be consistently discarded alongside the third- For the input weights, using (93), we have:
+order terms from the interaction expansion. 1 h +β i
+ in
+ ∆Jik ∝ (xi − x−β ′ 0
+ i )ρ (xi )uk , (94)
+Combining all these components, the final Hamiltonian is: 2β
+
+ 1 1 while for the recurrent weight, we get:
+ H(z, z ′ ) = − ρ(z)⊤ Sρ(z) + ρ(z ′ )⊤ Sρ(z ′ )
+ 2 2 1 h +β i
+ 1 ∆Jijdyn
+ ∝ (xi − x−β i )ρ′ 0
+ (xi )ρ(x 0
+ j ) . (95)
+ − ρ(z) Aρ(z ) + (∥z∥2 − ∥z ′ ∥2 )
+ ⊤ ′ 2β
+ 2
+ β ′
+ For EP, we have:
+ + (C(z) + C(z )). (88)
+ 2 1 h +β  i
+ in
+ ∆Jik ∝ ρ(xi ) − ρ(x−β
+ i ) uk , (96)
+The saddle-point dynamics, given by Eq. 32, generated by 2β
+this Hamiltonian are: and for the recurrent weights:
+ dz β ∂C 1 h +β
+ = ρ′ (z) ⊙ (Sρ(z) + Aρ(z ′ )) − z −
+ i
+ , dyn −β −β
+ dt 2 ∂z
+ (89) ∆Jij ∝ ρ(xi )ρ(x+βj ) − ρ(xi )ρ(xj ) . (97)
+ 2β
+ ′
+ dz β ∂C
+ = ρ′ (z ′ ) ⊙ (Sρ(z ′ ) + Aρ(z)) − z ′ + . (90)
+ dt 2 ∂z ′ G.1.2. S UPPLEMENTARY N UMERICAL R ESULTS
+This system recovers the original continuous Hopfield dy- To complement Fig. 2, we report the evolution of the accu-
+namics when z = z ′ (assuming β = 0). racy of the three methods in Fig. 4. We consider a layered
+ network with 50 hidden neurons. While this capacity is
+G. Experimental Details insufficient for state-of-the-art performance, it amplifies the
+ difference in accuracy between models to aid visualization.
+As in the main text, the neuronal dynamics are governed by Models are trained for 20 epochs starting from a symmetric
+the vector field: configuration, the natural setting for both VF and EP. With
+   this initialization, AsymEP consistently outperforms the
+ X dyn other methods and learns faster by exploiting the additional
+ Fi = ρ′ (xi )  Jij ρ(xj ) + bi (u) − xi , (91)
+ degrees of freedom of the asymmetric network.
+ j
+
+
+where the input-dependent bias bi (u) is precomputed for G.2. Fixed Asymmetry Ratio
+each MNIST input u as: This section details the implementation for the fixed asym-
+ X metry ratio experiments presented in Section 5.2, followed
+ bi (u) = Jilin ul . (92) by complementary numerical results regarding learning
+ l∈in speed and induced Jacobian asymmetry.
+This term projects the input space into the recurrent sub-
+ G.2.1. L EARNING RULES
+space. The bias yields a diagonal contribution to the Jaco-
+bian JF = ∂F ∂x , and therefore does not contribute to the Parametrization and notation. To enforce a fixed asym-
+antisymmetric correction used in the augmented dynamics metry ratio, we explicitly parameterize the independent ele-
+Eq. (20) of AsymEP. ments of Eq. (38). We introduce two parameter vectors θS
+
+ 17
+ Equilibrium Propagation for Non-Conservative Systems
+
+ Parameter Sym. Init. / Feedforward Fixed rstr Fixed rstr & rin
+ sec. 5.1 & 5.3 sec. 5.2 app. G.3
+ Learning Rate (Input-Hidden) 0.05 0.05 0.0125
+ Learning Rate (Hidden-Output) 0.01 0.01 0.0025
+ Time Step (Dynamics Integration) 0.5 0.3 0.3
+ Nudging Parameter (β) 0.5 0.5 0.5
+ Free-phase Steps (nfree ) 20 30 40
+ Nudged-phase Steps (nnudge ) 10 10 10
+ Number of Epochs 40 / 20 30 40
+ Batch Size 64 √64 √64
+ Scaling Parameter γ n.a. 60 60
+ Structure 784 - n.a. -10 784-50-10 all-to-all, 500 hid
+ Activation function ρ tanh tanh tanh
+ Initial Recurrent State s s ∼ U (−1, 1) s ∼ U(−1, 1) s ∼ U(−1, 1)
+ Initial Parameters θ θ ∼ N (0, N1 ) θ ∼ N (0, N1 ) θ ∼ N (0, N1 )
+ Number of Runs (training + inference) 10 10 10
+Table 3. Trained Model Hyperparameters on MNIST. N is the total number of neurons, U(−1, 1) is a uniform distribution, and N (µ, σ 2 )
+is a Gaussian distribution. For the rstr parametrization, we choose more cautious hyperparameters for training and inference compared to
+the symmetric initialization, due to increasingly non-conservative and potentially oscillatory dynamics.
+
+
+
+ elements of S̃, the full matrices are constructed as:
+ S
+ S̃ij = δij ξi + (1 − δij )θk(max(i,j),min(i,j)) , (99)
+ A
+ Ãij = ϵij θk(max(i,j),min(i,j)) , (100)
+ where ϵij is the Levi-Civita symbol. The dynamical param-
+ eters are then given by:
+ dyn
+ Jij = γ(cS S̃ij + cA Ãij ), (101)
+ with normalization coefficients
+ p
+ 2
+ 1 − rstr rstr
+ cS = , cA = , (102)
+ FS FA
+ defined in terms of the Frobenius norms:
+ v
+ uN M
+ uX X 2
+Figure 4. Evolution of the mean accuracy and standard deviation F =t
+ S ξ2 + 2i θS , k (103)
+(over 10 runs) during training on MNIST for AsymEP, EP, and VF. i=1 k=1
+Models use 50 hidden neurons. v
+ u M
+ u X 2
+ FA = t2 θkA . (104)
+ k=1
+and θA of size M = Ndyn (Ndyn − 1)/2, which encode the
+off-diagonal elements of the symmetric and antisymmetric Presynaptic computation. The dependence of the nor-
+components S̃ and Ã, respectively. The correspondence malization coefficients on the parameters introduces addi-
+between matrix and vector indices is given by: tional regularization terms in the learning rule compared
+ to the parameterization of (Scellier &Bengio, 2017). The
+ (i − 1)(i − 2) gradients of the normalization coefficients are:
+ k(i, j) = + j, (1 ≤ j < i ≤ Ndyn )
+ 2 ∂cS θkS ∂cS ξm
+ (98) = −2cS 2, = −cS 2, (105)
+ ∂θkS (FS ) ∂ξm (FS )
+ ∂cA θA
+where the condition j < i selects the strictly lower triangular A
+ = −2cA k 2 . (106)
+elements. Introducing an additional vector ξ for the diagonal ∂θk (FA )
+
+ 18
+ Equilibrium Propagation for Non-Conservative Systems
+
+ Parameter Comparison Dyn. 2 hidden layers 3 hidden layers
+ sec. 5.4 sec. 5.4 sec. 5.4
+ Learning Rate (Input-Hidden) 0.0016 0.0013 0.6
+ Learning Rate (Hidden-Hidden) 0.0016 0.0013 0.6
+ Learning Rate (Hidden-Output) 0.0016 0.0013 0.6
+ Time Step (Dynamics Integration) 0.4 0.3 0.0075
+ Nudging Parameter (β) 0.3 0.5 0.20
+ Free-phase Steps (nfree ) 40 40 60
+ Nudged-phase Steps (nnudge ) 20 20 30
+ Number of Epochs 50 40 40
+ Batch Size 64 64 64
+ Layer Structure 784-500-200-10 784-500-500-10 784-500-500-500-10
+ Activation function ρ tanh tanh tanh
+ Initial Recurrent State s s ∼ U(−1, 1) s ∼ U(−1, 1) s ∼ U (−1, 1)
+ Initial Parameters θ θ ∼ N (0, N1 ) θ ∼ N (0, N1 ) θ ∼ N (0, N1 )
+ Number of Runs (training + inference) 10 10 10
+Table 4. Trained Model Hyperparameters on Fashion-MNIST. N is the total number of neurons, U(−1, 1) is a uniform distribution, and
+N (µ, σ 2 ) is a Gaussian distribution. For the rstr parametrization, we choose more cautious hyperparameters for training and inference
+compared to the symmetric initialization, due to increasingly non-conservative and potentially oscillatory dynamics.
+
+
+
+Combining these with the derivatives of the matrices S̃ and (where p > q):
+Ã, we have:
+ N
+ θkA X
+ 
+ ∂Fi ′
+ ∂ S̃ij ∂ S̃ij = γc A ρ (xi ) −2 Ãij ρ(xj )
+ = δip δjq + δiq δjp , = δij δkj (107) ∂θkA (FA )2 j=1
+ ∂θkS ∂ξk 
+ ∂ Ãij + δip ρ(xq ) − δiq ρ(xp ) .
+ = δip δjq − δiq δjp , (108)
+ ∂θkA (111)
+where k corresponds to the index pair (p, q) with p > q, as
+defined in Eq. (98). The full presynaptic terms are then: Initialization. To ensure the stability of the system, we
+ initialize our parameters suchhthat the
+ i variance of dynam-
+ dyn
+ • For the diagonal parameters ξm : ical parameters scales as Var Jij ∝ 1/Ndyn . This is a
+ conservative choice for the layered architectures used in our
+ dyn
+ ∂Fi
+ 
+ ξm X
+ N experiments, where many entries of Jij are zero.
+ = γcS ρ′ (xi ) − S̃ij ρ(xj )
+ ∂ξm (FS )2 j=1 In practice, we initialize the parameter vectors θS , θA , and
+ (109)
+  ξ with identical variances σ 2 . For large Ndyn , the expected
+ + δim ρ(xm ) . Frobenius norms approximate to E[FS,A ] ≈ Ndyn σ. Conse-
+ quently, the normalization coefficients become:
+ p
+ 2
+ • For the off-diagonal symmetric parameters θkS (where 1 − rstr rstr
+ cS ≈ , cA ≈ . (112)
+ p > q): Ndyn σ Ndyn σ
+
+ N Since the symmetric and antisymmetric components are sta-
+ θkS X
+ 
+ ∂Fi ′
+ = γc S ρ (x i ) −2 S̃ij ρ(xj ) tistically independent, the variance of the weights is derived
+ ∂θkS (FS )2 j=1
+  as follows:
+ + δip ρ(xq ) + δiq ρ(xp ) .
+ • Diagonal elements (i = j):
+ (110)
+ 2
+ h i 1 − rstr
+ Var Jiidyn = γ 2 c2S σ 2 ≈ γ 2 2 . (113)
+ • For the off-diagonal antisymmetric parameters θkA Ndyn
+
+ 19
+ Equilibrium Propagation for Non-Conservative Systems
+
+ • Off-diagonal elements (i ̸= j): a zero-cost baseline (perfect prediction) during learning.
+ Specifically, for each method and value of rstr , we calcu-
+ h
+ dyn
+ i γ2 late the cumulative loss by summing the batch-averaged
+ = γ 2 c2S + c2A σ 2 ≈ 2 ,
+ 
+ Var Jij (114)
+ Ndyn costs of the first 5 epochs (out of 30, to avoid saturation
+ h i effects), and reporting the mean and standard deviation over
+ dyn
+To satisfy Var Jij ∝ 1/Ndyn , we set: 10 independent training runs. Mathematically, for each run:
+
+ p
+ γ= Ndyn (115)  
+ 5 NX
+ X batches X C(x0 , u) 
+Note that by random matrix theory, diagonal elements do Cumul. Loss =  ,
+ |Bk |
+not affect stability in the large Ndyn limit. epoch=1 k=1 (x0 ,u)∈Bk
+ (120)
+Potential Simplification. Although the parameterization where Bk represents the k-th batch, and |Bk | denotes the
+above is fully general, a simpler construction is possible number of examples in the batch. The parameters are up-
+by removing self-connections (ξ = 0) and enforcing identi- dated after each batch step; consequently, the free equilib-
+cal parameterization for the symmetric and antisymmetric rium x0 is inferred using the updated parameters and the
+components, i.e., θS = θA = θ. The matrix elements then current example u.
+become:
+
+ S̃ij = (1 − δij )θk(max(i,j),min(i,j)) , (116)
+ Ãij = ϵij θk(max(i,j),min(i,j)) . (117)
+
+In this case, the Frobenius norms are equal (FS = FA ), and
+we can omit the explicit normalization:
+ q
+ dyn 2 S̃ + r à .
+ Jij = 1 − rstr ij str ij (118)
+
+For a parameter θk corresponding to indices (p, q) with
+p > q, the presynaptic term is given by:
+ q 
+ ∂Fi
+ = ρ′ (xi ) 2 +r
+ 1 − rstr str δip ρ(xq )
+ ∂θk
+ q   (119)
+ + 2 −r
+ 1 − rstr δ ρ(x
+ str iq p .
+ )
+
+While this parameterization works in simulations and keeps Figure 5. Cumulative loss as defined by (120) over the first 5
+the number of parameters constant for all rstr , it constrains epochs of training, for different asymmetry ratios rstr . We compare
+the asymmetry to be “homogeneous”, by which we mean VF (orange) and AsymEP (blue), under two training regimes:
+that the asymmetry ratio is identical for every pair of neu- training only J in (dashed) or all parameters (solid).
+rons; hence, the network cannot learn to be symmetric in one
+region and antisymmetric in another. Therefore, we choose
+to explore the more general case of (38) in our experiments. In Fig 5, we observe that learning slows down for both al-
+ gorithms when rstr ≳ 0.6. This behavior likely results from
+G.2.2. S UPPLEMENTARY N UMERICAL R ESULTS the increased difficulty of reaching a stationary state as the
+ dynamics become strongly asymmetric. With a fixed num-
+To complement the results of Fig 3, we analyze the training
+ ber of inference steps, incomplete convergence degrades the
+efficiency as a function of the asymmetry ratio rstr and in-
+ accuracy of the gradient estimates, thereby slowing down
+vestigate the robustness of VF by monitoring the Jacobian
+ the learning. Fig 5 shows that while VF can eventually
+asymmetry.
+ achieve competitive accuracy, it is consistently slower than
+ AsymEP as soon as asymmetry is introduced.
+Training efficiency. We first study the training efficiency
+of the two algorithms as a function of the asymmetry ra-
+tio rstr . Inspired by the related concept in (Cesa-Bianchi
+&Lugosi, 2006), we define the cumulative loss as the accu- Jacobian asymmetry. We next examine how the struc-
+mulated difference between the free equilibrium cost and tural asymmetry rstr is reflected in the Jacobian of the dy-
+
+ 20
+ Equilibrium Propagation for Non-Conservative Systems
+
+namics (35), given by:
+ ∂Fi dyn ′
+ = (1 − δij )ρ′ (xi )Jij ρ (xj )
+ ∂sj
+ h i
+ + δij ρ′ (xi )(Jiidyn ρ′ (xi )) + ρ′′ (xi )bi − 1 .
+ (121)
+
+In our layered architecture, the self-connections are zero
+(Jiidyn = 0). For the following analysis, we neglect all diag-
+onal terms in the Jacobian (including external inputs and
+potential), since they do not contribute to the antisymmetric
+correction (20) and thus to the discrepancy between the per-
+formance of VF and AsymEP. Consequently, we define the
+following asymmetry ratio based solely on the off-diagonal
+Jacobian JF,off : Figure 6. Asymmetry ratio of the Jacobian rjac defined in equation
+ ⊤
+ (122) after training for different asymmetry ratios rstr . We compare
+ ∥JF,off − JF,off ∥F VF (orange) and AsymEP (blue), under two training regimes:
+ rjac = , (122) training only J in (dashed) or all parameters (solid).
+ ∥JF,off ∥F
+
+The results are presented in Fig 6. For each trained model Consequently, local stability requires the largest real eigen-
+and ratio rstr , we compute rjac averaged over the stationary value of the effective weight matrix to be strictly less than 1.
+states of the first batch (64 images) across 10 independent Assuming weights are initialized independently with vari-
+runs. We observe that when structural asymmetry is strong ance σ 2 , Girko’s circular law dictates that the eigenvalues
+and all parameters are trained, VF partially compensates for of√an asymmetric matrix uniformly populate a disk of radius
+the asymmetry by adjusting the neuronal states. This can be σ n in the complex plane. In contrast, imposing symmetry
+understood by rewriting the ratio as: forces the eigenvalues √ onto the real line, broadening the
+   spectral radius to 2σ n according to Wigner’s semicircle
+ dyn dyn ⊤
+ ρ′ (xi ) Jij − (Jji ) ρ′ (xj ) law. As a result, asymmetric networks can stably accommo-
+ F
+ rjac = . (123) date larger variance in the weight initializations than their
+ dyn ′
+ ρ′ (xi )Jij ρ (xj ) symmetric counterparts.
+ F
+
+Compared to the structural asymmetry ratio in Eq. (37), Asymmetry nevertheless introduces imaginary eigenvalues
+a value of rjac < rstr indicates that the neuronal states ef- and, consequently, damped oscillations. To study this effect
+fectively dampen the structural asymmetry, rendering the experimentally in a controlled setting, we constrain the input
+dynamics more symmetric. This symmetrization of the Ja- projections J in . In the experiments of the main text, fixing
+cobian appears without imposing an additional symmetriza- the structural asymmetry ratio rstr still allowed AsymEP
+tion penalty and could be enhanced using the method of to reduce oscillations by aligning and increasing the input
+(Laborieux &Zenke, 2022). This mechanism likely explains projections J in , thereby adding stabilizing diagonal contri-
+the superior performance of ‘All (VF)’ compared to ‘J in butions to the Jacobian. To isolate the network’s ability to
+(VF)’ in Fig 3, as the former is able to use the additional suppress oscillations independently of the magnitude of the
+degrees of freedom to reduce the effective asymmetry at input drive, we further constrain the relative scale of J in and
+high rstr . J dyn by imposing
+ ∥J in ∥F ∥J in ∥F
+G.3. Stability analysis with Fixed Asymmetry Ratio & rin = = , (124)
+ ∥J dyn ∥F γ
+ Constrained Inputs Projection
+ where ∥J dyn ∥F = γ following Eq. (101). Defining unscaled
+A complete stability analysis of the non-conservative dy- input projections J˜in , we set
+namics trainable with AsymEP is beyond the scope of this
+work. Nevertheless, for the class of continuous Hopfield J˜in
+ J in = rin γ (125)
+networks considered here, standard arguments from random ∥J˜in ∥F
+matrix theory suggest that asymmetry inherently improves
+asymptotic stability. G.3.1. L EARNING RULES
+In the dynamics defined by Eq. (91), the linear leak term Reusing the notations of the previous section, we write
+−xi shifts the spectrum of the system’s Jacobian by −1. Jilin = γcin J˜ilin with the normalization cin = rin /Fin , where
+
+ 21
+ Equilibrium Propagation for Non-Conservative Systems
+
+Fin = ∥J˜in ∥F . Applying the chain rule yields:
+ " #
+ ∂Fi ′ J˜kl
+ in X
+ ˜in
+ = γcin ρ (xi ) δik ul − 2 J um . (126)
+ ∂ J˜kl
+ in Fin m im
+
+And for γ we have:
+
+ ∂Fi 1
+ = (Fi + xi ). (127)
+ ∂γ γ
+
+G.3.2. S UPPLEMENTARY N UMERICAL R ESULTS
+ Figure 7. Comparison of AsymEP and VF on a feedforward net-
+Table 5 reports a worst-case control experiment where the
+ work. Test accuracy on MNIST is shown as a function of training
+structural asymmetry is fixed at rstr = 0.7 while the input epochs for a single-hidden-layer network with 20 neurons. Curves
+scale ratio rin is varied. The experiment uses an all-to-all report the mean and standard deviation over 10 runs. Best accura-
+architecture on MNIST (excluding direct input-to-output cies are 92.7% ± 0.5% (AsymEP) and 64.3% ± 2.0% (VF).
+connections). The output variance during extended infer-
+ence (steps 30-50) confirms that the system successfully
+learns to suppress oscillations even when rin is severely re- G.5. Advantages of Non-Conservatives Dynamics
+stricted. Any small residual oscillations can be mitigated by In Section 5.4, we compare three (non-)conservative dynam-
+time-averaging over the inference steps. ics under varying constraints. To further evaluate learning
+Finally, rin can be interpreted as a measure of the external speed, Table 6 reports network performance after a sin-
+signal magnitude relative to the internal recurrent dynamics. gle epoch. These results confirm our earlier observation:
+These results indicate that the system remains capable of AsymEP learns faster than VF.
+learning and stabilizing even under a low external input
+drive. Even when the input projection ∥J in ∥F is 100 times G.6. Feedforward CIFAR-10 Experiments
+smaller than the recurrent connections ∥J dyn ∥F , the network This appendix details the architecture and hyperparameters
+still achieves 36.34 ± 6.25% accuracy, which is well above of the deep feedforward experiments comparing backprop-
+chance level (∼ 10%). agation (BP), VF, AsymEP and Dyadic EP on CIFAR-10
+ (see subsection 5.5)
+G.4. Feedforward Network
+G.4.1. L EARNING RULES Architecture. We use a nine-layer convolutional network
+ (denoted CNN9). The first eight layers are convolutional
+For clarity, we write the learning rules for VF and AsymEP with 3 × 3 kernels and zero-padding; spatial downsampling
+in a feedforward architecture with one hidden layer using is performed by strided convolutions (stride 2 on layers 2, 4,
+the notation of Section 5.3. For the input weights connecting 6, 8 and stride 1 otherwise), so no pooling is used. The chan-
+to the hidden layer, we get the usual formula: nel widths follow the sequence 3 → 64 → 64 → 128 →
+ 128 → 256 → 256 → 512 → 512, reducing the spatial
+ 1 h +β −β 0
+ i
+ resolution from 32 × 32 to 2 × 2. The last layer is a fully
+ in
+ ∆Jik ∝ (hi − hi )ρ′ (hi )uk , (128)
+ 2β connected readout mapping the 512 × 2 × 2 feature map
+ to the 10 class logits. All hidden units use a ReLU non-
+while for the feedforward weights connecting the hidden to linearity.
+the output layer, we get: p Weights are initialized with the Kaiming scheme
+ (σ = 2/fan-in) and biases at zero.
+ 1 h +β 0
+ i
+ ∆(Wh→o )ji ∝ (oj − o−β ′ 0
+ j )ρ (oj )ρ(hi ) . (129) Training setup. All methods are trained for 40 epochs
+ 2β with batch size 64 and repeated over 5 seeds. Inputs are
+ normalized per channel and augmented with random 32 ×
+Note that EP is not applicable in this case.
+ 32 crops (padding 4), random horizontal flips and Cutout
+ (one 8 × 8 patch). Parameters are updated with SGD with
+G.4.2. S UPPLEMENTARY N UMERICAL R ESULTS
+ momentum 0.9, weight decay 5 × 10−4 and gradient-norm
+In addition to the final accuracy reported in Sec. 5.3, we clipping at 1, under a cosine learning-rate schedule decaying
+show in Fig. 7 the evolution of the accuracy over 20 epochs from 3.5 × 10−2 to 2 × 10−4 . Test accuracy is reported
+for AsymEP and VF. on an exponential moving average of the weights (decay
+
+ 22
+ Equilibrium Propagation for Non-Conservative Systems
+
+Table 5. Output variance and final test accuracy on MNIST (%) across different values of rin with rstr = 0.7. (mean ± std over 10 runs)
+(500 hiddens, all-to-all, no input-output).
+
+ Output variance Test Acc. (%)
+ rin Untrained Epoch 80 Epoch 80
+ 0.01 (3.38 ± 0.90) × 10−4 (5.22 ± 2.34) × 10−5 36.34 ± 6.25
+ 0.10 (2.33 ± 0.48) × 10−4 (1.39 ± 0.17) × 10−4 90.54 ± 0.19
+ 0.50 (1.34 ± 0.32) × 10−5 (1.06 ± 0.25) × 10−6 94.96 ± 0.10
+ 1.00 (6.27 ± 1.24) × 10−7 (1.75 ± 0.50) × 10−8 96.30 ± 0.09
+
+
+Table 6. Test accuracy on Fashion-MNIST (%) at Epoch 1 (mean
+± std 10 runs). The table compares three classes of network
+dynamics: Continuous Hopfield (CH), Predictive Coding (PC),
+and Standard dynamics. Each is evaluated under three connec-
+ ⊤
+tivity structures: Asymmetric (Asym, Bk ̸= Wk+1 ), Symmet-
+ ⊤
+ric/conservative (Sym, Bk = Wk+1 ), and Feedforward (Feedfor,
+Bk = 0).
+
+ EP AsymEP VF
+ Asym - 74.91 ± 0.45 48.98 ± 4.09
+ CH Feedfor - 74.36 ± 0.29 48.84 ± 3.46
+ Sym 74.57 ± 0.43 - -
+ Asym - 77.83 ± 0.47 57.75 ± 3.37
+ PC
+ Sym 76.23 ± 0.39 - -
+ Asym - 76.87 ± 0.51 61.50 ± 4.06
+ Standard
+ Feedfor - 77.92 ± 0.51 63.98 ± 0.73
+
+
+
+0.9995). The targets are smoothed (ε = 0.1), which for
+the EP methods amounts to nudging toward the smoothed
+one-hot vector.
+
+Relaxation hyperparameters. The four methods differ
+only in the gradient estimator: BP uses automatic differ-
+entiation, while the EP-based methods contrast stationary
+states of the corresponding relaxation dynamics. VF uses
+an integration step η = 1.0, nudging β = 0.1, and at most
+K = 1000 relaxation steps with an early-stopping threshold
+of 9 × 10−6 on the mean state update. Dyadic EP uses
+the same settings except for a nudging strength β = 0.1.
+AsymEP uses a smaller step η = 0.5, nudging β = 0.1,
+and up to K = 250 relaxation steps with a threshold of
+1 × 10−4 .
+
+
+
+
+ 23
+ \ No newline at end of file