Equilibrium Propagation for Non-Conservative Systems Antonino Emanuele Scurria 1 Dimitri Vanden Abeele 1 Bortolo Matteo Mognetti 2 Serge Massar 1 Abstract from inference, the transmission of nonlocal error signals, and synchronous layer-wise computations with explicit gra- Equilibrium Propagation (EP) is a physics- dient storage. These constraints have no clear analog in inspired learning algorithm that uses stationary physical systems, making backpropagation challenging to states of a dynamical system both for inference implement in neuromorphic or analog hardware. Conse- arXiv:2602.03670v2 [cs.LG] 1 Jun 2026 and learning. In its original formulation it is quently, understanding how credit assignment can instead limited to conservative systems, i.e. to dynam- emerge from intrinsic system dynamics, through local inter- ics which derive from an energy function. Given actions and continuous relaxation, is a central question in their applications, it is important to extend EP neuroscience and machine learning. to non-conservative systems, i.e. systems with non-reciprocal interactions. Previous attempts to Equilibrium Propagation (EP) (Scellier &Bengio, 2017) generalize EP to such systems failed to compute represents one of the most promising advances in this direc- the exact gradient of the cost function. Here we tion. It formulates supervised learning as a contrast between propose a framework that extends EP to arbitrary two stationary states of a dynamical system: a ‘free’ phase non-conservative systems, including feedforward where the system evolves autonomously, and a ‘nudged’ networks. We keep the key property of equilib- phase where outputs are weakly pushed toward their targets. rium propagation, namely the use of stationary The local change in neural states between these phases re- states both for inference and learning. However, covers the exact gradient of the cost function with respect to we modify the dynamics in the learning phase by parameters. This enables spatially local learning exploiting a term proportional to the non-reciprocal part of the continuous relaxation of the system without a distinct the interaction so as to obtain the exact gradient backward circuit or explicit weight transport. of the cost function. This algorithm can also be Since its introduction, several works have sought to im- derived using a variational formulation that gen- prove the practicality and biological realism of EP. Algo- erates the learning dynamics through an energy rithmic adaptations include enforcing temporal locality to function defined over an augmented state space. avoid state storage (Ernoult et al., 2020; Falk et al., 2025), Numerical experiments show that this algorithm deriving agnostic updates for black-box energies (Scel- achieves better performance and learns faster than lier et al., 2022), and substituting nudging with clamping previous proposals. (Stern et al., 2021). Theoretically, the framework has been extended to stochastic systems (Scellier &Bengio, 2017; Massar &Mognetti, 2025) and Lagrangian dynamics for 1. Introduction time-varying inputs (Massar, 2025; Pourcel et al., 2025; Standard neural network optimization relies on error back- Berneman &Hexner, 2025). In parallel, simulations have propagation, an algorithm whose computational mechanism explored suitable substrates, ranging from spiking (Mar- is difficult to reconcile with biological (Crick, 1989) and tin et al., 2021; O’Connor et al., 2019) and resistive net- physical implementations (Indiveri &Liu, 2015). Specif- works (Kendall et al., 2020) to coupled oscillators (Wang ically, backpropagation requires a backward pass distinct et al., 2024; Rageau &Grollier, 2025), as well as quantum systems (Wanjura &Marquardt, 2025; Massar &Mognetti, 1 Laboratoire d’Information Quantique (LIQ) CP224, Université 2025; Scellier, 2024). Experimental realizations have been libre de Bruxelles (ULB), Av. F. D. Roosevelt 50, 1050 Bruxelles, demonstrated in memristor crossbars (Yi et al., 2023), self- Belgium 2 Interdisciplinary Center for Nonlinear Phenomena and Complex Systems CP231, Université libre de Bruxelles (ULB), Av. adjusting electrical circuits (Dillavou et al., 2022; 2024), F. D. Roosevelt 50, 1050 Bruxelles, Belgium. Correspondence to: elastic networks (Altman et al., 2024), and classical Ising Antonino Emanuele Scurria . models trained on quantum annealers (Laydevant et al., 2024). Proceedings of the 43 rd International Conference on Machine Learning, Seoul, South Korea. PMLR 306, 2026. Copyright 2026 Despite these recent developments and the theoretical el- by the author(s). 1 Equilibrium Propagation for Non-Conservative Systems egance of EP, its standard formulation remains restricted a framework where the original dynamics serve for infer- to conservative systems. In these systems, dynamics are ence, while a new augmented dynamic is used to compute derived from an energy function, which inherently enforces gradients of the cost Eq. (2). In this augmented phase, the symmetry (e.g., symmetric synaptic connections Jij = Jji ) output neurons are nudged towards their targets (as in stan- through the action-reaction principle. This constraint pre- dard EP), while a local corrective term – proportional to the cludes the use of EP in a broad class of models characterized antisymmetric part of the Jacobian at the free equilibrium ∂ by non-conservative forces. This includes the feedforward JF (x0 , θ, u) = ∂x F (x0 , θ, u) – is added to the forces. The architectures dominant in modern AI, biological circuits, exact gradients of the cost with respect to parameters are as well as physical systems that reach stationary states far then obtained by contrasting stationary states of the aug- from thermodynamic equilibrium, such as nonlinear optical mented system. systems driven by external lasers (Cin et al., 2025), opto- Second, we introduce Dyadic EP, a ‘variational’ approach electronic systems (Kalinin et al., 2025), exciton-polariton to learning in non-conservative systems. This method in- condensates (Sajnok &Matuszewski, 2025), active meta- volves doubling the number of variables in the system’s materials (Brandenbourger et al., 2019) and active colloids state space and subsequently introducing a new energy func- (Bishop et al., 2023; Osat &Golestanian, 2023) (see (Bowick tion in this extended space. This approach takes advantage et al., 2022) for a review). of the extended space to execute the positive and negative Formally, we consider a dynamical system governed by nudging phases in parallel, recovering the same computa- a non-reciprocal force field F (x, θ, u), which relaxes to a tional cost as AsymEP. Derived from first principles, this stationary configuration x0 satisfying: approach is inspired by established methods for mapping dissipative dynamical systems onto conservative ones by F (x0 , θ, u) = 0, (1) doubling the degrees of freedom (Bateman, 1931; Galley, where x represents the state variables, θ the learnable param- 2013; Aykroyd et al., 2025). A more comprehensive study eters and u the static input. Our goal, given a target y(u), is of the theoretical framework and its application to feedfor- to compute the gradient of the cost function C(x0 , y) at this ward networks can be found in (Scurria, 2026). Our method equilibrium, is related to the Dual Propagation algorithm (Høier et al., dC 0 2023; Høier &Zach, 2023; 2024) and constitutes an inde- (x , y), (2) pendent, first-principles generalization of Dyadic Learning dθ and update θ to minimize the cost. (Nest &Høier; Høier et al., 2024)—previously limited to Hopfield networks—to arbitrary force fields. Previous attempts to extend EP to non-conservative dynam- ics include the Vector Field (VF) algorithm (Scellier et al., Third, we validate our framework on MNIST (LeCun, 1998), 2018). However, as noted by the authors, this method pro- Fashion-MNIST, and CIFAR-10. In continuous Hopfield vides an unbiased gradient of the cost Eq. (2) only in the networks initialized with symmetric connection matrices, conservative case. To mitigate this, (Laborieux &Zenke, AsymEP achieves better accuracy and learns faster than 2024) proposed adding a penalty to keep the Jacobian close EP and VF. Additionally, when we constrain the network to symmetry, essentially forcing the system to be as con- to have a strong degree of structural asymmetry, in which servative as possible. Alternative methods related to VF, case EP is inapplicable, AsymEP outperforms VF. Finally, which similarly do not compute the exact gradient, were when we restrict connections to a feedforward structure, our proposed in (Farinha et al., 2020; Costa &Santos, 2025) and algorithm effectively trains all parameters; in contrast, VF for specific systems in simulation (Cin et al., 2025; Sajnok is limited to training the last layer, acting essentially as an &Matuszewski, 2025). Extreme Learning Machine (Huang et al., 2006; Wang et al., 2022) with poor performance. Conversely, generalizations of backpropagation can handle non-reciprocal forces and compute the exact gradient of In summary, this theoretical work proposes two generaliza- the cost Eq. (2) but inherit the same challenges in physical tions of EP beyond conservative systems to arbitrary differ- implementations. For instance, Backpropagation Through entiable dynamics that compute in their stationary states. Time (Werbos, 1990) unfolds the network in time to ap- ply standard backpropagation, Recurrent Backpropagation 2. Equilibrium Propagation Overview (Almeida, 1990; Pineda, 1987) avoids this memory require- ment but still requires a specific circuit to propagate errors, 2.1. Conservative Systems and the continuous Adjoint Method (Chen et al., 2018) addi- We first review standard Equilibrium Propagation (EP) tionally requires integrating the dynamics backward in time (Scellier &Bengio, 2017). We consider a network described which is not physically possible for a dissipative system. by an energy function E(x, θ, u), such that the force field is In this paper, we first propose Asymmetric EP (AsymEP), 2 Equilibrium Propagation for Non-Conservative Systems derived from the potential E: stationary point, i.e., that Eq. (7) holds. Second, EP implic- ∂ itly assumes that the Jacobian JE (x0 , u) = ∂x FE (x0 , u) is ∂ FE (x, θ, u) = − E(x, θ, u). (3) invertible. In this work, we assume this condition holds and ∂x will not state it explicitly hereafter. Third, for simplicity, The objective is to compute the total gradient dC 0 we omit the dependency on the input u and target y in the dθ (x , y) of a (quadratic) cost function C(x, y) evaluated at the minimum following equations. energy configuration of the system. This free equilibrium denoted x0 (which depend implicitly in θ and u), satisfies 2.2. Vector Field the stationarity condition: The Vector Field (VF) algorithm, introduced in (Scellier ∂ et al., 2018), is an early attempt to adapt EP to non- − E(x0 , θ, u) = 0. (4) reciprocal forces. This method relies on the observation ∂x that, for conservative systems, linearizing the right-hand To compute gradients, we introduce the augmented energy side of Eq. (9) around the equilibrium point x0 yields functional: ∂E(xβ , θ) ∂E(x−β , θ)   ET (x, θ, β, u, y) = E(x, θ, u) + βC(x, y), (5) 1 lim − β→0 2β ∂θ ∂θ where β is a scalar nudging parameter. The stationary config- ⊤  β  (10) x − x−β  ∂FE 0 uration of this augmented system is obtained by integrating = lim − (x , θ) , β→0 ∂θ 2β the dynamics dx ∂ET (x, θ, β, u) where FE = −∂x E(x, θ) is the conservative force. It is =− , (6) therefore tempting to use the right-hand side of Eq. (10) for dt ∂x parameter updates of non-conservative systems, for which until the energy minimum is reached. This new fixed point no energy function E exists. xβ , called nudged equilibrium, satisfies: The VF algorithm adopts precisely this approach. It uses ∂E(xβ , θ, u) ∂C(xβ , y) the nudged counterpart of Eq. (7), +β = 0. (7) ∂x ∂x ∂C β F (xβ , θ) − β (x ) = 0, (11) The training procedure, as improved in (Laborieux et al., ∂x 2021), uses two nudged phases with factors ±β (with β ̸= 0). Starting from x0 , the system relaxes to two in conjunction with the learning rule Eq. (10): nearby perturbed equilibria, x+β and x−β . The displace- ⊤  β x − x−β   ment x+β − x−β is then used to compute the parameter ∂F 0 ∆θ = ϵ (x , θ) . (12) update in the learning rule: ∂θ 2β 1 ∂E(xβ , θ, u) ∂E(x−β , θ, u)   ∆θ = −ϵ − , (8) However, as noted in (Scellier et al., 2018), Eq. (12) does 2β ∂θ ∂θ not align with the true gradient dC 0 dθ (x ) and is exact only if where ϵ > 0 is the learning rate. The theoretical foundation the force is conservative. To see this, let JF (x, θ) denote of EP is the result that, in the limβ→0 of Eq. (8), we get: the Jacobian of the vector field F (x, θ) (in components (JF (x, θ))ij = ∂F∂xi (x,θ) j ). Differentiating the equilibrium dC(x0 , y) d ∂E(xβ , θ, u) 0 condition F (x , θ) = 0 with respect to θ gives = , (9) dθ dβ ∂θ dx0 ∂F 0 see Appendix D.1. The error of the above method is O(β 2 ). JF (x0 , θ) + (x , θ) = 0. (13) This error can be further reduced using holomorphic equi- dθ ∂θ librium propagation (Laborieux &Zenke, 2022). Consequently, the exact gradient of the cost is Thus, EP recovers the exact gradient of the cost function ⊤ using only local computations. In this manner, learning dC 0 dx0 ∂C 0 (x ) = (x ) implements gradient descent without an explicit backward dθ dθ ∂x ⊤  pass, and credit assignment is realized through the system’s   ∂F 0 ⊤ 0 −1 ∂C 0 intrinsic relaxation dynamics. =− (x , θ) JF (x , θ) (x ) . ∂θ ∂x | {z }| {z } Three remarks can be made at this point. First, EP does not pre-synaptic post-synaptic require the system to be at an energy minimum, but only at a (14) 3 Equilibrium Propagation for Non-Conservative Systems The terms ’pre-synaptic’ and ’post-synaptic’ in Eq. (14) Algorithm 1 Asymmetric EP (AsymEP) are used by analogy with neuronal transmission: the pre- 1: Inputs: Force field F (x, θ), cost function C(x), nudg- synaptic factor captures the local influence of θ on the force ing parameter β, learning rate ϵ. F , while the post-synaptic factor is the sensitivity of the 2: repeat cost to state perturbations. 3: 1. Free Phase: Evolve to stationary state If instead we differentiate the nudged equilibrium condition 4: Evolve the system dynamics in Eq. (11) with respect to β and evaluate at β = 0, we 5: dx obtain = F (x, θ), (17) β dt dx ∂C 0 JF (x0 , θ) − (x ) = 0, (15) 6: until convergence to the stationary state x0 . dβ β=0 ∂x 7: 2. Jacobian Decomposition which gives 8: Compute the Jacobian at equilibrium: 9: dxβ −1 ∂C 0 ∂F 0 = JF (x0 , θ) (x , y). (16) JF (x0 , θ) = (x , θ), (18) dβ β=0 ∂x ∂x The right-hand side of Eq. (16) represents the effective post- 10: and decompose it in its antisymmetric part: 11: synaptic term used by the VF algorithm (Eq. 12). Compar- ing this with the exact post-synaptic term derived in Eq. (14), AJ (x0 , θ) = 12 (JF (x0 , θ) − JF (x0 , θ)⊤ ). (19) we see that they coincide only if JF = JF⊤ , i.e., only if the system is conservative. 12: 3. Nudged Phase: Augmented Dynamics Now, let SJ (x0 , θ) and AJ (x0 , θ) denote the symmetric 13: Integrate the dynamics twice starting from x0 and antisymmetric parts of the Jacobian at the free (un- 14: nudged) equilibrium, respectively. Then, we show in Ap- dx ∂C pendix A that the gradient error increases with the spectral = F (x, θ) − β (x) − 2AJ (x0 , θ) (x − x0 ), −1 dt ∂x radius of SJ (x0 , θ) AJ (x0 , θ). Consequently, large (20) antisymmetric contributions degrade the gradient estima- 15: until convergence to two new stationary states tion, confirming empirical observations in the Appendix of x±β A . (Ernoult et al., 2020). In fact, in the pathological limit where 16: 4. Parameter Update the Jacobian would be purely antisymmetric SJ (x0 , θ) = 0, 17: Update the parameters according to: the update of VF gives the negative of the true gradient, 18: maximizing the cost rather than minimizing it. ⊤ ! xβA − x−β  ∂F 0 A ∆θ = ϵ (x , θ) . (21) ∂θ 2β 3. Asymmetric EP Here, we introduce Asymmetric EP (AsymEP), see Algo- 19: until convergence of θ rithm 1, which removes the gradient estimate error inherent 20: Output: Optimized parameters θ. to VF by adding a local correction term to the augmented inference dynamics. The new nudged equilibrium xβA satis- fies: where JFA (x, θ) is the Jacobian of the modified dynamical ∂C β system Eq. (20). At the equilibrium point x0 , JFA is equal F (xβA , θ) − β (x ) − 2AJ (x0 , θ) (xβA − x0 ) = 0, (22) to the transpose of the original Jacobian: ∂x A As in VF, we then obtain two perturbed states x±β A for op- JFA (x0 , θ) = JF (x0 , θ) − 2AJ (x0 , θ) posite nudging ±β and apply the contrastive learning rule = SJ (x0 , θ) − AJ (x0 , θ) of Eq. (12). = JF⊤ (x0 , θ). (24) We now show that AsymEP gives rise to the correct learning rule, i.e. that right-hand side of Eq. (21) is proportional to where we have used the decomposition Eq. (44) of the orig- the gradient of the cost function dC 0 inal Jacobian J into its symmetric and antisymmetric com- dθ (x ) at the equilibrium 0 point x (Eq. 14). To this end, note that the same reasoning ponents. Therefore, the left hand side of Eq. (23) is equal to leading to Eq. (16) leads to the true post-synaptic term dxβA −1 ∂C 0 dxβA −1 ∂C 0 = JFA (x0 , θ) (x ). (23) = JF⊤ (x0 , θ) (x ), (25) dβ β=0 ∂x dβ β=0 ∂x 4 Equilibrium Propagation for Non-Conservative Systems which, using Eq. (14), proves the result. Additionally, al- until a stationary point (z β , z ′β ) is reached. Upon conver- though implied by the equality with the true gradient, we gence, we follow the standard EP paradigm in using the explicitly demonstrate the equivalence of the gradient esti- difference z β − z ′β to compute the post-synaptic term. Un- ′ ′ mates obtained by AsymEP and Backpropagation Through der the change of variables m = z+z 2 and d = z − z , we Time in Appendix B following (Ernoult et al., 2019). prove in Appendix D that m follows the original dynamics F (ensuring valid inference), while d relaxes to a "physical" Note that the corrective term −2AJ (x0 , θ)(x − x0 ) in error signal proportional to the cost gradient. Eq. (20) is spatially local: AJ vanishes for unconnected neurons, and (x − x0 ) is available at the synapse given the It is important to notice that while Dyadic EP introduces a memory mechanism already required by Eq. (12). This distinct formulation, it remains consistent with the general correction can create backward connections (Section 5.3). theoretical setting of EP and matches the computational However, in physical realizations, both feedforward and cost of AsymEP. Note also that we start the evolution of feedback connections must be physically present, though the free phase (β = 0) with the identical initial condition feedback may be deactivated during inference. for z and z ′ , (i.e., d = 0). This guarantees that integrat- ing Eq. (32) leads to a symmetric stationary point where 4. Dyadic EP z 0 = z ′0 . Finally, we underline that the modified varia- tional update rule in Eq. (34) is equivalent to the standard We now introduce Dyadic EP (Algorithm 2), a variational symmetric EP update rule in Eq. (8) (see Appendix D). algorithm that computes the exact cost gradient in the limit Now, to make this concrete, consider a continuous Hopfield of infinitesimal nudging. It maps the original n-variable network (see also Eq. (35)) with an asymmetric connection dynamics F (x, θ) onto a 2n-variable system (z, z ′ ) defined matrix J. After some calculations (see Appendix F), the by an energy H(z, z ′ , θ) and cost D(z, z ′ ). We show in augmented energy of the system can be re-expressed as: Appendix E that AsymEP can be seen as the first-order projection of Dyadic EP onto the original n-dimensional 1 1 HT = − ρ(z)⊤ Sρ(z) + ρ(z ′ )⊤ Sρ(z ′ ) − ρ(z)⊤ Aρ(z ′ ) state space. 2 2 1 β The new system is defined by the energy H and cost function + (∥z∥ − ∥z ∥ ) + (C(z, y) + C(z ′ , y)) , 2 ′ 2 2 2 D, given in terms of F and C by: (29)  z + z′  where S and A are the symmetric and antisymmetric parts H(z, z ′ , θ) = −(z − z ′ )⊤ F ,θ , of J, respectively and ρ is an element-wise non-linearity. 2  ′  An interesting analogy can be drawn with standard learning z+z D(z, z ′ ) = C , (26) rules in discrete Hopfield networks (Hopfield, 1982). For 2 a sequence of binary memories {ξ 1 , . . . , ξ m } where ξ µ ∈ where z, z ′ ∈ Rn . In order to learn, we introduce the aug- {−1, 1}n , S P corresponds to the standard autoassociative mented energy Hebbian rule µ ξ µ (ξ µ )⊤ , creating stable attractors at each pattern. In contrast, A corresponds to the heteroassociative HT (z, z ′ , θ, β) = H(z, z ′ , θ) + βD(z, z ′ ). (27) rule (e.g., a cycle between ξ µ and ξ ν given by ξ ν (ξ µ )⊤ − The equilibrium configuration corresponds to a saddle point ξ µ (ξ ν )⊤ ), encoding transitions between patterns. of HT , where z minimizes and z ′ maximizes the energy. For this specific energy, the update rule given by Eq. (34) This poses no issue for EP, which requires only that the can be re-expressed as: joint state (z, z ′ ) reaches a stationary state. Although this min-maximization can be interpreted as z evolving forward 1 ⊤ ρ(z ′β ) − ρ(z β ) ρ(z ′β ) + ρ(z β ) . (30)  ∆J ∝ − and z ′ backward in time, in practice they evolve forward 2β simultaneously, as we integrate the coupled equations: In the limit β → 0, this gives: z + z′   dz ∂HT β ! =− =F ,θ d dt ∆J ∝ ⊙ ρ′ (m)ρ(m)⊤ . (31) ∂z ⊤ 2 β z − z′ β ∂C z + z ′   ∂F + − , 2 ∂z z+z′ 2 ∂z 2 2 matching the learning rule in (Pineda, 1987), with β dz ′ ∂HT  z + z′  limβ→0 dβ being the error signal. =+ ′ =F ,θ dt ∂z 2 ′ ⊤ β ∂C z + z ′    z−z ∂F 5. Numerical Experiments − + , 2 ∂z ′ z+z′ 2 ∂z ′ 2 2 In this section, we numerically validate AsymEP (Algo- (28) rithm 1). The neuronal dynamics follows the one introduced 5 Equilibrium Propagation for Non-Conservative Systems Algorithm 2 Dyadic EP where ∥ · ∥F denotes the Frobenius norm. Note that this 1: Inputs: Force field F (x, θ), cost function C(x, y), metric does not capture the asymmetry of the Jacobian, nudging parameter β, learning rate ϵ which depends on the state x. 2: repeat For numerical experiments, we restricted the network to a 3: 1. Free Phase: Evolve to stationary state layered architecture with a single hidden layer to facilitate 4: Evolve the system dynamics, starting from identi- comparison with prior work. Accordingly, J in contains cal initial conditions z(0) = z ′ (0) = z0 , only input-to-hidden connections, while J dyn is block off- 5: diagonal, encoding bidirectional interactions between the dz ∂H dz ′ ∂H =− , =+ ′, (32) hidden and output layers. Both J in and J dyn are trained. dt ∂z dt ∂z We first use MNIST (LeCun, 1998) (60k train, 10k test) 6: until stationary states z 0 , z ′0 are reached. followed by Fashion-MNIST to validate AsymEP, and then 7: 2. Nudged Equilibrium we further validate AsymEP and Dyadic EP by comparing 8: Evolve the system dynamics, starting from the them to Backpropagation on a convolutional feedforward, solution of the free phase z 0 = z ′0 : with CIFAR-10. Inputs are normalized using min-max to 9: dz ∂HT dz ′ ∂HT [−1, 1] and targets are one-hot encoded in {−1, 1}. All =− , =+ , (33) hyperparameters are detailed in Appendix G, along with dt ∂z dt ∂z ′ additional details and numerical results. 10: until two nudged stationary states z β , z ′β are reached. 5.1. Symmetric Initialization 11: 3. Parameter Update 12: Update the parameters according to: We start by comparing AsymEP with standard EP and 13: VF. All algorithms are initialized with an identical sym- 1 ∂H(z β , z ′β , θ)   ∆θ = −ϵ (34) metric matrix J dyn . EP maintains this symmetry through- β ∂θ out training, while VF and AsymEP induce asymmetry in 14: until convergence of θ J dyn . Since EP and VF already achieve strong performance 15: Output: Optimized parameters θ. on MNIST, the purpose of this experiment is to validate AsymEP and compare it against EP and VF rather than outperform the state of the art. in (Scellier &Bengio, 2017), and is generalized to allow Figure 1 compares the three algorithms as a function of for non-reciprocal forces as in (Scellier et al., 2018). For hidden-layer dimension after 1 and 20 epochs. AsymEP clarity, we express the forces in a form that explicitly sepa- consistently outperforms the baselines, suggesting it learns rates the contributions of the external input and the recurrent faster and better. interactions: Figure 2 studies the evolution of the asymmetry ratio rstr . F (x) = ρ′ (x) ⊙ J in u + J dyn ρ(x) − x,  (35) The results are reported for 50 hidden neurons. As expected, EP preserves the initial weight symmetry. In contrast, VF where u ∈ RNin denotes the input and x ∈ RNdyn the neu- and AsymEP induce non-trivial evolution of rstr following ronal state, comprising both hidden and output units. The two distinct patterns, resulting in three distinct network matrices J in ∈ RNdyn ×Nin and J dyn ∈ RNdyn ×Ndyn define the configurations. A complementary figure is available in Ap- input and recurrent connectivity, respectively. The activation pendix G.1. function ρ(·) is taken to be the hyperbolic tangent, applied element-wise. 5.2. Fixed Asymmetry Ratio If J dyn is symmetric, we can define the energy: While the previous section focused on networks compatible 1 1 with all three algorithms (EP, VF, AsymEP), we now turn E(x) = ∥x∥2 − ρ(x)⊤ J dyn ρ(x) − ρ(x)⊤ J in u, (36) to architectures with strong structural asymmetry. In this 2 2 regime, EP is inapplicable by construction, and, as we show, which is identical to that of (Scellier &Bengio, 2017), pro- VF performs poorly, contrary to AsymEP which remains vided that the input neurons are activated as ρ(u). effective. Equation (35) naturally motivates a quantitative measure of To this end, we consider a class of networks where the structural asymmetry rstr , defined as: asymmetry ratio rstr defined in Eq. (37) is kept fixed. Let S̃ ⊤ and à be arbitrary symmetric and antisymmetric matrices ∥(J dyn − J dyn )/2∥F in RNdyn ×Ndyn respectively. We enforce a fixed rstr via the rstr = , (37) ∥J dyn ∥F 6 Equilibrium Propagation for Non-Conservative Systems where γ ∈ R is a learnable global scale. Using VF and AsymEP, we train a layered network with one hidden layer of 50 neurons (in which case S̃ and à are block off-diagonal) for different values of rstr to investigate the impact of structural asymmetry. We compare two training regimes: training only the input weights J in (and the scale γ), versus training all parameters including J dyn . The first regime trains only the external forces from the input ρ′ (x) ⊙ J in u (which correspond to a symmetric contribution in the Jacobian) applied to our non-conservative system, while the second additionally trains J dyn and therefore the non- (a) Results after one epoch. symmetric part of the Jacobian directly. Figure 3 summarizes the results. We find that AsymEP maintains robust performance across all asymmetry levels (e.g., achieving an accuracy of 93.8 ± 0.4% at rstr = 0 and 94.9 ± 0.2% at rstr = 0.875 when training all parameters) and can even learn when the recurrent connection matrix J dyn is completely antisymmetric (rstr = 1). Additionally, training all parameters shows significant improvement over training only J in . In contrast, VF performs well at low asymmetry ratios but degrades as asymmetry increases, eventually dropping to chance levels (e.g., accuracies of 5 ± 3% and 8 ± 4% (b) Results after 20 epochs. at rstr = 1 for input-only and all-parameter training, re- Figure 1. Comparison of algorithm performance on MNIST using spectively). When only J in is trained, VF accuracy col- a layered architecture with one hidden layer and symmetric ini- lapses around rstr ≈ 0.5, whereas training all parameters tialization. Squares denote AsymEP, circles EP, and triangles VF. delays this collapse until rstr ≈ 0.8. Our analysis in Ap- Test accuracy (averaged over 10 runs) is shown after one epoch pendix G.2.1 reveals that VF adjusts the dynamics such that (Fig. 1a) and 20 epochs (Fig. 1b). the asymmetry of the Jacobian’s off-diagonal terms remains strictly lower than the structural asymmetry ratio. The train- ing appears to adjust the neuronal state such that neurons connected by strongly asymmetric weights have low activa- tion. As shown in Appendix G.2.1, AsymEP learns faster than VF across all levels of asymmetry. Finally, Appendix G.3 opens with a brief theoretical dis- cussion of the stability of these non-conservative dynamics, followed by simulations on all-to-all topologies with con- strained rstr and input projections J in . Even in this worst- case setting, AsymEP reduces oscillations and improves stability. 5.3. Feedforward Architectures Figure 2. Evolution of the asymmetry ratio rstr (defined in Eq. (37)) We now consider a purely feedforward architecture. Here during training on MNIST for AsymEP, EP and VF, initialized from a symmetric configuration. The models use 50 hidden neu- VF trains only the last layer: with no backward connections, rons. the output nudging signal cannot reach earlier layers, so for every layer but the last the nudged stationary states coincide with the free states, giving zero weight updates. As only following parameterization of the recurrent parameters: the output layer is trained, the system essentially becomes "q # an Extreme Learning Machine (Huang et al., 2006; Wang dyn 2 S̃ à et al., 2022). In contrast, AsymEP introduces a correction J =γ 1 − rstr + rstr , (38) that generates effective backward connections, allowing the ∥S̃∥F ∥Ã∥F 7 Equilibrium Propagation for Non-Conservative Systems tivity structures inspired by (Millidge et al., 2023), while keeping the number of trainable parameters fixed. Experiments are conducted on Fashion-MNIST using a two- hidden-layer network with hidden dimensions 500 and 200. Network states are denoted (x0 , x1 , x2 , x3 ), where x0 is the input and x3 = xL the output. Forward and backward connections are denoted by Wk and Bk , respectively, with W1 = J in . We consider three classes of dynamics. First, the Continuous Hopfield (CH) dynamics introduced previously: dxk   = −xk +ρ′ (xk )⊙ Wk ρ(xk−1 )+(1−δk,L )Bk ρ(xk+1 ) . dt (40) Second, Predictive Coding (PC) dynamics, defined through the prediction errors ek = xk − Wk ρ(xk−1 ), whose fixed Figure 3. Impact of the structural asymmetry ratio rstr on accuracy (top) and standard deviation over 10 runs (bottom) on MNIST. point ek = 0 corresponds to a standard feedforward net- We compare VF (orange) and AsymEP (blue) under two training work: regimes: training only J in (dashed) or all parameters (solid). dxk = −ek + (1 − δk,L ) (ρ′ (xk ) ⊙ (Bk ek+1 )) . (41) dt nudging signal to influence all layers. We make this explicit Third, a standard dynamics chosen for direct comparison for a network with one hidden layer. with backpropagation: Let the state x be partitioned in hidden h and output o dxk layers. The recurrent connection matrix is then J dyn = = −xk + Wk ρ(xk−1 ) + (1 − δk,L )Bk ρ(xk+1 ). (42) dt  0 0 . The forces of the system are: Wh→o 0 For each dynamics, we examine three connectivity scenar-  β ios. Fh = ρ′ (h) ⊙ J in u + λ(Wh→o )⊤ (o − o0 ) − h       0  ⊤ Fo = ρ′ (o) ⊙ Wh→o ρ(h) − λWh→o (h − h ) • In the asymmetric case (Bk ̸= Wk+1  β ), the backward (39)  weights Bk are randomly initialized and kept fixed   ∂C while only the forward weights are trained, ensuring a + λβ −o   ∂o fair comparison (i.e., identical number of parameters); where λ is 0 during the free inference and 1 during the in PC, the learning rule for Bk is zero when only inputs nudged phase (Eq. 20). The force on the hidden layer Fhβ are clamped. now depends on the output layer through the term ρ′ (h) ⊙ ⊤ ⊤ • In the symmetric / conservative case (Bk = Wk+1 ), the (Wh→o ) (o − o0 ), enabling the nudge (the term β ∂C ∂o ) to CH and PC dynamics derive from an energy functional, influence the hidden layer. This implicitly assumes that the while the standard dynamics remains non-conservative hardware implementation supports the physical activation due to its non-symmetric Jacobian. of these backward connections. • In the feedforward case (Bk = 0), the PC and stan- We validate this using a single hidden layer of only 20 neu- dard dynamics coincide; for the standard dynamics, the rons on MNIST. After training, VF saturates with 64.3 ± AsymEP learning rule mirrors backpropagation, with 2.0% accuracy, whereas AsymEP reaches 92.7 ± 0.5% ac- ∆xβk = 2β1 (xβ − x−β ) acting as the propagated error curacy. We expect this discrepancy to increase with network signal. depth, since this increases the number of layers unable to learn under VF. A figure with the accuracy during training can be found in Appendix G.4.2. Table 1 shows that AsymEP consistently outperforms VF in both asymmetric and feedforward settings, in final ac- curacy, learning speed, and stability. After a single epoch 5.4. Advantages of Non-Conservative Dynamics it already provides on average a 15% accuracy gain with AsymEP is not tied to a specific neural dynamics. To further an order-of-magnitude reduction in variance. Remarkably, assess the benefits of training non-conservative dynamics AsymEP with asymmetric connectivity also surpasses EP using AsymEP, we compare several dynamics and connec- on symmetric networks despite training only the forward 8 Equilibrium Propagation for Non-Conservative Systems weights, suggesting that relaxing symmetry constraints may 6. Discussion and Conclusion improve expressivity. Supplementary results are provided in Appendix G.5. In this work, we extended Equilibrium Propagation (EP) to non-conservative systems that reach stationary states by deriving two mathematically equivalent algorithms that re- cover the exact gradient of the cost function in the limit of Table 1. Test accuracy on Fashion-MNIST (%) at Epoch 50 (mean ± std 10 runs). BP on a standard feedforward architecture using infinitesimal nudging. MSE and SGD achieve 87.37 ± 0.29%. The first approach, Asymmetric EP, preserves the original inference dynamics. It introduces a corrective force during EP AsymEP VF the nudged phase that remains spatially local, as the anti- Asym - 86.78 ± 0.14 85.20 ± 0.12 symmetric Jacobian is null for unconnected neurons and the CH Feedfor - 86.05 ± 0.12 77.76 ± 0.37 Sym 84.30 ± 0.13 - - perturbation from equilibrium is available at the synapse Asym - 86.20 ± 0.17 80.71 ± 6.17 level. Unlike standard methods like Recurrent Backpropa- PC gation (Almeida, 1990; Pineda, 1987), this avoids explicit Sym 84.78 ± 0.14 - - Asym - 82.91 ± 0.48 75.52 ± 1.69 digital weight transposition. However, a physical mech- Standard Feedfor - 86.25 ± 0.16 78.58 ± 0.28 anism to obtain the local corrective force at the synapse level remains a subject for future work. We also note that AsymEP shares the temporal non-locality of standard EP. Finally, to investigate how AsymEP scales with depth, we The second approach, Dyadic EP, doubles the state space trained deeper fully connected networks with two and three to map non-reciprocal dynamics onto an energy land- hidden layers of 500 neurons on Fashion-MNIST, reaching scape—conceptually reminiscent of multi-compartment cor- 86.41 ± 0.22% and 87.8 ± 0.15% test accuracy respectively. tical neurons, where apical dendrites integrate feedback (analogous to z − z ′ ) separately from basal feedforward 5.5. Feedforward Training on CIFAR-10: BP vs. Dyadic input (analogous to z + z ′ ) (Guerguiev et al., 2017). Addi- EP vs. AsymEP tionally, this expanded space also enables the positive and negative nudging phases to run in parallel. This offers a To test whether our framework scales beyond shallow net- pathway to implement a version of EP that is local in time, works, we conclude with a deep, purely feedforward CNN but would require a doubling of the degrees of freedom architecture trained on CIFAR-10. We compare backprop- on the physical hardware. More fundamentally, the energy agation (BP), VF, AsymEP and Dyadic EP in a controlled defined on the extended state shows that the tools and the- setting where the gradient estimator is the only difference oretical guarantees obtained for EP should also apply to between runs: all methods share the same configuration, the case of non-reciprocal forces, and that the variational with the BP gradient replaced by the contrast of stationary principle behind EP is universal in the sense that it can be states for the EP-based methods (see App. G.6 for details). applied to all networks which operate in a stationary state. Each configuration is trained for 40 epochs over 5 seeds. Furthermore, Dyadic EP is not restricted to the EP com- Table 2 reports the final test accuracy. Both of our algo- munity and could suggest a more physically plausible al- rithms scale to this regime, closely tracking the BP baseline ternative to the stationary-state Adjoint Method (for fixed throughout training and matching its final accuracy: a paired inputs) (Chen et al., 2018): by solving the forward and ad- t-test finds no significant difference between Dyadic EP and joint equations simultaneously via relaxation, it circumvents BP (p = 0.75), and only a sub-percent gap for AsymEP. a separate backward-in-time pass. In contrast, VF makes slight initial progress (peaking near 30%) before collapsing to chance level (10%). Additional Finally, our experiments on MNIST, Fashion-MNIST, and details can be found in Appendix G.6 CIFAR-10 confirm that AsymEP and Dyadic EP consis- tently outperform EP and VF, and notably enables effective training of feedforward networks. Our work thus opens new avenues for learning in neuro- Table 2. Test accuracy on CIFAR-10 (%) at epoch 40 (mean ± std over 5 seeds). morphic hardware, dissipative physical systems, and neural architectures where asymmetry is intrinsic rather than inci- Method Test Acc. (%) dental. Backpropagation 90.66 ± 0.25 Dyadic EP 90.69 ± 0.14 AsymEP 89.74 ± 0.14 VF 10.00 ± 0.00 9 Equilibrium Propagation for Non-Conservative Systems Impact Statement References This paper presents results that advance the field of machine Almeida, L. B. A learning rule for asynchronous percep- learning. There are many potential societal consequences trons with feedback in a combinatorial environment. In of our work, none of which we feel must be specifically Artificial neural networks: concept learning, pp. 102–111. highlighted here. 1990. Altman, L. E., Stern, M., Liu, A. J., and Durian, D. J. Ex- Acknowledgments perimental demonstration of coupled learning in elastic networks. Physical Review Applied, 22(2):024053, 2024. AES is fully funded by the Horizon Europe Marie Skłodowska-Curie Doctoral Network ’Postdigital Plus’ Aykroyd, C., Bourgoin, A., and Poncin-Lafitte, C. L. Hamil- (Grant 101169118). DVA acknowledges the support of tonian treatment of non-conservative systems. arXiv the French Community of Belgium through a FRIA fellow- preprint arXiv:2507.18658, 2025. ship. SM acknowledges financial support by the Fonds de la Bateman, H. On dissipative systems and related variational Recherche Scientifique–FNRS, Belgium under EOS Project principles. Physical Review, 38(4):815, 1931. No. 40007536. Computational resources have been pro- vided by the Consortium des Équipements de Calcul Intensif Berneman, M. and Hexner, D. Equilibrium propagation for (CÉCI), funded by the Fonds de la Recherche Scientifique dissipative dynamics. Advanced Intelligent Systems, pp. de Belgique (F.R.S.-FNRS) under Grant No. 2.5020.11 and e202501310, 2025. by the Walloon Region. Bishop, K. J., Biswal, S. L., and Bharti, B. Active colloids as models, materials, and machines. Annual Review of Chemical and Biomolecular Engineering, 14(1):1–30, “ἁρμονίη ἀφανὴς φανερῆς κρείττων” 2023. Bowick, M. J., Fakhri, N., Marchetti, M. C., and Ra- maswamy, S. Symmetry, thermodynamics, and topology in active matter. Physical Review X, 12(1):010501, 2022. Brandenbourger, M., Locsin, X., Lerner, E., and Coulais, C. Non-reciprocal robotic metamaterials. Nature communi- cations, 10(1):4608, 2019. Cesa-Bianchi, N. and Lugosi, G. Prediction, learning, and games. Cambridge university press, 2006. Chen, R. T., Rubanova, Y., Bettencourt, J., and Duvenaud, D. K. Neural ordinary differential equations. Advances in neural information processing systems, 31, 2018. Cin, N. D., Marquardt, F., and Wanjura, C. C. Training nonlinear optical neural networks with scattering back- propagation. arXiv preprint arXiv:2508.11750, 2025. Costa, P. and Santos, P. A. Directed equilibrium propagation revisited. Mathematics, 13(11), 2025. ISSN 2227-7390. Crick, F. The recent excitement about neural networks. Nature, 337, 1989. Dillavou, S., Stern, M., Liu, A. J., and Durian, D. J. Demon- stration of decentralized physics-driven learning. Physi- cal Review Applied, 18(1):014040, 2022. Dillavou, S., Beyer, B. D., Stern, M., Liu, A. J., Miskin, M. Z., and Durian, D. J. Machine learning without a pro- cessor: Emergent learning in a nonlinear analog network. Proceedings of the National Academy of Sciences, 121 (28):e2319718121, 2024. 10 Equilibrium Propagation for Non-Conservative Systems Ernoult, M., Grollier, J., Querlioz, D., Bengio, Y., and Scel- Indiveri, G. and Liu, S.-C. Memory and information pro- lier, B. Updates of equilibrium prop match gradients cessing in neuromorphic systems. Proceedings of the of backprop through time in an rnn with static input. IEEE, 103(8):1379–1397, 2015. Advances in neural information processing systems, 32, 2019. Kalinin, K. P., Gladrow, J., Chu, J., Clegg, J. H., Cletheroe, D., Kelly, D. J., Rahmani, B., Brennan, G., Canakci, B., Ernoult, M., Grollier, J., Querlioz, D., Bengio, Y., and Scel- Falck, F., et al. Analog optical computer for ai inference lier, B. Equilibrium propagation with continual weight and combinatorial optimization. Nature, 645(8080):354– updates. arXiv preprint arXiv:2005.04168, 2020. 361, 2025. Falk, M. J., Strupp, A. T., Scellier, B., and Murugan, Kendall, J., Pantone, R., Manickavasagam, K., Bengio, A. Temporal contrastive learning through implicit non- Y., and Scellier, B. Training end-to-end analog neural equilibrium memory. Nature Communications, (16), networks with equilibrium propagation. arXiv preprint 2025. arXiv:2006.01981, 2020. Farinha, M. T., Pequito, S., Santos, P. A., and Figueiredo, Laborieux, A. and Zenke, F. Holomorphic equilibrium M. A. T. Equilibrium propagation for complete directed propagation computes exact gradients through finite size neural networks. In Proceedings of the 28th European oscillations. Advances in Neural Information Processing Symposium on Artificial Neural Networks, Computational Systems, 35:12950–12963, 2022. Intelligence and Machine Learning (ESANN 2020), 2020. Galley, C. R. Classical mechanics of nonconservative sys- Laborieux, A. and Zenke, F. Improving equilibrium propa- tems. Physical review letters, 110(17):174301, 2013. gation without weight symmetry through jacobian home- ostasis. In Proceedings of the International Confer- Guerguiev, J., Lillicrap, T. P., and Richards, B. A. Towards ence on Learning Representations (ICLR) 2024, Virtual deep learning with segregated dendrites. elife, 6:e22901, (ICLR), May 2024. 2017. Laborieux, A., Ernoult, M., Scellier, B., Bengio, Y., Grollier, Høier, R. and Zach, C. A lagrangian perspective on dual J., and Querlioz, D. Scaling equilibrium propagation to propagation. In Proceedings of the First Workshop on Ma- deep convnets by drastically reducing its gradient estima- chine Learning with New Compute Paradigms at NeurIPS tor bias. Frontiers in neuroscience, 15:633674, 2021. 2023, New Orleans, LA, USA, Dec 2023. Laydevant, J., Marković, D., and Grollier, J. Training an Høier, R. and Zach, C. Two tales of single-phase contrastive ising machine with equilibrium propagation. Nature Com- hebbian learning. In Salakhutdinov, R., Kolter, Z., Heller, munications, 15(1):3671, 2024. K., Weller, A., Oliver, N., Scarlett, J., and Berkenkamp, F. (eds.), Proceedings of the 41st International Conference LeCun, Y. The mnist database of handwritten digits. on Machine Learning, volume 235 of Proceedings of http://yann. lecun. com/exdb/mnist/, 1998. Machine Learning Research, pp. 18470–18488. PMLR, 21–27 Jul 2024. Martin, E., Ernoult, M., Laydevant, J., Li, S., Querlioz, D., Petrisor, T., and Grollier, J. Eqspike: spike-driven equi- Høier, R., Staudt, D., and Zach, C. Dual propagation: accel- librium propagation for neuromorphic implementations. erating contrastive hebbian learning with dyadic neurons. Iscience, 24(3), 2021. In Proceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023. Massar, S. Equilibrium propagation for learning in la- Høier, R., Kalinin, K., Ernoult, M., and Zach, C. Dyadic grangian dynamical systems. Physical Review E, 112 learning in recurrent and feedforward models. In NeurIPS (3):035304, 2025. 2024 Workshop Machine Learning with new Compute Massar, S. and Mognetti, B. M. Equilibrium propagation: Paradigms, 2024. the quantum and the thermal cases. Quantum Studies: Hopfield, J. J. Neural networks and physical systems with Mathematics and Foundations, 12(1):6, 2025. emergent collective computational abilities. Proceedings Millidge, B., Song, Y., Salvatori, T., Lukasiewicz, T., and of the national academy of sciences, 79(8):2554–2558, Bogacz, R. Backpropagation at the infinitesimal infer- 1982. ence limit of energy-based models: Unifying predictive Huang, G.-B., Zhu, Q.-Y., and Siew, C.-K. Extreme learning coding, equilibrium propagation, and contrastive hebbian machine: theory and applications. Neurocomputing, 70 learning. In International Conference on Learning Rep- (1-3):489–501, 2006. resentations (ICLR), 2023. 11 Equilibrium Propagation for Non-Conservative Systems Nest, T. and Høier, R. Dyadic learning in asymmetric Wang, Q., Wanjura, C. C., and Marquardt, F. Training convnets. In New Frontiers in Associative Memories- coupled phase oscillators as a neuromorphic platform Workshop at ICLR 2026. using equilibrium propagation. Neuromorphic Computing and Engineering, 4(3):034014, 2024. Osat, S. and Golestanian, R. Non-reciprocal multifarious self-organization. Nature Nanotechnology, 18(1):79–85, Wanjura, C. C. and Marquardt, F. Quantum equilibrium 2023. propagation for efficient training of quantum systems based on onsager reciprocity. Nature Communications, O’Connor, P., Gavves, E., and Welling, M. Training a spik- 16(1):6595, 2025. ing neural network with equilibrium propagation. In The 22nd international conference on artificial intelligence Werbos, P. J. Backpropagation through time: what it does and statistics, pp. 1516–1523. PMLR, 2019. and how to do it. Proceedings of the IEEE, 78(10):1550– 1560, 1990. Pineda, F. Generalization of back propagation to recurrent and higher order neural networks. In Neural information Yi, S.-i., Kendall, J. D., Williams, R. S., and Kumar, S. processing systems, 1987. Activity-difference training of deep neural networks using memristor crossbars. Nature Electronics, 6(1):45–51, Pourcel, G., Basu, D., Ernoult, M., and Gilra, A. Lagrangian- 2023. based equilibrium propagation: generalisation to arbi- trary boundary conditions & equivalence with hamilto- nian echo learning. arXiv preprint arXiv:2506.06248, 2025. Rageau, T. and Grollier, J. Training and synchronizing oscillator networks with equilibrium propagation. Neuro- morphic Computing and Engineering, 2025. Sajnok, K. and Matuszewski, M. Near-equilibrium propaga- tion training in nonlinear wave systems. arXiv preprint arXiv:2510.16084, 2025. Scellier, B. Quantum equilibrium propagation: Gradient- descent training of quantum systems. arXiv preprint arXiv:2406.00879, 2024. Scellier, B. and Bengio, Y. Equilibrium propagation: Bridg- ing the gap between energy-based models and backprop- agation. Frontiers in computational neuroscience, 11:24, 2017. Scellier, B., Goyal, A., Binas, J., Mesnard, T., and Bengio, Y. Generalization of equilibrium propagation to vector field dynamics. arXiv preprint arXiv:1808.04873, 2018. Scellier, B., Mishra, S., Bengio, Y., and Ollivier, Y. Agnostic physics-driven deep learning. arXiv:2205.15021v1, 2022. Scurria, A. E. A physical theory of backpropagation: Exact gradients from the least-action principle. 2026. Stern, M., Hexner, D., Rocks, J. W., and Liu, A. J. Su- pervised learning in physical networks: From machine learning to learning machines. Physical Review X, 11(2): 021045, 2021. Wang, J., Lu, S., Wang, S.-H., and Zhang, Y.-D. A review on extreme learning machine. Multimedia Tools and Applications, 81(29):41611–41660, 2022. 12 Equilibrium Propagation for Non-Conservative Systems A. Gradient Estimation Error in VF where s denotes the dynamical state of the system. This symmetry is the linchpin of the equivalence proof, as the In this appendix, we quantify the gradient estimation error gradient expressions derived for BPTT and standard EP introduced by VF in the limit where the Jacobian asymmetry differ precisely by a transpose operation applied to ∂F ∂s . is small. This observation aligns with our analysis in the main text: Comparing the post-synaptic update terms in Eqs. (12) and VF fails in non-conservative systems due to the missing (14) gives the following error in the gradient of the cost: transpose in the post-synaptic term (see Eq. (16)). Following  ⊤ the derivation in Ernoult et al. (2019) (viz., Appendix A, Eqs. ∂F 0 (31–33)), the recursive relations for the gradients in BPTT Error = − (x , θ) ∂θ are given by:  −1 −1  ∂C 0 × JF (x0 , θ) − JF⊤ (x0 , θ) (x , y), (43) ∂ℓ ∂x ∇BPTT s (0) = (s⋆ , y), (49) ∂s To quantify this error, we decompose the Jacobian JF (x, θ) and for all t = 1, . . . , K, into its symmetric part SJ (x, θ) and antisymmetric part  ⊤ ∂F SJ (x, θ) = 12 JF (x, θ) + JF⊤ (x, θ) , ∇BPTT ∇BPTT (t − 1),  s (t) = (x, s⋆ , θ) s (50) (44) ∂s AJ (x, θ) = 12 JF (x, θ) − JF⊤ (x, θ) .   ⊤ ∂F ∇BPTT θ (t) = (x, s⋆ , θ) ∇BPTT s (t − 1), (51) Assuming the asymmetry AJ (x, θ) is small, we can make ∂θ a series expansion in SJ−1 AJ (omitting the dependencies where θ represents the optimization parameters, ℓ is the for clarity). Applying the Neumann expansion for small cost function, s⋆ is the free equilibrium state (satisfying ∥SJ−1 AJ ∥ gives F (s⋆ ) = 0), y is the target, and x is the input. The index t ∞ ! denotes the unrolled time steps, initialized at s(0) = s⋆ . X (JF ) −1 = (−1) n (SJ−1 AJ )n SJ−1 , (45) In contrast, the gradients computed by VF follow the recur- n=0 sion (viz., Ernoult et al. (2019), Appendix A, Eqs. (24–26)): ∞ ! X (JF⊤ )−1 = (SJ−1 AJ )n SJ−1 . (46) ∂ℓ ∆EP s (0) = − (s⋆ , y), (52) n=0 ∂s Subtracting the two series and assuming convergence, we and for all t ≥ 0, finally obtain ∂F ∆EP s (t + 1) = (x, s⋆ , θ) ∆EPs (t), (53) ∞ ∂s ! X −1 2n+1 −1 ⊤ −1 (JF ) − (JF ) = −2 SJ AJ SJ−1 .  ∂F ⊤ n=0 ∆EP θ (t + 1) = (x, s ⋆ , θ) ∆EP s (t). (54) (47) ∂θ Comparing these two sets of equations confirms that the only B. Equivalence between AsymEP and BPTT difference are Eqs. (50) and (53), specifically the transpose of the Jacobian ∂F ∂s (ignoring the global sign difference in In this appendix, we sketch the equivalence between the Eqs. (49) and (52)). gradient estimate computed by AsymEP and Backpropaga- tion Through Time (BPTT) (Werbos, 1990) for a Recurrent In AsymEP, we modify the dynamics by adding a correction Neural Network with fixed inputs. Our derivation relies on term dependent on the antisymmetric part of the Jacobian. the proof provided by Ernoult et al. (2019), which estab- Denoting the force of this augmented system by F A , the lished that standard (conservative) EP computes gradients Jacobian at the free equilibrium satisfies: identical to those of BPTT. To facilitate direct comparison, ⊤ ∂F A  we adopt their notation for this section. ∂F (x, s⋆ , θ) = (x, s⋆ , θ) . (55) ∂s ∂s The proof provided by Ernoult et al. (2019) relies on the assumption that the vector field F (i.e., transition function) By substituting this corrected Jacobian into the recursive is derived from a scalar potential function, which implies relations, AsymEP recovers the exact transpose required that by BPTT. Consequently, our method extends the equiva-  ⊤ ∂F ∂F lence between EP and BPTT to the general case of non- = , (48) conservative force. ∂s ∂s 13 Equilibrium Propagation for Non-Conservative Systems C. Out-of-Equilibrium Mechanics C.3. Symmetry Breaking as Credit Assignment Here we sketch the physical picture behind the doubled- On the diagonal manifold z = z ′ the doubled system enjoys energy construction of Eq. (26). The full derivation from a gauge symmetry: the auxiliary variable z ′ is redundant and Hamilton’s least-action principle, together with its connec- the difference d is identically zero. Credit assignment is im- tion to the Bateman–Galley formalism for non-conservative plemented by deliberately breaking this symmetry through classical mechanics (Bateman, 1931; Galley, 2013; Aykroyd the task cost. Adding βD(z, z ′ ) = β C(m) to H exerts et al., 2025), can be found in (Scurria, 2026). opposite forces on z and z ′ and drives them apart, so that the difference d ceases to be redundant and begins to carry C.1. The Helmholtz Obstruction information about the loss landscape. The natural physical route to a variational principle for a dynamical system ẋ = F (x, θ) is to seek a scalar potential D. Proofs for Dyadic EP E such that F = −∂x E. The classical Helmholtz integra- We now demonstrate that Dyadic EP correctly trains the bility condition states that such an E exists if and only if the parameters θ of the original force field F (x, θ), giving the Jacobian JF is symmetric everywhere. Whenever the inter- 0 exact gradient dC(x̄ dθ ) in the limit of infinitesimal nudging. actions are non-reciprocal — as in feedforward networks, active matter, or driven optical systems — JF acquires a non-zero antisymmetric part and the Helmholtz condi- D.1. Proof of EP tion fails identically. No scalar potential on the original First, recall that standard EP does not strictly require the n-dimensional state space can then generate the dynamics, system to settle at an energy minimum; it requires only that and the “energy minimisation” route at the heart of standard the system reaches a stationary state (a fixed point of the EP is blocked at the structural level. The obstruction is not dynamics). Indeed, using the notation of Section 2.1, EP a matter of computational convenience: it reflects the fact relies on the key identity: that the rotational component of F carries information that no scalar function of x alone can record. d2 d2 ET (xβ , θ) = ET (xβ , θ). (57) dθdβ dβdθ C.2. Variational Reconstruction on a Doubled Space Expanding the total derivative with respect to β gives: Applying the Bateman–Galley formalism circumvents this ⊤ ∂ET (xβ , θ) dxβ ∂ET (xβ , θ)  obstruction by enlarging the configuration space. The single d ET (xβ , θ) = + state x ∈ Rn is replaced by a conjugate pair (z, z ′ ) ∈ R2n , dβ ∂x dβ ∂β and the rotational component of F — which has no scalar = C(xβ ). (58) generator on the original n-dimensional space — is ab- sorbed into a bilinear coupling between z and z ′ on the Where the first term vanishes because the system is at a ∂ doubled space, where it does admit a variational descrip- stationary state, i.e., ∂x ET (xβ , θ) = 0; this holds even if tion. The physical motion is recovered on the diagonal the system is not at a minimum of ET . Similarly, for the submanifold z = z ′ (the so called ’physical limit’), while derivative with respect to θ: the off-diagonal direction d = z − z ′ supplies the additional d ∂ET (xβ , θ) degree of freedom needed to encode non-reciprocity. ET (xβ , θ) = , (59) dθ ∂θ Specializing this reconstruction to the overdamped (first- where we additionally assume that the cost function does order) regime relevant to relaxational neural dynamics yields not depend explicitly on the parameters θ. Substituting these the bilinear energy results into Eq. (57) in the limit of infinitesimal nudging (β → 0) recovers the fundamental relation given by Eq. (9). z + z′   H(z, z ′ , θ) = −(z − z ′ )⊤ F ,θ , (56) 2 D.2. Proof of Dyadic EP which is precisely Eq. (26). The symmetric midpoint m = We analyze now the stationary states of Dyadic EP by intro- (z + z ′ )/2 plays the role of the physical coordinate of the ducing the change of variables: doubled system, while d is the auxiliary direction along which non-reciprocity is stored. On the submanifold z = z ′ z + z′ m= , d = z − z′. (60) the coupling proportional to (z − z ′ ) vanishes identically 2 and both states evolve under the original field F , so the In these coordinates, the augmented energy HT becomes doubling leaves the on-shell physics unchanged. We refer the reader to (Scurria, 2026) for the full construction. HT (m, d, θ, β) = −d⊤ F (m, θ) + βC(m) (61) 14 Equilibrium Propagation for Non-Conservative Systems and the dynamics in Eq. (28) can be rewritten as: In Dyadic EP, we instead employ the single-phase update: 1 ∂H(z β , z ′β , θ)   dm ∂HT =− = F (m, θ), (62) ∆θ ∝ − (70) dt ∂d β ∂θ dd ∂HT ∂ =− = dT JF (m, θ) − β C(m). (63) This choice avoids the overhead of evolving two coupled dt ∂m ∂m equations in the extended space, which would be computa- β The stationary states (mβ , d ) are the solutions to: tionally equivalent to evolving four equations in the original space (two for +β and two for −β). Using Eq. (70), we F (mβ , θ) = 0, (64) evolve only one coupled equation for +β in the extended space; this corresponds to two equations in the original βT ∂ d JF (mβ , θ) − β C(mβ ) = 0. (65) space, thereby achieving the same computational complex- ∂m ity as AsymEP. Furthermore, this single-phase formulation suggests a pathway toward making the update local in time, This leads to the following observations: provided appropriate hardware is used to implement the 1) The stationary state of m is independent of β and coin- augmented phase. cides with the stationary state of the original system: Mathematically, these two approaches yield the same gradi- ent estimate because the equations for dβ are linear. Explic- z β + z ′β = mβ = m0 = x0 . (66) itly we have : 2 ∂H(z β , z ′β , θ) ∂F z β + z ′β   = −(z β − z ′β )⊤ ,θ 2) The Jacobian of the extended system defined in Eq. (26) ∂θ ∂θ 2 is invertible, provided JF is invertible. This is most evident  ⊤ ∂F 0  −1 from Eq. (63). = −β z ,θ JF⊤ (z 0 , θ) ∂θ 3) The stationary state value of d is given by:  ∂C 0  × (z ) , (71) β  −1 ∂C 0  ∂x d = β JF⊤ (m0 , θ) (x ) (67) ∂x where we have used Eqs. (66) and (67). Inspection of Eq. (71) confirms that, up to corrections of order β 2 , we 0 In particular, when β = 0, we have d = 0, which implies obtain exactly the same gradient as in AsymEP. that the free stationary states coincide: z 0 = z ′0 . 4) The cost at the stationary state of the extended system E. AsymEP versus Dyadic EP is equal to the cost at the stationary state of the original In this appendix, we demonstrate that Asymmetric Equilib- system: rium Propagation (AsymEP) emerges naturally as the first- D(m0 ) = C(x0 ). (68) order projection of the 2N -dimensional Dyadic Equilibrium Consequently, the gradients of the cost with respect to the Propagation onto a single N -dimensional state space. We parameters are identical. then formalize the physical trade-offs between the two ar- chitectures. Since both the original and extended systems, given respec- tively in Eq. (28) and Eq. (1-2), share the same cost at their E.1. AsymEP as the Linear Projection of Dyadic EP respective stationary states, and because the Jacobians of both models are invertible, applying EP update rule to the As established in Appendix D.2, transforming the 2N - extended system give the correct gradient estimate for the dimensional extended space (z, z ′ ) into the mean state ′ parameters θ of the original system. m = z+z 2 and the difference state d = z − z ′ exactly The final step of the proof is to establish the equivalence decouples the stationary dynamics. Because the stationary between the standard parameter update rule in Eq. (8) and state of m is the free state of the original system (mβ = x0 ), the modified rule used by Dyadic EP in Eq. (34). Indeed, if the cost function drives the difference variable to a stationary β we were to apply the standard update rule in the extended state d satisfying: space, the update would be: β ∂C 0 JF⊤ (x0 , θ)d = β (x ) (72) 1  β ′β ∂H(z , z , θ) ∂H(z , z −β ′−β , θ)  ∂x ∆θ ∝ − − . 2β ∂θ ∂θ To recover this exact error signal in an N -dimensional space, (69) we postulate a modified dynamical system FA (x) compris- 15 Equilibrium Propagation for Non-Conservative Systems ing the standard EP dynamics and a spatial correction Γ(x): F. Derivation of the Hopfield-like Energy ∂C In this section, we derive the explicit energy functional for FA (x) = F (x) − β (x) + Γ(x) (73) the Continuous Asymmetric Hopfield dynamics defined in ∂x Eq. (35). The force field is given by: Let ∆x = xβA − x0 denote the displacement from the free equilibrium. Expanding the stationarity condition F (x) = ρ′ (x) ⊙ (Jρ(x)) − x. (78) FA (xβA ) = 0 to first order around x0 yields: We omit external inputs J in for brevity, as they appear sym- ∂C 0 metrically in the Jacobian. The variational Hamiltonian is JF (x0 , θ)∆x − β (x ) + Γ(xβA ) ≈ 0 (74) defined as: ∂x z + z′ z + z′     To ensure the first-order displacement matches the Dyadic H(z, z ′ ) = −(z − z ′ )⊤ F + βC . β 2 2 EP error signal (i.e., ∆x ≈ d ), we substitute Eq. (72) into (79) the expansion: To analyze this expression, we introduce the midpoint m = z+z ′ Γ(xβA ) = JF⊤ (x0 , θ) − JF (x0 , θ) ∆x and the difference d = z − z ′ . Since the separation  (75) 2 between z and z ′ is induced solely by the nudging parameter = −2AJ (x0 , θ)(xβA − x0 ) (76) β, the difference scales as ∥d∥ ∼ O(β). We therefore This uniquely recovers the AsymEP augmented dynamics. neglect terms of order O(∥d∥3 ) (i.e., or equivalently O(β 3 )) Finally, to eliminate the O(β 2 ) error, AsymEP evaluates the as they do not contribute to the gradient of the cost. centered difference of two opposite nudges: The activation at the midpoint can be approximated as: dxA ρ(z) + ρ(z ′ ) x±β 0 A =x ±β + O(β 2 ) (77) ρ(m) = + O(∥d∥2 ). (80) dβ β=0 2 Similarly, the difference in activations is: Subtracting these states cancels the O(β 2 ) error, yielding 1 +β −β β 3 2 (xA − xA ) = d + O(β ), successfully recovering the ρ(z) − ρ(z ′ ) = ρ′ (m) ⊙ d + O(∥d∥3 ). (81) exact post-synaptic update term. Inverting this relation, we express the state difference as: E.2. Physical Trade-offs and the Extended Space z − z ′ = (ρ(z) − ρ(z ′ )) ⊙ ρ′ (m) + O(∥d∥3 ). (82) We can view AsymEP and Dyadic EP as a space-time trade- off of the same underlying physical optimization problem. We substitute these expansions into the interaction term AsymEP preserves the original N -dimensional state space of the Hamiltonian, Hint = −(z − z ′ )⊤ (ρ′ (m) ⊙ Jρ(m)). of the network at the cost of temporal non-locality. The sys- Applying the identity a⊤ (b ⊙ c) = (a ⊙ b)⊤ c, we obtain: tem must evolve sequentially, requiring physical memory ⊤ not only to store the free equilibrium x0 for the asymmet- Hint = − ((z − z ′ ) ⊙ ρ′ (m)) Jρ(m) ric correction, but also to store the successive stationary  ρ(z) + ρ(z ′ )  states required to evaluate the contrastive gradient update. ≈ −(ρ(z) − ρ(z ′ ))⊤ J . (83) 2 AsymEP thus serves as the direct, spatially minimal exten- sion of EP. Expanding the product gives: Dyadic EP provide a learning signal that is local in both 1h space (where z − z ′ encodes the gradient) and time (allow- Hint = − ρ(z)⊤ Jρ(z) + ρ(z)⊤ Jρ(z ′ ) 2 ing the nudged phases to execute in parallel) at the cost i of doubling the state space. In particular, capturing non- − ρ(z ′ )⊤ Jρ(z) − ρ(z ′ )⊤ Jρ(z ′ ) . (84) conservative forces in this extended space requires a spe- cific bilinear coupling, rather than a trivial superposition We decompose the connectivity matrix J into its symmetric of uncoupled subsystems. It can be seen as a blueprint for part S and antisymmetric part A. The first and last terms future neuromorphic hardware. simplify to ρ(z)⊤ Sρ(z). The cross terms satisfy: Ultimately, the reduction of Dyadic EP to AsymEP via the ρ(z)⊤ Jρ(z ′ ) − ρ(z ′ )⊤ Jρ(z) = ρ(z)⊤ (J − J ⊤ )ρ(z ′ ) variables m and d proves the universality of EP’s variational principle. = ρ(z)⊤ (2A)ρ(z ′ ). (85) 16 Equilibrium Propagation for Non-Conservative Systems Thus, the interaction term reduces to: The input parameters are then updated using the standard learning rule (21). In particular, the presynaptic term associ- 1 1 Hint = − ρ(z)⊤ Sρ(z) + ρ(z ′ )⊤ Sρ(z ′ ) ated with the input weights is given by, 2 2 − ρ(z)⊤ Aρ(z ′ ) + O(∥d∥3 ). (86) ∂Fi in = δik ρ′ (xi )ul . (93) ∂Jkl Finally, for the nudging term, we expand the cost function The presynaptic terms associated with the dynamical param- dyn around the midpoint: eters Jij depend on the experiment. 1 C(m) = (C(z) + C(z ′ )) + O(∥d∥2 ). (87) G.1. Symmetric Initialization 2 G.1.1. L EARNING RULES When multiplying by β, the remainder term becomes β · O(∥d∥2 ). Since ∥d∥ ∼ O(β), this remainder is of order For clarity, we write the learning rules for VF and AsymEP. O(β 3 ) and can be consistently discarded alongside the third- For the input weights, using (93), we have: order terms from the interaction expansion. 1 h +β i in ∆Jik ∝ (xi − x−β ′ 0 i )ρ (xi )uk , (94) Combining all these components, the final Hamiltonian is: 2β 1 1 while for the recurrent weight, we get: H(z, z ′ ) = − ρ(z)⊤ Sρ(z) + ρ(z ′ )⊤ Sρ(z ′ ) 2 2 1 h +β i 1 ∆Jijdyn ∝ (xi − x−β i )ρ′ 0 (xi )ρ(x 0 j ) . (95) − ρ(z) Aρ(z ) + (∥z∥2 − ∥z ′ ∥2 ) ⊤ ′ 2β 2 β ′ For EP, we have: + (C(z) + C(z )). (88) 2 1 h +β  i in ∆Jik ∝ ρ(xi ) − ρ(x−β i ) uk , (96) The saddle-point dynamics, given by Eq. 32, generated by 2β this Hamiltonian are: and for the recurrent weights: dz β ∂C 1 h +β = ρ′ (z) ⊙ (Sρ(z) + Aρ(z ′ )) − z − i , dyn −β −β dt 2 ∂z (89) ∆Jij ∝ ρ(xi )ρ(x+βj ) − ρ(xi )ρ(xj ) . (97) 2β ′ dz β ∂C = ρ′ (z ′ ) ⊙ (Sρ(z ′ ) + Aρ(z)) − z ′ + . (90) dt 2 ∂z ′ G.1.2. S UPPLEMENTARY N UMERICAL R ESULTS This system recovers the original continuous Hopfield dy- To complement Fig. 2, we report the evolution of the accu- namics when z = z ′ (assuming β = 0). racy of the three methods in Fig. 4. We consider a layered network with 50 hidden neurons. While this capacity is G. Experimental Details insufficient for state-of-the-art performance, it amplifies the difference in accuracy between models to aid visualization. As in the main text, the neuronal dynamics are governed by Models are trained for 20 epochs starting from a symmetric the vector field: configuration, the natural setting for both VF and EP. With   this initialization, AsymEP consistently outperforms the X dyn other methods and learns faster by exploiting the additional Fi = ρ′ (xi )  Jij ρ(xj ) + bi (u) − xi , (91) degrees of freedom of the asymmetric network. j where the input-dependent bias bi (u) is precomputed for G.2. Fixed Asymmetry Ratio each MNIST input u as: This section details the implementation for the fixed asym- X metry ratio experiments presented in Section 5.2, followed bi (u) = Jilin ul . (92) by complementary numerical results regarding learning l∈in speed and induced Jacobian asymmetry. This term projects the input space into the recurrent sub- G.2.1. L EARNING RULES space. The bias yields a diagonal contribution to the Jaco- bian JF = ∂F ∂x , and therefore does not contribute to the Parametrization and notation. To enforce a fixed asym- antisymmetric correction used in the augmented dynamics metry ratio, we explicitly parameterize the independent ele- Eq. (20) of AsymEP. ments of Eq. (38). We introduce two parameter vectors θS 17 Equilibrium Propagation for Non-Conservative Systems Parameter Sym. Init. / Feedforward Fixed rstr Fixed rstr & rin sec. 5.1 & 5.3 sec. 5.2 app. G.3 Learning Rate (Input-Hidden) 0.05 0.05 0.0125 Learning Rate (Hidden-Output) 0.01 0.01 0.0025 Time Step (Dynamics Integration) 0.5 0.3 0.3 Nudging Parameter (β) 0.5 0.5 0.5 Free-phase Steps (nfree ) 20 30 40 Nudged-phase Steps (nnudge ) 10 10 10 Number of Epochs 40 / 20 30 40 Batch Size 64 √64 √64 Scaling Parameter γ n.a. 60 60 Structure 784 - n.a. -10 784-50-10 all-to-all, 500 hid Activation function ρ tanh tanh tanh Initial Recurrent State s s ∼ U (−1, 1) s ∼ U(−1, 1) s ∼ U(−1, 1) Initial Parameters θ θ ∼ N (0, N1 ) θ ∼ N (0, N1 ) θ ∼ N (0, N1 ) Number of Runs (training + inference) 10 10 10 Table 3. Trained Model Hyperparameters on MNIST. N is the total number of neurons, U(−1, 1) is a uniform distribution, and N (µ, σ 2 ) is a Gaussian distribution. For the rstr parametrization, we choose more cautious hyperparameters for training and inference compared to the symmetric initialization, due to increasingly non-conservative and potentially oscillatory dynamics. elements of S̃, the full matrices are constructed as: S S̃ij = δij ξi + (1 − δij )θk(max(i,j),min(i,j)) , (99) A Ãij = ϵij θk(max(i,j),min(i,j)) , (100) where ϵij is the Levi-Civita symbol. The dynamical param- eters are then given by: dyn Jij = γ(cS S̃ij + cA Ãij ), (101) with normalization coefficients p 2 1 − rstr rstr cS = , cA = , (102) FS FA defined in terms of the Frobenius norms: v uN M uX X 2 Figure 4. Evolution of the mean accuracy and standard deviation F =t S ξ2 + 2i θS , k (103) (over 10 runs) during training on MNIST for AsymEP, EP, and VF. i=1 k=1 Models use 50 hidden neurons. v u M u X 2 FA = t2 θkA . (104) k=1 and θA of size M = Ndyn (Ndyn − 1)/2, which encode the off-diagonal elements of the symmetric and antisymmetric Presynaptic computation. The dependence of the nor- components S̃ and Ã, respectively. The correspondence malization coefficients on the parameters introduces addi- between matrix and vector indices is given by: tional regularization terms in the learning rule compared to the parameterization of (Scellier &Bengio, 2017). The (i − 1)(i − 2) gradients of the normalization coefficients are: k(i, j) = + j, (1 ≤ j < i ≤ Ndyn ) 2 ∂cS θkS ∂cS ξm (98) = −2cS 2, = −cS 2, (105) ∂θkS (FS ) ∂ξm (FS ) ∂cA θA where the condition j < i selects the strictly lower triangular A = −2cA k 2 . (106) elements. Introducing an additional vector ξ for the diagonal ∂θk (FA ) 18 Equilibrium Propagation for Non-Conservative Systems Parameter Comparison Dyn. 2 hidden layers 3 hidden layers sec. 5.4 sec. 5.4 sec. 5.4 Learning Rate (Input-Hidden) 0.0016 0.0013 0.6 Learning Rate (Hidden-Hidden) 0.0016 0.0013 0.6 Learning Rate (Hidden-Output) 0.0016 0.0013 0.6 Time Step (Dynamics Integration) 0.4 0.3 0.0075 Nudging Parameter (β) 0.3 0.5 0.20 Free-phase Steps (nfree ) 40 40 60 Nudged-phase Steps (nnudge ) 20 20 30 Number of Epochs 50 40 40 Batch Size 64 64 64 Layer Structure 784-500-200-10 784-500-500-10 784-500-500-500-10 Activation function ρ tanh tanh tanh Initial Recurrent State s s ∼ U(−1, 1) s ∼ U(−1, 1) s ∼ U (−1, 1) Initial Parameters θ θ ∼ N (0, N1 ) θ ∼ N (0, N1 ) θ ∼ N (0, N1 ) Number of Runs (training + inference) 10 10 10 Table 4. Trained Model Hyperparameters on Fashion-MNIST. N is the total number of neurons, U(−1, 1) is a uniform distribution, and N (µ, σ 2 ) is a Gaussian distribution. For the rstr parametrization, we choose more cautious hyperparameters for training and inference compared to the symmetric initialization, due to increasingly non-conservative and potentially oscillatory dynamics. Combining these with the derivatives of the matrices S̃ and (where p > q): Ã, we have: N θkA X  ∂Fi ′ ∂ S̃ij ∂ S̃ij = γc A ρ (xi ) −2 Ãij ρ(xj ) = δip δjq + δiq δjp , = δij δkj (107) ∂θkA (FA )2 j=1 ∂θkS ∂ξk  ∂ Ãij + δip ρ(xq ) − δiq ρ(xp ) . = δip δjq − δiq δjp , (108) ∂θkA (111) where k corresponds to the index pair (p, q) with p > q, as defined in Eq. (98). The full presynaptic terms are then: Initialization. To ensure the stability of the system, we initialize our parameters suchhthat the i variance of dynam- dyn • For the diagonal parameters ξm : ical parameters scales as Var Jij ∝ 1/Ndyn . This is a conservative choice for the layered architectures used in our dyn ∂Fi  ξm X N experiments, where many entries of Jij are zero. = γcS ρ′ (xi ) − S̃ij ρ(xj ) ∂ξm (FS )2 j=1 In practice, we initialize the parameter vectors θS , θA , and (109)  ξ with identical variances σ 2 . For large Ndyn , the expected + δim ρ(xm ) . Frobenius norms approximate to E[FS,A ] ≈ Ndyn σ. Conse- quently, the normalization coefficients become: p 2 • For the off-diagonal symmetric parameters θkS (where 1 − rstr rstr cS ≈ , cA ≈ . (112) p > q): Ndyn σ Ndyn σ N Since the symmetric and antisymmetric components are sta- θkS X  ∂Fi ′ = γc S ρ (x i ) −2 S̃ij ρ(xj ) tistically independent, the variance of the weights is derived ∂θkS (FS )2 j=1  as follows: + δip ρ(xq ) + δiq ρ(xp ) . • Diagonal elements (i = j): (110) 2 h i 1 − rstr Var Jiidyn = γ 2 c2S σ 2 ≈ γ 2 2 . (113) • For the off-diagonal antisymmetric parameters θkA Ndyn 19 Equilibrium Propagation for Non-Conservative Systems • Off-diagonal elements (i ̸= j): a zero-cost baseline (perfect prediction) during learning. Specifically, for each method and value of rstr , we calcu- h dyn i γ2 late the cumulative loss by summing the batch-averaged = γ 2 c2S + c2A σ 2 ≈ 2 ,  Var Jij (114) Ndyn costs of the first 5 epochs (out of 30, to avoid saturation h i effects), and reporting the mean and standard deviation over dyn To satisfy Var Jij ∝ 1/Ndyn , we set: 10 independent training runs. Mathematically, for each run: p γ= Ndyn (115)   5 NX X batches X C(x0 , u)  Note that by random matrix theory, diagonal elements do Cumul. Loss =  , |Bk | not affect stability in the large Ndyn limit. epoch=1 k=1 (x0 ,u)∈Bk (120) Potential Simplification. Although the parameterization where Bk represents the k-th batch, and |Bk | denotes the above is fully general, a simpler construction is possible number of examples in the batch. The parameters are up- by removing self-connections (ξ = 0) and enforcing identi- dated after each batch step; consequently, the free equilib- cal parameterization for the symmetric and antisymmetric rium x0 is inferred using the updated parameters and the components, i.e., θS = θA = θ. The matrix elements then current example u. become: S̃ij = (1 − δij )θk(max(i,j),min(i,j)) , (116) Ãij = ϵij θk(max(i,j),min(i,j)) . (117) In this case, the Frobenius norms are equal (FS = FA ), and we can omit the explicit normalization: q dyn 2 S̃ + r à . Jij = 1 − rstr ij str ij (118) For a parameter θk corresponding to indices (p, q) with p > q, the presynaptic term is given by: q  ∂Fi = ρ′ (xi ) 2 +r 1 − rstr str δip ρ(xq ) ∂θk q   (119) + 2 −r 1 − rstr δ ρ(x str iq p . ) While this parameterization works in simulations and keeps Figure 5. Cumulative loss as defined by (120) over the first 5 the number of parameters constant for all rstr , it constrains epochs of training, for different asymmetry ratios rstr . We compare the asymmetry to be “homogeneous”, by which we mean VF (orange) and AsymEP (blue), under two training regimes: that the asymmetry ratio is identical for every pair of neu- training only J in (dashed) or all parameters (solid). rons; hence, the network cannot learn to be symmetric in one region and antisymmetric in another. Therefore, we choose to explore the more general case of (38) in our experiments. In Fig 5, we observe that learning slows down for both al- gorithms when rstr ≳ 0.6. This behavior likely results from G.2.2. S UPPLEMENTARY N UMERICAL R ESULTS the increased difficulty of reaching a stationary state as the dynamics become strongly asymmetric. With a fixed num- To complement the results of Fig 3, we analyze the training ber of inference steps, incomplete convergence degrades the efficiency as a function of the asymmetry ratio rstr and in- accuracy of the gradient estimates, thereby slowing down vestigate the robustness of VF by monitoring the Jacobian the learning. Fig 5 shows that while VF can eventually asymmetry. achieve competitive accuracy, it is consistently slower than AsymEP as soon as asymmetry is introduced. Training efficiency. We first study the training efficiency of the two algorithms as a function of the asymmetry ra- tio rstr . Inspired by the related concept in (Cesa-Bianchi &Lugosi, 2006), we define the cumulative loss as the accu- Jacobian asymmetry. We next examine how the struc- mulated difference between the free equilibrium cost and tural asymmetry rstr is reflected in the Jacobian of the dy- 20 Equilibrium Propagation for Non-Conservative Systems namics (35), given by: ∂Fi dyn ′ = (1 − δij )ρ′ (xi )Jij ρ (xj ) ∂sj h i + δij ρ′ (xi )(Jiidyn ρ′ (xi )) + ρ′′ (xi )bi − 1 . (121) In our layered architecture, the self-connections are zero (Jiidyn = 0). For the following analysis, we neglect all diag- onal terms in the Jacobian (including external inputs and potential), since they do not contribute to the antisymmetric correction (20) and thus to the discrepancy between the per- formance of VF and AsymEP. Consequently, we define the following asymmetry ratio based solely on the off-diagonal Jacobian JF,off : Figure 6. Asymmetry ratio of the Jacobian rjac defined in equation ⊤ (122) after training for different asymmetry ratios rstr . We compare ∥JF,off − JF,off ∥F VF (orange) and AsymEP (blue), under two training regimes: rjac = , (122) training only J in (dashed) or all parameters (solid). ∥JF,off ∥F The results are presented in Fig 6. For each trained model Consequently, local stability requires the largest real eigen- and ratio rstr , we compute rjac averaged over the stationary value of the effective weight matrix to be strictly less than 1. states of the first batch (64 images) across 10 independent Assuming weights are initialized independently with vari- runs. We observe that when structural asymmetry is strong ance σ 2 , Girko’s circular law dictates that the eigenvalues and all parameters are trained, VF partially compensates for of√an asymmetric matrix uniformly populate a disk of radius the asymmetry by adjusting the neuronal states. This can be σ n in the complex plane. In contrast, imposing symmetry understood by rewriting the ratio as: forces the eigenvalues √ onto the real line, broadening the   spectral radius to 2σ n according to Wigner’s semicircle dyn dyn ⊤ ρ′ (xi ) Jij − (Jji ) ρ′ (xj ) law. As a result, asymmetric networks can stably accommo- F rjac = . (123) date larger variance in the weight initializations than their dyn ′ ρ′ (xi )Jij ρ (xj ) symmetric counterparts. F Compared to the structural asymmetry ratio in Eq. (37), Asymmetry nevertheless introduces imaginary eigenvalues a value of rjac < rstr indicates that the neuronal states ef- and, consequently, damped oscillations. To study this effect fectively dampen the structural asymmetry, rendering the experimentally in a controlled setting, we constrain the input dynamics more symmetric. This symmetrization of the Ja- projections J in . In the experiments of the main text, fixing cobian appears without imposing an additional symmetriza- the structural asymmetry ratio rstr still allowed AsymEP tion penalty and could be enhanced using the method of to reduce oscillations by aligning and increasing the input (Laborieux &Zenke, 2022). This mechanism likely explains projections J in , thereby adding stabilizing diagonal contri- the superior performance of ‘All (VF)’ compared to ‘J in butions to the Jacobian. To isolate the network’s ability to (VF)’ in Fig 3, as the former is able to use the additional suppress oscillations independently of the magnitude of the degrees of freedom to reduce the effective asymmetry at input drive, we further constrain the relative scale of J in and high rstr . J dyn by imposing ∥J in ∥F ∥J in ∥F G.3. Stability analysis with Fixed Asymmetry Ratio & rin = = , (124) ∥J dyn ∥F γ Constrained Inputs Projection where ∥J dyn ∥F = γ following Eq. (101). Defining unscaled A complete stability analysis of the non-conservative dy- input projections J˜in , we set namics trainable with AsymEP is beyond the scope of this work. Nevertheless, for the class of continuous Hopfield J˜in J in = rin γ (125) networks considered here, standard arguments from random ∥J˜in ∥F matrix theory suggest that asymmetry inherently improves asymptotic stability. G.3.1. L EARNING RULES In the dynamics defined by Eq. (91), the linear leak term Reusing the notations of the previous section, we write −xi shifts the spectrum of the system’s Jacobian by −1. Jilin = γcin J˜ilin with the normalization cin = rin /Fin , where 21 Equilibrium Propagation for Non-Conservative Systems Fin = ∥J˜in ∥F . Applying the chain rule yields: " # ∂Fi ′ J˜kl in X ˜in = γcin ρ (xi ) δik ul − 2 J um . (126) ∂ J˜kl in Fin m im And for γ we have: ∂Fi 1 = (Fi + xi ). (127) ∂γ γ G.3.2. S UPPLEMENTARY N UMERICAL R ESULTS Figure 7. Comparison of AsymEP and VF on a feedforward net- Table 5 reports a worst-case control experiment where the work. Test accuracy on MNIST is shown as a function of training structural asymmetry is fixed at rstr = 0.7 while the input epochs for a single-hidden-layer network with 20 neurons. Curves scale ratio rin is varied. The experiment uses an all-to-all report the mean and standard deviation over 10 runs. Best accura- architecture on MNIST (excluding direct input-to-output cies are 92.7% ± 0.5% (AsymEP) and 64.3% ± 2.0% (VF). connections). The output variance during extended infer- ence (steps 30-50) confirms that the system successfully learns to suppress oscillations even when rin is severely re- G.5. Advantages of Non-Conservatives Dynamics stricted. Any small residual oscillations can be mitigated by In Section 5.4, we compare three (non-)conservative dynam- time-averaging over the inference steps. ics under varying constraints. To further evaluate learning Finally, rin can be interpreted as a measure of the external speed, Table 6 reports network performance after a sin- signal magnitude relative to the internal recurrent dynamics. gle epoch. These results confirm our earlier observation: These results indicate that the system remains capable of AsymEP learns faster than VF. learning and stabilizing even under a low external input drive. Even when the input projection ∥J in ∥F is 100 times G.6. Feedforward CIFAR-10 Experiments smaller than the recurrent connections ∥J dyn ∥F , the network This appendix details the architecture and hyperparameters still achieves 36.34 ± 6.25% accuracy, which is well above of the deep feedforward experiments comparing backprop- chance level (∼ 10%). agation (BP), VF, AsymEP and Dyadic EP on CIFAR-10 (see subsection 5.5) G.4. Feedforward Network G.4.1. L EARNING RULES Architecture. We use a nine-layer convolutional network (denoted CNN9). The first eight layers are convolutional For clarity, we write the learning rules for VF and AsymEP with 3 × 3 kernels and zero-padding; spatial downsampling in a feedforward architecture with one hidden layer using is performed by strided convolutions (stride 2 on layers 2, 4, the notation of Section 5.3. For the input weights connecting 6, 8 and stride 1 otherwise), so no pooling is used. The chan- to the hidden layer, we get the usual formula: nel widths follow the sequence 3 → 64 → 64 → 128 → 128 → 256 → 256 → 512 → 512, reducing the spatial 1 h +β −β 0 i resolution from 32 × 32 to 2 × 2. The last layer is a fully in ∆Jik ∝ (hi − hi )ρ′ (hi )uk , (128) 2β connected readout mapping the 512 × 2 × 2 feature map to the 10 class logits. All hidden units use a ReLU non- while for the feedforward weights connecting the hidden to linearity. the output layer, we get: p Weights are initialized with the Kaiming scheme (σ = 2/fan-in) and biases at zero. 1 h +β 0 i ∆(Wh→o )ji ∝ (oj − o−β ′ 0 j )ρ (oj )ρ(hi ) . (129) Training setup. All methods are trained for 40 epochs 2β with batch size 64 and repeated over 5 seeds. Inputs are normalized per channel and augmented with random 32 × Note that EP is not applicable in this case. 32 crops (padding 4), random horizontal flips and Cutout (one 8 × 8 patch). Parameters are updated with SGD with G.4.2. S UPPLEMENTARY N UMERICAL R ESULTS momentum 0.9, weight decay 5 × 10−4 and gradient-norm In addition to the final accuracy reported in Sec. 5.3, we clipping at 1, under a cosine learning-rate schedule decaying show in Fig. 7 the evolution of the accuracy over 20 epochs from 3.5 × 10−2 to 2 × 10−4 . Test accuracy is reported for AsymEP and VF. on an exponential moving average of the weights (decay 22 Equilibrium Propagation for Non-Conservative Systems Table 5. Output variance and final test accuracy on MNIST (%) across different values of rin with rstr = 0.7. (mean ± std over 10 runs) (500 hiddens, all-to-all, no input-output). Output variance Test Acc. (%) rin Untrained Epoch 80 Epoch 80 0.01 (3.38 ± 0.90) × 10−4 (5.22 ± 2.34) × 10−5 36.34 ± 6.25 0.10 (2.33 ± 0.48) × 10−4 (1.39 ± 0.17) × 10−4 90.54 ± 0.19 0.50 (1.34 ± 0.32) × 10−5 (1.06 ± 0.25) × 10−6 94.96 ± 0.10 1.00 (6.27 ± 1.24) × 10−7 (1.75 ± 0.50) × 10−8 96.30 ± 0.09 Table 6. Test accuracy on Fashion-MNIST (%) at Epoch 1 (mean ± std 10 runs). The table compares three classes of network dynamics: Continuous Hopfield (CH), Predictive Coding (PC), and Standard dynamics. Each is evaluated under three connec- ⊤ tivity structures: Asymmetric (Asym, Bk ̸= Wk+1 ), Symmet- ⊤ ric/conservative (Sym, Bk = Wk+1 ), and Feedforward (Feedfor, Bk = 0). EP AsymEP VF Asym - 74.91 ± 0.45 48.98 ± 4.09 CH Feedfor - 74.36 ± 0.29 48.84 ± 3.46 Sym 74.57 ± 0.43 - - Asym - 77.83 ± 0.47 57.75 ± 3.37 PC Sym 76.23 ± 0.39 - - Asym - 76.87 ± 0.51 61.50 ± 4.06 Standard Feedfor - 77.92 ± 0.51 63.98 ± 0.73 0.9995). The targets are smoothed (ε = 0.1), which for the EP methods amounts to nudging toward the smoothed one-hot vector. Relaxation hyperparameters. The four methods differ only in the gradient estimator: BP uses automatic differ- entiation, while the EP-based methods contrast stationary states of the corresponding relaxation dynamics. VF uses an integration step η = 1.0, nudging β = 0.1, and at most K = 1000 relaxation steps with an early-stopping threshold of 9 × 10−6 on the mean state update. Dyadic EP uses the same settings except for a nudging strength β = 0.1. AsymEP uses a smaller step η = 0.5, nudging β = 0.1, and up to K = 250 relaxation steps with a threshold of 1 × 10−4 . 23