diff options
Diffstat (limited to 'refs/scurria_2602.03670v2.txt')
| -rw-r--r-- | refs/scurria_2602.03670v2.txt | 3311 |
1 files changed, 3311 insertions, 0 deletions
diff --git a/refs/scurria_2602.03670v2.txt b/refs/scurria_2602.03670v2.txt new file mode 100644 index 0000000..d9eba6a --- /dev/null +++ b/refs/scurria_2602.03670v2.txt @@ -0,0 +1,3311 @@ +Equilibrium Propagation for Non-Conservative Systems + +Antonino Emanuele Scurria 1 Dimitri Vanden Abeele 1 Bortolo Matteo Mognetti 2 Serge Massar 1 + +arXiv:2602.03670v2 [cs.LG] 1 Jun 2026 + +Abstract + +from inference, the transmission of nonlocal error signals, +and synchronous layer-wise computations with explicit gradient storage. These constraints have no clear analog in +physical systems, making backpropagation challenging to +implement in neuromorphic or analog hardware. Consequently, understanding how credit assignment can instead +emerge from intrinsic system dynamics, through local interactions and continuous relaxation, is a central question in +neuroscience and machine learning. + +Equilibrium Propagation (EP) is a physicsinspired learning algorithm that uses stationary +states of a dynamical system both for inference +and learning. In its original formulation it is +limited to conservative systems, i.e. to dynamics which derive from an energy function. Given +their applications, it is important to extend EP +to non-conservative systems, i.e. systems with +non-reciprocal interactions. Previous attempts to +generalize EP to such systems failed to compute +the exact gradient of the cost function. Here we +propose a framework that extends EP to arbitrary +non-conservative systems, including feedforward +networks. We keep the key property of equilibrium propagation, namely the use of stationary +states both for inference and learning. However, +we modify the dynamics in the learning phase by +a term proportional to the non-reciprocal part of +the interaction so as to obtain the exact gradient +of the cost function. This algorithm can also be +derived using a variational formulation that generates the learning dynamics through an energy +function defined over an augmented state space. +Numerical experiments show that this algorithm +achieves better performance and learns faster than +previous proposals. + +Equilibrium Propagation (EP) (Scellier &Bengio, 2017) +represents one of the most promising advances in this direction. It formulates supervised learning as a contrast between +two stationary states of a dynamical system: a ‘free’ phase +where the system evolves autonomously, and a ‘nudged’ +phase where outputs are weakly pushed toward their targets. +The local change in neural states between these phases recovers the exact gradient of the cost function with respect to +parameters. This enables spatially local learning exploiting +the continuous relaxation of the system without a distinct +backward circuit or explicit weight transport. +Since its introduction, several works have sought to improve the practicality and biological realism of EP. Algorithmic adaptations include enforcing temporal locality to +avoid state storage (Ernoult et al., 2020; Falk et al., 2025), +deriving agnostic updates for black-box energies (Scellier et al., 2022), and substituting nudging with clamping +(Stern et al., 2021). Theoretically, the framework has been +extended to stochastic systems (Scellier &Bengio, 2017; +Massar &Mognetti, 2025) and Lagrangian dynamics for +time-varying inputs (Massar, 2025; Pourcel et al., 2025; +Berneman &Hexner, 2025). In parallel, simulations have +explored suitable substrates, ranging from spiking (Martin et al., 2021; O’Connor et al., 2019) and resistive networks (Kendall et al., 2020) to coupled oscillators (Wang +et al., 2024; Rageau &Grollier, 2025), as well as quantum +systems (Wanjura &Marquardt, 2025; Massar &Mognetti, +2025; Scellier, 2024). Experimental realizations have been +demonstrated in memristor crossbars (Yi et al., 2023), selfadjusting electrical circuits (Dillavou et al., 2022; 2024), +elastic networks (Altman et al., 2024), and classical Ising +models trained on quantum annealers (Laydevant et al., +2024). + +1. Introduction +Standard neural network optimization relies on error backpropagation, an algorithm whose computational mechanism +is difficult to reconcile with biological (Crick, 1989) and +physical implementations (Indiveri &Liu, 2015). Specifically, backpropagation requires a backward pass distinct +1 +Laboratoire d’Information Quantique (LIQ) CP224, Université +libre de Bruxelles (ULB), Av. F. D. Roosevelt 50, 1050 Bruxelles, +Belgium 2 Interdisciplinary Center for Nonlinear Phenomena and +Complex Systems CP231, Université libre de Bruxelles (ULB), Av. +F. D. Roosevelt 50, 1050 Bruxelles, Belgium. Correspondence to: +Antonino Emanuele Scurria <antonino.scurria@ulb.be>. + +Proceedings of the 43 rd International Conference on Machine +Learning, Seoul, South Korea. PMLR 306, 2026. Copyright 2026 +by the author(s). + +Despite these recent developments and the theoretical el- + +1 + +Equilibrium Propagation for Non-Conservative Systems + +egance of EP, its standard formulation remains restricted +to conservative systems. In these systems, dynamics are +derived from an energy function, which inherently enforces +symmetry (e.g., symmetric synaptic connections Jij = Jji ) +through the action-reaction principle. This constraint precludes the use of EP in a broad class of models characterized +by non-conservative forces. This includes the feedforward +architectures dominant in modern AI, biological circuits, +as well as physical systems that reach stationary states far +from thermodynamic equilibrium, such as nonlinear optical +systems driven by external lasers (Cin et al., 2025), optoelectronic systems (Kalinin et al., 2025), exciton-polariton +condensates (Sajnok &Matuszewski, 2025), active metamaterials (Brandenbourger et al., 2019) and active colloids +(Bishop et al., 2023; Osat &Golestanian, 2023) (see (Bowick +et al., 2022) for a review). + +a framework where the original dynamics serve for inference, while a new augmented dynamic is used to compute +gradients of the cost Eq. (2). In this augmented phase, the +output neurons are nudged towards their targets (as in standard EP), while a local corrective term – proportional to the +antisymmetric part of the Jacobian at the free equilibrium +∂ +JF (x0 , θ, u) = ∂x +F (x0 , θ, u) – is added to the forces. The +exact gradients of the cost with respect to parameters are +then obtained by contrasting stationary states of the augmented system. +Second, we introduce Dyadic EP, a ‘variational’ approach +to learning in non-conservative systems. This method involves doubling the number of variables in the system’s +state space and subsequently introducing a new energy function in this extended space. This approach takes advantage +of the extended space to execute the positive and negative +nudging phases in parallel, recovering the same computational cost as AsymEP. Derived from first principles, this +approach is inspired by established methods for mapping +dissipative dynamical systems onto conservative ones by +doubling the degrees of freedom (Bateman, 1931; Galley, +2013; Aykroyd et al., 2025). A more comprehensive study +of the theoretical framework and its application to feedforward networks can be found in (Scurria, 2026). Our method +is related to the Dual Propagation algorithm (Høier et al., +2023; Høier &Zach, 2023; 2024) and constitutes an independent, first-principles generalization of Dyadic Learning +(Nest &Høier; Høier et al., 2024)—previously limited to +Hopfield networks—to arbitrary force fields. + +Formally, we consider a dynamical system governed by +a non-reciprocal force field F (x, θ, u), which relaxes to a +stationary configuration x0 satisfying: +F (x0 , θ, u) = 0, + +(1) + +where x represents the state variables, θ the learnable parameters and u the static input. Our goal, given a target y(u), is +to compute the gradient of the cost function C(x0 , y) at this +equilibrium, +dC 0 +(x , y), +(2) +dθ +and update θ to minimize the cost. +Previous attempts to extend EP to non-conservative dynamics include the Vector Field (VF) algorithm (Scellier et al., +2018). However, as noted by the authors, this method provides an unbiased gradient of the cost Eq. (2) only in the +conservative case. To mitigate this, (Laborieux &Zenke, +2024) proposed adding a penalty to keep the Jacobian close +to symmetry, essentially forcing the system to be as conservative as possible. Alternative methods related to VF, +which similarly do not compute the exact gradient, were +proposed in (Farinha et al., 2020; Costa &Santos, 2025) and +for specific systems in simulation (Cin et al., 2025; Sajnok +&Matuszewski, 2025). + +Third, we validate our framework on MNIST (LeCun, 1998), +Fashion-MNIST, and CIFAR-10. In continuous Hopfield +networks initialized with symmetric connection matrices, +AsymEP achieves better accuracy and learns faster than +EP and VF. Additionally, when we constrain the network +to have a strong degree of structural asymmetry, in which +case EP is inapplicable, AsymEP outperforms VF. Finally, +when we restrict connections to a feedforward structure, our +algorithm effectively trains all parameters; in contrast, VF +is limited to training the last layer, acting essentially as an +Extreme Learning Machine (Huang et al., 2006; Wang et al., +2022) with poor performance. + +Conversely, generalizations of backpropagation can handle +non-reciprocal forces and compute the exact gradient of +the cost Eq. (2) but inherit the same challenges in physical +implementations. For instance, Backpropagation Through +Time (Werbos, 1990) unfolds the network in time to apply standard backpropagation, Recurrent Backpropagation +(Almeida, 1990; Pineda, 1987) avoids this memory requirement but still requires a specific circuit to propagate errors, +and the continuous Adjoint Method (Chen et al., 2018) additionally requires integrating the dynamics backward in time +which is not physically possible for a dissipative system. + +In summary, this theoretical work proposes two generalizations of EP beyond conservative systems to arbitrary differentiable dynamics that compute in their stationary states. + +2. Equilibrium Propagation Overview +2.1. Conservative Systems +We first review standard Equilibrium Propagation (EP) +(Scellier &Bengio, 2017). We consider a network described +by an energy function E(x, θ, u), such that the force field is + +In this paper, we first propose Asymmetric EP (AsymEP), +2 + +Equilibrium Propagation for Non-Conservative Systems + +derived from the potential E: +FE (x, θ, u) = − + +∂ +E(x, θ, u). +∂x + +stationary point, i.e., that Eq. (7) holds. Second, EP implic∂ +itly assumes that the Jacobian JE (x0 , u) = ∂x +FE (x0 , u) is +invertible. In this work, we assume this condition holds and +will not state it explicitly hereafter. Third, for simplicity, +we omit the dependency on the input u and target y in the +following equations. + +(3) + +0 +The objective is to compute the total gradient dC +dθ (x , y) of a +(quadratic) cost function C(x, y) evaluated at the minimum +energy configuration of the system. This free equilibrium +denoted x0 (which depend implicitly in θ and u), satisfies +the stationarity condition: + +∂ +− E(x0 , θ, u) = 0. +∂x + +2.2. Vector Field +The Vector Field (VF) algorithm, introduced in (Scellier +et al., 2018), is an early attempt to adapt EP to nonreciprocal forces. This method relies on the observation +that, for conservative systems, linearizing the right-hand +side of Eq. (9) around the equilibrium point x0 yields + +(4) + +To compute gradients, we introduce the augmented energy +functional: +ET (x, θ, β, u, y) = E(x, θ, u) + βC(x, y), + +1 +β→0 2β + +(5) + +lim + +where β is a scalar nudging parameter. The stationary configuration of this augmented system is obtained by integrating +the dynamics +∂ET (x, θ, β, u) +dx +=− +, +dt +∂x + +where FE = −∂x E(x, θ) is the conservative force. It is +therefore tempting to use the right-hand side of Eq. (10) for +parameter updates of non-conservative systems, for which +no energy function E exists. + +(6) + +until the energy minimum is reached. This new fixed point +xβ , called nudged equilibrium, satisfies: +∂E(xβ , θ, u) +∂C(xβ , y) ++β += 0. +∂x +∂x + +The VF algorithm adopts precisely this approach. It uses +the nudged counterpart of Eq. (7), + +(7) + +F (xβ , θ) − β + +The training procedure, as improved in (Laborieux et al., +2021), uses two nudged phases with factors ±β (with +β ̸= 0). Starting from x0 , the system relaxes to two +nearby perturbed equilibria, x+β and x−β . The displacement x+β − x−β is then used to compute the parameter +update in the learning rule: + + +1 ∂E(xβ , θ, u) ∂E(x−β , θ, u) +− +, (8) +∆θ = −ϵ +2β +∂θ +∂θ + +∂C β +(x ) = 0, +∂x + +(11) + +in conjunction with the learning rule Eq. (10): + +∆θ = ϵ + +∂F 0 +(x , θ) +∂θ + +⊤ β + +x − x−β +. +2β + +(12) + +However, as noted in (Scellier et al., 2018), Eq. (12) does +0 +not align with the true gradient dC +dθ (x ) and is exact only if +the force is conservative. To see this, let JF (x, θ) denote +the Jacobian of the vector field F (x, θ) (in components +i (x,θ) +(JF (x, θ))ij = ∂F∂x +). Differentiating the equilibrium +j +0 +condition F (x , θ) = 0 with respect to θ gives + +where ϵ > 0 is the learning rate. The theoretical foundation +of EP is the result that, in the limβ→0 of Eq. (8), we get: +dC(x0 , y) +d ∂E(xβ , θ, u) += +, +dθ +dβ +∂θ + + +∂E(xβ , θ) ∂E(x−β , θ) +− +∂θ +∂θ + +⊤ β + (10) +x − x−β +∂FE 0 +(x , θ) += lim − +, +β→0 +∂θ +2β + + + +(9) + +JF (x0 , θ) + +see Appendix D.1. The error of the above method is O(β 2 ). +This error can be further reduced using holomorphic equilibrium propagation (Laborieux &Zenke, 2022). + +dx0 +∂F 0 ++ +(x , θ) = 0. +dθ +∂θ + +(13) + +Consequently, the exact gradient of the cost is + +Thus, EP recovers the exact gradient of the cost function +using only local computations. In this manner, learning +implements gradient descent without an explicit backward +pass, and credit assignment is realized through the system’s +intrinsic relaxation dynamics. + +⊤ + +dx0 ∂C 0 +dC 0 +(x ) = +(x ) +dθ +dθ ∂x + +⊤ + +−1 ∂C 0 +∂F 0 +⊤ 0 +=− +(x , θ) +JF (x , θ) +(x ) . +∂θ +∂x +| +{z +}| +{z +} + +Three remarks can be made at this point. First, EP does not +require the system to be at an energy minimum, but only at a + +pre-synaptic + +post-synaptic + +(14) +3 + +Equilibrium Propagation for Non-Conservative Systems + +Algorithm 1 Asymmetric EP (AsymEP) + +The terms ’pre-synaptic’ and ’post-synaptic’ in Eq. (14) +are used by analogy with neuronal transmission: the presynaptic factor captures the local influence of θ on the force +F , while the post-synaptic factor is the sensitivity of the +cost to state perturbations. + +1: Inputs: Force field F (x, θ), cost function C(x), nudg- + +ing parameter β, learning rate ϵ. +2: repeat +3: +1. Free Phase: Evolve to stationary state +4: +Evolve the system dynamics +5: + +If instead we differentiate the nudged equilibrium condition +in Eq. (11) with respect to β and evaluate at β = 0, we +obtain +β + +JF (x0 , θ) + +dx +dβ + +− +β=0 + +∂C 0 +(x ) = 0, +∂x + +dx += F (x, θ), +dt + +(15) + +6: +7: +8: +9: + +which gives +−1 ∂C 0 +dxβ += JF (x0 , θ) +(x , y). +dβ β=0 +∂x + +(17) + +until convergence to the stationary state x0 . +2. Jacobian Decomposition +Compute the Jacobian at equilibrium: +∂F 0 +(x , θ), +(18) +∂x +and decompose it in its antisymmetric part: +JF (x0 , θ) = + +(16) +10: +11: + +The right-hand side of Eq. (16) represents the effective postsynaptic term used by the VF algorithm (Eq. 12). Comparing this with the exact post-synaptic term derived in Eq. (14), +we see that they coincide only if JF = JF⊤ , i.e., only if the +system is conservative. + +AJ (x0 , θ) = 12 (JF (x0 , θ) − JF (x0 , θ)⊤ ). (19) +12: +13: +14: + +Now, let SJ (x0 , θ) and AJ (x0 , θ) denote the symmetric +and antisymmetric parts of the Jacobian at the free (unnudged) equilibrium, respectively. Then, we show in Appendix A that the gradient error increases with the spectral +−1 +radius of SJ (x0 , θ) +AJ (x0 , θ). Consequently, large +antisymmetric contributions degrade the gradient estimation, confirming empirical observations in the Appendix of +(Ernoult et al., 2020). In fact, in the pathological limit where +the Jacobian would be purely antisymmetric SJ (x0 , θ) = 0, +the update of VF gives the negative of the true gradient, +maximizing the cost rather than minimizing it. + +15: +16: +17: +18: + +3. Nudged Phase: Augmented Dynamics +Integrate the dynamics twice starting from x0 +dx +∂C += F (x, θ) − β +(x) − 2AJ (x0 , θ) (x − x0 ), +dt +∂x +(20) +until convergence to two new stationary states +x±β +A . +4. Parameter Update +Update the parameters according to: + +∆θ = ϵ + +3. Asymmetric EP + +⊤ +∂F 0 +(x , θ) +∂θ + +xβA − x−β +A +2β + +! +. + +(21) + +19: until convergence of θ +20: Output: Optimized parameters θ. + +Here, we introduce Asymmetric EP (AsymEP), see Algorithm 1, which removes the gradient estimate error inherent +to VF by adding a local correction term to the augmented +inference dynamics. The new nudged equilibrium xβA satisfies: +∂C β +F (xβA , θ) − β +(x ) − 2AJ (x0 , θ) (xβA − x0 ) = 0, (22) +∂x A + +where JFA (x, θ) is the Jacobian of the modified dynamical +system Eq. (20). At the equilibrium point x0 , JFA is equal +to the transpose of the original Jacobian: +JFA (x0 , θ) + +As in VF, we then obtain two perturbed states x±β +A for opposite nudging ±β and apply the contrastive learning rule +of Eq. (12). + += + +JF (x0 , θ) − 2AJ (x0 , θ) + += + +SJ (x0 , θ) − AJ (x0 , θ) + += + +JF⊤ (x0 , θ). + +(24) + +We now show that AsymEP gives rise to the correct learning +rule, i.e. that right-hand side of Eq. (21) is proportional to +0 +the gradient of the cost function dC +dθ (x ) at the equilibrium +0 +point x (Eq. 14). To this end, note that the same reasoning +leading to Eq. (16) leads to + +where we have used the decomposition Eq. (44) of the original Jacobian J into its symmetric and antisymmetric components. Therefore, the left hand side of Eq. (23) is equal to +the true post-synaptic term + +−1 ∂C 0 +dxβA += JFA (x0 , θ) +(x ). +dβ β=0 +∂x + +−1 ∂C 0 +dxβA += JF⊤ (x0 , θ) +(x ), +dβ β=0 +∂x + +(23) +4 + +(25) + +Equilibrium Propagation for Non-Conservative Systems + +until a stationary point (z β , z ′β ) is reached. Upon convergence, we follow the standard EP paradigm in using the +difference z β − z ′β to compute the post-synaptic term. Un′ +′ +der the change of variables m = z+z +2 and d = z − z , we +prove in Appendix D that m follows the original dynamics +F (ensuring valid inference), while d relaxes to a "physical" +error signal proportional to the cost gradient. + +which, using Eq. (14), proves the result. Additionally, although implied by the equality with the true gradient, we +explicitly demonstrate the equivalence of the gradient estimates obtained by AsymEP and Backpropagation Through +Time in Appendix B following (Ernoult et al., 2019). +Note that the corrective term −2AJ (x0 , θ)(x − x0 ) in +Eq. (20) is spatially local: AJ vanishes for unconnected +neurons, and (x − x0 ) is available at the synapse given the +memory mechanism already required by Eq. (12). This +correction can create backward connections (Section 5.3). +However, in physical realizations, both feedforward and +feedback connections must be physically present, though +feedback may be deactivated during inference. + +It is important to notice that while Dyadic EP introduces a +distinct formulation, it remains consistent with the general +theoretical setting of EP and matches the computational +cost of AsymEP. Note also that we start the evolution of +the free phase (β = 0) with the identical initial condition +for z and z ′ , (i.e., d = 0). This guarantees that integrating Eq. (32) leads to a symmetric stationary point where +z 0 = z ′0 . Finally, we underline that the modified variational update rule in Eq. (34) is equivalent to the standard +symmetric EP update rule in Eq. (8) (see Appendix D). + +4. Dyadic EP +We now introduce Dyadic EP (Algorithm 2), a variational +algorithm that computes the exact cost gradient in the limit +of infinitesimal nudging. It maps the original n-variable +dynamics F (x, θ) onto a 2n-variable system (z, z ′ ) defined +by an energy H(z, z ′ , θ) and cost D(z, z ′ ). We show in +Appendix E that AsymEP can be seen as the first-order +projection of Dyadic EP onto the original n-dimensional +state space. + +Now, to make this concrete, consider a continuous Hopfield +network (see also Eq. (35)) with an asymmetric connection +matrix J. After some calculations (see Appendix F), the +augmented energy of the system can be re-expressed as: +1 +1 +HT = − ρ(z)⊤ Sρ(z) + ρ(z ′ )⊤ Sρ(z ′ ) − ρ(z)⊤ Aρ(z ′ ) +2 +2 +1 +β +2 +′ 2 ++ (∥z∥ − ∥z ∥ ) + (C(z, y) + C(z ′ , y)) , +2 +2 +(29) +where S and A are the symmetric and antisymmetric parts +of J, respectively and ρ is an element-wise non-linearity. +An interesting analogy can be drawn with standard learning +rules in discrete Hopfield networks (Hopfield, 1982). For +a sequence of binary memories {ξ 1 , . . . , ξ m } where ξ µ ∈ +corresponds to the standard autoassociative +{−1, 1}n , S P +Hebbian rule µ ξ µ (ξ µ )⊤ , creating stable attractors at each +pattern. In contrast, A corresponds to the heteroassociative +rule (e.g., a cycle between ξ µ and ξ ν given by ξ ν (ξ µ )⊤ − +ξ µ (ξ ν )⊤ ), encoding transitions between patterns. + +The new system is defined by the energy H and cost function +D, given in terms of F and C by: + + +z + z′ +H(z, z ′ , θ) = −(z − z ′ )⊤ F +,θ , +2 + + +′ +z+z +, +(26) +D(z, z ′ ) = C +2 +where z, z ′ ∈ Rn . In order to learn, we introduce the augmented energy +HT (z, z ′ , θ, β) = H(z, z ′ , θ) + βD(z, z ′ ). + +(27) + +The equilibrium configuration corresponds to a saddle point +of HT , where z minimizes and z ′ maximizes the energy. +This poses no issue for EP, which requires only that the +joint state (z, z ′ ) reaches a stationary state. Although this +min-maximization can be interpreted as z evolving forward +and z ′ backward in time, in practice they evolve forward +simultaneously, as we integrate the coupled equations: + + +dz +∂HT +z + z′ +=− +=F +,θ +dt +2 +∂z +⊤ + + +z − z′ +∂F +β ∂C z + z ′ ++ +− +, +2 +∂z z+z′ +2 ∂z +2 +2 + + +dz ′ +∂HT +z + z′ +=+ +=F +,θ +′ +dt +2 +∂z + + + +′ ⊤ +z−z +∂F +β ∂C z + z ′ +− ++ +, +2 +∂z ′ z+z′ +2 ∂z ′ +2 +2 +(28) + +For this specific energy, the update rule given by Eq. (34) +can be re-expressed as: + +⊤ +1 +∆J ∝ − +ρ(z ′β ) − ρ(z β ) ρ(z ′β ) + ρ(z β ) . (30) +2β +In the limit β → 0, this gives: +! +β +d +∆J ∝ +⊙ ρ′ (m)ρ(m)⊤ . +β + +(31) + +matching the learning rule in (Pineda, 1987), with +β +limβ→0 dβ being the error signal. + +5. Numerical Experiments +In this section, we numerically validate AsymEP (Algorithm 1). The neuronal dynamics follows the one introduced +5 + +Equilibrium Propagation for Non-Conservative Systems + +where ∥ · ∥F denotes the Frobenius norm. Note that this +metric does not capture the asymmetry of the Jacobian, +which depends on the state x. + +Algorithm 2 Dyadic EP +1: Inputs: Force field F (x, θ), cost function C(x, y), + +nudging parameter β, learning rate ϵ +2: repeat +3: +1. Free Phase: Evolve to stationary state +4: +Evolve the system dynamics, starting from identi- + +For numerical experiments, we restricted the network to a +layered architecture with a single hidden layer to facilitate +comparison with prior work. Accordingly, J in contains +only input-to-hidden connections, while J dyn is block offdiagonal, encoding bidirectional interactions between the +hidden and output layers. Both J in and J dyn are trained. + +cal initial conditions z(0) = z ′ (0) = z0 , +5: + +6: +7: +8: + +dz +∂H +=− +, +dt +∂z + +11: +12: +13: + +(32) + +We first use MNIST (LeCun, 1998) (60k train, 10k test) +followed by Fashion-MNIST to validate AsymEP, and then +we further validate AsymEP and Dyadic EP by comparing +them to Backpropagation on a convolutional feedforward, +with CIFAR-10. Inputs are normalized using min-max to +[−1, 1] and targets are one-hot encoded in {−1, 1}. All +hyperparameters are detailed in Appendix G, along with +additional details and numerical results. + +until stationary states z 0 , z ′0 are reached. +2. Nudged Equilibrium +Evolve the system dynamics, starting from the +solution of the free phase z 0 = z ′0 : + +9: + +10: + +dz ′ +∂H +=+ ′, +dt +∂z + +dz +∂HT +=− +, +dt +∂z + +dz ′ +∂HT +, +=+ +dt +∂z ′ + +(33) + +until two nudged stationary states z β , z ′β are +reached. +3. Parameter Update +Update the parameters according to: + + +1 ∂H(z β , z ′β , θ) +∆θ = −ϵ +(34) +β +∂θ + +5.1. Symmetric Initialization +We start by comparing AsymEP with standard EP and +VF. All algorithms are initialized with an identical symmetric matrix J dyn . EP maintains this symmetry throughout training, while VF and AsymEP induce asymmetry in +J dyn . Since EP and VF already achieve strong performance +on MNIST, the purpose of this experiment is to validate +AsymEP and compare it against EP and VF rather than +outperform the state of the art. + +14: until convergence of θ +15: Output: Optimized parameters θ. + +in (Scellier &Bengio, 2017), and is generalized to allow +for non-reciprocal forces as in (Scellier et al., 2018). For +clarity, we express the forces in a form that explicitly separates the contributions of the external input and the recurrent +interactions: + +F (x) = ρ′ (x) ⊙ J in u + J dyn ρ(x) − x, +(35) + +Figure 1 compares the three algorithms as a function of +hidden-layer dimension after 1 and 20 epochs. AsymEP +consistently outperforms the baselines, suggesting it learns +faster and better. +Figure 2 studies the evolution of the asymmetry ratio rstr . +The results are reported for 50 hidden neurons. As expected, +EP preserves the initial weight symmetry. In contrast, VF +and AsymEP induce non-trivial evolution of rstr following +two distinct patterns, resulting in three distinct network +configurations. A complementary figure is available in Appendix G.1. + +where u ∈ RNin denotes the input and x ∈ RNdyn the neuronal state, comprising both hidden and output units. The +matrices J in ∈ RNdyn ×Nin and J dyn ∈ RNdyn ×Ndyn define the +input and recurrent connectivity, respectively. The activation +function ρ(·) is taken to be the hyperbolic tangent, applied +element-wise. + +5.2. Fixed Asymmetry Ratio + +If J dyn is symmetric, we can define the energy: +E(x) = + +While the previous section focused on networks compatible +with all three algorithms (EP, VF, AsymEP), we now turn +to architectures with strong structural asymmetry. In this +regime, EP is inapplicable by construction, and, as we show, +VF performs poorly, contrary to AsymEP which remains +effective. + +1 +1 +∥x∥2 − ρ(x)⊤ J dyn ρ(x) − ρ(x)⊤ J in u, (36) +2 +2 + +which is identical to that of (Scellier &Bengio, 2017), provided that the input neurons are activated as ρ(u). +Equation (35) naturally motivates a quantitative measure of +structural asymmetry rstr , defined as: + +To this end, we consider a class of networks where the +asymmetry ratio rstr defined in Eq. (37) is kept fixed. Let S̃ +and à be arbitrary symmetric and antisymmetric matrices +in RNdyn ×Ndyn respectively. We enforce a fixed rstr via the + +⊤ + +rstr = + +∥(J dyn − J dyn )/2∥F +, +∥J dyn ∥F + +(37) +6 + +Equilibrium Propagation for Non-Conservative Systems + +where γ ∈ R is a learnable global scale. +Using VF and AsymEP, we train a layered network with one +hidden layer of 50 neurons (in which case S̃ and à are block +off-diagonal) for different values of rstr to investigate the +impact of structural asymmetry. We compare two training +regimes: training only the input weights J in (and the scale +γ), versus training all parameters including J dyn . The first +regime trains only the external forces from the input ρ′ (x) ⊙ +J in u (which correspond to a symmetric contribution in the +Jacobian) applied to our non-conservative system, while +the second additionally trains J dyn and therefore the nonsymmetric part of the Jacobian directly. + +(a) Results after one epoch. + +Figure 3 summarizes the results. We find that AsymEP +maintains robust performance across all asymmetry levels +(e.g., achieving an accuracy of 93.8 ± 0.4% at rstr = 0 and +94.9 ± 0.2% at rstr = 0.875 when training all parameters) +and can even learn when the recurrent connection matrix +J dyn is completely antisymmetric (rstr = 1). Additionally, +training all parameters shows significant improvement over +training only J in . +In contrast, VF performs well at low asymmetry ratios +but degrades as asymmetry increases, eventually dropping +to chance levels (e.g., accuracies of 5 ± 3% and 8 ± 4% +at rstr = 1 for input-only and all-parameter training, respectively). When only J in is trained, VF accuracy collapses around rstr ≈ 0.5, whereas training all parameters +delays this collapse until rstr ≈ 0.8. Our analysis in Appendix G.2.1 reveals that VF adjusts the dynamics such that +the asymmetry of the Jacobian’s off-diagonal terms remains +strictly lower than the structural asymmetry ratio. The training appears to adjust the neuronal state such that neurons +connected by strongly asymmetric weights have low activation. As shown in Appendix G.2.1, AsymEP learns faster +than VF across all levels of asymmetry. + +(b) Results after 20 epochs. +Figure 1. Comparison of algorithm performance on MNIST using +a layered architecture with one hidden layer and symmetric initialization. Squares denote AsymEP, circles EP, and triangles VF. +Test accuracy (averaged over 10 runs) is shown after one epoch +(Fig. 1a) and 20 epochs (Fig. 1b). + +Finally, Appendix G.3 opens with a brief theoretical discussion of the stability of these non-conservative dynamics, +followed by simulations on all-to-all topologies with constrained rstr and input projections J in . Even in this worstcase setting, AsymEP reduces oscillations and improves +stability. +5.3. Feedforward Architectures +Figure 2. Evolution of the asymmetry ratio rstr (defined in Eq. (37)) +during training on MNIST for AsymEP, EP and VF, initialized +from a symmetric configuration. The models use 50 hidden neurons. + +We now consider a purely feedforward architecture. Here +VF trains only the last layer: with no backward connections, +the output nudging signal cannot reach earlier layers, so for +every layer but the last the nudged stationary states coincide +with the free states, giving zero weight updates. As only +the output layer is trained, the system essentially becomes +an Extreme Learning Machine (Huang et al., 2006; Wang +et al., 2022). In contrast, AsymEP introduces a correction +that generates effective backward connections, allowing the + +following parameterization of the recurrent parameters: +"q +# +S̃ +à +dyn +2 +J =γ +1 − rstr ++ rstr +, +(38) +∥S̃∥F +∥Ã∥F +7 + +Equilibrium Propagation for Non-Conservative Systems + +tivity structures inspired by (Millidge et al., 2023), while +keeping the number of trainable parameters fixed. +Experiments are conducted on Fashion-MNIST using a twohidden-layer network with hidden dimensions 500 and 200. +Network states are denoted (x0 , x1 , x2 , x3 ), where x0 is +the input and x3 = xL the output. Forward and backward +connections are denoted by Wk and Bk , respectively, with +W1 = J in . +We consider three classes of dynamics. First, the Continuous +Hopfield (CH) dynamics introduced previously: + + +dxk += −xk +ρ′ (xk )⊙ Wk ρ(xk−1 )+(1−δk,L )Bk ρ(xk+1 ) . +dt +(40) +Second, Predictive Coding (PC) dynamics, defined through +the prediction errors ek = xk − Wk ρ(xk−1 ), whose fixed +point ek = 0 corresponds to a standard feedforward network: + +Figure 3. Impact of the structural asymmetry ratio rstr on accuracy +(top) and standard deviation over 10 runs (bottom) on MNIST. +We compare VF (orange) and AsymEP (blue) under two training +regimes: training only J in (dashed) or all parameters (solid). + +dxk += −ek + (1 − δk,L ) (ρ′ (xk ) ⊙ (Bk ek+1 )) . +dt + +nudging signal to influence all layers. We make this explicit +for a network with one hidden layer. + +(41) + +Third, a standard dynamics chosen for direct comparison +with backpropagation: + +Let the state x be partitioned in hidden h and output o +layers. The recurrent connection matrix is then J dyn = + +0 +0 +. The forces of the system are: +Wh→o 0 + β + + +Fh = ρ′ (h) ⊙ J in u + λ(Wh→o )⊤ (o − o0 ) − h + + + + + + β +0 +Fo = ρ′ (o) ⊙ Wh→o ρ(h) − λWh→o (h − h ) +(39) + + + +∂C + + +−o ++ λβ +∂o + +dxk += −xk + Wk ρ(xk−1 ) + (1 − δk,L )Bk ρ(xk+1 ). (42) +dt +For each dynamics, we examine three connectivity scenarios. +⊤ +• In the asymmetric case (Bk ̸= Wk+1 +), the backward +weights Bk are randomly initialized and kept fixed +while only the forward weights are trained, ensuring a +fair comparison (i.e., identical number of parameters); +in PC, the learning rule for Bk is zero when only inputs +are clamped. + +where λ is 0 during the free inference and 1 during the +nudged phase (Eq. 20). The force on the hidden layer Fhβ +now depends on the output layer through the term ρ′ (h) ⊙ +⊤ +(Wh→o ) (o − o0 ), enabling the nudge (the term β ∂C +∂o ) to +influence the hidden layer. This implicitly assumes that the +hardware implementation supports the physical activation +of these backward connections. + +⊤ +• In the symmetric / conservative case (Bk = Wk+1 +), the +CH and PC dynamics derive from an energy functional, +while the standard dynamics remains non-conservative +due to its non-symmetric Jacobian. + +• In the feedforward case (Bk = 0), the PC and standard dynamics coincide; for the standard dynamics, the +AsymEP learning rule mirrors backpropagation, with +1 +∆xβk = 2β +(xβ − x−β ) acting as the propagated error +signal. + +We validate this using a single hidden layer of only 20 neurons on MNIST. After training, VF saturates with 64.3 ± +2.0% accuracy, whereas AsymEP reaches 92.7 ± 0.5% accuracy. We expect this discrepancy to increase with network +depth, since this increases the number of layers unable to +learn under VF. A figure with the accuracy during training +can be found in Appendix G.4.2. + +Table 1 shows that AsymEP consistently outperforms VF +in both asymmetric and feedforward settings, in final accuracy, learning speed, and stability. After a single epoch +it already provides on average a 15% accuracy gain with +an order-of-magnitude reduction in variance. Remarkably, +AsymEP with asymmetric connectivity also surpasses EP +on symmetric networks despite training only the forward + +5.4. Advantages of Non-Conservative Dynamics +AsymEP is not tied to a specific neural dynamics. To further +assess the benefits of training non-conservative dynamics +using AsymEP, we compare several dynamics and connec8 + +Equilibrium Propagation for Non-Conservative Systems + +6. Discussion and Conclusion + +weights, suggesting that relaxing symmetry constraints may +improve expressivity. Supplementary results are provided +in Appendix G.5. + +In this work, we extended Equilibrium Propagation (EP) +to non-conservative systems that reach stationary states by +deriving two mathematically equivalent algorithms that recover the exact gradient of the cost function in the limit of +infinitesimal nudging. + +Table 1. Test accuracy on Fashion-MNIST (%) at Epoch 50 (mean +± std 10 runs). BP on a standard feedforward architecture using +MSE and SGD achieve 87.37 ± 0.29%. +EP + +AsymEP + +Asym +86.78 ± 0.14 +Feedfor +86.05 ± 0.12 +Sym +84.30 ± 0.13 +Asym +86.20 ± 0.17 +PC +Sym +84.78 ± 0.14 +Asym +82.91 ± 0.48 +Standard +Feedfor +86.25 ± 0.16 +CH + +The first approach, Asymmetric EP, preserves the original +inference dynamics. It introduces a corrective force during +the nudged phase that remains spatially local, as the antisymmetric Jacobian is null for unconnected neurons and the +perturbation from equilibrium is available at the synapse +level. Unlike standard methods like Recurrent Backpropagation (Almeida, 1990; Pineda, 1987), this avoids explicit +digital weight transposition. However, a physical mechanism to obtain the local corrective force at the synapse +level remains a subject for future work. We also note that +AsymEP shares the temporal non-locality of standard EP. + +VF +85.20 ± 0.12 +77.76 ± 0.37 +80.71 ± 6.17 +75.52 ± 1.69 +78.58 ± 0.28 + +Finally, to investigate how AsymEP scales with depth, we +trained deeper fully connected networks with two and three +hidden layers of 500 neurons on Fashion-MNIST, reaching +86.41 ± 0.22% and 87.8 ± 0.15% test accuracy respectively. + +The second approach, Dyadic EP, doubles the state space +to map non-reciprocal dynamics onto an energy landscape—conceptually reminiscent of multi-compartment cortical neurons, where apical dendrites integrate feedback +(analogous to z − z ′ ) separately from basal feedforward +input (analogous to z + z ′ ) (Guerguiev et al., 2017). Additionally, this expanded space also enables the positive and +negative nudging phases to run in parallel. This offers a +pathway to implement a version of EP that is local in time, +but would require a doubling of the degrees of freedom +on the physical hardware. More fundamentally, the energy +defined on the extended state shows that the tools and theoretical guarantees obtained for EP should also apply to +the case of non-reciprocal forces, and that the variational +principle behind EP is universal in the sense that it can be +applied to all networks which operate in a stationary state. + +5.5. Feedforward Training on CIFAR-10: BP vs. Dyadic +EP vs. AsymEP +To test whether our framework scales beyond shallow networks, we conclude with a deep, purely feedforward CNN +architecture trained on CIFAR-10. We compare backpropagation (BP), VF, AsymEP and Dyadic EP in a controlled +setting where the gradient estimator is the only difference +between runs: all methods share the same configuration, +with the BP gradient replaced by the contrast of stationary +states for the EP-based methods (see App. G.6 for details). +Each configuration is trained for 40 epochs over 5 seeds. + +Furthermore, Dyadic EP is not restricted to the EP community and could suggest a more physically plausible alternative to the stationary-state Adjoint Method (for fixed +inputs) (Chen et al., 2018): by solving the forward and adjoint equations simultaneously via relaxation, it circumvents +a separate backward-in-time pass. + +Table 2 reports the final test accuracy. Both of our algorithms scale to this regime, closely tracking the BP baseline +throughout training and matching its final accuracy: a paired +t-test finds no significant difference between Dyadic EP and +BP (p = 0.75), and only a sub-percent gap for AsymEP. +In contrast, VF makes slight initial progress (peaking near +30%) before collapsing to chance level (10%). Additional +details can be found in Appendix G.6 + +Finally, our experiments on MNIST, Fashion-MNIST, and +CIFAR-10 confirm that AsymEP and Dyadic EP consistently outperform EP and VF, and notably enables effective +training of feedforward networks. +Our work thus opens new avenues for learning in neuromorphic hardware, dissipative physical systems, and neural +architectures where asymmetry is intrinsic rather than incidental. + +Table 2. Test accuracy on CIFAR-10 (%) at epoch 40 (mean ± std +over 5 seeds). +Method + +Test Acc. (%) + +Backpropagation +Dyadic EP +AsymEP +VF + +90.66 ± 0.25 +90.69 ± 0.14 +89.74 ± 0.14 +10.00 ± 0.00 + +9 + +Equilibrium Propagation for Non-Conservative Systems + +Impact Statement + +References + +This paper presents results that advance the field of machine +learning. There are many potential societal consequences +of our work, none of which we feel must be specifically +highlighted here. + +Almeida, L. B. A learning rule for asynchronous perceptrons with feedback in a combinatorial environment. In +Artificial neural networks: concept learning, pp. 102–111. +1990. + +Acknowledgments + +Altman, L. E., Stern, M., Liu, A. J., and Durian, D. J. Experimental demonstration of coupled learning in elastic +networks. Physical Review Applied, 22(2):024053, 2024. + +AES is fully funded by the Horizon Europe Marie +Skłodowska-Curie Doctoral Network ’Postdigital Plus’ +(Grant 101169118). DVA acknowledges the support of +the French Community of Belgium through a FRIA fellowship. SM acknowledges financial support by the Fonds de la +Recherche Scientifique–FNRS, Belgium under EOS Project +No. 40007536. Computational resources have been provided by the Consortium des Équipements de Calcul Intensif +(CÉCI), funded by the Fonds de la Recherche Scientifique +de Belgique (F.R.S.-FNRS) under Grant No. 2.5020.11 and +by the Walloon Region. + +Aykroyd, C., Bourgoin, A., and Poncin-Lafitte, C. L. Hamiltonian treatment of non-conservative systems. arXiv +preprint arXiv:2507.18658, 2025. +Bateman, H. On dissipative systems and related variational +principles. Physical Review, 38(4):815, 1931. +Berneman, M. and Hexner, D. Equilibrium propagation for +dissipative dynamics. Advanced Intelligent Systems, pp. +e202501310, 2025. +Bishop, K. J., Biswal, S. L., and Bharti, B. Active colloids +as models, materials, and machines. Annual Review of +Chemical and Biomolecular Engineering, 14(1):1–30, +2023. + +“ἁρμονίη ἀφανὴς φανερῆς κρείττων” + +Bowick, M. J., Fakhri, N., Marchetti, M. C., and Ramaswamy, S. Symmetry, thermodynamics, and topology +in active matter. Physical Review X, 12(1):010501, 2022. +Brandenbourger, M., Locsin, X., Lerner, E., and Coulais, C. +Non-reciprocal robotic metamaterials. Nature communications, 10(1):4608, 2019. +Cesa-Bianchi, N. and Lugosi, G. Prediction, learning, and +games. Cambridge university press, 2006. +Chen, R. T., Rubanova, Y., Bettencourt, J., and Duvenaud, +D. K. Neural ordinary differential equations. Advances +in neural information processing systems, 31, 2018. +Cin, N. D., Marquardt, F., and Wanjura, C. C. Training +nonlinear optical neural networks with scattering backpropagation. arXiv preprint arXiv:2508.11750, 2025. +Costa, P. and Santos, P. A. Directed equilibrium propagation +revisited. Mathematics, 13(11), 2025. ISSN 2227-7390. +Crick, F. The recent excitement about neural networks. +Nature, 337, 1989. +Dillavou, S., Stern, M., Liu, A. J., and Durian, D. J. Demonstration of decentralized physics-driven learning. Physical Review Applied, 18(1):014040, 2022. +Dillavou, S., Beyer, B. D., Stern, M., Liu, A. J., Miskin, +M. Z., and Durian, D. J. Machine learning without a processor: Emergent learning in a nonlinear analog network. +Proceedings of the National Academy of Sciences, 121 +(28):e2319718121, 2024. +10 + +Equilibrium Propagation for Non-Conservative Systems + +Ernoult, M., Grollier, J., Querlioz, D., Bengio, Y., and Scellier, B. Updates of equilibrium prop match gradients +of backprop through time in an rnn with static input. +Advances in neural information processing systems, 32, +2019. + +Indiveri, G. and Liu, S.-C. Memory and information processing in neuromorphic systems. Proceedings of the +IEEE, 103(8):1379–1397, 2015. + +Ernoult, M., Grollier, J., Querlioz, D., Bengio, Y., and Scellier, B. Equilibrium propagation with continual weight +updates. arXiv preprint arXiv:2005.04168, 2020. + +Kalinin, K. P., Gladrow, J., Chu, J., Clegg, J. H., Cletheroe, +D., Kelly, D. J., Rahmani, B., Brennan, G., Canakci, B., +Falck, F., et al. Analog optical computer for ai inference +and combinatorial optimization. Nature, 645(8080):354– +361, 2025. + +Falk, M. J., Strupp, A. T., Scellier, B., and Murugan, +A. Temporal contrastive learning through implicit nonequilibrium memory. Nature Communications, (16), +2025. + +Kendall, J., Pantone, R., Manickavasagam, K., Bengio, +Y., and Scellier, B. Training end-to-end analog neural +networks with equilibrium propagation. arXiv preprint +arXiv:2006.01981, 2020. + +Farinha, M. T., Pequito, S., Santos, P. A., and Figueiredo, +M. A. T. Equilibrium propagation for complete directed +neural networks. In Proceedings of the 28th European +Symposium on Artificial Neural Networks, Computational +Intelligence and Machine Learning (ESANN 2020), 2020. + +Laborieux, A. and Zenke, F. Holomorphic equilibrium +propagation computes exact gradients through finite size +oscillations. Advances in Neural Information Processing +Systems, 35:12950–12963, 2022. +Laborieux, A. and Zenke, F. Improving equilibrium propagation without weight symmetry through jacobian homeostasis. In Proceedings of the International Conference on Learning Representations (ICLR) 2024, Virtual +(ICLR), May 2024. + +Galley, C. R. Classical mechanics of nonconservative systems. Physical review letters, 110(17):174301, 2013. +Guerguiev, J., Lillicrap, T. P., and Richards, B. A. Towards +deep learning with segregated dendrites. elife, 6:e22901, +2017. + +Laborieux, A., Ernoult, M., Scellier, B., Bengio, Y., Grollier, +J., and Querlioz, D. Scaling equilibrium propagation to +deep convnets by drastically reducing its gradient estimator bias. Frontiers in neuroscience, 15:633674, 2021. + +Høier, R. and Zach, C. A lagrangian perspective on dual +propagation. In Proceedings of the First Workshop on Machine Learning with New Compute Paradigms at NeurIPS +2023, New Orleans, LA, USA, Dec 2023. + +Laydevant, J., Marković, D., and Grollier, J. Training an +ising machine with equilibrium propagation. Nature Communications, 15(1):3671, 2024. + +Høier, R. and Zach, C. Two tales of single-phase contrastive +hebbian learning. In Salakhutdinov, R., Kolter, Z., Heller, +K., Weller, A., Oliver, N., Scarlett, J., and Berkenkamp, F. +(eds.), Proceedings of the 41st International Conference +on Machine Learning, volume 235 of Proceedings of +Machine Learning Research, pp. 18470–18488. PMLR, +21–27 Jul 2024. + +LeCun, Y. The mnist database of handwritten digits. +http://yann. lecun. com/exdb/mnist/, 1998. +Martin, E., Ernoult, M., Laydevant, J., Li, S., Querlioz, D., +Petrisor, T., and Grollier, J. Eqspike: spike-driven equilibrium propagation for neuromorphic implementations. +Iscience, 24(3), 2021. + +Høier, R., Staudt, D., and Zach, C. Dual propagation: accelerating contrastive hebbian learning with dyadic neurons. +In Proceedings of the 40th International Conference on +Machine Learning, ICML’23. JMLR.org, 2023. + +Massar, S. Equilibrium propagation for learning in lagrangian dynamical systems. Physical Review E, 112 +(3):035304, 2025. + +Høier, R., Kalinin, K., Ernoult, M., and Zach, C. Dyadic +learning in recurrent and feedforward models. In NeurIPS +2024 Workshop Machine Learning with new Compute +Paradigms, 2024. + +Massar, S. and Mognetti, B. M. Equilibrium propagation: +the quantum and the thermal cases. Quantum Studies: +Mathematics and Foundations, 12(1):6, 2025. + +Hopfield, J. J. Neural networks and physical systems with +emergent collective computational abilities. Proceedings +of the national academy of sciences, 79(8):2554–2558, +1982. + +Millidge, B., Song, Y., Salvatori, T., Lukasiewicz, T., and +Bogacz, R. Backpropagation at the infinitesimal inference limit of energy-based models: Unifying predictive +coding, equilibrium propagation, and contrastive hebbian +learning. In International Conference on Learning Representations (ICLR), 2023. + +Huang, G.-B., Zhu, Q.-Y., and Siew, C.-K. Extreme learning +machine: theory and applications. Neurocomputing, 70 +(1-3):489–501, 2006. +11 + +Equilibrium Propagation for Non-Conservative Systems + +Nest, T. and Høier, R. Dyadic learning in asymmetric +convnets. In New Frontiers in Associative MemoriesWorkshop at ICLR 2026. + +Wang, Q., Wanjura, C. C., and Marquardt, F. Training +coupled phase oscillators as a neuromorphic platform +using equilibrium propagation. Neuromorphic Computing +and Engineering, 4(3):034014, 2024. + +Osat, S. and Golestanian, R. Non-reciprocal multifarious +self-organization. Nature Nanotechnology, 18(1):79–85, +2023. + +Wanjura, C. C. and Marquardt, F. Quantum equilibrium +propagation for efficient training of quantum systems +based on onsager reciprocity. Nature Communications, +16(1):6595, 2025. + +O’Connor, P., Gavves, E., and Welling, M. Training a spiking neural network with equilibrium propagation. In The +22nd international conference on artificial intelligence +and statistics, pp. 1516–1523. PMLR, 2019. + +Werbos, P. J. Backpropagation through time: what it does +and how to do it. Proceedings of the IEEE, 78(10):1550– +1560, 1990. + +Pineda, F. Generalization of back propagation to recurrent +and higher order neural networks. In Neural information +processing systems, 1987. + +Yi, S.-i., Kendall, J. D., Williams, R. S., and Kumar, S. +Activity-difference training of deep neural networks using +memristor crossbars. Nature Electronics, 6(1):45–51, +2023. + +Pourcel, G., Basu, D., Ernoult, M., and Gilra, A. Lagrangianbased equilibrium propagation: generalisation to arbitrary boundary conditions & equivalence with hamiltonian echo learning. arXiv preprint arXiv:2506.06248, +2025. +Rageau, T. and Grollier, J. Training and synchronizing +oscillator networks with equilibrium propagation. Neuromorphic Computing and Engineering, 2025. +Sajnok, K. and Matuszewski, M. Near-equilibrium propagation training in nonlinear wave systems. arXiv preprint +arXiv:2510.16084, 2025. +Scellier, B. Quantum equilibrium propagation: Gradientdescent training of quantum systems. arXiv preprint +arXiv:2406.00879, 2024. +Scellier, B. and Bengio, Y. Equilibrium propagation: Bridging the gap between energy-based models and backpropagation. Frontiers in computational neuroscience, 11:24, +2017. +Scellier, B., Goyal, A., Binas, J., Mesnard, T., and Bengio, +Y. Generalization of equilibrium propagation to vector +field dynamics. arXiv preprint arXiv:1808.04873, 2018. +Scellier, B., Mishra, S., Bengio, Y., and Ollivier, Y. Agnostic +physics-driven deep learning. arXiv:2205.15021v1, 2022. +Scurria, A. E. A physical theory of backpropagation: Exact +gradients from the least-action principle. 2026. +Stern, M., Hexner, D., Rocks, J. W., and Liu, A. J. Supervised learning in physical networks: From machine +learning to learning machines. Physical Review X, 11(2): +021045, 2021. +Wang, J., Lu, S., Wang, S.-H., and Zhang, Y.-D. A review +on extreme learning machine. Multimedia Tools and +Applications, 81(29):41611–41660, 2022. +12 + +Equilibrium Propagation for Non-Conservative Systems + +A. Gradient Estimation Error in VF + +where s denotes the dynamical state of the system. This +symmetry is the linchpin of the equivalence proof, as the +gradient expressions derived for BPTT and standard EP +differ precisely by a transpose operation applied to ∂F +∂s . + +In this appendix, we quantify the gradient estimation error +introduced by VF in the limit where the Jacobian asymmetry +is small. + +This observation aligns with our analysis in the main text: +VF fails in non-conservative systems due to the missing +transpose in the post-synaptic term (see Eq. (16)). Following +the derivation in Ernoult et al. (2019) (viz., Appendix A, Eqs. +(31–33)), the recursive relations for the gradients in BPTT +are given by: + +Comparing the post-synaptic update terms in Eqs. (12) and +(14) gives the following error in the gradient of the cost: +⊤ +∂F 0 +(x , θ) +∂θ + +−1 +−1 ∂C 0 +(x , y), (43) +× JF (x0 , θ) +− JF⊤ (x0 , θ) +∂x + + +Error = − + +∇BPTT +(0) = +s + +To quantify this error, we decompose the Jacobian JF (x, θ) +into its symmetric part SJ (x, θ) and antisymmetric part + +SJ (x, θ) = 12 JF (x, θ) + JF⊤ (x, θ) , +(44) + +AJ (x, θ) = 12 JF (x, θ) − JF⊤ (x, θ) . + +(JF ) + += + +(JF⊤ )−1 = + +∞ +X + +∇BPTT +(t) = +s +∇BPTT +(t) = +θ + +(−1) + +n=0 +∞ +X + +(SJ−1 AJ )n + +SJ−1 , + + + + +∂F +(x, s⋆ , θ) +∂s + +⊤ + +∂F +(x, s⋆ , θ) +∂θ + +⊤ + +∇BPTT +(t − 1), +s + +(50) + +∇BPTT +(t − 1), +s + +(51) + +where θ represents the optimization parameters, ℓ is the +cost function, s⋆ is the free equilibrium state (satisfying +F (s⋆ ) = 0), y is the target, and x is the input. The index t +denotes the unrolled time steps, initialized at s(0) = s⋆ . + +! +n + +(49) + +and for all t = 1, . . . , K, + +Assuming the asymmetry AJ (x, θ) is small, we can make +a series expansion in SJ−1 AJ (omitting the dependencies +for clarity). Applying the Neumann expansion for small +∥SJ−1 AJ ∥ gives +−1 + +∂ℓ +(s⋆ , y), +∂s + +(45) + +In contrast, the gradients computed by VF follow the recursion (viz., Ernoult et al. (2019), Appendix A, Eqs. (24–26)): + +! +(SJ−1 AJ )n + +SJ−1 . + +(46) + +∆EP +s (0) = − + +n=0 + +∂ℓ +(s⋆ , y), +∂s + +(52) + +and for all t ≥ 0, + +Subtracting the two series and assuming convergence, we +finally obtain +! +∞ +X +2n+1 +−1 +−1 +⊤ −1 +(JF ) − (JF ) = −2 +SJ AJ +SJ−1 . + +∂F +(x, s⋆ , θ) ∆EP +s (t), +∂s +⊤ + +∂F +(x, +s +, +θ) +∆EP +∆EP +(t ++ +1) += +⋆ +s (t). +θ +∂θ + +∆EP +s (t + 1) = + +n=0 + +(47) + +(53) +(54) + +Comparing these two sets of equations confirms that the only +difference are Eqs. (50) and (53), specifically the transpose +of the Jacobian ∂F +∂s (ignoring the global sign difference in +Eqs. (49) and (52)). + +B. Equivalence between AsymEP and BPTT +In this appendix, we sketch the equivalence between the +gradient estimate computed by AsymEP and Backpropagation Through Time (BPTT) (Werbos, 1990) for a Recurrent +Neural Network with fixed inputs. Our derivation relies on +the proof provided by Ernoult et al. (2019), which established that standard (conservative) EP computes gradients +identical to those of BPTT. To facilitate direct comparison, +we adopt their notation for this section. + +In AsymEP, we modify the dynamics by adding a correction +term dependent on the antisymmetric part of the Jacobian. +Denoting the force of this augmented system by F A , the +Jacobian at the free equilibrium satisfies: + +The proof provided by Ernoult et al. (2019) relies on the +assumption that the vector field F (i.e., transition function) +is derived from a scalar potential function, which implies +that + +⊤ +∂F +∂F += +, +(48) +∂s +∂s + +By substituting this corrected Jacobian into the recursive +relations, AsymEP recovers the exact transpose required +by BPTT. Consequently, our method extends the equivalence between EP and BPTT to the general case of nonconservative force. + +∂F A +(x, s⋆ , θ) = +∂s + +13 + + + +∂F +(x, s⋆ , θ) +∂s + +⊤ +. + +(55) + +Equilibrium Propagation for Non-Conservative Systems + +C. Out-of-Equilibrium Mechanics + +C.3. Symmetry Breaking as Credit Assignment + +Here we sketch the physical picture behind the doubledenergy construction of Eq. (26). The full derivation from +Hamilton’s least-action principle, together with its connection to the Bateman–Galley formalism for non-conservative +classical mechanics (Bateman, 1931; Galley, 2013; Aykroyd +et al., 2025), can be found in (Scurria, 2026). + +On the diagonal manifold z = z ′ the doubled system enjoys +a gauge symmetry: the auxiliary variable z ′ is redundant and +the difference d is identically zero. Credit assignment is implemented by deliberately breaking this symmetry through +the task cost. Adding βD(z, z ′ ) = β C(m) to H exerts +opposite forces on z and z ′ and drives them apart, so that +the difference d ceases to be redundant and begins to carry +information about the loss landscape. + +C.1. The Helmholtz Obstruction +The natural physical route to a variational principle for a +dynamical system ẋ = F (x, θ) is to seek a scalar potential +E such that F = −∂x E. The classical Helmholtz integrability condition states that such an E exists if and only if the +Jacobian JF is symmetric everywhere. Whenever the interactions are non-reciprocal — as in feedforward networks, +active matter, or driven optical systems — JF acquires +a non-zero antisymmetric part and the Helmholtz condition fails identically. No scalar potential on the original +n-dimensional state space can then generate the dynamics, +and the “energy minimisation” route at the heart of standard +EP is blocked at the structural level. The obstruction is not +a matter of computational convenience: it reflects the fact +that the rotational component of F carries information that +no scalar function of x alone can record. + +D. Proofs for Dyadic EP +We now demonstrate that Dyadic EP correctly trains the +parameters θ of the original force field F (x, θ), giving the +0 +) +in the limit of infinitesimal nudging. +exact gradient dC(x̄ +dθ +D.1. Proof of EP +First, recall that standard EP does not strictly require the +system to settle at an energy minimum; it requires only that +the system reaches a stationary state (a fixed point of the +dynamics). Indeed, using the notation of Section 2.1, EP +relies on the key identity: +d2 +d2 +ET (xβ , θ) = +ET (xβ , θ). +dθdβ +dβdθ + +C.2. Variational Reconstruction on a Doubled Space + +Expanding the total derivative with respect to β gives: + +Applying the Bateman–Galley formalism circumvents this +obstruction by enlarging the configuration space. The single +state x ∈ Rn is replaced by a conjugate pair (z, z ′ ) ∈ R2n , +and the rotational component of F — which has no scalar +generator on the original n-dimensional space — is absorbed into a bilinear coupling between z and z ′ on the +doubled space, where it does admit a variational description. The physical motion is recovered on the diagonal +submanifold z = z ′ (the so called ’physical limit’), while +the off-diagonal direction d = z − z ′ supplies the additional +degree of freedom needed to encode non-reciprocity. + +d +ET (xβ , θ) = +dβ + + + + +z + z′ +,θ , +2 + + + +∂ET (xβ , θ) +∂x + +⊤ + +dxβ +∂ET (xβ , θ) ++ +dβ +∂β + += C(xβ ). + +(58) + +Where the first term vanishes because the system is at a +∂ +ET (xβ , θ) = 0; this holds even if +stationary state, i.e., ∂x +the system is not at a minimum of ET . Similarly, for the +derivative with respect to θ: +d +∂ET (xβ , θ) +ET (xβ , θ) = +, +dθ +∂θ + +Specializing this reconstruction to the overdamped (firstorder) regime relevant to relaxational neural dynamics yields +the bilinear energy +H(z, z ′ , θ) = −(z − z ′ )⊤ F + +(57) + +(59) + +where we additionally assume that the cost function does +not depend explicitly on the parameters θ. Substituting these +results into Eq. (57) in the limit of infinitesimal nudging +(β → 0) recovers the fundamental relation given by Eq. (9). + +(56) +D.2. Proof of Dyadic EP + +which is precisely Eq. (26). The symmetric midpoint m = +(z + z ′ )/2 plays the role of the physical coordinate of the +doubled system, while d is the auxiliary direction along +which non-reciprocity is stored. On the submanifold z = z ′ +the coupling proportional to (z − z ′ ) vanishes identically +and both states evolve under the original field F , so the +doubling leaves the on-shell physics unchanged. We refer +the reader to (Scurria, 2026) for the full construction. + +We analyze now the stationary states of Dyadic EP by introducing the change of variables: +m= + +z + z′ +, +2 + +d = z − z′. + +(60) + +In these coordinates, the augmented energy HT becomes +HT (m, d, θ, β) = −d⊤ F (m, θ) + βC(m) +14 + +(61) + +Equilibrium Propagation for Non-Conservative Systems + +and the dynamics in Eq. (28) can be rewritten as: +∂HT +dm +=− += F (m, θ), +dt +∂d +dd +∂HT +∂ +=− += dT JF (m, θ) − β +C(m). +dt +∂m +∂m + +In Dyadic EP, we instead employ the single-phase update: + + +1 ∂H(z β , z ′β , θ) +∆θ ∝ − +(70) +β +∂θ + +(62) +(63) + +This choice avoids the overhead of evolving two coupled +equations in the extended space, which would be computationally equivalent to evolving four equations in the original +space (two for +β and two for −β). Using Eq. (70), we +evolve only one coupled equation for +β in the extended +space; this corresponds to two equations in the original +space, thereby achieving the same computational complexity as AsymEP. Furthermore, this single-phase formulation +suggests a pathway toward making the update local in time, +provided appropriate hardware is used to implement the +augmented phase. + +β + +The stationary states (mβ , d ) are the solutions to: +F (mβ , θ) = 0, +∂ +βT +d JF (mβ , θ) − β +C(mβ ) = 0. +∂m + +(64) +(65) + +This leads to the following observations: +1) The stationary state of m is independent of β and coincides with the stationary state of the original system: +z β + z ′β += mβ = m0 = x0 . +2 + +Mathematically, these two approaches yield the same gradient estimate because the equations for dβ are linear. Explicitly we have : + + +∂H(z β , z ′β , θ) +∂F z β + z ′β += −(z β − z ′β )⊤ +,θ +∂θ +∂θ +2 + +⊤ +−1 +∂F 0 += −β +z ,θ +JF⊤ (z 0 , θ) +∂θ + + +∂C 0 +× +(z ) , +(71) +∂x + +(66) + +2) The Jacobian of the extended system defined in Eq. (26) +is invertible, provided JF is invertible. This is most evident +from Eq. (63). +3) The stationary state value of d is given by: + + +−1 ∂C 0 +β +d = β JF⊤ (m0 , θ) +(x ) +∂x + +(67) + +where we have used Eqs. (66) and (67). Inspection of +Eq. (71) confirms that, up to corrections of order β 2 , we +obtain exactly the same gradient as in AsymEP. + +0 + +In particular, when β = 0, we have d = 0, which implies +that the free stationary states coincide: z 0 = z ′0 . + +E. AsymEP versus Dyadic EP + +4) The cost at the stationary state of the extended system +is equal to the cost at the stationary state of the original +system: +D(m0 ) = C(x0 ). +(68) + +In this appendix, we demonstrate that Asymmetric Equilibrium Propagation (AsymEP) emerges naturally as the firstorder projection of the 2N -dimensional Dyadic Equilibrium +Propagation onto a single N -dimensional state space. We +then formalize the physical trade-offs between the two architectures. + +Consequently, the gradients of the cost with respect to the +parameters are identical. +Since both the original and extended systems, given respectively in Eq. (28) and Eq. (1-2), share the same cost at their +respective stationary states, and because the Jacobians of +both models are invertible, applying EP update rule to the +extended system give the correct gradient estimate for the +parameters θ of the original system. + +E.1. AsymEP as the Linear Projection of Dyadic EP +As established in Appendix D.2, transforming the 2N dimensional extended space (z, z ′ ) into the mean state +′ +m = z+z +and the difference state d = z − z ′ exactly +2 +decouples the stationary dynamics. Because the stationary +state of m is the free state of the original system (mβ = x0 ), +the cost function drives the difference variable to a stationary +β +state d satisfying: + +The final step of the proof is to establish the equivalence +between the standard parameter update rule in Eq. (8) and +the modified rule used by Dyadic EP in Eq. (34). Indeed, if +we were to apply the standard update rule in the extended +space, the update would be: +1 +∆θ ∝ − +2β + + + +β + +′β + +−β + +∂H(z , z , θ) ∂H(z , z +− +∂θ +∂θ + +′−β + +, θ) + +β + +JF⊤ (x0 , θ)d = β + + +. + +∂C 0 +(x ) +∂x + +(72) + +To recover this exact error signal in an N -dimensional space, +we postulate a modified dynamical system FA (x) compris- + +(69) +15 + +Equilibrium Propagation for Non-Conservative Systems + +F. Derivation of the Hopfield-like Energy + +ing the standard EP dynamics and a spatial correction Γ(x): +FA (x) = F (x) − β + +∂C +(x) + Γ(x) +∂x + +In this section, we derive the explicit energy functional for +the Continuous Asymmetric Hopfield dynamics defined in +Eq. (35). The force field is given by: + +(73) + +Let ∆x = xβA − x0 denote the displacement from the +free equilibrium. Expanding the stationarity condition +FA (xβA ) = 0 to first order around x0 yields: +JF (x0 , θ)∆x − β + +∂C 0 +(x ) + Γ(xβA ) ≈ 0 +∂x + +F (x) = ρ′ (x) ⊙ (Jρ(x)) − x. + +We omit external inputs J in for brevity, as they appear symmetrically in the Jacobian. The variational Hamiltonian is +defined as: + + + + +z + z′ +z + z′ +H(z, z ′ ) = −(z − z ′ )⊤ F ++ βC +. +2 +2 +(79) + +(74) + +To ensure the first-order displacement matches the Dyadic +β +EP error signal (i.e., ∆x ≈ d ), we substitute Eq. (72) into +the expansion: + +Γ(xβA ) = JF⊤ (x0 , θ) − JF (x0 , θ) ∆x + +(75) + += −2AJ (x0 , θ)(xβA − x0 ) + +(76) + +To analyze this expression, we introduce the midpoint m = +z+z ′ +and the difference d = z − z ′ . Since the separation +2 +between z and z ′ is induced solely by the nudging parameter +β, the difference scales as ∥d∥ ∼ O(β). We therefore +neglect terms of order O(∥d∥3 ) (i.e., or equivalently O(β 3 )) +as they do not contribute to the gradient of the cost. + +This uniquely recovers the AsymEP augmented dynamics. +Finally, to eliminate the O(β 2 ) error, AsymEP evaluates the +centered difference of two opposite nudges: +0 +x±β +A =x ±β + +dxA ++ O(β 2 ) +dβ β=0 + +(78) + +The activation at the midpoint can be approximated as: +ρ(m) = + +(77) + +ρ(z) + ρ(z ′ ) ++ O(∥d∥2 ). +2 + +(80) + +Similarly, the difference in activations is: + +Subtracting these states cancels the O(β 2 ) error, yielding +β ++β +−β +1 +3 +2 (xA − xA ) = d + O(β ), successfully recovering the +exact post-synaptic update term. + +ρ(z) − ρ(z ′ ) = ρ′ (m) ⊙ d + O(∥d∥3 ). + +(81) + +Inverting this relation, we express the state difference as: +E.2. Physical Trade-offs and the Extended Space + +z − z ′ = (ρ(z) − ρ(z ′ )) ⊙ ρ′ (m) + O(∥d∥3 ). + +We can view AsymEP and Dyadic EP as a space-time tradeoff of the same underlying physical optimization problem. + +(82) + +We substitute these expansions into the interaction term +of the Hamiltonian, Hint = −(z − z ′ )⊤ (ρ′ (m) ⊙ Jρ(m)). +Applying the identity a⊤ (b ⊙ c) = (a ⊙ b)⊤ c, we obtain: + +AsymEP preserves the original N -dimensional state space +of the network at the cost of temporal non-locality. The system must evolve sequentially, requiring physical memory +not only to store the free equilibrium x0 for the asymmetric correction, but also to store the successive stationary +states required to evaluate the contrastive gradient update. +AsymEP thus serves as the direct, spatially minimal extension of EP. + +⊤ + +Hint = − ((z − z ′ ) ⊙ ρ′ (m)) Jρ(m) + + +ρ(z) + ρ(z ′ ) +≈ −(ρ(z) − ρ(z ′ ))⊤ J +. +2 + +(83) + +Expanding the product gives: + +Dyadic EP provide a learning signal that is local in both +space (where z − z ′ encodes the gradient) and time (allowing the nudged phases to execute in parallel) at the cost +of doubling the state space. In particular, capturing nonconservative forces in this extended space requires a specific bilinear coupling, rather than a trivial superposition +of uncoupled subsystems. It can be seen as a blueprint for +future neuromorphic hardware. + +Hint = − + +1h +ρ(z)⊤ Jρ(z) + ρ(z)⊤ Jρ(z ′ ) +2 +i + +− ρ(z ′ )⊤ Jρ(z) − ρ(z ′ )⊤ Jρ(z ′ ) . + +(84) + +We decompose the connectivity matrix J into its symmetric +part S and antisymmetric part A. The first and last terms +simplify to ρ(z)⊤ Sρ(z). The cross terms satisfy: + +Ultimately, the reduction of Dyadic EP to AsymEP via the +variables m and d proves the universality of EP’s variational +principle. + +ρ(z)⊤ Jρ(z ′ ) − ρ(z ′ )⊤ Jρ(z) = ρ(z)⊤ (J − J ⊤ )ρ(z ′ ) += ρ(z)⊤ (2A)ρ(z ′ ). +16 + +(85) + +Equilibrium Propagation for Non-Conservative Systems + +Thus, the interaction term reduces to: +1 +1 +Hint = − ρ(z)⊤ Sρ(z) + ρ(z ′ )⊤ Sρ(z ′ ) +2 +2 +− ρ(z)⊤ Aρ(z ′ ) + O(∥d∥3 ). + +The input parameters are then updated using the standard +learning rule (21). In particular, the presynaptic term associated with the input weights is given by, +∂Fi += δik ρ′ (xi )ul . +in +∂Jkl + +(86) + +Finally, for the nudging term, we expand the cost function +around the midpoint: +C(m) = + +1 +(C(z) + C(z ′ )) + O(∥d∥2 ). +2 + +The presynaptic terms associated with the dynamical paramdyn +eters Jij +depend on the experiment. +G.1. Symmetric Initialization + +(87) + +G.1.1. L EARNING RULES + +When multiplying by β, the remainder term becomes β · +O(∥d∥2 ). Since ∥d∥ ∼ O(β), this remainder is of order +O(β 3 ) and can be consistently discarded alongside the thirdorder terms from the interaction expansion. + +For clarity, we write the learning rules for VF and AsymEP. +For the input weights, using (93), we have: +i +1 h +β +in +′ 0 +∆Jik +∝ +(xi − x−β +(94) +i )ρ (xi )uk , +2β + +Combining all these components, the final Hamiltonian is: + +while for the recurrent weight, we get: +i +1 h +β +dyn +0 +′ 0 +(xi − x−β +)ρ(x +) +. +∆Jij +)ρ +(x +∝ +i +j +i +2β + +1 +1 +H(z, z ′ ) = − ρ(z)⊤ Sρ(z) + ρ(z ′ )⊤ Sρ(z ′ ) +2 +2 +1 +⊤ +′ +− ρ(z) Aρ(z ) + (∥z∥2 − ∥z ′ ∥2 ) +2 +β +′ ++ (C(z) + C(z )). +(88) +2 + +in +∆Jik +∝ + +(89) +(90) + +To complement Fig. 2, we report the evolution of the accuracy of the three methods in Fig. 4. We consider a layered +network with 50 hidden neurons. While this capacity is +insufficient for state-of-the-art performance, it amplifies the +difference in accuracy between models to aid visualization. +Models are trained for 20 epochs starting from a symmetric +configuration, the natural setting for both VF and EP. With +this initialization, AsymEP consistently outperforms the +other methods and learns faster by exploiting the additional +degrees of freedom of the asymmetric network. + +As in the main text, the neuronal dynamics are governed by +the vector field: + + +X dyn +Fi = ρ′ (xi ) +Jij ρ(xj ) + bi (u) − xi , +(91) +j + +G.2. Fixed Asymmetry Ratio + +where the input-dependent bias bi (u) is precomputed for +each MNIST input u as: +Jilin ul . + +(96) + +G.1.2. S UPPLEMENTARY N UMERICAL R ESULTS + +G. Experimental Details + +bi (u) = + + i +1 h +β +ρ(xi ) − ρ(x−β +) +uk , +i +2β + +and for the recurrent weights: +i +1 h +β +dyn +−β +−β +∆Jij +∝ +ρ(xi )ρ(x+β +j ) − ρ(xi )ρ(xj ) . (97) +2β + +This system recovers the original continuous Hopfield dynamics when z = z ′ (assuming β = 0). + +X + +(95) + +For EP, we have: + +The saddle-point dynamics, given by Eq. 32, generated by +this Hamiltonian are: +β ∂C +dz += ρ′ (z) ⊙ (Sρ(z) + Aρ(z ′ )) − z − +, +dt +2 ∂z +′ +dz +β ∂C += ρ′ (z ′ ) ⊙ (Sρ(z ′ ) + Aρ(z)) − z ′ + +. +dt +2 ∂z ′ + +(93) + +This section details the implementation for the fixed asymmetry ratio experiments presented in Section 5.2, followed +by complementary numerical results regarding learning +speed and induced Jacobian asymmetry. + +(92) + +l∈in + +This term projects the input space into the recurrent subspace. The bias yields a diagonal contribution to the Jacobian JF = ∂F +∂x , and therefore does not contribute to the +antisymmetric correction used in the augmented dynamics +Eq. (20) of AsymEP. + +G.2.1. L EARNING RULES +Parametrization and notation. To enforce a fixed asymmetry ratio, we explicitly parameterize the independent elements of Eq. (38). We introduce two parameter vectors θS +17 + +Equilibrium Propagation for Non-Conservative Systems + +Parameter + +Sym. Init. / Feedforward +sec. 5.1 & 5.3 + +Fixed rstr +sec. 5.2 + +Fixed rstr & rin +app. G.3 + +0.05 +0.01 +0.5 +0.5 +20 +10 +40 / 20 +64 +n.a. +784 - n.a. -10 +tanh +s ∼ U (−1, 1) +θ ∼ N (0, N1 ) +10 + +0.05 +0.01 +0.3 +0.5 +30 +10 +30 +√64 +60 +784-50-10 +tanh +s ∼ U(−1, 1) +θ ∼ N (0, N1 ) +10 + +0.0125 +0.0025 +0.3 +0.5 +40 +10 +40 +√64 +60 +all-to-all, 500 hid +tanh +s ∼ U(−1, 1) +θ ∼ N (0, N1 ) +10 + +Learning Rate (Input-Hidden) +Learning Rate (Hidden-Output) +Time Step (Dynamics Integration) +Nudging Parameter (β) +Free-phase Steps (nfree ) +Nudged-phase Steps (nnudge ) +Number of Epochs +Batch Size +Scaling Parameter γ +Structure +Activation function ρ +Initial Recurrent State s +Initial Parameters θ +Number of Runs (training + inference) + +Table 3. Trained Model Hyperparameters on MNIST. N is the total number of neurons, U(−1, 1) is a uniform distribution, and N (µ, σ 2 ) +is a Gaussian distribution. For the rstr parametrization, we choose more cautious hyperparameters for training and inference compared to +the symmetric initialization, due to increasingly non-conservative and potentially oscillatory dynamics. + +elements of S̃, the full matrices are constructed as: +S +S̃ij = δij ξi + (1 − δij )θk(max(i,j),min(i,j)) +, +A +Ãij = ϵij θk(max(i,j),min(i,j)) +, + +(99) +(100) + +where ϵij is the Levi-Civita symbol. The dynamical parameters are then given by: +dyn +Jij += γ(cS S̃ij + cA Ãij ), + +with normalization coefficients +p +2 +1 − rstr +cS = +, +FS + +cA = + +(101) + +rstr +, +FA + +defined in terms of the Frobenius norms: +v +uN +M +uX +X +2 +F =t +ξ2 + 2 +θS , + +Figure 4. Evolution of the mean accuracy and standard deviation +(over 10 runs) during training on MNIST for AsymEP, EP, and VF. +Models use 50 hidden neurons. + +S + +i + +i=1 + +k + +(102) + +(103) + +k=1 + +v +u M +u X +2 +θkA . +FA = t2 + +(104) + +k=1 + +and θA of size M = Ndyn (Ndyn − 1)/2, which encode the +off-diagonal elements of the symmetric and antisymmetric +components S̃ and Ã, respectively. The correspondence +between matrix and vector indices is given by: + +k(i, j) = + +(i − 1)(i − 2) ++ j, +2 + +Presynaptic computation. The dependence of the normalization coefficients on the parameters introduces additional regularization terms in the learning rule compared +to the parameterization of (Scellier &Bengio, 2017). The +gradients of the normalization coefficients are: + +(1 ≤ j < i ≤ Ndyn ) +(98) + +∂cS +θkS += +−2c +S +2, +∂θkS +(FS ) + +where the condition j < i selects the strictly lower triangular +elements. Introducing an additional vector ξ for the diagonal + +∂cA +θA += −2cA k 2 . +A +∂θk +(FA ) +18 + +∂cS +ξm += −cS +2, +∂ξm +(FS ) + +(105) +(106) + +Equilibrium Propagation for Non-Conservative Systems + +Parameter + +Comparison Dyn. +sec. 5.4 + +2 hidden layers +sec. 5.4 + +3 hidden layers +sec. 5.4 + +Learning Rate (Input-Hidden) +Learning Rate (Hidden-Hidden) +Learning Rate (Hidden-Output) +Time Step (Dynamics Integration) +Nudging Parameter (β) +Free-phase Steps (nfree ) +Nudged-phase Steps (nnudge ) +Number of Epochs +Batch Size +Layer Structure +Activation function ρ +Initial Recurrent State s +Initial Parameters θ +Number of Runs (training + inference) + +0.0016 +0.0016 +0.0016 +0.4 +0.3 +40 +20 +50 +64 +784-500-200-10 +tanh +s ∼ U(−1, 1) +θ ∼ N (0, N1 ) +10 + +0.0013 +0.0013 +0.0013 +0.3 +0.5 +40 +20 +40 +64 +784-500-500-10 +tanh +s ∼ U(−1, 1) +θ ∼ N (0, N1 ) +10 + +0.6 +0.6 +0.6 +0.0075 +0.20 +60 +30 +40 +64 +784-500-500-500-10 +tanh +s ∼ U (−1, 1) +θ ∼ N (0, N1 ) +10 + +Table 4. Trained Model Hyperparameters on Fashion-MNIST. N is the total number of neurons, U(−1, 1) is a uniform distribution, and +N (µ, σ 2 ) is a Gaussian distribution. For the rstr parametrization, we choose more cautious hyperparameters for training and inference +compared to the symmetric initialization, due to increasingly non-conservative and potentially oscillatory dynamics. + +Combining these with the derivatives of the matrices S̃ and +Ã, we have: +∂ S̃ij += δip δjq + δiq δjp , +∂θkS + +∂ S̃ij += δij δkj +∂ξk + +∂ Ãij += δip δjq − δiq δjp , +∂θkA + +(where p > q): + +N +θkA X +∂Fi +′ += +γc +ρ +(x +) +−2 +Ãij ρ(xj ) +A +i +(FA )2 j=1 +∂θkA + +(107) + + ++ δip ρ(xq ) − δiq ρ(xp ) . + +(108) + +(111) + +where k corresponds to the index pair (p, q) with p > q, as +defined in Eq. (98). The full presynaptic terms are then: + +Initialization. To ensure the stability of the system, we +initialize our parameters suchhthat the +i variance of dynamdyn +ical parameters scales as Var Jij ∝ 1/Ndyn . This is a +conservative choice for the layered architectures used in our +dyn +experiments, where many entries of Jij +are zero. + +• For the diagonal parameters ξm : + +N +∂Fi +ξm X +S̃ij ρ(xj ) += γcS ρ′ (xi ) − +∂ξm +(FS )2 j=1 + ++ δim ρ(xm ) . + +In practice, we initialize the parameter vectors θS , θA , and +ξ with identical variances σ 2 . For large Ndyn , the expected +Frobenius norms approximate to E[FS,A ] ≈ Ndyn σ. Consequently, the normalization coefficients become: + +(109) + +p + +• For the off-diagonal symmetric parameters θkS (where +p > q): + +cS ≈ + + +N +∂Fi +θkS X +′ += +γc +ρ +(x +) +−2 +S̃ij ρ(xj ) +S +i +(FS )2 j=1 +∂θkS + +2 +1 − rstr +, +Ndyn σ + +cA ≈ + +rstr +. +Ndyn σ + +(112) + +Since the symmetric and antisymmetric components are statistically independent, the variance of the weights is derived +as follows: + + ++ δip ρ(xq ) + δiq ρ(xp ) . + +• Diagonal elements (i = j): + +(110) +h +i +2 +1 − rstr +Var Jiidyn = γ 2 c2S σ 2 ≈ γ 2 +. +2 +Ndyn + +• For the off-diagonal antisymmetric parameters θkA +19 + +(113) + +Equilibrium Propagation for Non-Conservative Systems + +• Off-diagonal elements (i ̸= j): +i +h + +γ2 +dyn += γ 2 c2S + c2A σ 2 ≈ 2 , +Var Jij +Ndyn + +a zero-cost baseline (perfect prediction) during learning. +Specifically, for each method and value of rstr , we calculate the cumulative loss by summing the batch-averaged +costs of the first 5 epochs (out of 30, to avoid saturation +effects), and reporting the mean and standard deviation over +10 independent training runs. Mathematically, for each run: + +(114) + +i +h +dyn +∝ 1/Ndyn , we set: +To satisfy Var Jij +γ= + +p +Ndyn + +(115) +Cumul. Loss = + +Note that by random matrix theory, diagonal elements do +not affect stability in the large Ndyn limit. + +(116) + +Ãij = ϵij θk(max(i,j),min(i,j)) . + +(117) + +NX +batches + + +X + + +epoch=1 k=1 + +(x0 ,u)∈Bk + + +C(x0 , u) +, +|Bk | + +(120) +where Bk represents the k-th batch, and |Bk | denotes the +number of examples in the batch. The parameters are updated after each batch step; consequently, the free equilibrium x0 is inferred using the updated parameters and the +current example u. + +Potential Simplification. Although the parameterization +above is fully general, a simpler construction is possible +by removing self-connections (ξ = 0) and enforcing identical parameterization for the symmetric and antisymmetric +components, i.e., θS = θA = θ. The matrix elements then +become: +S̃ij = (1 − δij )θk(max(i,j),min(i,j)) , + +5 +X + +In this case, the Frobenius norms are equal (FS = FA ), and +we can omit the explicit normalization: +q +dyn +2 S̃ + r à . +Jij += 1 − rstr +(118) +ij +str ij +For a parameter θk corresponding to indices (p, q) with +p > q, the presynaptic term is given by: +q + +∂Fi +2 +r +1 − rstr += ρ′ (xi ) +str δip ρ(xq ) +∂θk +q + + (119) +2 −r +1 − rstr ++ +δ +ρ(x +) +str +iq +p . +While this parameterization works in simulations and keeps +the number of parameters constant for all rstr , it constrains +the asymmetry to be “homogeneous”, by which we mean +that the asymmetry ratio is identical for every pair of neurons; hence, the network cannot learn to be symmetric in one +region and antisymmetric in another. Therefore, we choose +to explore the more general case of (38) in our experiments. + +Figure 5. Cumulative loss as defined by (120) over the first 5 +epochs of training, for different asymmetry ratios rstr . We compare +VF (orange) and AsymEP (blue), under two training regimes: +training only J in (dashed) or all parameters (solid). + +In Fig 5, we observe that learning slows down for both algorithms when rstr ≳ 0.6. This behavior likely results from +the increased difficulty of reaching a stationary state as the +dynamics become strongly asymmetric. With a fixed number of inference steps, incomplete convergence degrades the +accuracy of the gradient estimates, thereby slowing down +the learning. Fig 5 shows that while VF can eventually +achieve competitive accuracy, it is consistently slower than +AsymEP as soon as asymmetry is introduced. + +G.2.2. S UPPLEMENTARY N UMERICAL R ESULTS +To complement the results of Fig 3, we analyze the training +efficiency as a function of the asymmetry ratio rstr and investigate the robustness of VF by monitoring the Jacobian +asymmetry. +Training efficiency. We first study the training efficiency +of the two algorithms as a function of the asymmetry ratio rstr . Inspired by the related concept in (Cesa-Bianchi +&Lugosi, 2006), we define the cumulative loss as the accumulated difference between the free equilibrium cost and + +Jacobian asymmetry. We next examine how the structural asymmetry rstr is reflected in the Jacobian of the dy20 + +Equilibrium Propagation for Non-Conservative Systems + +namics (35), given by: +∂Fi +dyn ′ += (1 − δij )ρ′ (xi )Jij +ρ (xj ) +∂sj +h +i ++ δij ρ′ (xi )(Jiidyn ρ′ (xi )) + ρ′′ (xi )bi − 1 . +(121) +In our layered architecture, the self-connections are zero +(Jiidyn = 0). For the following analysis, we neglect all diagonal terms in the Jacobian (including external inputs and +potential), since they do not contribute to the antisymmetric +correction (20) and thus to the discrepancy between the performance of VF and AsymEP. Consequently, we define the +following asymmetry ratio based solely on the off-diagonal +Jacobian JF,off : +(122) + +Figure 6. Asymmetry ratio of the Jacobian rjac defined in equation +(122) after training for different asymmetry ratios rstr . We compare +VF (orange) and AsymEP (blue), under two training regimes: +training only J in (dashed) or all parameters (solid). + +The results are presented in Fig 6. For each trained model +and ratio rstr , we compute rjac averaged over the stationary +states of the first batch (64 images) across 10 independent +runs. We observe that when structural asymmetry is strong +and all parameters are trained, VF partially compensates for +the asymmetry by adjusting the neuronal states. This can be +understood by rewriting the ratio as: + + +dyn +dyn ⊤ +ρ′ (xi ) Jij +− (Jji +) ρ′ (xj ) +F +rjac = +. +(123) +dyn ′ +ρ′ (xi )Jij +ρ (xj ) + +Consequently, local stability requires the largest real eigenvalue of the effective weight matrix to be strictly less than 1. +Assuming weights are initialized independently with variance σ 2 , Girko’s circular law dictates that the eigenvalues +of√an asymmetric matrix uniformly populate a disk of radius +σ n in the complex plane. In contrast, imposing symmetry +forces the eigenvalues +√ onto the real line, broadening the +spectral radius to 2σ n according to Wigner’s semicircle +law. As a result, asymmetric networks can stably accommodate larger variance in the weight initializations than their +symmetric counterparts. + +Compared to the structural asymmetry ratio in Eq. (37), +a value of rjac < rstr indicates that the neuronal states effectively dampen the structural asymmetry, rendering the +dynamics more symmetric. This symmetrization of the Jacobian appears without imposing an additional symmetrization penalty and could be enhanced using the method of +(Laborieux &Zenke, 2022). This mechanism likely explains +the superior performance of ‘All (VF)’ compared to ‘J in +(VF)’ in Fig 3, as the former is able to use the additional +degrees of freedom to reduce the effective asymmetry at +high rstr . + +Asymmetry nevertheless introduces imaginary eigenvalues +and, consequently, damped oscillations. To study this effect +experimentally in a controlled setting, we constrain the input +projections J in . In the experiments of the main text, fixing +the structural asymmetry ratio rstr still allowed AsymEP +to reduce oscillations by aligning and increasing the input +projections J in , thereby adding stabilizing diagonal contributions to the Jacobian. To isolate the network’s ability to +suppress oscillations independently of the magnitude of the +input drive, we further constrain the relative scale of J in and +J dyn by imposing + +rjac = + +⊤ +∥JF,off − JF,off +∥F +, +∥JF,off ∥F + +F + +rin = + +G.3. Stability analysis with Fixed Asymmetry Ratio & +Constrained Inputs Projection + +∥J in ∥F +∥J in ∥F += +, +∥J dyn ∥F +γ + +(124) + +where ∥J dyn ∥F = γ following Eq. (101). Defining unscaled +input projections J˜in , we set + +A complete stability analysis of the non-conservative dynamics trainable with AsymEP is beyond the scope of this +work. Nevertheless, for the class of continuous Hopfield +networks considered here, standard arguments from random +matrix theory suggest that asymmetry inherently improves +asymptotic stability. + +J in = rin γ + +J˜in +∥J˜in ∥F + +(125) + +G.3.1. L EARNING RULES + +In the dynamics defined by Eq. (91), the linear leak term +−xi shifts the spectrum of the system’s Jacobian by −1. + +Reusing the notations of the previous section, we write +Jilin = γcin J˜ilin with the normalization cin = rin /Fin , where +21 + +Equilibrium Propagation for Non-Conservative Systems + +Fin = ∥J˜in ∥F . Applying the chain rule yields: +" +# +in X +∂Fi +J˜kl +′ +in +˜ += γcin ρ (xi ) δik ul − 2 +J um . +in +Fin m im +∂ J˜kl + +(126) + +And for γ we have: +1 +∂Fi += (Fi + xi ). +∂γ +γ + +(127) + +G.3.2. S UPPLEMENTARY N UMERICAL R ESULTS +Figure 7. Comparison of AsymEP and VF on a feedforward network. Test accuracy on MNIST is shown as a function of training +epochs for a single-hidden-layer network with 20 neurons. Curves +report the mean and standard deviation over 10 runs. Best accuracies are 92.7% ± 0.5% (AsymEP) and 64.3% ± 2.0% (VF). + +Table 5 reports a worst-case control experiment where the +structural asymmetry is fixed at rstr = 0.7 while the input +scale ratio rin is varied. The experiment uses an all-to-all +architecture on MNIST (excluding direct input-to-output +connections). The output variance during extended inference (steps 30-50) confirms that the system successfully +learns to suppress oscillations even when rin is severely restricted. Any small residual oscillations can be mitigated by +time-averaging over the inference steps. + +G.5. Advantages of Non-Conservatives Dynamics +In Section 5.4, we compare three (non-)conservative dynamics under varying constraints. To further evaluate learning +speed, Table 6 reports network performance after a single epoch. These results confirm our earlier observation: +AsymEP learns faster than VF. + +Finally, rin can be interpreted as a measure of the external +signal magnitude relative to the internal recurrent dynamics. +These results indicate that the system remains capable of +learning and stabilizing even under a low external input +drive. Even when the input projection ∥J in ∥F is 100 times +smaller than the recurrent connections ∥J dyn ∥F , the network +still achieves 36.34 ± 6.25% accuracy, which is well above +chance level (∼ 10%). + +G.6. Feedforward CIFAR-10 Experiments +This appendix details the architecture and hyperparameters +of the deep feedforward experiments comparing backpropagation (BP), VF, AsymEP and Dyadic EP on CIFAR-10 +(see subsection 5.5) + +G.4. Feedforward Network +Architecture. We use a nine-layer convolutional network +(denoted CNN9). The first eight layers are convolutional +with 3 × 3 kernels and zero-padding; spatial downsampling +is performed by strided convolutions (stride 2 on layers 2, 4, +6, 8 and stride 1 otherwise), so no pooling is used. The channel widths follow the sequence 3 → 64 → 64 → 128 → +128 → 256 → 256 → 512 → 512, reducing the spatial +resolution from 32 × 32 to 2 × 2. The last layer is a fully +connected readout mapping the 512 × 2 × 2 feature map +to the 10 class logits. All hidden units use a ReLU nonlinearity. +p Weights are initialized with the Kaiming scheme +(σ = 2/fan-in) and biases at zero. + +G.4.1. L EARNING RULES +For clarity, we write the learning rules for VF and AsymEP +in a feedforward architecture with one hidden layer using +the notation of Section 5.3. For the input weights connecting +to the hidden layer, we get the usual formula: +in +∆Jik +∝ + +i +1 h +β +−β +0 +(hi − hi )ρ′ (hi )uk , +2β + +(128) + +while for the feedforward weights connecting the hidden to +the output layer, we get: +∆(Wh→o )ji ∝ + +i +1 h +β +0 +′ 0 +(oj − o−β +j )ρ (oj )ρ(hi ) . +2β + +Training setup. All methods are trained for 40 epochs +with batch size 64 and repeated over 5 seeds. Inputs are +normalized per channel and augmented with random 32 × +32 crops (padding 4), random horizontal flips and Cutout +(one 8 × 8 patch). Parameters are updated with SGD with +momentum 0.9, weight decay 5 × 10−4 and gradient-norm +clipping at 1, under a cosine learning-rate schedule decaying +from 3.5 × 10−2 to 2 × 10−4 . Test accuracy is reported +on an exponential moving average of the weights (decay + +(129) + +Note that EP is not applicable in this case. +G.4.2. S UPPLEMENTARY N UMERICAL R ESULTS +In addition to the final accuracy reported in Sec. 5.3, we +show in Fig. 7 the evolution of the accuracy over 20 epochs +for AsymEP and VF. +22 + +Equilibrium Propagation for Non-Conservative Systems +Table 5. Output variance and final test accuracy on MNIST (%) across different values of rin with rstr = 0.7. (mean ± std over 10 runs) +(500 hiddens, all-to-all, no input-output). + +rin +0.01 +0.10 +0.50 +1.00 + +Output variance +Untrained +Epoch 80 +(3.38 ± 0.90) × 10−4 +(2.33 ± 0.48) × 10−4 +(1.34 ± 0.32) × 10−5 +(6.27 ± 1.24) × 10−7 + +(5.22 ± 2.34) × 10−5 +(1.39 ± 0.17) × 10−4 +(1.06 ± 0.25) × 10−6 +(1.75 ± 0.50) × 10−8 + +Table 6. Test accuracy on Fashion-MNIST (%) at Epoch 1 (mean +± std 10 runs). The table compares three classes of network +dynamics: Continuous Hopfield (CH), Predictive Coding (PC), +and Standard dynamics. Each is evaluated under three connec⊤ +tivity structures: Asymmetric (Asym, Bk ̸= Wk+1 +), Symmet⊤ +ric/conservative (Sym, Bk = Wk+1 ), and Feedforward (Feedfor, +Bk = 0). +EP + +AsymEP + +Asym +74.91 ± 0.45 +Feedfor +74.36 ± 0.29 +Sym +74.57 ± 0.43 +Asym +77.83 ± 0.47 +PC +Sym +76.23 ± 0.39 +Asym +76.87 ± 0.51 +Standard +Feedfor +77.92 ± 0.51 +CH + +VF +48.98 ± 4.09 +48.84 ± 3.46 +57.75 ± 3.37 +61.50 ± 4.06 +63.98 ± 0.73 + +0.9995). The targets are smoothed (ε = 0.1), which for +the EP methods amounts to nudging toward the smoothed +one-hot vector. +Relaxation hyperparameters. The four methods differ +only in the gradient estimator: BP uses automatic differentiation, while the EP-based methods contrast stationary +states of the corresponding relaxation dynamics. VF uses +an integration step η = 1.0, nudging β = 0.1, and at most +K = 1000 relaxation steps with an early-stopping threshold +of 9 × 10−6 on the mean state update. Dyadic EP uses +the same settings except for a nudging strength β = 0.1. +AsymEP uses a smaller step η = 0.5, nudging β = 0.1, +and up to K = 250 relaxation steps with a threshold of +1 × 10−4 . + +23 + +Test Acc. (%) +Epoch 80 +36.34 ± 6.25 +90.54 ± 0.19 +94.96 ± 0.10 +96.30 ± 0.09 + +
\ No newline at end of file |
