ep_run/scurria_nonconservative.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
1455
1456
1457
1458
1459
1460
1461
1462
1463
1464
1465
1466
1467
1468
1469
1470
1471
1472
1473
1474
1475
1476
1477
1478
1479
1480
1481
1482
1483
1484
1485
1486
1487
1488
1489
1490
1491
1492
1493
1494
1495
1496
1497
1498
1499
1500
1501
1502
1503
1504
1505
1506
1507
1508
1509
1510
1511
1512
1513
1514
1515
1516
1517
1518
1519
1520
1521
1522
1523
1524
1525
1526
1527
1528
1529
1530
1531
1532
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
1548
1549
1550
1551
1552
1553
1554
1555
1556
1557
1558
1559
1560
1561
1562
1563
1564
1565
1566
1567
1568
1569
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
1596
1597
1598
1599
1600
1601
1602
1603
1604
1605
1606
1607
1608
1609
1610
1611
1612
1613
1614
1615
1616
1617
1618
1619
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
1695
1696
1697
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708

                                                           Equilibrium Propagation for Non-Conservative Systems


                                                Antonino Emanuele Scurria 1 Dimitri Vanden Abeele 1 Bortolo Matteo Mognetti 2 Serge Massar 1


                                                                  Abstract                                      from inference, the transmission of nonlocal error signals,
                                                                                                                and synchronous layer-wise computations with explicit gra-
                                             Equilibrium Propagation (EP) is a physics-
                                                                                                                dient storage. These constraints have no clear analog in
                                             inspired learning algorithm that uses stationary
                                                                                                                physical systems, making backpropagation challenging to
                                             states of a dynamical system both for inference
                                                                                                                implement in neuromorphic or analog hardware. Conse-
arXiv:2602.03670v2 [cs.LG] 1 Jun 2026


                                             and learning. In its original formulation it is
                                                                                                                quently, understanding how credit assignment can instead
                                             limited to conservative systems, i.e. to dynam-
                                                                                                                emerge from intrinsic system dynamics, through local inter-
                                             ics which derive from an energy function. Given
                                                                                                                actions and continuous relaxation, is a central question in
                                             their applications, it is important to extend EP
                                                                                                                neuroscience and machine learning.
                                             to non-conservative systems, i.e. systems with
                                             non-reciprocal interactions. Previous attempts to                  Equilibrium Propagation (EP) (Scellier &Bengio, 2017)
                                             generalize EP to such systems failed to compute                    represents one of the most promising advances in this direc-
                                             the exact gradient of the cost function. Here we                   tion. It formulates supervised learning as a contrast between
                                             propose a framework that extends EP to arbitrary                   two stationary states of a dynamical system: a ‘free’ phase
                                             non-conservative systems, including feedforward                    where the system evolves autonomously, and a ‘nudged’
                                             networks. We keep the key property of equilib-                     phase where outputs are weakly pushed toward their targets.
                                             rium propagation, namely the use of stationary                     The local change in neural states between these phases re-
                                             states both for inference and learning. However,                   covers the exact gradient of the cost function with respect to
                                             we modify the dynamics in the learning phase by                    parameters. This enables spatially local learning exploiting
                                             a term proportional to the non-reciprocal part of                  the continuous relaxation of the system without a distinct
                                             the interaction so as to obtain the exact gradient                 backward circuit or explicit weight transport.
                                             of the cost function. This algorithm can also be
                                                                                                                Since its introduction, several works have sought to im-
                                             derived using a variational formulation that gen-
                                                                                                                prove the practicality and biological realism of EP. Algo-
                                             erates the learning dynamics through an energy
                                                                                                                rithmic adaptations include enforcing temporal locality to
                                             function defined over an augmented state space.
                                                                                                                avoid state storage (Ernoult et al., 2020; Falk et al., 2025),
                                             Numerical experiments show that this algorithm
                                                                                                                deriving agnostic updates for black-box energies (Scel-
                                             achieves better performance and learns faster than
                                                                                                                lier et al., 2022), and substituting nudging with clamping
                                             previous proposals.
                                                                                                                (Stern et al., 2021). Theoretically, the framework has been
                                                                                                                extended to stochastic systems (Scellier &Bengio, 2017;
                                                                                                                Massar &Mognetti, 2025) and Lagrangian dynamics for
                                        1. Introduction                                                         time-varying inputs (Massar, 2025; Pourcel et al., 2025;
                                        Standard neural network optimization relies on error back-              Berneman &Hexner, 2025). In parallel, simulations have
                                        propagation, an algorithm whose computational mechanism                 explored suitable substrates, ranging from spiking (Mar-
                                        is difficult to reconcile with biological (Crick, 1989) and             tin et al., 2021; O’Connor et al., 2019) and resistive net-
                                        physical implementations (Indiveri &Liu, 2015). Specif-                 works (Kendall et al., 2020) to coupled oscillators (Wang
                                        ically, backpropagation requires a backward pass distinct               et al., 2024; Rageau &Grollier, 2025), as well as quantum
                                                                                                                systems (Wanjura &Marquardt, 2025; Massar &Mognetti,
                                            1
                                              Laboratoire d’Information Quantique (LIQ) CP224, Université       2025; Scellier, 2024). Experimental realizations have been
                                        libre de Bruxelles (ULB), Av. F. D. Roosevelt 50, 1050 Bruxelles,       demonstrated in memristor crossbars (Yi et al., 2023), self-
                                        Belgium 2 Interdisciplinary Center for Nonlinear Phenomena and
                                        Complex Systems CP231, Université libre de Bruxelles (ULB), Av.         adjusting electrical circuits (Dillavou et al., 2022; 2024),
                                        F. D. Roosevelt 50, 1050 Bruxelles, Belgium. Correspondence to:         elastic networks (Altman et al., 2024), and classical Ising
                                        Antonino Emanuele Scurria <antonino.scurria@ulb.be>.                    models trained on quantum annealers (Laydevant et al.,
                                                                                                                2024).
                                        Proceedings of the 43 rd International Conference on Machine
                                        Learning, Seoul, South Korea. PMLR 306, 2026. Copyright 2026            Despite these recent developments and the theoretical el-
                                        by the author(s).

                                                                                                            1
                                    Equilibrium Propagation for Non-Conservative Systems

egance of EP, its standard formulation remains restricted             a framework where the original dynamics serve for infer-
to conservative systems. In these systems, dynamics are               ence, while a new augmented dynamic is used to compute
derived from an energy function, which inherently enforces            gradients of the cost Eq. (2). In this augmented phase, the
symmetry (e.g., symmetric synaptic connections Jij = Jji )            output neurons are nudged towards their targets (as in stan-
through the action-reaction principle. This constraint pre-           dard EP), while a local corrective term – proportional to the
cludes the use of EP in a broad class of models characterized         antisymmetric part of the Jacobian at the free equilibrium
                                                                                       ∂
by non-conservative forces. This includes the feedforward             JF (x0 , θ, u) = ∂x F (x0 , θ, u) – is added to the forces. The
architectures dominant in modern AI, biological circuits,             exact gradients of the cost with respect to parameters are
as well as physical systems that reach stationary states far          then obtained by contrasting stationary states of the aug-
from thermodynamic equilibrium, such as nonlinear optical             mented system.
systems driven by external lasers (Cin et al., 2025), opto-
                                                                      Second, we introduce Dyadic EP, a ‘variational’ approach
electronic systems (Kalinin et al., 2025), exciton-polariton
                                                                      to learning in non-conservative systems. This method in-
condensates (Sajnok &Matuszewski, 2025), active meta-
                                                                      volves doubling the number of variables in the system’s
materials (Brandenbourger et al., 2019) and active colloids
                                                                      state space and subsequently introducing a new energy func-
(Bishop et al., 2023; Osat &Golestanian, 2023) (see (Bowick
                                                                      tion in this extended space. This approach takes advantage
et al., 2022) for a review).
                                                                      of the extended space to execute the positive and negative
Formally, we consider a dynamical system governed by                  nudging phases in parallel, recovering the same computa-
a non-reciprocal force field F (x, θ, u), which relaxes to a          tional cost as AsymEP. Derived from first principles, this
stationary configuration x0 satisfying:                               approach is inspired by established methods for mapping
                                                                      dissipative dynamical systems onto conservative ones by
                      F (x0 , θ, u) = 0,                   (1)        doubling the degrees of freedom (Bateman, 1931; Galley,
where x represents the state variables, θ the learnable param-        2013; Aykroyd et al., 2025). A more comprehensive study
eters and u the static input. Our goal, given a target y(u), is       of the theoretical framework and its application to feedfor-
to compute the gradient of the cost function C(x0 , y) at this        ward networks can be found in (Scurria, 2026). Our method
equilibrium,                                                          is related to the Dual Propagation algorithm (Høier et al.,
                          dC 0                                        2023; Høier &Zach, 2023; 2024) and constitutes an inde-
                              (x , y),                      (2)       pendent, first-principles generalization of Dyadic Learning
                          dθ
and update θ to minimize the cost.                                    (Nest &Høier; Høier et al., 2024)—previously limited to
                                                                      Hopfield networks—to arbitrary force fields.
Previous attempts to extend EP to non-conservative dynam-
ics include the Vector Field (VF) algorithm (Scellier et al.,         Third, we validate our framework on MNIST (LeCun, 1998),
2018). However, as noted by the authors, this method pro-             Fashion-MNIST, and CIFAR-10. In continuous Hopfield
vides an unbiased gradient of the cost Eq. (2) only in the            networks initialized with symmetric connection matrices,
conservative case. To mitigate this, (Laborieux &Zenke,               AsymEP achieves better accuracy and learns faster than
2024) proposed adding a penalty to keep the Jacobian close            EP and VF. Additionally, when we constrain the network
to symmetry, essentially forcing the system to be as con-             to have a strong degree of structural asymmetry, in which
servative as possible. Alternative methods related to VF,             case EP is inapplicable, AsymEP outperforms VF. Finally,
which similarly do not compute the exact gradient, were               when we restrict connections to a feedforward structure, our
proposed in (Farinha et al., 2020; Costa &Santos, 2025) and           algorithm effectively trains all parameters; in contrast, VF
for specific systems in simulation (Cin et al., 2025; Sajnok          is limited to training the last layer, acting essentially as an
&Matuszewski, 2025).                                                  Extreme Learning Machine (Huang et al., 2006; Wang et al.,
                                                                      2022) with poor performance.
Conversely, generalizations of backpropagation can handle
non-reciprocal forces and compute the exact gradient of               In summary, this theoretical work proposes two generaliza-
the cost Eq. (2) but inherit the same challenges in physical          tions of EP beyond conservative systems to arbitrary differ-
implementations. For instance, Backpropagation Through                entiable dynamics that compute in their stationary states.
Time (Werbos, 1990) unfolds the network in time to ap-
ply standard backpropagation, Recurrent Backpropagation               2. Equilibrium Propagation Overview
(Almeida, 1990; Pineda, 1987) avoids this memory require-
ment but still requires a specific circuit to propagate errors,       2.1. Conservative Systems
and the continuous Adjoint Method (Chen et al., 2018) addi-           We first review standard Equilibrium Propagation (EP)
tionally requires integrating the dynamics backward in time           (Scellier &Bengio, 2017). We consider a network described
which is not physically possible for a dissipative system.            by an energy function E(x, θ, u), such that the force field is
In this paper, we first propose Asymmetric EP (AsymEP),

                                                                  2
                                    Equilibrium Propagation for Non-Conservative Systems

derived from the potential E:                                        stationary point, i.e., that Eq. (7) holds. Second, EP implic-
                                                                                                                     ∂
                                                                     itly assumes that the Jacobian JE (x0 , u) = ∂x   FE (x0 , u) is
                                 ∂
              FE (x, θ, u) = −      E(x, θ, u).            (3)       invertible. In this work, we assume this condition holds and
                                 ∂x                                  will not state it explicitly hereafter. Third, for simplicity,
The objective is to compute the total gradient dC   0                we omit the dependency on the input u and target y in the
                                               dθ (x , y) of a
(quadratic) cost function C(x, y) evaluated at the minimum           following equations.
energy configuration of the system. This free equilibrium
denoted x0 (which depend implicitly in θ and u), satisfies           2.2. Vector Field
the stationarity condition:
                                                                     The Vector Field (VF) algorithm, introduced in (Scellier
                    ∂                                                et al., 2018), is an early attempt to adapt EP to non-
                   − E(x0 , θ, u) = 0.                     (4)       reciprocal forces. This method relies on the observation
                    ∂x
                                                                     that, for conservative systems, linearizing the right-hand
To compute gradients, we introduce the augmented energy              side of Eq. (9) around the equilibrium point x0 yields
functional:
                                                                                 ∂E(xβ , θ) ∂E(x−β , θ)
                                                                                                         
        ET (x, θ, β, u, y) = E(x, θ, u) + βC(x, y),        (5)              1
                                                                       lim                  −
                                                                       β→0 2β        ∂θ           ∂θ
where β is a scalar nudging parameter. The stationary config-                                         ⊤  β       (10)
                                                                                                          x − x−β
                                                                                      
                                                                                          ∂FE 0
uration of this augmented system is obtained by integrating                     = lim −       (x , θ)              ,
                                                                                  β→0      ∂θ                2β
the dynamics
                 dx    ∂ET (x, θ, β, u)                              where FE = −∂x E(x, θ) is the conservative force. It is
                    =−                  ,                  (6)       therefore tempting to use the right-hand side of Eq. (10) for
                 dt         ∂x
                                                                     parameter updates of non-conservative systems, for which
until the energy minimum is reached. This new fixed point            no energy function E exists.
xβ , called nudged equilibrium, satisfies:
                                                                     The VF algorithm adopts precisely this approach. It uses
             ∂E(xβ , θ, u)    ∂C(xβ , y)                             the nudged counterpart of Eq. (7),
                           +β            = 0.              (7)
                ∂x              ∂x
                                                                                                        ∂C β
                                                                                       F (xβ , θ) − β      (x ) = 0,             (11)
The training procedure, as improved in (Laborieux et al.,                                               ∂x
2021), uses two nudged phases with factors ±β (with
β ̸= 0). Starting from x0 , the system relaxes to two                in conjunction with the learning rule Eq. (10):
nearby perturbed equilibria, x+β and x−β . The displace-                                                  ⊤  β
                                                                                                              x − x−β
                                                                                                                     
ment x+β − x−β is then used to compute the parameter                                         ∂F 0
                                                                               ∆θ = ϵ           (x , θ)                 .        (12)
update in the learning rule:                                                                 ∂θ                  2β

              1 ∂E(xβ , θ, u) ∂E(x−β , θ, u)
                                              
  ∆θ = −ϵ                      −                  , (8)              However, as noted in (Scellier et al., 2018), Eq. (12) does
             2β         ∂θ             ∂θ                            not align with the true gradient dC      0
                                                                                                         dθ (x ) and is exact only if
where ϵ > 0 is the learning rate. The theoretical foundation         the force is conservative. To see this, let JF (x, θ) denote
of EP is the result that, in the limβ→0 of Eq. (8), we get:          the Jacobian of the vector field F (x, θ) (in components
                                                                     (JF (x, θ))ij = ∂F∂xi (x,θ)
                                                                                            j
                                                                                                 ). Differentiating the equilibrium
              dC(x0 , y)    d ∂E(xβ , θ, u)                                         0
                                                                     condition F (x , θ) = 0 with respect to θ gives
                         =                  ,              (9)
                dθ         dβ     ∂θ
                                                                                                  dx0   ∂F 0
see Appendix D.1. The error of the above method is O(β 2 ).                         JF (x0 , θ)       +    (x , θ) = 0.          (13)
This error can be further reduced using holomorphic equi-                                         dθ    ∂θ
librium propagation (Laborieux &Zenke, 2022).                        Consequently, the exact gradient of the cost is
Thus, EP recovers the exact gradient of the cost function                               ⊤
using only local computations. In this manner, learning              dC 0      dx0 ∂C 0
                                                                        (x ) =         (x )
implements gradient descent without an explicit backward             dθ        dθ ∂x
                                                                                             ⊤ 
pass, and credit assignment is realized through the system’s
                                                                                                                       
                                                                                  ∂F 0             ⊤ 0
                                                                                                             −1 ∂C 0
intrinsic relaxation dynamics.                                               =−      (x , θ)      JF (x , θ)        (x ) .
                                                                                  ∂θ                             ∂x
                                                                                |     {z      }|           {z           }
Three remarks can be made at this point. First, EP does not                              pre-synaptic            post-synaptic
require the system to be at an energy minimum, but only at a                                                                     (14)

                                                                 3
                                         Equilibrium Propagation for Non-Conservative Systems

The terms ’pre-synaptic’ and ’post-synaptic’ in Eq. (14)               Algorithm 1 Asymmetric EP (AsymEP)
are used by analogy with neuronal transmission: the pre-                1: Inputs: Force field F (x, θ), cost function C(x), nudg-
synaptic factor captures the local influence of θ on the force               ing parameter β, learning rate ϵ.
F , while the post-synaptic factor is the sensitivity of the            2: repeat
cost to state perturbations.                                            3:   1. Free Phase: Evolve to stationary state
If instead we differentiate the nudged equilibrium condition            4:      Evolve the system dynamics
in Eq. (11) with respect to β and evaluate at β = 0, we                 5:
                                                                                                    dx
obtain                                                                                                 = F (x, θ),                   (17)
                           β
                                                                                                    dt
                          dx             ∂C 0
            JF (x0 , θ)              −      (x ) = 0,       (15)        6:        until convergence to the stationary state x0 .
                          dβ   β=0       ∂x                             7:     2. Jacobian Decomposition
which gives                                                             8:        Compute the Jacobian at equilibrium:
                                                                        9:
          dxβ                 −1 ∂C 0                                                                     ∂F 0
                 = JF (x0 , θ)       (x , y).               (16)                               JF (x0 , θ) =   (x , θ),       (18)
          dβ β=0                  ∂x                                                                       ∂x
The right-hand side of Eq. (16) represents the effective post-         10:        and decompose it in its antisymmetric part:
                                                                       11:
synaptic term used by the VF algorithm (Eq. 12). Compar-
ing this with the exact post-synaptic term derived in Eq. (14),
                                                                                  AJ (x0 , θ) = 12 (JF (x0 , θ) − JF (x0 , θ)⊤ ). (19)
we see that they coincide only if JF = JF⊤ , i.e., only if the
system is conservative.                                                12:     3. Nudged Phase: Augmented Dynamics
Now, let SJ (x0 , θ) and AJ (x0 , θ) denote the symmetric              13:        Integrate the dynamics twice starting from x0
and antisymmetric parts of the Jacobian at the free (un-               14:
nudged) equilibrium, respectively. Then, we show in Ap-                        dx                 ∂C
pendix A that the gradient error increases with the spectral                       = F (x, θ) − β     (x) − 2AJ (x0 , θ) (x − x0 ),
                        −1                                                     dt                ∂x
radius of SJ (x0 , θ)        AJ (x0 , θ). Consequently, large                                                                 (20)
antisymmetric contributions degrade the gradient estima-               15:        until convergence to two new stationary states
tion, confirming empirical observations in the Appendix of                     x±β
                                                                                A .
(Ernoult et al., 2020). In fact, in the pathological limit where       16:     4. Parameter Update
the Jacobian would be purely antisymmetric SJ (x0 , θ) = 0,            17:        Update the parameters according to:
the update of VF gives the negative of the true gradient,              18:
maximizing the cost rather than minimizing it.                                                            ⊤                 !
                                                                                                                 xβA − x−β
                                                                                            
                                                                                                ∂F 0                    A
                                                                                   ∆θ = ϵ          (x , θ)                       .   (21)
                                                                                                ∂θ                   2β
3. Asymmetric EP
Here, we introduce Asymmetric EP (AsymEP), see Algo-                   19: until convergence of θ
rithm 1, which removes the gradient estimate error inherent            20: Output: Optimized parameters θ.
to VF by adding a local correction term to the augmented
inference dynamics. The new nudged equilibrium xβA satis-
fies:                                                                  where JFA (x, θ) is the Jacobian of the modified dynamical
                  ∂C β                                                 system Eq. (20). At the equilibrium point x0 , JFA is equal
  F (xβA , θ) − β   (x ) − 2AJ (x0 , θ) (xβA − x0 ) = 0, (22)          to the transpose of the original Jacobian:
                  ∂x A
As in VF, we then obtain two perturbed states x±β
                                               A for op-
                                                                                JFA (x0 , θ)    =   JF (x0 , θ) − 2AJ (x0 , θ)
posite nudging ±β and apply the contrastive learning rule                                       =   SJ (x0 , θ) − AJ (x0 , θ)
of Eq. (12).                                                                                    =   JF⊤ (x0 , θ).                    (24)
We now show that AsymEP gives rise to the correct learning
rule, i.e. that right-hand side of Eq. (21) is proportional to         where we have used the decomposition Eq. (44) of the orig-
the gradient of the cost function dC     0                             inal Jacobian J into its symmetric and antisymmetric com-
                                    dθ (x ) at the equilibrium
         0
point x (Eq. 14). To this end, note that the same reasoning            ponents. Therefore, the left hand side of Eq. (23) is equal to
leading to Eq. (16) leads to                                           the true post-synaptic term

           dxβA                   −1 ∂C 0                                         dxβA                   −1 ∂C 0
                   = JFA (x0 , θ)        (x ).              (23)                           = JF⊤ (x0 , θ)        (x ),               (25)
            dβ β=0                    ∂x                                            dβ β=0                    ∂x

                                                                   4
                                     Equilibrium Propagation for Non-Conservative Systems

which, using Eq. (14), proves the result. Additionally, al-            until a stationary point (z β , z ′β ) is reached. Upon conver-
though implied by the equality with the true gradient, we              gence, we follow the standard EP paradigm in using the
explicitly demonstrate the equivalence of the gradient esti-           difference z β − z ′β to compute the post-synaptic term. Un-
                                                                                                                   ′
                                                                                                                                 ′
mates obtained by AsymEP and Backpropagation Through                   der the change of variables m = z+z       2 and d = z − z , we
Time in Appendix B following (Ernoult et al., 2019).                   prove in Appendix D that m follows the original dynamics
                                                                       F (ensuring valid inference), while d relaxes to a "physical"
Note that the corrective term −2AJ (x0 , θ)(x − x0 ) in
                                                                       error signal proportional to the cost gradient.
Eq. (20) is spatially local: AJ vanishes for unconnected
neurons, and (x − x0 ) is available at the synapse given the           It is important to notice that while Dyadic EP introduces a
memory mechanism already required by Eq. (12). This                    distinct formulation, it remains consistent with the general
correction can create backward connections (Section 5.3).              theoretical setting of EP and matches the computational
However, in physical realizations, both feedforward and                cost of AsymEP. Note also that we start the evolution of
feedback connections must be physically present, though                the free phase (β = 0) with the identical initial condition
feedback may be deactivated during inference.                          for z and z ′ , (i.e., d = 0). This guarantees that integrat-
                                                                       ing Eq. (32) leads to a symmetric stationary point where
4. Dyadic EP                                                           z 0 = z ′0 . Finally, we underline that the modified varia-
                                                                       tional update rule in Eq. (34) is equivalent to the standard
We now introduce Dyadic EP (Algorithm 2), a variational                symmetric EP update rule in Eq. (8) (see Appendix D).
algorithm that computes the exact cost gradient in the limit
                                                                       Now, to make this concrete, consider a continuous Hopfield
of infinitesimal nudging. It maps the original n-variable
                                                                       network (see also Eq. (35)) with an asymmetric connection
dynamics F (x, θ) onto a 2n-variable system (z, z ′ ) defined
                                                                       matrix J. After some calculations (see Appendix F), the
by an energy H(z, z ′ , θ) and cost D(z, z ′ ). We show in
                                                                       augmented energy of the system can be re-expressed as:
Appendix E that AsymEP can be seen as the first-order
projection of Dyadic EP onto the original n-dimensional                             1                 1
                                                                       HT = − ρ(z)⊤ Sρ(z) + ρ(z ′ )⊤ Sρ(z ′ ) − ρ(z)⊤ Aρ(z ′ )
state space.                                                                        2                 2
                                                                                    1                    β
The new system is defined by the energy H and cost function                      + (∥z∥ − ∥z ∥ ) + (C(z, y) + C(z ′ , y)) ,
                                                                                           2      ′ 2
                                                                                    2                    2
D, given in terms of F and C by:                                                                                                  (29)
                                         
                                           z + z′
                                                                      where S and A are the symmetric and antisymmetric parts
      H(z, z ′ , θ) = −(z − z ′ )⊤ F              ,θ ,                 of J, respectively and ρ is an element-wise non-linearity.
                                             2
                                  ′
                                                                      An interesting analogy can be drawn with standard learning
                            z+z
        D(z, z ′ ) = C                 ,               (26)            rules in discrete Hopfield networks (Hopfield, 1982). For
                               2                                       a sequence of binary memories {ξ 1 , . . . , ξ m } where ξ µ ∈
where z, z ′ ∈ Rn . In order to learn, we introduce the aug-           {−1, 1}n , S P   corresponds to the standard autoassociative
mented energy                                                          Hebbian rule µ ξ µ (ξ µ )⊤ , creating stable attractors at each
                                                                       pattern. In contrast, A corresponds to the heteroassociative
        HT (z, z ′ , θ, β) = H(z, z ′ , θ) + βD(z, z ′ ).   (27)
                                                                       rule (e.g., a cycle between ξ µ and ξ ν given by ξ ν (ξ µ )⊤ −
The equilibrium configuration corresponds to a saddle point            ξ µ (ξ ν )⊤ ), encoding transitions between patterns.
of HT , where z minimizes and z ′ maximizes the energy.
                                                                       For this specific energy, the update rule given by Eq. (34)
This poses no issue for EP, which requires only that the
                                                                       can be re-expressed as:
joint state (z, z ′ ) reaches a stationary state. Although this
min-maximization can be interpreted as z evolving forward                          1                                         ⊤
                                                                                        ρ(z ′β ) − ρ(z β ) ρ(z ′β ) + ρ(z β ) . (30)
                                                                                                          
                                                                         ∆J ∝ −
and z ′ backward in time, in practice they evolve forward                         2β
simultaneously, as we integrate the coupled equations:                 In the limit β → 0, this gives:
                              z + z′
                                        
     dz       ∂HT                                                                                β
                                                                                                   !
         =−             =F            ,θ                                                       d
     dt                                                                               ∆J ∝           ⊙ ρ′ (m)ρ(m)⊤ .             (31)
              ∂z         ⊤     2
                                                                                                β
                 z − z′                    β ∂C z + z ′
                                                        
                             ∂F
           +                             −                 ,
                      2       ∂z z+z′      2 ∂z      2
                                    2                                  matching the learning rule in (Pineda, 1987), with
                                                                               β

    dz ′      ∂HT
                            
                              z + z′
                                                                      limβ→0 dβ being the error signal.
         =+         ′
                        =F            ,θ
    dt        ∂z                2
                         ′ ⊤
                                           β ∂C z + z ′
                                                       
                 z−z         ∂F                                        5. Numerical Experiments
           −                             +                 ,
                      2      ∂z ′ z+z′     2 ∂z ′    2
                                    2                                  In this section, we numerically validate AsymEP (Algo-
                                                           (28)        rithm 1). The neuronal dynamics follows the one introduced

                                                                   5
                                     Equilibrium Propagation for Non-Conservative Systems

Algorithm 2 Dyadic EP                                                 where ∥ · ∥F denotes the Frobenius norm. Note that this
 1: Inputs: Force field F (x, θ), cost function C(x, y),              metric does not capture the asymmetry of the Jacobian,
      nudging parameter β, learning rate ϵ                            which depends on the state x.
 2: repeat                                                            For numerical experiments, we restricted the network to a
 3:   1. Free Phase: Evolve to stationary state                       layered architecture with a single hidden layer to facilitate
 4:      Evolve the system dynamics, starting from identi-            comparison with prior work. Accordingly, J in contains
        cal initial conditions z(0) = z ′ (0) = z0 ,                  only input-to-hidden connections, while J dyn is block off-
 5:                                                                   diagonal, encoding bidirectional interactions between the
                  dz    ∂H             dz ′   ∂H
                     =−    ,                =+ ′,         (32)        hidden and output layers. Both J in and J dyn are trained.
                  dt    ∂z             dt     ∂z
                                                                      We first use MNIST (LeCun, 1998) (60k train, 10k test)
 6:        until stationary states z 0 , z ′0 are reached.
                                                                      followed by Fashion-MNIST to validate AsymEP, and then
 7:     2. Nudged Equilibrium
                                                                      we further validate AsymEP and Dyadic EP by comparing
 8:        Evolve the system dynamics, starting from the
                                                                      them to Backpropagation on a convolutional feedforward,
        solution of the free phase z 0 = z ′0 :
                                                                      with CIFAR-10. Inputs are normalized using min-max to
 9:
                 dz    ∂HT             dz ′    ∂HT                    [−1, 1] and targets are one-hot encoded in {−1, 1}. All
                    =−     ,                =+       ,    (33)        hyperparameters are detailed in Appendix G, along with
                 dt     ∂z             dt       ∂z ′
                                                                      additional details and numerical results.
10:        until two nudged stationary states z β , z ′β are
        reached.
                                                                      5.1. Symmetric Initialization
11:     3. Parameter Update
12:        Update the parameters according to:                        We start by comparing AsymEP with standard EP and
13:                                                                   VF. All algorithms are initialized with an identical sym-
                           1 ∂H(z β , z ′β , θ)
                                               
                   ∆θ = −ϵ                              (34)          metric matrix J dyn . EP maintains this symmetry through-
                           β          ∂θ                              out training, while VF and AsymEP induce asymmetry in
14: until convergence of θ                                            J dyn . Since EP and VF already achieve strong performance
15: Output: Optimized parameters θ.                                   on MNIST, the purpose of this experiment is to validate
                                                                      AsymEP and compare it against EP and VF rather than
                                                                      outperform the state of the art.
in (Scellier &Bengio, 2017), and is generalized to allow              Figure 1 compares the three algorithms as a function of
for non-reciprocal forces as in (Scellier et al., 2018). For          hidden-layer dimension after 1 and 20 epochs. AsymEP
clarity, we express the forces in a form that explicitly sepa-        consistently outperforms the baselines, suggesting it learns
rates the contributions of the external input and the recurrent       faster and better.
interactions:
                                                                      Figure 2 studies the evolution of the asymmetry ratio rstr .
          F (x) = ρ′ (x) ⊙ J in u + J dyn ρ(x) − x,
                                               
                                                           (35)       The results are reported for 50 hidden neurons. As expected,
                                                                      EP preserves the initial weight symmetry. In contrast, VF
where u ∈ RNin denotes the input and x ∈ RNdyn the neu-               and AsymEP induce non-trivial evolution of rstr following
ronal state, comprising both hidden and output units. The             two distinct patterns, resulting in three distinct network
matrices J in ∈ RNdyn ×Nin and J dyn ∈ RNdyn ×Ndyn define the         configurations. A complementary figure is available in Ap-
input and recurrent connectivity, respectively. The activation        pendix G.1.
function ρ(·) is taken to be the hyperbolic tangent, applied
element-wise.                                                         5.2. Fixed Asymmetry Ratio
If J dyn is symmetric, we can define the energy:                      While the previous section focused on networks compatible
            1       1                                                 with all three algorithms (EP, VF, AsymEP), we now turn
  E(x) =      ∥x∥2 − ρ(x)⊤ J dyn ρ(x) − ρ(x)⊤ J in u, (36)            to architectures with strong structural asymmetry. In this
            2       2
                                                                      regime, EP is inapplicable by construction, and, as we show,
which is identical to that of (Scellier &Bengio, 2017), pro-          VF performs poorly, contrary to AsymEP which remains
vided that the input neurons are activated as ρ(u).                   effective.
Equation (35) naturally motivates a quantitative measure of           To this end, we consider a class of networks where the
structural asymmetry rstr , defined as:                               asymmetry ratio rstr defined in Eq. (37) is kept fixed. Let S̃
                               ⊤                                      and Ã be arbitrary symmetric and antisymmetric matrices
                         ∥(J dyn − J dyn )/2∥F                        in RNdyn ×Ndyn respectively. We enforce a fixed rstr via the
                rstr =                         ,          (37)
                                ∥J dyn ∥F

                                                                  6
                                       Equilibrium Propagation for Non-Conservative Systems

                                                                            where γ ∈ R is a learnable global scale.
                                                                            Using VF and AsymEP, we train a layered network with one
                                                                            hidden layer of 50 neurons (in which case S̃ and Ã are block
                                                                            off-diagonal) for different values of rstr to investigate the
                                                                            impact of structural asymmetry. We compare two training
                                                                            regimes: training only the input weights J in (and the scale
                                                                            γ), versus training all parameters including J dyn . The first
                                                                            regime trains only the external forces from the input ρ′ (x) ⊙
                                                                            J in u (which correspond to a symmetric contribution in the
                                                                            Jacobian) applied to our non-conservative system, while
                                                                            the second additionally trains J dyn and therefore the non-
                   (a) Results after one epoch.                             symmetric part of the Jacobian directly.
                                                                            Figure 3 summarizes the results. We find that AsymEP
                                                                            maintains robust performance across all asymmetry levels
                                                                            (e.g., achieving an accuracy of 93.8 ± 0.4% at rstr = 0 and
                                                                            94.9 ± 0.2% at rstr = 0.875 when training all parameters)
                                                                            and can even learn when the recurrent connection matrix
                                                                            J dyn is completely antisymmetric (rstr = 1). Additionally,
                                                                            training all parameters shows significant improvement over
                                                                            training only J in .
                                                                            In contrast, VF performs well at low asymmetry ratios
                                                                            but degrades as asymmetry increases, eventually dropping
                                                                            to chance levels (e.g., accuracies of 5 ± 3% and 8 ± 4%
                   (b) Results after 20 epochs.                             at rstr = 1 for input-only and all-parameter training, re-
Figure 1. Comparison of algorithm performance on MNIST using                spectively). When only J in is trained, VF accuracy col-
a layered architecture with one hidden layer and symmetric ini-             lapses around rstr ≈ 0.5, whereas training all parameters
tialization. Squares denote AsymEP, circles EP, and triangles VF.           delays this collapse until rstr ≈ 0.8. Our analysis in Ap-
Test accuracy (averaged over 10 runs) is shown after one epoch
                                                                            pendix G.2.1 reveals that VF adjusts the dynamics such that
(Fig. 1a) and 20 epochs (Fig. 1b).
                                                                            the asymmetry of the Jacobian’s off-diagonal terms remains
                                                                            strictly lower than the structural asymmetry ratio. The train-
                                                                            ing appears to adjust the neuronal state such that neurons
                                                                            connected by strongly asymmetric weights have low activa-
                                                                            tion. As shown in Appendix G.2.1, AsymEP learns faster
                                                                            than VF across all levels of asymmetry.
                                                                            Finally, Appendix G.3 opens with a brief theoretical dis-
                                                                            cussion of the stability of these non-conservative dynamics,
                                                                            followed by simulations on all-to-all topologies with con-
                                                                            strained rstr and input projections J in . Even in this worst-
                                                                            case setting, AsymEP reduces oscillations and improves
                                                                            stability.

                                                                            5.3. Feedforward Architectures
Figure 2. Evolution of the asymmetry ratio rstr (defined in Eq. (37))       We now consider a purely feedforward architecture. Here
during training on MNIST for AsymEP, EP and VF, initialized
from a symmetric configuration. The models use 50 hidden neu-               VF trains only the last layer: with no backward connections,
rons.                                                                       the output nudging signal cannot reach earlier layers, so for
                                                                            every layer but the last the nudged stationary states coincide
                                                                            with the free states, giving zero weight updates. As only
following parameterization of the recurrent parameters:                     the output layer is trained, the system essentially becomes
                "q                               #                          an Extreme Learning Machine (Huang et al., 2006; Wang
         dyn              2     S̃           Ã                             et al., 2022). In contrast, AsymEP introduces a correction
        J =γ         1 − rstr       + rstr         ,  (38)                  that generates effective backward connections, allowing the
                              ∥S̃∥F        ∥Ã∥F

                                                                        7
                                      Equilibrium Propagation for Non-Conservative Systems

                                                                          tivity structures inspired by (Millidge et al., 2023), while
                                                                          keeping the number of trainable parameters fixed.
                                                                          Experiments are conducted on Fashion-MNIST using a two-
                                                                          hidden-layer network with hidden dimensions 500 and 200.
                                                                          Network states are denoted (x0 , x1 , x2 , x3 ), where x0 is
                                                                          the input and x3 = xL the output. Forward and backward
                                                                          connections are denoted by Wk and Bk , respectively, with
                                                                          W1 = J in .
                                                                          We consider three classes of dynamics. First, the Continuous
                                                                          Hopfield (CH) dynamics introduced previously:
                                                                          dxk                                                      
                                                                               = −xk +ρ′ (xk )⊙ Wk ρ(xk−1 )+(1−δk,L )Bk ρ(xk+1 ) .
                                                                           dt
                                                                                                                               (40)
                                                                          Second, Predictive Coding (PC) dynamics, defined through
                                                                          the prediction errors ek = xk − Wk ρ(xk−1 ), whose fixed
Figure 3. Impact of the structural asymmetry ratio rstr on accuracy
(top) and standard deviation over 10 runs (bottom) on MNIST.              point ek = 0 corresponds to a standard feedforward net-
We compare VF (orange) and AsymEP (blue) under two training               work:
regimes: training only J in (dashed) or all parameters (solid).
                                                                             dxk
                                                                                 = −ek + (1 − δk,L ) (ρ′ (xk ) ⊙ (Bk ek+1 )) .    (41)
                                                                              dt
nudging signal to influence all layers. We make this explicit
                                                                          Third, a standard dynamics chosen for direct comparison
for a network with one hidden layer.
                                                                          with backpropagation:
Let the state x be partitioned in hidden h and output o
                                                                           dxk
layers. The recurrent connection matrix is then J dyn =                       = −xk + Wk ρ(xk−1 ) + (1 − δk,L )Bk ρ(xk+1 ). (42)
                                                                            dt

    0     0
             . The forces of the system are:
  Wh→o 0                                                                  For each dynamics, we examine three connectivity scenar-
  β                                                                      ios.
   Fh = ρ′ (h) ⊙ J in u + λ(Wh→o )⊤ (o − o0 ) − h
                                             
 
 
                                               
                                               0
 
                                                                                                                    ⊤
   Fo = ρ′ (o) ⊙ Wh→o ρ(h) − λWh→o (h − h )                                  • In the asymmetric case (Bk ̸= Wk+1
  β
                                                                                                                       ), the backward
                                                     (39)
                                                                              weights Bk are randomly initialized and kept fixed
 
              ∂C                                                              while only the forward weights are trained, ensuring a
         + λβ      −o
 
 
               ∂o                                                              fair comparison (i.e., identical number of parameters);
where λ is 0 during the free inference and 1 during the                        in PC, the learning rule for Bk is zero when only inputs
nudged phase (Eq. 20). The force on the hidden layer Fhβ                       are clamped.
now depends on the output layer through the term ρ′ (h) ⊙                                                                  ⊤
         ⊤                                                                   • In the symmetric / conservative case (Bk = Wk+1 ), the
(Wh→o ) (o − o0 ), enabling the nudge (the term β ∂C ∂o ) to                   CH and PC dynamics derive from an energy functional,
influence the hidden layer. This implicitly assumes that the                   while the standard dynamics remains non-conservative
hardware implementation supports the physical activation                       due to its non-symmetric Jacobian.
of these backward connections.
                                                                             • In the feedforward case (Bk = 0), the PC and stan-
We validate this using a single hidden layer of only 20 neu-
                                                                               dard dynamics coincide; for the standard dynamics, the
rons on MNIST. After training, VF saturates with 64.3 ±
                                                                               AsymEP learning rule mirrors backpropagation, with
2.0% accuracy, whereas AsymEP reaches 92.7 ± 0.5% ac-
                                                                               ∆xβk = 2β1
                                                                                          (xβ − x−β ) acting as the propagated error
curacy. We expect this discrepancy to increase with network
                                                                               signal.
depth, since this increases the number of layers unable to
learn under VF. A figure with the accuracy during training
can be found in Appendix G.4.2.                                           Table 1 shows that AsymEP consistently outperforms VF
                                                                          in both asymmetric and feedforward settings, in final ac-
                                                                          curacy, learning speed, and stability. After a single epoch
5.4. Advantages of Non-Conservative Dynamics
                                                                          it already provides on average a 15% accuracy gain with
AsymEP is not tied to a specific neural dynamics. To further              an order-of-magnitude reduction in variance. Remarkably,
assess the benefits of training non-conservative dynamics                 AsymEP with asymmetric connectivity also surpasses EP
using AsymEP, we compare several dynamics and connec-                     on symmetric networks despite training only the forward

                                                                      8
                                    Equilibrium Propagation for Non-Conservative Systems

weights, suggesting that relaxing symmetry constraints may            6. Discussion and Conclusion
improve expressivity. Supplementary results are provided
in Appendix G.5.                                                      In this work, we extended Equilibrium Propagation (EP)
                                                                      to non-conservative systems that reach stationary states by
                                                                      deriving two mathematically equivalent algorithms that re-
                                                                      cover the exact gradient of the cost function in the limit of
Table 1. Test accuracy on Fashion-MNIST (%) at Epoch 50 (mean
± std 10 runs). BP on a standard feedforward architecture using
                                                                      infinitesimal nudging.
MSE and SGD achieve 87.37 ± 0.29%.                                    The first approach, Asymmetric EP, preserves the original
                                                                      inference dynamics. It introduces a corrective force during
                          EP          AsymEP            VF
                                                                      the nudged phase that remains spatially local, as the anti-
          Asym          -      86.78 ± 0.14        85.20 ± 0.12       symmetric Jacobian is null for unconnected neurons and the
 CH       Feedfor       -      86.05 ± 0.12        77.76 ± 0.37
          Sym     84.30 ± 0.13       -                   -
                                                                      perturbation from equilibrium is available at the synapse
          Asym          -      86.20 ± 0.17        80.71 ± 6.17       level. Unlike standard methods like Recurrent Backpropa-
 PC                                                                   gation (Almeida, 1990; Pineda, 1987), this avoids explicit
          Sym     84.78 ± 0.14       -                   -
          Asym          -      82.91 ± 0.48        75.52 ± 1.69       digital weight transposition. However, a physical mech-
 Standard
          Feedfor       -      86.25 ± 0.16        78.58 ± 0.28       anism to obtain the local corrective force at the synapse
                                                                      level remains a subject for future work. We also note that
                                                                      AsymEP shares the temporal non-locality of standard EP.
Finally, to investigate how AsymEP scales with depth, we              The second approach, Dyadic EP, doubles the state space
trained deeper fully connected networks with two and three            to map non-reciprocal dynamics onto an energy land-
hidden layers of 500 neurons on Fashion-MNIST, reaching               scape—conceptually reminiscent of multi-compartment cor-
86.41 ± 0.22% and 87.8 ± 0.15% test accuracy respectively.            tical neurons, where apical dendrites integrate feedback
                                                                      (analogous to z − z ′ ) separately from basal feedforward
5.5. Feedforward Training on CIFAR-10: BP vs. Dyadic                  input (analogous to z + z ′ ) (Guerguiev et al., 2017). Addi-
     EP vs. AsymEP                                                    tionally, this expanded space also enables the positive and
                                                                      negative nudging phases to run in parallel. This offers a
To test whether our framework scales beyond shallow net-              pathway to implement a version of EP that is local in time,
works, we conclude with a deep, purely feedforward CNN                but would require a doubling of the degrees of freedom
architecture trained on CIFAR-10. We compare backprop-                on the physical hardware. More fundamentally, the energy
agation (BP), VF, AsymEP and Dyadic EP in a controlled                defined on the extended state shows that the tools and the-
setting where the gradient estimator is the only difference           oretical guarantees obtained for EP should also apply to
between runs: all methods share the same configuration,               the case of non-reciprocal forces, and that the variational
with the BP gradient replaced by the contrast of stationary           principle behind EP is universal in the sense that it can be
states for the EP-based methods (see App. G.6 for details).           applied to all networks which operate in a stationary state.
Each configuration is trained for 40 epochs over 5 seeds.
                                                                      Furthermore, Dyadic EP is not restricted to the EP com-
Table 2 reports the final test accuracy. Both of our algo-            munity and could suggest a more physically plausible al-
rithms scale to this regime, closely tracking the BP baseline         ternative to the stationary-state Adjoint Method (for fixed
throughout training and matching its final accuracy: a paired         inputs) (Chen et al., 2018): by solving the forward and ad-
t-test finds no significant difference between Dyadic EP and          joint equations simultaneously via relaxation, it circumvents
BP (p = 0.75), and only a sub-percent gap for AsymEP.                 a separate backward-in-time pass.
In contrast, VF makes slight initial progress (peaking near
30%) before collapsing to chance level (10%). Additional              Finally, our experiments on MNIST, Fashion-MNIST, and
details can be found in Appendix G.6                                  CIFAR-10 confirm that AsymEP and Dyadic EP consis-
                                                                      tently outperform EP and VF, and notably enables effective
                                                                      training of feedforward networks.
                                                                      Our work thus opens new avenues for learning in neuro-
Table 2. Test accuracy on CIFAR-10 (%) at epoch 40 (mean ± std
over 5 seeds).                                                        morphic hardware, dissipative physical systems, and neural
                                                                      architectures where asymmetry is intrinsic rather than inci-
              Method              Test Acc. (%)                       dental.
              Backpropagation     90.66 ± 0.25
              Dyadic EP           90.69 ± 0.14
              AsymEP              89.74 ± 0.14
              VF                  10.00 ± 0.00


                                                                  9
                                   Equilibrium Propagation for Non-Conservative Systems

Impact Statement                                                     References
This paper presents results that advance the field of machine        Almeida, L. B. A learning rule for asynchronous percep-
learning. There are many potential societal consequences               trons with feedback in a combinatorial environment. In
of our work, none of which we feel must be specifically                Artificial neural networks: concept learning, pp. 102–111.
highlighted here.                                                      1990.
                                                                     Altman, L. E., Stern, M., Liu, A. J., and Durian, D. J. Ex-
Acknowledgments                                                        perimental demonstration of coupled learning in elastic
                                                                       networks. Physical Review Applied, 22(2):024053, 2024.
AES is fully funded by the Horizon Europe Marie
Skłodowska-Curie Doctoral Network ’Postdigital Plus’                 Aykroyd, C., Bourgoin, A., and Poncin-Lafitte, C. L. Hamil-
(Grant 101169118). DVA acknowledges the support of                     tonian treatment of non-conservative systems. arXiv
the French Community of Belgium through a FRIA fellow-                 preprint arXiv:2507.18658, 2025.
ship. SM acknowledges financial support by the Fonds de la
                                                                     Bateman, H. On dissipative systems and related variational
Recherche Scientifique–FNRS, Belgium under EOS Project
                                                                       principles. Physical Review, 38(4):815, 1931.
No. 40007536. Computational resources have been pro-
vided by the Consortium des Équipements de Calcul Intensif           Berneman, M. and Hexner, D. Equilibrium propagation for
(CÉCI), funded by the Fonds de la Recherche Scientifique               dissipative dynamics. Advanced Intelligent Systems, pp.
de Belgique (F.R.S.-FNRS) under Grant No. 2.5020.11 and                e202501310, 2025.
by the Walloon Region.
                                                                     Bishop, K. J., Biswal, S. L., and Bharti, B. Active colloids
                                                                       as models, materials, and machines. Annual Review of
                                                                       Chemical and Biomolecular Engineering, 14(1):1–30,
                     “ἁρμονίη ἀφανὴς φανερῆς κρείττων”
                                                                       2023.
                                                                     Bowick, M. J., Fakhri, N., Marchetti, M. C., and Ra-
                                                                       maswamy, S. Symmetry, thermodynamics, and topology
                                                                       in active matter. Physical Review X, 12(1):010501, 2022.
                                                                     Brandenbourger, M., Locsin, X., Lerner, E., and Coulais, C.
                                                                       Non-reciprocal robotic metamaterials. Nature communi-
                                                                       cations, 10(1):4608, 2019.
                                                                     Cesa-Bianchi, N. and Lugosi, G. Prediction, learning, and
                                                                       games. Cambridge university press, 2006.
                                                                     Chen, R. T., Rubanova, Y., Bettencourt, J., and Duvenaud,
                                                                       D. K. Neural ordinary differential equations. Advances
                                                                       in neural information processing systems, 31, 2018.
                                                                     Cin, N. D., Marquardt, F., and Wanjura, C. C. Training
                                                                       nonlinear optical neural networks with scattering back-
                                                                       propagation. arXiv preprint arXiv:2508.11750, 2025.
                                                                     Costa, P. and Santos, P. A. Directed equilibrium propagation
                                                                       revisited. Mathematics, 13(11), 2025. ISSN 2227-7390.
                                                                     Crick, F. The recent excitement about neural networks.
                                                                       Nature, 337, 1989.
                                                                     Dillavou, S., Stern, M., Liu, A. J., and Durian, D. J. Demon-
                                                                       stration of decentralized physics-driven learning. Physi-
                                                                       cal Review Applied, 18(1):014040, 2022.
                                                                     Dillavou, S., Beyer, B. D., Stern, M., Liu, A. J., Miskin,
                                                                       M. Z., and Durian, D. J. Machine learning without a pro-
                                                                       cessor: Emergent learning in a nonlinear analog network.
                                                                       Proceedings of the National Academy of Sciences, 121
                                                                       (28):e2319718121, 2024.

                                                                10
                                   Equilibrium Propagation for Non-Conservative Systems

Ernoult, M., Grollier, J., Querlioz, D., Bengio, Y., and Scel-        Indiveri, G. and Liu, S.-C. Memory and information pro-
  lier, B. Updates of equilibrium prop match gradients                  cessing in neuromorphic systems. Proceedings of the
  of backprop through time in an rnn with static input.                 IEEE, 103(8):1379–1397, 2015.
  Advances in neural information processing systems, 32,
  2019.                                                               Kalinin, K. P., Gladrow, J., Chu, J., Clegg, J. H., Cletheroe,
                                                                        D., Kelly, D. J., Rahmani, B., Brennan, G., Canakci, B.,
Ernoult, M., Grollier, J., Querlioz, D., Bengio, Y., and Scel-          Falck, F., et al. Analog optical computer for ai inference
  lier, B. Equilibrium propagation with continual weight                and combinatorial optimization. Nature, 645(8080):354–
  updates. arXiv preprint arXiv:2005.04168, 2020.                       361, 2025.
Falk, M. J., Strupp, A. T., Scellier, B., and Murugan,                Kendall, J., Pantone, R., Manickavasagam, K., Bengio,
  A. Temporal contrastive learning through implicit non-                Y., and Scellier, B. Training end-to-end analog neural
  equilibrium memory. Nature Communications, (16),                      networks with equilibrium propagation. arXiv preprint
  2025.                                                                 arXiv:2006.01981, 2020.
Farinha, M. T., Pequito, S., Santos, P. A., and Figueiredo,
                                                                      Laborieux, A. and Zenke, F. Holomorphic equilibrium
  M. A. T. Equilibrium propagation for complete directed
                                                                        propagation computes exact gradients through finite size
  neural networks. In Proceedings of the 28th European
                                                                        oscillations. Advances in Neural Information Processing
  Symposium on Artificial Neural Networks, Computational
                                                                        Systems, 35:12950–12963, 2022.
  Intelligence and Machine Learning (ESANN 2020), 2020.
Galley, C. R. Classical mechanics of nonconservative sys-             Laborieux, A. and Zenke, F. Improving equilibrium propa-
  tems. Physical review letters, 110(17):174301, 2013.                  gation without weight symmetry through jacobian home-
                                                                        ostasis. In Proceedings of the International Confer-
Guerguiev, J., Lillicrap, T. P., and Richards, B. A. Towards            ence on Learning Representations (ICLR) 2024, Virtual
  deep learning with segregated dendrites. elife, 6:e22901,             (ICLR), May 2024.
  2017.
                                                                      Laborieux, A., Ernoult, M., Scellier, B., Bengio, Y., Grollier,
Høier, R. and Zach, C. A lagrangian perspective on dual                 J., and Querlioz, D. Scaling equilibrium propagation to
  propagation. In Proceedings of the First Workshop on Ma-              deep convnets by drastically reducing its gradient estima-
  chine Learning with New Compute Paradigms at NeurIPS                  tor bias. Frontiers in neuroscience, 15:633674, 2021.
 2023, New Orleans, LA, USA, Dec 2023.
                                                                      Laydevant, J., Marković, D., and Grollier, J. Training an
Høier, R. and Zach, C. Two tales of single-phase contrastive
                                                                        ising machine with equilibrium propagation. Nature Com-
  hebbian learning. In Salakhutdinov, R., Kolter, Z., Heller,
                                                                        munications, 15(1):3671, 2024.
  K., Weller, A., Oliver, N., Scarlett, J., and Berkenkamp, F.
  (eds.), Proceedings of the 41st International Conference            LeCun, Y. The mnist database of handwritten digits.
  on Machine Learning, volume 235 of Proceedings of                     http://yann. lecun. com/exdb/mnist/, 1998.
  Machine Learning Research, pp. 18470–18488. PMLR,
  21–27 Jul 2024.                                                     Martin, E., Ernoult, M., Laydevant, J., Li, S., Querlioz, D.,
                                                                       Petrisor, T., and Grollier, J. Eqspike: spike-driven equi-
Høier, R., Staudt, D., and Zach, C. Dual propagation: accel-
                                                                       librium propagation for neuromorphic implementations.
  erating contrastive hebbian learning with dyadic neurons.
                                                                       Iscience, 24(3), 2021.
  In Proceedings of the 40th International Conference on
  Machine Learning, ICML’23. JMLR.org, 2023.                          Massar, S. Equilibrium propagation for learning in la-
Høier, R., Kalinin, K., Ernoult, M., and Zach, C. Dyadic               grangian dynamical systems. Physical Review E, 112
  learning in recurrent and feedforward models. In NeurIPS             (3):035304, 2025.
 2024 Workshop Machine Learning with new Compute                      Massar, S. and Mognetti, B. M. Equilibrium propagation:
 Paradigms, 2024.                                                      the quantum and the thermal cases. Quantum Studies:
Hopfield, J. J. Neural networks and physical systems with              Mathematics and Foundations, 12(1):6, 2025.
  emergent collective computational abilities. Proceedings
                                                                      Millidge, B., Song, Y., Salvatori, T., Lukasiewicz, T., and
  of the national academy of sciences, 79(8):2554–2558,
                                                                       Bogacz, R. Backpropagation at the infinitesimal infer-
 1982.
                                                                       ence limit of energy-based models: Unifying predictive
Huang, G.-B., Zhu, Q.-Y., and Siew, C.-K. Extreme learning             coding, equilibrium propagation, and contrastive hebbian
  machine: theory and applications. Neurocomputing, 70                 learning. In International Conference on Learning Rep-
  (1-3):489–501, 2006.                                                 resentations (ICLR), 2023.

                                                                 11
                                    Equilibrium Propagation for Non-Conservative Systems

Nest, T. and Høier, R. Dyadic learning in asymmetric                   Wang, Q., Wanjura, C. C., and Marquardt, F. Training
  convnets. In New Frontiers in Associative Memories-                   coupled phase oscillators as a neuromorphic platform
 Workshop at ICLR 2026.                                                 using equilibrium propagation. Neuromorphic Computing
                                                                        and Engineering, 4(3):034014, 2024.
Osat, S. and Golestanian, R. Non-reciprocal multifarious
  self-organization. Nature Nanotechnology, 18(1):79–85,               Wanjura, C. C. and Marquardt, F. Quantum equilibrium
  2023.                                                                 propagation for efficient training of quantum systems
                                                                        based on onsager reciprocity. Nature Communications,
O’Connor, P., Gavves, E., and Welling, M. Training a spik-              16(1):6595, 2025.
  ing neural network with equilibrium propagation. In The
  22nd international conference on artificial intelligence             Werbos, P. J. Backpropagation through time: what it does
  and statistics, pp. 1516–1523. PMLR, 2019.                            and how to do it. Proceedings of the IEEE, 78(10):1550–
                                                                        1560, 1990.
Pineda, F. Generalization of back propagation to recurrent
  and higher order neural networks. In Neural information              Yi, S.-i., Kendall, J. D., Williams, R. S., and Kumar, S.
  processing systems, 1987.                                              Activity-difference training of deep neural networks using
                                                                         memristor crossbars. Nature Electronics, 6(1):45–51,
Pourcel, G., Basu, D., Ernoult, M., and Gilra, A. Lagrangian-            2023.
  based equilibrium propagation: generalisation to arbi-
  trary boundary conditions & equivalence with hamilto-
  nian echo learning. arXiv preprint arXiv:2506.06248,
  2025.

Rageau, T. and Grollier, J. Training and synchronizing
  oscillator networks with equilibrium propagation. Neuro-
  morphic Computing and Engineering, 2025.

Sajnok, K. and Matuszewski, M. Near-equilibrium propaga-
  tion training in nonlinear wave systems. arXiv preprint
  arXiv:2510.16084, 2025.

Scellier, B. Quantum equilibrium propagation: Gradient-
  descent training of quantum systems. arXiv preprint
  arXiv:2406.00879, 2024.

Scellier, B. and Bengio, Y. Equilibrium propagation: Bridg-
  ing the gap between energy-based models and backprop-
  agation. Frontiers in computational neuroscience, 11:24,
  2017.

Scellier, B., Goyal, A., Binas, J., Mesnard, T., and Bengio,
  Y. Generalization of equilibrium propagation to vector
  field dynamics. arXiv preprint arXiv:1808.04873, 2018.

Scellier, B., Mishra, S., Bengio, Y., and Ollivier, Y. Agnostic
  physics-driven deep learning. arXiv:2205.15021v1, 2022.

Scurria, A. E. A physical theory of backpropagation: Exact
  gradients from the least-action principle. 2026.

Stern, M., Hexner, D., Rocks, J. W., and Liu, A. J. Su-
  pervised learning in physical networks: From machine
  learning to learning machines. Physical Review X, 11(2):
  021045, 2021.

Wang, J., Lu, S., Wang, S.-H., and Zhang, Y.-D. A review
 on extreme learning machine. Multimedia Tools and
 Applications, 81(29):41611–41660, 2022.

                                                                  12
                                       Equilibrium Propagation for Non-Conservative Systems

A. Gradient Estimation Error in VF                                           where s denotes the dynamical state of the system. This
                                                                             symmetry is the linchpin of the equivalence proof, as the
In this appendix, we quantify the gradient estimation error                  gradient expressions derived for BPTT and standard EP
introduced by VF in the limit where the Jacobian asymmetry                   differ precisely by a transpose operation applied to ∂F
                                                                                                                                  ∂s .
is small.
                                                                             This observation aligns with our analysis in the main text:
Comparing the post-synaptic update terms in Eqs. (12) and                    VF fails in non-conservative systems due to the missing
(14) gives the following error in the gradient of the cost:                  transpose in the post-synaptic term (see Eq. (16)). Following
                       ⊤                                                   the derivation in Ernoult et al. (2019) (viz., Appendix A, Eqs.
            ∂F 0                                                             (31–33)), the recursive relations for the gradients in BPTT
  Error = −     (x , θ)
             ∂θ                                                              are given by:
             −1                −1  ∂C 0
× JF (x0 , θ)     − JF⊤ (x0 , θ)          (x , y), (43)                                                           ∂ℓ
                                       ∂x                                                         ∇BPTT
                                                                                                   s    (0) =        (s⋆ , y),              (49)
                                                                                                                  ∂s
To quantify this error, we decompose the Jacobian JF (x, θ)                  and for all t = 1, . . . , K,
into its symmetric part SJ (x, θ) and antisymmetric part
                                                                                                                    ⊤
                                                                                                    ∂F
           SJ (x, θ) = 12 JF (x, θ) + JF⊤ (x, θ) ,                               ∇BPTT                                    ∇BPTT (t − 1),
                                                
                                                                                  s    (t) =           (x, s⋆ , θ)         s                (50)
                                                      (44)                                          ∂s
           AJ (x, θ) = 12 JF (x, θ) − JF⊤ (x, θ) .
                                                
                                                                                                                    ⊤
                                                                                                    ∂F
                                                                                 ∇BPTT
                                                                                  θ    (t) =           (x, s⋆ , θ)        ∇BPTT
                                                                                                                           s    (t − 1),    (51)
Assuming the asymmetry AJ (x, θ) is small, we can make                                              ∂θ
a series expansion in SJ−1 AJ (omitting the dependencies                     where θ represents the optimization parameters, ℓ is the
for clarity). Applying the Neumann expansion for small                       cost function, s⋆ is the free equilibrium state (satisfying
∥SJ−1 AJ ∥ gives                                                             F (s⋆ ) = 0), y is the target, and x is the input. The index t
                       ∞
                                                   !                         denotes the unrolled time steps, initialized at s(0) = s⋆ .
                       X
        (JF ) −1
                   =       (−1)  n
                                     (SJ−1 AJ )n        SJ−1 ,   (45)        In contrast, the gradients computed by VF follow the recur-
                       n=0                                                   sion (viz., Ernoult et al. (2019), Appendix A, Eqs. (24–26)):
                        ∞
                                           !
                       X
        (JF⊤ )−1 =           (SJ−1 AJ )n       SJ−1 .            (46)                                             ∂ℓ
                                                                                                  ∆EP
                                                                                                   s (0) = −         (s⋆ , y),              (52)
                       n=0                                                                                        ∂s
Subtracting the two series and assuming convergence, we                      and for all t ≥ 0,
finally obtain
                                                                                                     ∂F
                                                                                       ∆EP
                                                                                        s (t + 1) =     (x, s⋆ , θ) ∆EPs (t),               (53)
                              ∞                                                                      ∂s
                                                !
                              X
                                    −1
                                          2n+1
        −1      ⊤ −1
  (JF ) − (JF ) = −2               SJ AJ          SJ−1 .                                             
                                                                                                       ∂F
                                                                                                                       ⊤
                                 n=0                                                   ∆EP
                                                                                        θ  (t + 1) =      (x, s ⋆ , θ)    ∆EP
                                                                                                                            s (t).          (54)
                                                                 (47)                                  ∂θ
                                                                             Comparing these two sets of equations confirms that the only
B. Equivalence between AsymEP and BPTT                                       difference are Eqs. (50) and (53), specifically the transpose
                                                                             of the Jacobian ∂F
                                                                                              ∂s (ignoring the global sign difference in
In this appendix, we sketch the equivalence between the                      Eqs. (49) and (52)).
gradient estimate computed by AsymEP and Backpropaga-
tion Through Time (BPTT) (Werbos, 1990) for a Recurrent                      In AsymEP, we modify the dynamics by adding a correction
Neural Network with fixed inputs. Our derivation relies on                   term dependent on the antisymmetric part of the Jacobian.
the proof provided by Ernoult et al. (2019), which estab-                    Denoting the force of this augmented system by F A , the
lished that standard (conservative) EP computes gradients                    Jacobian at the free equilibrium satisfies:
identical to those of BPTT. To facilitate direct comparison,                                                                       ⊤
                                                                                         ∂F A
                                                                                                              
we adopt their notation for this section.                                                                         ∂F
                                                                                              (x, s⋆ , θ) =          (x, s⋆ , θ)        .   (55)
                                                                                          ∂s                      ∂s
The proof provided by Ernoult et al. (2019) relies on the
assumption that the vector field F (i.e., transition function)               By substituting this corrected Jacobian into the recursive
is derived from a scalar potential function, which implies                   relations, AsymEP recovers the exact transpose required
that                                                                         by BPTT. Consequently, our method extends the equiva-
                                 ⊤
                     ∂F        ∂F                                            lence between EP and BPTT to the general case of non-
                         =             ,                  (48)               conservative force.
                     ∂s        ∂s

                                                                        13
                                    Equilibrium Propagation for Non-Conservative Systems

C. Out-of-Equilibrium Mechanics                                         C.3. Symmetry Breaking as Credit Assignment
Here we sketch the physical picture behind the doubled-                 On the diagonal manifold z = z ′ the doubled system enjoys
energy construction of Eq. (26). The full derivation from               a gauge symmetry: the auxiliary variable z ′ is redundant and
Hamilton’s least-action principle, together with its connec-            the difference d is identically zero. Credit assignment is im-
tion to the Bateman–Galley formalism for non-conservative               plemented by deliberately breaking this symmetry through
classical mechanics (Bateman, 1931; Galley, 2013; Aykroyd               the task cost. Adding βD(z, z ′ ) = β C(m) to H exerts
et al., 2025), can be found in (Scurria, 2026).                         opposite forces on z and z ′ and drives them apart, so that
                                                                        the difference d ceases to be redundant and begins to carry
C.1. The Helmholtz Obstruction                                          information about the loss landscape.

The natural physical route to a variational principle for a
dynamical system ẋ = F (x, θ) is to seek a scalar potential            D. Proofs for Dyadic EP
E such that F = −∂x E. The classical Helmholtz integra-                 We now demonstrate that Dyadic EP correctly trains the
bility condition states that such an E exists if and only if the        parameters θ of the original force field F (x, θ), giving the
Jacobian JF is symmetric everywhere. Whenever the inter-                                    0
                                                                        exact gradient dC(x̄
                                                                                         dθ
                                                                                              )
                                                                                                in the limit of infinitesimal nudging.
actions are non-reciprocal — as in feedforward networks,
active matter, or driven optical systems — JF acquires
a non-zero antisymmetric part and the Helmholtz condi-                  D.1. Proof of EP
tion fails identically. No scalar potential on the original             First, recall that standard EP does not strictly require the
n-dimensional state space can then generate the dynamics,               system to settle at an energy minimum; it requires only that
and the “energy minimisation” route at the heart of standard            the system reaches a stationary state (a fixed point of the
EP is blocked at the structural level. The obstruction is not           dynamics). Indeed, using the notation of Section 2.1, EP
a matter of computational convenience: it reflects the fact             relies on the key identity:
that the rotational component of F carries information that
no scalar function of x alone can record.                                           d2                 d2
                                                                                        ET (xβ , θ) =      ET (xβ , θ).            (57)
                                                                                   dθdβ               dβdθ
C.2. Variational Reconstruction on a Doubled Space                      Expanding the total derivative with respect to β gives:
Applying the Bateman–Galley formalism circumvents this                                                         ⊤
                                                                                                ∂ET (xβ , θ)        dxβ   ∂ET (xβ , θ)
                                                                                            
obstruction by enlarging the configuration space. The single              d
                                                                            ET (xβ , θ) =                               +
state x ∈ Rn is replaced by a conjugate pair (z, z ′ ) ∈ R2n ,           dβ                        ∂x               dβ       ∂β
and the rotational component of F — which has no scalar                                  = C(xβ ).                                 (58)
generator on the original n-dimensional space — is ab-
sorbed into a bilinear coupling between z and z ′ on the                Where the first term vanishes because the system is at a
                                                                                                ∂
doubled space, where it does admit a variational descrip-               stationary state, i.e., ∂x ET (xβ , θ) = 0; this holds even if
tion. The physical motion is recovered on the diagonal                  the system is not at a minimum of ET . Similarly, for the
submanifold z = z ′ (the so called ’physical limit’), while             derivative with respect to θ:
the off-diagonal direction d = z − z ′ supplies the additional
                                                                                       d                ∂ET (xβ , θ)
degree of freedom needed to encode non-reciprocity.                                       ET (xβ , θ) =              ,             (59)
                                                                                       dθ                  ∂θ
Specializing this reconstruction to the overdamped (first-
                                                                        where we additionally assume that the cost function does
order) regime relevant to relaxational neural dynamics yields
                                                                        not depend explicitly on the parameters θ. Substituting these
the bilinear energy
                                                                        results into Eq. (57) in the limit of infinitesimal nudging
                                                                        (β → 0) recovers the fundamental relation given by Eq. (9).
                                             z + z′
                                                     
        H(z, z ′ , θ) = −(z − z ′ )⊤ F              ,θ ,   (56)
                                               2
                                                                        D.2. Proof of Dyadic EP
which is precisely Eq. (26). The symmetric midpoint m =                 We analyze now the stationary states of Dyadic EP by intro-
(z + z ′ )/2 plays the role of the physical coordinate of the           ducing the change of variables:
doubled system, while d is the auxiliary direction along
which non-reciprocity is stored. On the submanifold z = z ′                                  z + z′
                                                                                       m=           ,          d = z − z′.         (60)
the coupling proportional to (z − z ′ ) vanishes identically                                   2
and both states evolve under the original field F , so the              In these coordinates, the augmented energy HT becomes
doubling leaves the on-shell physics unchanged. We refer
the reader to (Scurria, 2026) for the full construction.                        HT (m, d, θ, β) = −d⊤ F (m, θ) + βC(m)             (61)

                                                                   14
                                   Equilibrium Propagation for Non-Conservative Systems

and the dynamics in Eq. (28) can be rewritten as:                          In Dyadic EP, we instead employ the single-phase update:

                                                                                                 1 ∂H(z β , z ′β , θ)
                                                                                                                     
    dm     ∂HT
        =−     = F (m, θ),                                    (62)                       ∆θ ∝ −                                (70)
    dt      ∂d                                                                                   β         ∂θ
     dd    ∂HT                     ∂
        =−     = dT JF (m, θ) − β    C(m).                    (63)         This choice avoids the overhead of evolving two coupled
     dt    ∂m                     ∂m
                                                                           equations in the extended space, which would be computa-
                             β
The stationary states (mβ , d ) are the solutions to:                      tionally equivalent to evolving four equations in the original
                                                                           space (two for +β and two for −β). Using Eq. (70), we
                              F (mβ , θ) = 0,                 (64)         evolve only one coupled equation for +β in the extended
                                                                           space; this corresponds to two equations in the original
            βT                ∂
           d JF (mβ , θ) − β    C(mβ ) = 0.                   (65)         space, thereby achieving the same computational complex-
                             ∂m                                            ity as AsymEP. Furthermore, this single-phase formulation
                                                                           suggests a pathway toward making the update local in time,
This leads to the following observations:                                  provided appropriate hardware is used to implement the
1) The stationary state of m is independent of β and coin-                 augmented phase.
cides with the stationary state of the original system:                    Mathematically, these two approaches yield the same gradi-
                                                                           ent estimate because the equations for dβ are linear. Explic-
               z β + z ′β
                          = mβ = m0 = x0 .                    (66)         itly we have :
                   2
                                                                              ∂H(z β , z ′β , θ)                   ∂F z β + z ′β
                                                                                                                                    
                                                                                                 = −(z β − z ′β )⊤                ,θ
2) The Jacobian of the extended system defined in Eq. (26)                         ∂θ                              ∂θ     2
is invertible, provided JF is invertible. This is most evident                                                      ⊤
                                                                                                         ∂F 0                        −1
from Eq. (63).                                                                                   = −β          z ,θ     JF⊤ (z 0 , θ)
                                                                                                         ∂θ
3) The stationary state value of d is given by:                                                        
                                                                                                          ∂C 0
                                                                                                                   
                                                                                                    ×         (z ) ,                   (71)
             β
                                      
                                 −1 ∂C 0
                                                                                                         ∂x
           d = β JF⊤ (m0 , θ)              (x )               (67)
                                        ∂x                                 where we have used Eqs. (66) and (67). Inspection of
                                                                           Eq. (71) confirms that, up to corrections of order β 2 , we
                                       0
In particular, when β = 0, we have d = 0, which implies                    obtain exactly the same gradient as in AsymEP.
that the free stationary states coincide: z 0 = z ′0 .
4) The cost at the stationary state of the extended system                 E. AsymEP versus Dyadic EP
is equal to the cost at the stationary state of the original
                                                                           In this appendix, we demonstrate that Asymmetric Equilib-
system:
                                                                           rium Propagation (AsymEP) emerges naturally as the first-
                     D(m0 ) = C(x0 ).                   (68)
                                                                           order projection of the 2N -dimensional Dyadic Equilibrium
Consequently, the gradients of the cost with respect to the                Propagation onto a single N -dimensional state space. We
parameters are identical.                                                  then formalize the physical trade-offs between the two ar-
                                                                           chitectures.
Since both the original and extended systems, given respec-
tively in Eq. (28) and Eq. (1-2), share the same cost at their
                                                                           E.1. AsymEP as the Linear Projection of Dyadic EP
respective stationary states, and because the Jacobians of
both models are invertible, applying EP update rule to the                 As established in Appendix D.2, transforming the 2N -
extended system give the correct gradient estimate for the                 dimensional extended space (z, z ′ ) into the mean state
                                                                                       ′
parameters θ of the original system.                                       m = z+z   2   and the difference state d = z − z ′ exactly
The final step of the proof is to establish the equivalence                decouples the stationary dynamics. Because the stationary
between the standard parameter update rule in Eq. (8) and                  state of m is the free state of the original system (mβ = x0 ),
the modified rule used by Dyadic EP in Eq. (34). Indeed, if                the cost function drives the difference variable to a stationary
                                                                                  β
we were to apply the standard update rule in the extended                  state d satisfying:
space, the update would be:                                                                              β      ∂C 0
                                                                                            JF⊤ (x0 , θ)d = β      (x )               (72)
          1
                      β   ′β
                   ∂H(z , z , θ) ∂H(z , z   −β   ′−β
                                                       , θ)
                                                                                                               ∂x
  ∆θ ∝ −                        −                                 .
         2β            ∂θ             ∂θ                                   To recover this exact error signal in an N -dimensional space,
                                                              (69)         we postulate a modified dynamical system FA (x) compris-

                                                                      15
                                   Equilibrium Propagation for Non-Conservative Systems

ing the standard EP dynamics and a spatial correction Γ(x):          F. Derivation of the Hopfield-like Energy
                                 ∂C                                  In this section, we derive the explicit energy functional for
            FA (x) = F (x) − β      (x) + Γ(x)          (73)         the Continuous Asymmetric Hopfield dynamics defined in
                                 ∂x
                                                                     Eq. (35). The force field is given by:
Let ∆x = xβA − x0 denote the displacement from the
free equilibrium. Expanding the stationarity condition                              F (x) = ρ′ (x) ⊙ (Jρ(x)) − x.              (78)
FA (xβA ) = 0 to first order around x0 yields:
                                                                     We omit external inputs J in for brevity, as they appear sym-
                             ∂C 0                                    metrically in the Jacobian. The variational Hamiltonian is
         JF (x0 , θ)∆x − β      (x ) + Γ(xβA ) ≈ 0      (74)         defined as:
                             ∂x
                                                                                                    z + z′              z + z′
                                                                                                                            
To ensure the first-order displacement matches the Dyadic             H(z, z ′ ) = −(z − z ′ )⊤ F             + βC               .
                              β                                                                        2                  2
EP error signal (i.e., ∆x ≈ d ), we substitute Eq. (72) into
                                                                                                                               (79)
the expansion:
                                                                     To analyze this expression, we introduce the midpoint m =
                                                                     z+z ′
         Γ(xβA ) = JF⊤ (x0 , θ) − JF (x0 , θ) ∆x                           and the difference d = z − z ′ . Since the separation
                                             
                                                        (75)           2
                                                                     between z and z ′ is induced solely by the nudging parameter
                 = −2AJ (x0 , θ)(xβA − x0 )             (76)
                                                                     β, the difference scales as ∥d∥ ∼ O(β). We therefore
This uniquely recovers the AsymEP augmented dynamics.                neglect terms of order O(∥d∥3 ) (i.e., or equivalently O(β 3 ))
Finally, to eliminate the O(β 2 ) error, AsymEP evaluates the        as they do not contribute to the gradient of the cost.
centered difference of two opposite nudges:                          The activation at the midpoint can be approximated as:

                             dxA                                                           ρ(z) + ρ(z ′ )
            x±β  0
             A =x ±β                 + O(β 2 )          (77)                     ρ(m) =                   + O(∥d∥2 ).          (80)
                              dβ β=0                                                            2
                                                                     Similarly, the difference in activations is:
Subtracting these states cancels the O(β 2 ) error, yielding
1   +β     −β       β        3
2 (xA − xA ) = d + O(β ), successfully recovering the                          ρ(z) − ρ(z ′ ) = ρ′ (m) ⊙ d + O(∥d∥3 ).         (81)
exact post-synaptic update term.
                                                                     Inverting this relation, we express the state difference as:
E.2. Physical Trade-offs and the Extended Space
                                                                         z − z ′ = (ρ(z) − ρ(z ′ )) ⊙ ρ′ (m) + O(∥d∥3 ).       (82)
We can view AsymEP and Dyadic EP as a space-time trade-
off of the same underlying physical optimization problem.
                                                                     We substitute these expansions into the interaction term
AsymEP preserves the original N -dimensional state space             of the Hamiltonian, Hint = −(z − z ′ )⊤ (ρ′ (m) ⊙ Jρ(m)).
of the network at the cost of temporal non-locality. The sys-        Applying the identity a⊤ (b ⊙ c) = (a ⊙ b)⊤ c, we obtain:
tem must evolve sequentially, requiring physical memory
                                                                                                        ⊤
not only to store the free equilibrium x0 for the asymmet-               Hint = − ((z − z ′ ) ⊙ ρ′ (m)) Jρ(m)
ric correction, but also to store the successive stationary                                           
                                                                                                        ρ(z) + ρ(z ′ )
                                                                                                                       
states required to evaluate the contrastive gradient update.                  ≈ −(ρ(z) − ρ(z ′ ))⊤ J                     .     (83)
                                                                                                             2
AsymEP thus serves as the direct, spatially minimal exten-
sion of EP.                                                          Expanding the product gives:
Dyadic EP provide a learning signal that is local in both
                                                                                         1h
space (where z − z ′ encodes the gradient) and time (allow-                   Hint = −      ρ(z)⊤ Jρ(z) + ρ(z)⊤ Jρ(z ′ )
                                                                                         2
ing the nudged phases to execute in parallel) at the cost                                                             i
of doubling the state space. In particular, capturing non-                           − ρ(z ′ )⊤ Jρ(z) − ρ(z ′ )⊤ Jρ(z ′ ) .    (84)
conservative forces in this extended space requires a spe-
cific bilinear coupling, rather than a trivial superposition         We decompose the connectivity matrix J into its symmetric
of uncoupled subsystems. It can be seen as a blueprint for           part S and antisymmetric part A. The first and last terms
future neuromorphic hardware.                                        simplify to ρ(z)⊤ Sρ(z). The cross terms satisfy:
Ultimately, the reduction of Dyadic EP to AsymEP via the               ρ(z)⊤ Jρ(z ′ ) − ρ(z ′ )⊤ Jρ(z) = ρ(z)⊤ (J − J ⊤ )ρ(z ′ )
variables m and d proves the universality of EP’s variational
principle.                                                                                             = ρ(z)⊤ (2A)ρ(z ′ ).    (85)

                                                                16
                                         Equilibrium Propagation for Non-Conservative Systems

Thus, the interaction term reduces to:                                  The input parameters are then updated using the standard
                                                                        learning rule (21). In particular, the presynaptic term associ-
                 1                1
        Hint = − ρ(z)⊤ Sρ(z) + ρ(z ′ )⊤ Sρ(z ′ )                        ated with the input weights is given by,
                 2                2
               − ρ(z)⊤ Aρ(z ′ ) + O(∥d∥3 ).                 (86)                            ∂Fi
                                                                                              in
                                                                                                 = δik ρ′ (xi )ul .               (93)
                                                                                            ∂Jkl
Finally, for the nudging term, we expand the cost function              The presynaptic terms associated with the dynamical param-
                                                                               dyn
around the midpoint:                                                    eters Jij  depend on the experiment.

                    1
          C(m) =      (C(z) + C(z ′ )) + O(∥d∥2 ).          (87)        G.1. Symmetric Initialization
                    2
                                                                        G.1.1. L EARNING RULES
When multiplying by β, the remainder term becomes β ·
O(∥d∥2 ). Since ∥d∥ ∼ O(β), this remainder is of order                  For clarity, we write the learning rules for VF and AsymEP.
O(β 3 ) and can be consistently discarded alongside the third-          For the input weights, using (93), we have:
order terms from the interaction expansion.                                                  1 h +β                      i
                                                                                      in
                                                                                   ∆Jik  ∝       (xi − x−β       ′ 0
                                                                                                           i )ρ (xi )uk ,      (94)
Combining all these components, the final Hamiltonian is:                                   2β

                    1               1                                   while for the recurrent weight, we get:
     H(z, z ′ ) = − ρ(z)⊤ Sρ(z) + ρ(z ′ )⊤ Sρ(z ′ )
                    2               2                                                      1 h +β                          i
                                   1                                            ∆Jijdyn
                                                                                        ∝      (xi − x−β i  )ρ′ 0
                                                                                                               (xi )ρ(x 0
                                                                                                                        j )  .    (95)
                  − ρ(z) Aρ(z ) + (∥z∥2 − ∥z ′ ∥2 )
                        ⊤    ′                                                            2β
                                   2
                    β            ′
                                                                        For EP, we have:
                  + (C(z) + C(z )).                 (88)
                    2                                                                       1 h +β              i
                                                                                   in
                                                                                 ∆Jik ∝         ρ(xi ) − ρ(x−β
                                                                                                            i  ) uk ,             (96)
The saddle-point dynamics, given by Eq. 32, generated by                                   2β
this Hamiltonian are:                                                   and for the recurrent weights:
    dz                                          β ∂C                                  1 h +β
        = ρ′ (z) ⊙ (Sρ(z) + Aρ(z ′ )) − z −
                                                                                                                      i
                                                     ,                        dyn                            −β    −β
    dt                                          2 ∂z
                                                            (89)          ∆Jij    ∝      ρ(xi )ρ(x+βj ) − ρ(xi )ρ(xj ) . (97)
                                                                                     2β
      ′
   dz                                            β ∂C
        = ρ′ (z ′ ) ⊙ (Sρ(z ′ ) + Aρ(z)) − z ′ +        .   (90)
   dt                                            2 ∂z ′                 G.1.2. S UPPLEMENTARY N UMERICAL R ESULTS
This system recovers the original continuous Hopfield dy-               To complement Fig. 2, we report the evolution of the accu-
namics when z = z ′ (assuming β = 0).                                   racy of the three methods in Fig. 4. We consider a layered
                                                                        network with 50 hidden neurons. While this capacity is
G. Experimental Details                                                 insufficient for state-of-the-art performance, it amplifies the
                                                                        difference in accuracy between models to aid visualization.
As in the main text, the neuronal dynamics are governed by              Models are trained for 20 epochs starting from a symmetric
the vector field:                                                       configuration, the natural setting for both VF and EP. With
                                                                      this initialization, AsymEP consistently outperforms the
                    X dyn                                               other methods and learns faster by exploiting the additional
     Fi = ρ′ (xi )      Jij ρ(xj ) + bi (u) − xi ,  (91)
                                                                        degrees of freedom of the asymmetric network.
                      j


where the input-dependent bias bi (u) is precomputed for                G.2. Fixed Asymmetry Ratio
each MNIST input u as:                                                  This section details the implementation for the fixed asym-
                                X                                       metry ratio experiments presented in Section 5.2, followed
                     bi (u) =          Jilin ul .           (92)        by complementary numerical results regarding learning
                                l∈in                                    speed and induced Jacobian asymmetry.
This term projects the input space into the recurrent sub-
                                                                        G.2.1. L EARNING RULES
space. The bias yields a diagonal contribution to the Jaco-
bian JF = ∂F ∂x , and therefore does not contribute to the              Parametrization and notation. To enforce a fixed asym-
antisymmetric correction used in the augmented dynamics                 metry ratio, we explicitly parameterize the independent ele-
Eq. (20) of AsymEP.                                                     ments of Eq. (38). We introduce two parameter vectors θS

                                                                   17
                                      Equilibrium Propagation for Non-Conservative Systems

         Parameter                                     Sym. Init. / Feedforward             Fixed rstr          Fixed rstr & rin
                                                            sec. 5.1 & 5.3                   sec. 5.2              app. G.3
         Learning Rate (Input-Hidden)                             0.05                       0.05                    0.0125
         Learning Rate (Hidden-Output)                            0.01                       0.01                    0.0025
         Time Step (Dynamics Integration)                          0.5                        0.3                      0.3
         Nudging Parameter (β)                                     0.5                        0.5                      0.5
         Free-phase Steps (nfree )                                  20                        30                       40
         Nudged-phase Steps (nnudge )                               10                        10                       10
         Number of Epochs                                        40 / 20                      30                       40
         Batch Size                                                 64                       √64                      √64
         Scaling Parameter γ                                       n.a.                        60                        60
         Structure                                            784 - n.a. -10               784-50-10           all-to-all, 500 hid
         Activation function ρ                                    tanh                       tanh                     tanh
         Initial Recurrent State s                            s ∼ U (−1, 1)              s ∼ U(−1, 1)            s ∼ U(−1, 1)
         Initial Parameters θ                                 θ ∼ N (0, N1 )             θ ∼ N (0, N1 )          θ ∼ N (0, N1 )
         Number of Runs (training + inference)                      10                        10                       10
Table 3. Trained Model Hyperparameters on MNIST. N is the total number of neurons, U(−1, 1) is a uniform distribution, and N (µ, σ 2 )
is a Gaussian distribution. For the rstr parametrization, we choose more cautious hyperparameters for training and inference compared to
the symmetric initialization, due to increasingly non-conservative and potentially oscillatory dynamics.


                                                                        elements of S̃, the full matrices are constructed as:
                                                                                                        S
                                                                             S̃ij = δij ξi + (1 − δij )θk(max(i,j),min(i,j)) ,         (99)
                                                                                         A
                                                                             Ãij = ϵij θk(max(i,j),min(i,j)) ,                       (100)
                                                                        where ϵij is the Levi-Civita symbol. The dynamical param-
                                                                        eters are then given by:
                                                                                         dyn
                                                                                        Jij  = γ(cS S̃ij + cA Ãij ),                 (101)
                                                                        with normalization coefficients
                                                                                        p
                                                                                                2
                                                                                          1 − rstr                           rstr
                                                                                 cS =              ,                  cA =        ,   (102)
                                                                                           FS                                FA
                                                                        defined in terms of the Frobenius norms:
                                                                                             v
                                                                                             uN            M
                                                                                             uX           X      2
Figure 4. Evolution of the mean accuracy and standard deviation                       F =t
                                                                                        S          ξ2 + 2i    θS ,     k              (103)
(over 10 runs) during training on MNIST for AsymEP, EP, and VF.                                  i=1            k=1
Models use 50 hidden neurons.                                                              v
                                                                                           u M
                                                                                           u X    2
                                                                                      FA = t2  θkA .                                  (104)
                                                                                                  k=1
and θA of size M = Ndyn (Ndyn − 1)/2, which encode the
off-diagonal elements of the symmetric and antisymmetric                Presynaptic computation. The dependence of the nor-
components S̃ and Ã, respectively. The correspondence                  malization coefficients on the parameters introduces addi-
between matrix and vector indices is given by:                          tional regularization terms in the learning rule compared
                                                                        to the parameterization of (Scellier &Bengio, 2017). The
               (i − 1)(i − 2)                                           gradients of the normalization coefficients are:
   k(i, j) =                  + j,     (1 ≤ j < i ≤ Ndyn )
                     2                                                      ∂cS          θkS                 ∂cS        ξm
                                                            (98)                 = −2cS      2,                  = −cS       2,       (105)
                                                                            ∂θkS        (FS )                ∂ξm       (FS )
                                                                            ∂cA        θA
where the condition j < i selects the strictly lower triangular               A
                                                                                = −2cA k 2 .                                          (106)
elements. Introducing an additional vector ξ for the diagonal               ∂θk       (FA )

                                                                   18
                                        Equilibrium Propagation for Non-Conservative Systems

          Parameter                                        Comparison Dyn.         2 hidden layers         3 hidden layers
                                                              sec. 5.4                 sec. 5.4                sec. 5.4
          Learning Rate (Input-Hidden)                          0.0016                 0.0013                   0.6
          Learning Rate (Hidden-Hidden)                         0.0016                 0.0013                   0.6
          Learning Rate (Hidden-Output)                         0.0016                 0.0013                   0.6
          Time Step (Dynamics Integration)                        0.4                    0.3                  0.0075
          Nudging Parameter (β)                                   0.3                    0.5                   0.20
          Free-phase Steps (nfree )                               40                     40                     60
          Nudged-phase Steps (nnudge )                            20                     20                     30
          Number of Epochs                                        50                     40                     40
          Batch Size                                              64                     64                     64
          Layer Structure                                   784-500-200-10         784-500-500-10       784-500-500-500-10
          Activation function ρ                                  tanh                   tanh                   tanh
          Initial Recurrent State s                          s ∼ U(−1, 1)           s ∼ U(−1, 1)           s ∼ U (−1, 1)
          Initial Parameters θ                               θ ∼ N (0, N1 )         θ ∼ N (0, N1 )         θ ∼ N (0, N1 )
          Number of Runs (training + inference)                   10                     10                     10
Table 4. Trained Model Hyperparameters on Fashion-MNIST. N is the total number of neurons, U(−1, 1) is a uniform distribution, and
N (µ, σ 2 ) is a Gaussian distribution. For the rstr parametrization, we choose more cautious hyperparameters for training and inference
compared to the symmetric initialization, due to increasingly non-conservative and potentially oscillatory dynamics.


Combining these with the derivatives of the matrices S̃ and                    (where p > q):
Ã, we have:
                                                                                                                   N
                                                                                                            θkA X
                                                                                                       
                                                                                   ∂Fi          ′
     ∂ S̃ij                             ∂ S̃ij                                          = γc A ρ  (xi ) −2            Ãij ρ(xj )
            = δip δjq + δiq δjp ,              = δij δkj     (107)                 ∂θkA                    (FA )2 j=1
     ∂θkS                               ∂ξk                                                                                        
     ∂ Ãij                                                                                               + δip ρ(xq ) − δiq ρ(xp ) .
            = δip δjq − δiq δjp ,                            (108)
     ∂θkA                                                                                                                           (111)
where k corresponds to the index pair (p, q) with p > q, as
defined in Eq. (98). The full presynaptic terms are then:                 Initialization. To ensure the stability of the system, we
                                                                          initialize our parameters suchhthat the
                                                                                                                i variance of dynam-
                                                                                                            dyn
   • For the diagonal parameters ξm :                                     ical parameters scales as Var Jij ∝ 1/Ndyn . This is a
                                                                          conservative choice for the layered architectures used in our
                                                                                                                 dyn
         ∂Fi
                           
                               ξm X
                                      N                                   experiments, where many entries of Jij     are zero.
             = γcS ρ′ (xi ) −            S̃ij ρ(xj )
         ∂ξm                  (FS )2 j=1                                  In practice, we initialize the parameter vectors θS , θA , and
                                                             (109)
                                                                         ξ with identical variances σ 2 . For large Ndyn , the expected
                             + δim ρ(xm ) .                               Frobenius norms approximate to E[FS,A ] ≈ Ndyn σ. Conse-
                                                                          quently, the normalization coefficients become:
                                                                                         p
                                                                                                 2
   • For the off-diagonal symmetric parameters θkS (where                                 1 − rstr                   rstr
                                                                                  cS ≈             ,        cA ≈           .        (112)
     p > q):                                                                              Ndyn σ                    Ndyn σ

                                           N                              Since the symmetric and antisymmetric components are sta-
                                    θkS X
                               
          ∂Fi          ′
               = γc S ρ  (x i ) −2            S̃ij ρ(xj )                 tistically independent, the variance of the weights is derived
          ∂θkS                     (FS )2 j=1
                                                                         as follows:
                                    + δip ρ(xq ) + δiq ρ(xp ) .
                                                                             • Diagonal elements (i = j):
                                                             (110)
                                                                                                                         2
                                                                                        h      i                    1 − rstr
                                                                                     Var Jiidyn = γ 2 c2S σ 2 ≈ γ 2    2     .      (113)
   • For the off-diagonal antisymmetric parameters θkA                                                               Ndyn

                                                                     19
                                      Equilibrium Propagation for Non-Conservative Systems

   • Off-diagonal elements (i ̸= j):                                    a zero-cost baseline (perfect prediction) during learning.
                                                                        Specifically, for each method and value of rstr , we calcu-
             h
               dyn
                   i                       γ2                           late the cumulative loss by summing the batch-averaged
                     = γ 2 c2S + c2A σ 2 ≈ 2 ,
                                    
          Var Jij                                         (114)
                                          Ndyn                          costs of the first 5 epochs (out of 30, to avoid saturation
              h     i                                                   effects), and reporting the mean and standard deviation over
                dyn
To satisfy Var Jij    ∝ 1/Ndyn , we set:                                10 independent training runs. Mathematically, for each run:

                               p
                         γ=     Ndyn                      (115)                                                                       
                                                                                               5     NX
                                                                                               X      batches       X        C(x0 , u) 
Note that by random matrix theory, diagonal elements do                   Cumul. Loss =                                                 ,
                                                                                                                              |Bk |
not affect stability in the large Ndyn limit.                                                epoch=1 k=1        (x0 ,u)∈Bk
                                                                                                                              (120)
Potential Simplification. Although the parameterization                 where Bk represents the k-th batch, and |Bk | denotes the
above is fully general, a simpler construction is possible              number of examples in the batch. The parameters are up-
by removing self-connections (ξ = 0) and enforcing identi-              dated after each batch step; consequently, the free equilib-
cal parameterization for the symmetric and antisymmetric                rium x0 is inferred using the updated parameters and the
components, i.e., θS = θA = θ. The matrix elements then                 current example u.
become:

            S̃ij = (1 − δij )θk(max(i,j),min(i,j)) ,      (116)
            Ãij = ϵij θk(max(i,j),min(i,j)) .            (117)

In this case, the Frobenius norms are equal (FS = FA ), and
we can omit the explicit normalization:
                       q
                 dyn           2 S̃ + r Ã .
               Jij   = 1 − rstr    ij   str ij       (118)

For a parameter θk corresponding to indices (p, q) with
p > q, the presynaptic term is given by:
                q                   
  ∂Fi
      = ρ′ (xi )           2 +r
                     1 − rstr      str δip ρ(xq )
  ∂θk
                     q                                  (119)
                  +             2 −r
                          1 − rstr            δ    ρ(x
                                        str     iq     p .
                                                        )

While this parameterization works in simulations and keeps              Figure 5. Cumulative loss as defined by (120) over the first 5
the number of parameters constant for all rstr , it constrains          epochs of training, for different asymmetry ratios rstr . We compare
the asymmetry to be “homogeneous”, by which we mean                     VF (orange) and AsymEP (blue), under two training regimes:
that the asymmetry ratio is identical for every pair of neu-            training only J in (dashed) or all parameters (solid).
rons; hence, the network cannot learn to be symmetric in one
region and antisymmetric in another. Therefore, we choose
to explore the more general case of (38) in our experiments.            In Fig 5, we observe that learning slows down for both al-
                                                                        gorithms when rstr ≳ 0.6. This behavior likely results from
G.2.2. S UPPLEMENTARY N UMERICAL R ESULTS                               the increased difficulty of reaching a stationary state as the
                                                                        dynamics become strongly asymmetric. With a fixed num-
To complement the results of Fig 3, we analyze the training
                                                                        ber of inference steps, incomplete convergence degrades the
efficiency as a function of the asymmetry ratio rstr and in-
                                                                        accuracy of the gradient estimates, thereby slowing down
vestigate the robustness of VF by monitoring the Jacobian
                                                                        the learning. Fig 5 shows that while VF can eventually
asymmetry.
                                                                        achieve competitive accuracy, it is consistently slower than
                                                                        AsymEP as soon as asymmetry is introduced.
Training efficiency. We first study the training efficiency
of the two algorithms as a function of the asymmetry ra-
tio rstr . Inspired by the related concept in (Cesa-Bianchi
&Lugosi, 2006), we define the cumulative loss as the accu-              Jacobian asymmetry. We next examine how the struc-
mulated difference between the free equilibrium cost and                tural asymmetry rstr is reflected in the Jacobian of the dy-

                                                                   20
                                     Equilibrium Propagation for Non-Conservative Systems

namics (35), given by:
    ∂Fi                       dyn ′
        = (1 − δij )ρ′ (xi )Jij  ρ (xj )
    ∂sj
               h                                           i
          + δij ρ′ (xi )(Jiidyn ρ′ (xi )) + ρ′′ (xi )bi − 1 .
                                                           (121)

In our layered architecture, the self-connections are zero
(Jiidyn = 0). For the following analysis, we neglect all diag-
onal terms in the Jacobian (including external inputs and
potential), since they do not contribute to the antisymmetric
correction (20) and thus to the discrepancy between the per-
formance of VF and AsymEP. Consequently, we define the
following asymmetry ratio based solely on the off-diagonal
Jacobian JF,off :                                                       Figure 6. Asymmetry ratio of the Jacobian rjac defined in equation
                                      ⊤
                                                                        (122) after training for different asymmetry ratios rstr . We compare
                          ∥JF,off − JF,off ∥F                           VF (orange) and AsymEP (blue), under two training regimes:
                 rjac =                       ,            (122)        training only J in (dashed) or all parameters (solid).
                             ∥JF,off ∥F

The results are presented in Fig 6. For each trained model              Consequently, local stability requires the largest real eigen-
and ratio rstr , we compute rjac averaged over the stationary           value of the effective weight matrix to be strictly less than 1.
states of the first batch (64 images) across 10 independent             Assuming weights are initialized independently with vari-
runs. We observe that when structural asymmetry is strong               ance σ 2 , Girko’s circular law dictates that the eigenvalues
and all parameters are trained, VF partially compensates for            of√an asymmetric matrix uniformly populate a disk of radius
the asymmetry by adjusting the neuronal states. This can be             σ n in the complex plane. In contrast, imposing symmetry
understood by rewriting the ratio as:                                   forces the eigenvalues √ onto the real line, broadening the
                                                                      spectral radius to 2σ n according to Wigner’s semicircle
                            dyn      dyn ⊤
                ρ′ (xi ) Jij    − (Jji  ) ρ′ (xj )                      law. As a result, asymmetric networks can stably accommo-
                                                   F
     rjac =                                          .  (123)           date larger variance in the weight initializations than their
                                  dyn ′
                         ρ′ (xi )Jij ρ (xj )                            symmetric counterparts.
                                          F

Compared to the structural asymmetry ratio in Eq. (37),                 Asymmetry nevertheless introduces imaginary eigenvalues
a value of rjac < rstr indicates that the neuronal states ef-           and, consequently, damped oscillations. To study this effect
fectively dampen the structural asymmetry, rendering the                experimentally in a controlled setting, we constrain the input
dynamics more symmetric. This symmetrization of the Ja-                 projections J in . In the experiments of the main text, fixing
cobian appears without imposing an additional symmetriza-               the structural asymmetry ratio rstr still allowed AsymEP
tion penalty and could be enhanced using the method of                  to reduce oscillations by aligning and increasing the input
(Laborieux &Zenke, 2022). This mechanism likely explains                projections J in , thereby adding stabilizing diagonal contri-
the superior performance of ‘All (VF)’ compared to ‘J in                butions to the Jacobian. To isolate the network’s ability to
(VF)’ in Fig 3, as the former is able to use the additional             suppress oscillations independently of the magnitude of the
degrees of freedom to reduce the effective asymmetry at                 input drive, we further constrain the relative scale of J in and
high rstr .                                                             J dyn by imposing
                                                                                                   ∥J in ∥F   ∥J in ∥F
G.3. Stability analysis with Fixed Asymmetry Ratio &                                      rin =             =          ,              (124)
                                                                                                  ∥J dyn ∥F      γ
     Constrained Inputs Projection
                                                                        where ∥J dyn ∥F = γ following Eq. (101). Defining unscaled
A complete stability analysis of the non-conservative dy-               input projections J˜in , we set
namics trainable with AsymEP is beyond the scope of this
work. Nevertheless, for the class of continuous Hopfield                                                        J˜in
                                                                                               J in = rin γ                           (125)
networks considered here, standard arguments from random                                                      ∥J˜in ∥F
matrix theory suggest that asymmetry inherently improves
asymptotic stability.                                                   G.3.1. L EARNING RULES
In the dynamics defined by Eq. (91), the linear leak term               Reusing the notations of the previous section, we write
−xi shifts the spectrum of the system’s Jacobian by −1.                 Jilin = γcin J˜ilin with the normalization cin = rin /Fin , where

                                                                   21
                                   Equilibrium Propagation for Non-Conservative Systems

Fin = ∥J˜in ∥F . Applying the chain rule yields:
                        "                   #
   ∂Fi            ′              J˜kl
                                   in X
                                        ˜in
          = γcin ρ (xi ) δik ul − 2     J um .          (126)
   ∂ J˜kl
       in                        Fin m im

And for γ we have:

                     ∂Fi  1
                         = (Fi + xi ).                  (127)
                     ∂γ   γ

G.3.2. S UPPLEMENTARY N UMERICAL R ESULTS
                                                                      Figure 7. Comparison of AsymEP and VF on a feedforward net-
Table 5 reports a worst-case control experiment where the
                                                                      work. Test accuracy on MNIST is shown as a function of training
structural asymmetry is fixed at rstr = 0.7 while the input           epochs for a single-hidden-layer network with 20 neurons. Curves
scale ratio rin is varied. The experiment uses an all-to-all          report the mean and standard deviation over 10 runs. Best accura-
architecture on MNIST (excluding direct input-to-output               cies are 92.7% ± 0.5% (AsymEP) and 64.3% ± 2.0% (VF).
connections). The output variance during extended infer-
ence (steps 30-50) confirms that the system successfully
learns to suppress oscillations even when rin is severely re-         G.5. Advantages of Non-Conservatives Dynamics
stricted. Any small residual oscillations can be mitigated by         In Section 5.4, we compare three (non-)conservative dynam-
time-averaging over the inference steps.                              ics under varying constraints. To further evaluate learning
Finally, rin can be interpreted as a measure of the external          speed, Table 6 reports network performance after a sin-
signal magnitude relative to the internal recurrent dynamics.         gle epoch. These results confirm our earlier observation:
These results indicate that the system remains capable of             AsymEP learns faster than VF.
learning and stabilizing even under a low external input
drive. Even when the input projection ∥J in ∥F is 100 times           G.6. Feedforward CIFAR-10 Experiments
smaller than the recurrent connections ∥J dyn ∥F , the network        This appendix details the architecture and hyperparameters
still achieves 36.34 ± 6.25% accuracy, which is well above            of the deep feedforward experiments comparing backprop-
chance level (∼ 10%).                                                 agation (BP), VF, AsymEP and Dyadic EP on CIFAR-10
                                                                      (see subsection 5.5)
G.4. Feedforward Network
G.4.1. L EARNING RULES                                                Architecture. We use a nine-layer convolutional network
                                                                      (denoted CNN9). The first eight layers are convolutional
For clarity, we write the learning rules for VF and AsymEP            with 3 × 3 kernels and zero-padding; spatial downsampling
in a feedforward architecture with one hidden layer using             is performed by strided convolutions (stride 2 on layers 2, 4,
the notation of Section 5.3. For the input weights connecting         6, 8 and stride 1 otherwise), so no pooling is used. The chan-
to the hidden layer, we get the usual formula:                        nel widths follow the sequence 3 → 64 → 64 → 128 →
                                                                      128 → 256 → 256 → 512 → 512, reducing the spatial
                  1 h +β    −β      0
                                         i
                                                                      resolution from 32 × 32 to 2 × 2. The last layer is a fully
         in
       ∆Jik ∝        (hi − hi )ρ′ (hi )uk ,             (128)
                 2β                                                   connected readout mapping the 512 × 2 × 2 feature map
                                                                      to the 10 class logits. All hidden units use a ReLU non-
while for the feedforward weights connecting the hidden to            linearity.
the output layer, we get:                                                   p Weights are initialized with the Kaiming scheme
                                                                      (σ = 2/fan-in) and biases at zero.
                    1 h +β                 0
                                              i
  ∆(Wh→o )ji ∝         (oj − o−β  ′ 0
                              j )ρ (oj )ρ(hi ) .        (129)         Training setup. All methods are trained for 40 epochs
                   2β                                                 with batch size 64 and repeated over 5 seeds. Inputs are
                                                                      normalized per channel and augmented with random 32 ×
Note that EP is not applicable in this case.
                                                                      32 crops (padding 4), random horizontal flips and Cutout
                                                                      (one 8 × 8 patch). Parameters are updated with SGD with
G.4.2. S UPPLEMENTARY N UMERICAL R ESULTS
                                                                      momentum 0.9, weight decay 5 × 10−4 and gradient-norm
In addition to the final accuracy reported in Sec. 5.3, we            clipping at 1, under a cosine learning-rate schedule decaying
show in Fig. 7 the evolution of the accuracy over 20 epochs           from 3.5 × 10−2 to 2 × 10−4 . Test accuracy is reported
for AsymEP and VF.                                                    on an exponential moving average of the weights (decay

                                                                 22
                                     Equilibrium Propagation for Non-Conservative Systems

Table 5. Output variance and final test accuracy on MNIST (%) across different values of rin with rstr = 0.7. (mean ± std over 10 runs)
(500 hiddens, all-to-all, no input-output).

                                                  Output variance                          Test Acc. (%)
                           rin            Untrained             Epoch 80                     Epoch 80
                           0.01    (3.38 ± 0.90) × 10−4        (5.22 ± 2.34) × 10−5         36.34 ± 6.25
                           0.10    (2.33 ± 0.48) × 10−4        (1.39 ± 0.17) × 10−4         90.54 ± 0.19
                           0.50    (1.34 ± 0.32) × 10−5        (1.06 ± 0.25) × 10−6         94.96 ± 0.10
                           1.00    (6.27 ± 1.24) × 10−7        (1.75 ± 0.50) × 10−8         96.30 ± 0.09


Table 6. Test accuracy on Fashion-MNIST (%) at Epoch 1 (mean
± std 10 runs). The table compares three classes of network
dynamics: Continuous Hopfield (CH), Predictive Coding (PC),
and Standard dynamics. Each is evaluated under three connec-
                                               ⊤
tivity structures: Asymmetric (Asym, Bk ̸= Wk+1   ), Symmet-
                                ⊤
ric/conservative (Sym, Bk = Wk+1 ), and Feedforward (Feedfor,
Bk = 0).

                           EP          AsymEP            VF
          Asym          -      74.91 ± 0.45         48.98 ± 4.09
 CH       Feedfor       -      74.36 ± 0.29         48.84 ± 3.46
          Sym     74.57 ± 0.43       -                    -
          Asym          -      77.83 ± 0.47         57.75 ± 3.37
 PC
          Sym     76.23 ± 0.39       -                    -
          Asym          -      76.87 ± 0.51         61.50 ± 4.06
 Standard
          Feedfor       -      77.92 ± 0.51         63.98 ± 0.73


0.9995). The targets are smoothed (ε = 0.1), which for
the EP methods amounts to nudging toward the smoothed
one-hot vector.

Relaxation hyperparameters. The four methods differ
only in the gradient estimator: BP uses automatic differ-
entiation, while the EP-based methods contrast stationary
states of the corresponding relaxation dynamics. VF uses
an integration step η = 1.0, nudging β = 0.1, and at most
K = 1000 relaxation steps with an early-stopping threshold
of 9 × 10−6 on the mean state update. Dyadic EP uses
the same settings except for a nudging strength β = 0.1.
AsymEP uses a smaller step η = 0.5, nudging β = 0.1,
and up to K = 250 relaxation steps with a threshold of
1 × 10−4 .


                                                                  23