1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
|
"""#2 — leading-abscissa control for the ept non-conservative operator (ports the aep-dynamics
'control the LEADING spectral signal, surgically' finding to C512).
Why not jacreg: jacreg penalizes ||J_nc||_F^2 (Hutchinson) — the WHOLE Jacobian norm. That is blunt:
it over-constrains directions that never threaten stability, and when the controller ramps it high it
HIJACKS the task gradient (the known jr-hijack failure). The aep leading-vs-lagging result says the
right knob is the leading SPECTRAL ABSCISSA, not the norm.
What we control: the numerical abscissa omega(J_nc) = lambda_max( (J_nc + J_nc^T)/2 ) = the 2-norm
log-norm mu_2(J_nc) = one-sided Lipschitz constant. It upper-bounds the spectral abscissa, governs the
transient growth ||e^{Jt}||, and its crossing past (1+c) IS the free-phase Hopf (J_F = J_nc - (1+c)I).
Power iteration on the SYMMETRIC PART -> matvec-only (jvp+vjp of nc_force, the same primitives jacreg
and the AEP correction already call), so it scales and is analog-compatible (no eigendecomposition).
One-sided ReLU penalty = a LEADING signal: acts only as the abscissa nears the margin, so unlike jacreg
it does not over-contract / hijack when the operator is already safe.
=== VERDICT (eig_recheck.py, 2026-07-03): the omega premise above is REFUTED for this operator. ===
Gold-standard Lanczos on Sym(J_nc) at z*: omega = +5..+13 on EVERY operator — including the stablest
warm source s2000 (omega=+12.8 while its TRUE spectral abscissa, eig_probe FD-JVP, is ~-0.02). The
non-normality gap omega - alpha ~ 10: the numerical abscissa is hopelessly conservative here and even
ANTI-correlates with actual stability (rand-init +5.0 -> s2000 +12.8). An omega-margin penalty would
fire permanently and over-constrain worse than jacreg. KEEP num_abscissa/eig_penalty as DIAGNOSTICS
ONLY. (The pre-fix plain power iteration also returned garbage mixtures on the indefinite Sym — the old
'warm -10.14 vs scratch +1.11' fingerprint story is retracted.)
The CONTROL is lead_rho/spec_penalty below: the TRUE leading eigenvalue of the forward map
M = I + eps*J_F (a real 2-D Rayleigh-Ritz block captures the complex Hopf pair that symmetric power
iteration cannot see). rho = |lam_lead(M)| is the exact discrete stability criterion (rho > 1 = the
relaxation spirals out — covers both the continuous Hopf and the Euler/Neimark-Sacker route), matches
eig_probe's measurement, and is the aep-dynamics 'spectral' steering ported to C512 — applied SOFTLY
(small coefficient, one-sided), per the aep finding that hard projection over-constrains.
"""
import torch
from torch.autograd.functional import jvp, vjp
def num_abscissa(blk, zs, cache, iters=3):
"""SHIFTED power iteration for the TOP (algebraic-max) eigenpair of Sym(J_nc)=(J_nc+J_nc^T)/2 at zs.
Returns (v_detached, lambda_max_float) — lambda_max(Sym) = the numerical abscissa.
Why shifted: plain power iteration converges to the largest-|lambda| eigenpair. Sym(J_nc) is
indefinite and its NEGATIVE end can dominate in magnitude — plain PI then silently returns
lambda_min, which reads 'deeply stable' exactly when lambda_max is over the margin, disarming the
penalty. Phase 1 estimates the dominant magnitude s; phase 2 iterates (Sym + s·I), whose top
eigenvalue lambda_max + s is non-negative and dominant, so the Rayleigh quotient converges to
lambda_max. Both phases warm-start from the cache, so accuracy accrues across training steps."""
z = zs.detach()
def smv(u): # Sym(J_nc) u
return 0.5 * (jvp(blk.nc_force, z, u)[1] + vjp(blk.nc_force, z, u)[1])
def unit(key):
u = cache.get(key)
if u is None or u.shape != z.shape or u.dtype != z.dtype or u.device != z.device:
u = torch.randn_like(z)
return u / (u.norm() + 1e-12)
with torch.no_grad():
u, s = unit('u'), 0.0
for _ in range(iters): # phase 1: dominant |lambda| -> the shift
Su = smv(u); s = float(Su.norm()); u = Su / (s + 1e-12)
s = 1.1 * s + 1e-3 # safe overestimate (need s >= |lambda_min|)
v = unit('v')
for _ in range(iters): # phase 2: top eigenpair of Sym + s·I
w = smv(v) + s * v
v = w / (w.norm() + 1e-12)
lam = float((v * jvp(blk.nc_force, z, v)[1]).sum() / (v * v).sum()) # v^T J v = v^T Sym(J) v
cache['u'] = u; cache['v'] = v
return v, lam
def lead_rho(blk, zs, eps, c, cache, iters=6):
"""TRUE leading eigenvalue of the forward map M = I + eps*J_F, J_F = J_nc - (1+c)I.
Warm-started 2-D subspace (Rayleigh-Ritz) iteration: a real 2-D block captures the complex Hopf
pair that plain power iteration cannot represent. Matvec-only (jvp of nc_force).
Returns (V=[v0,v1] detached orthonormal, rho, mu_re): rho = |lam_lead(M)| (discrete stability:
rho > 1 = relaxation spirals out), mu = (lam-1)/eps the continuous eigenvalue (Hopf: Re mu > 0)."""
z = zs.detach()
k = 1.0 - eps * (1.0 + c)
def mv(u): # M u = u + eps*(J_nc - (1+c)I) u
return k * u + eps * jvp(blk.nc_force, z, u)[1]
V = cache.get('V')
if V is None or V[0].shape != z.shape or V[0].dtype != z.dtype or V[0].device != z.device:
V = [torch.randn_like(z), torch.randn_like(z)]
with torch.no_grad():
for _ in range(iters):
W0, W1 = mv(V[0]), mv(V[1])
W0 = W0 / (W0.norm() + 1e-12) # Gram-Schmidt: orthonormal 2-D block
W1 = W1 - (W1 * W0).sum() * W0
W1 = W1 / (W1.norm() + 1e-12)
V = [W0, W1]
M0, M1 = mv(V[0]), mv(V[1])
h00, h01 = float((V[0] * M0).sum()), float((V[0] * M1).sum()) # H = V^T M V (Rayleigh-Ritz)
h10, h11 = float((V[1] * M0).sum()), float((V[1] * M1).sum())
tr, det = h00 + h11, h00 * h11 - h01 * h10
disc = tr * tr / 4.0 - det
if disc >= 0: # two real Ritz values: take the larger |.|
l1, l2 = tr / 2.0 + disc ** 0.5, tr / 2.0 - disc ** 0.5
lam_re, rho = (l1, abs(l1)) if abs(l1) >= abs(l2) else (l2, abs(l2))
else: # complex pair a+-bi: |lam|=sqrt(det), Re=tr/2
rho, lam_re = det ** 0.5, tr / 2.0
cache['V'] = V
return V, rho, (lam_re - 1.0) / eps
def spec_penalty(blk, zs, eps, c, specreg, rho_target, cache, iters=6):
"""Grads of the one-sided TRUE-leading-eigenvalue penalty R = specreg * relu(rho - rho_target)^2
with rho = |lam_lead(I + eps*J_F)| via Rayleigh-Ritz on the converged (detached) 2-D subspace.
This is the aep-dynamics 'spectral' steering at C512, applied softly (one-sided, small specreg).
Returns ({id(p): grad}, rho, mu_re) — empty grads when rho below target."""
V, rho0, mu_re = lead_rho(blk, zs, eps, c, cache, iters)
if rho0 <= rho_target: # contraction margin held: signal off
return {}, rho0, mu_re
z = zs.detach()
k = 1.0 - eps * (1.0 + c)
with torch.enable_grad(): # H(theta), V fixed (envelope/Rayleigh-Ritz)
M0 = k * V[0] + eps * jvp(blk.nc_force, z, V[0], create_graph=True)[1]
M1 = k * V[1] + eps * jvp(blk.nc_force, z, V[1], create_graph=True)[1]
h00, h01 = (V[0] * M0).sum(), (V[0] * M1).sum()
h10, h11 = (V[1] * M0).sum(), (V[1] * M1).sum()
tr, det = h00 + h11, h00 * h11 - h01 * h10
disc = tr * tr / 4.0 - det
rho = torch.where(disc >= 0,
torch.maximum((tr / 2 + disc.clamp(min=0).sqrt()).abs(),
(tr / 2 - disc.clamp(min=0).sqrt()).abs()),
det.clamp(min=1e-12).sqrt())
R = specreg * torch.relu(rho - rho_target) ** 2
gr = torch.autograd.grad(R, blk.block, allow_unused=True)
return {id(p): g for p, g in zip(blk.block, gr) if g is not None}, rho0, mu_re
def eig_penalty(blk, zs, eigreg, margin, cache, iters=3):
"""[DIAGNOSTIC-ONLY — omega control refuted, see module docstring; use spec_penalty instead.]
Grads of the one-sided leading-abscissa penalty R = eigreg * relu(omega(J_nc) - margin)^2.
Returns ({id(p): grad}, omega) — omega logged as the controller signal. Empty grads when below margin."""
v, lam0 = num_abscissa(blk, zs, cache, iters)
if lam0 <= margin: # below the stability margin: leading signal off
return {}, lam0
z = zs.detach()
with torch.enable_grad():
Jv = jvp(blk.nc_force, z, v, create_graph=True)[1] # differentiable in theta (nc_force params)
lam = (v * Jv).sum() / (v * v).sum() # numerical abscissa, v fixed
R = eigreg * torch.relu(lam - margin) ** 2
gr = torch.autograd.grad(R, blk.block, allow_unused=True)
return {id(p): g for p, g in zip(blk.block, gr) if g is not None}, lam0
|