# Session 2026-06-22→24: below-2.1 wall — DIAGNOSIS FLIPPED to genuine Hopf, FIX = resreg/jacreg (LE control), 2.09 recipe recovered THE definitive write-up of this multi-day session. Supersedes the "Euler-artifact" framing in EP_BELOW210 cont.7. ## 1. HEADLINE — the diagnosis flipped twice, landed on GENUINE Hopf instability The C512 EP below-CE-2.1 divergence is a **genuine continuous-time Hopf instability** of the learned **non-conservative attention** operator — a complex eigenvalue pair crossing **Re μ > 0** as EP trains the attention expressive. **NOT a discrete-Euler artifact** (that was an intermediate wrong turn). Analog/continuous hardware is NOT automatically immune. ### Evidence chain (3 independent methods, all in ep_run/) 1. **eval_relax_s3200.py** — relax the marginal ckpt redx s3200 (val 2.74) 6000 steps → res floors ~2.3e-2, OSCILLATES, no fixed point = limit cycle. 2. **knockout_s3200.py** — scale attention output WO×α: cycle scales with α, CONVERGES at α=0.2 → the **non-conservative attention drives the cycle**. 3. **eps_sweep_s3200.py** — cycle amplitude shrinks monotonically as ε↓ (0.1→2.3e-2, 0.01→8.9e-4). *Intermediate misread*: thought "Euler artifact, analog immune" (cont.7). **fugu caught the bug**: that 8.9e-4 at ε=0.01 is the STEP residual `r=ε·g`, i.e. the SAME force-floor g≈0.09 — so the sweep proves the oscillation is discrete-amplified but does NOT prove a clean continuous fixed point exists. 4. **ε-TRAINING-monotonicity (3 runs)**: ε=0.1→blew@**2.74**; ε=0.1,t2sel=160 (BETTER gradient)→blew@**3.02** EARLIER (⇒ gradient quality is NOT the lever); ε=0.05→blew@**2.41**. Smaller ε → strictly lower wall. 5. **eig_probe.py** (matrix-free FINITE-DIFFERENCE-JVP Arnoldi on M=I+εJ — DECISIVE): leading continuous μ=(λ−1)/ε at the operating point: - s2000 (3.13): all |λ|<1, **Re μ=−0.024 STABLE** - s3200 (2.74): top |λ|=1.044, **Re μ=+0.44, COMPLEX** (μ=+0.26±1.37j rotating) - ep_eps05 (2.41): top |λ|=1.14, **Re μ=+1.35**, complex (±2.08j) - **Re μ grows −0.02→+0.44→+1.35 as CE drops, |Im μ| grows ~0→1.37→2.08.** GENUINE instability, growing. - NOTE: autograd JVP gave 0 (blk.force detaches internally) → use **FD-JVP** `(F(z+h·u)−F(z))/h·‖v‖`, h=1e-3·‖z‖. 6. **anderson_control.py** — s3200: plain relax floors (res 2.7e-2), Anderson CANNOT reach a root (best 1.4e-3), near-root has **Re μ=+0.24, |Im μ|≈2.0 UNSTABLE rotating**. Solver can't manufacture an absent/unstable equilibrium. ### Reconciles with everything - ε-monotonicity: with Re μ>0, `|1+εμ|>1` for ANY ε; smaller ε just makes the blow-up SLOWER (less discrete over-amplification, esp. the rotating (εb)² term) → delays → wall RELOCATES (2.74→2.41), never closes. - ep_t2fix earlier-blow: cleaner gradient drives into the stiff/unstable regime faster. - It's a COMPOUND: continuous instability (root, Re μ>0) + explicit-Euler over-amplification (2nd layer). ε attacks layer 2 (delay); jacreg/resreg attack the root. ## 2. THE FIX — resreg AND jacreg both work, by controlling the finite-T1 LE Both add back the **finite-time contraction defense** that equilibrium-EP's L(z*) structurally lacks (and that BPTT has implicitly — BPTT differentiates the T1 unroll, so a non-converged z_T1 → bad output → penalized). - **jacreg** = penalize ‖J_nc·v‖ (the non-conservative/rotating Jacobian = attention+FFN; lt_ep_train.py:211-219). **Cause-side**: shrinks |Im μ| → pushes the complex pair back to Re μ<0. - **resreg** = penalize the T1-residual ‖εF(z_T1)‖ (lt_ep_train.py:220-231). **Symptom-side**: residual ~ρ^T1=exp(T1·LE), so resreg ≈ a DIRECT finite-T1 Lyapunov-exponent penalty (catches non-normal transients the eigenvalues miss). ~orthogonal to BPTT-grad (cos −0.047) — a constraint that keeps res low so the EP estimate doesn't collapse. - **Geometry**: relaxation map M=I+εJ; stable ⟺ ρ(M)<1 ⟺ finite-T1 LE<0. Hopf = non-conservative part pushes complex μ past Re μ=0. resreg controls LE directly (output), jacreg controls the structural source (‖J_nc‖). **They stack** (orthogonal handles; cause+symptom). - **eig_jacreg.py CONFIRMED jacreg at the mechanism level** (cont.9): at the SAME loss ~2.74, FROZEN jacreg (redx) = Re μ=+0.45 rotating UNSTABLE g_floor 0.26; ADAPTIVE jacreg (ep_jacreg @2.75) = **Re μ=−0.23 STABLE real, g_floor 0.0001 (true fixed point restored)**. jacreg killed the Hopf + restored AsymEP validity. ## 3. ★★ THE 2.09 CONFIG (recovered from EP_BELOW210:97-101) — the key stabilizer is RESREG, NOT jacreg ★★ The session spent days on adaptive-jacreg; the USER pushed to find the actual 2.09 recipe. It is **FROM SCRATCH + resreg=0.2 + FROZEN jr=0.1** (the original ep_resreg2 reached **2.0573**, lowest EP ever; lost to /tmp wipe; rebuilt ep_resreg_scratch reached 2.22): ``` python3 lt_ep_train.py --mode ep --attn_mode thick --B 24 --C 512 --H 16 --T 256 --c 1.0 \ --jacreg 0.1 --jr_floor 0.1 --jr_max 0.1 --holo 2 --hr 0.02 --t2sel 40 --track --pema 0.999 --t1max 300 \ --res_est 1e-4 --res_gate 0 --resreg 0.2 --qknorm --resinit 0.1 --warmup 800 --T1 150 --T2 20 --lr 6e-4 \ --wsd 0.25 --steps 32000 ... # NO --init_ckpt = from scratch ``` - jr_max=jr_floor=0.1 = jacreg FROZEN (controller off). Adaptive jacreg = release jr_max (e.g. 16); controller :520-529 ramps jr by `(res/res_target)^0.3`. - res_gate MUST be 0 (the gate early-returns before the resreg penalty → bypasses it; res_gate≠0 blew @200 historically). ## 4. RUNS this session (all C512, warm from s2000 unless noted) - **s2000** = `runs/redx_traj/s2000.pt` (redx step 2000, val 3.13) = the STABLE pre-bifurcation start (Re μ=−0.024). Use THIS to warm-start; ep_eps05.pt (2.41, Re μ=+1.35) is UNSTABLE — resreg/jacreg can't RESCUE an already-unstable operator (they PREVENT, they don't reverse). - ep_jacreg (adaptive jacreg, warm s2000): twitchy (jr-spikes to 15 → CE spikes), crawled to ~2.32 then ~stuck/slow. Broke past 2.74/2.41 cleanly though. - **ep_resreg_warm** (resreg=0.2 eager, t2sel=160, warm s2000): SMOOTH (peak res 1.6e-2, no spikes), LEADING at **2.2985** — the clean 2.09 test, still alive. - ep_resreg_fast (resreg t2sel40, warm from UNSTABLE ep_eps05@2.41): BLEW — start-point was unstable. (Confirmed by direct eval: fp32 & TF32 both 2.55 → load fine, TF32 didn't change relaxation; the operator just destabilizes under training.) - ep_resreg_c (resreg t2sel40 +compile, warm s2000): BLEW @2.31. Cause = **t2sel40 (lean gradient)**, NOT compile, NOT warm-start (resreg_warm same warm-start is fine). - ep_rr_scratch (FROM SCRATCH original recipe +compile): launched 2026-06-24 05:32 on GPU3 — tests the user's "from-scratch is robust" hypothesis + the proven 2.09 path. - ep_eps05 (ε=0.05): blew @2.41 (the ε-monotonicity run). ## 5. INFRA / #14 speedup findings - **compile: EXONERATED + SAFE.** lt_ep_train has `--compile` (compiles the FREE-phase via `tforce`, the no_grad fast path; gradient stays eager). Verified numerically identical: tforce vs force rel-diff 9e-7; compile-z150 vs eager-z150 **1.6e-7** (just fp32 op-reorder rounding). Speedup ~1.43x (free phase) / ~3.3x with t2sel40. reduce-overhead/CUDA-graphs BROKEN (0.07x, graph breaks — needs fullgraph/static-shape fix). - **TF32: DROPPED (user decision).** `--tf32` exists (lt_ep_train:368, sets allow_tf32). 10-bit mantissa ≈ 1e-3 precision loss. The relaxation is HYPER-precision-sensitive (ε 0.1→0.05 moved the wall 0.33!), so TF32's 1e-3 perturbation is too coarse → risky. compile(fp32, 1.6e-7) is ~10⁶× below the sensitivity scale → safe; TF32 is not. **DO NOT use --tf32.** - **EP parallelism advantages for #14 (esp. for the scaling/deep phase):** 1. NO sequential backward (vs BPTT's N reverse layers) + NO activation graph (memory-light). 2. COUPLED equilibrium stack (#13, like the Hopfield-ResNet) → all layers relax CONCURRENTLY each step → depth parallelizes (vs BPTT's 2N sequential). (DEQ-style z*=f(z*) with deep f does NOT parallelize.) 3. **adaptive-T1** (relax until residual