summaryrefslogtreecommitdiff
path: root/docs/campaign/SESSION_2026-06-24_HOPF_DIAGNOSIS_RESREG_FIX.md
blob: d605140b92632dcbe4583ac38e3dd91b6528e15b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
# Session 2026-06-22→24: below-2.1 wall — DIAGNOSIS FLIPPED to genuine Hopf, FIX = resreg/jacreg (LE control), 2.09 recipe recovered

THE definitive write-up of this multi-day session. Supersedes the "Euler-artifact" framing in EP_BELOW210 cont.7.

## 1. HEADLINE — the diagnosis flipped twice, landed on GENUINE Hopf instability
The C512 EP below-CE-2.1 divergence is a **genuine continuous-time Hopf instability** of the learned **non-conservative attention** operator — a complex eigenvalue pair crossing **Re μ > 0** as EP trains the attention expressive. **NOT a discrete-Euler artifact** (that was an intermediate wrong turn). Analog/continuous hardware is NOT automatically immune.

### Evidence chain (3 independent methods, all in ep_run/)
1. **eval_relax_s3200.py** — relax the marginal ckpt redx s3200 (val 2.74) 6000 steps → res floors ~2.3e-2, OSCILLATES, no fixed point = limit cycle.
2. **knockout_s3200.py** — scale attention output WO×α: cycle scales with α, CONVERGES at α=0.2 → the **non-conservative attention drives the cycle**.
3. **eps_sweep_s3200.py** — cycle amplitude shrinks monotonically as ε↓ (0.1→2.3e-2, 0.01→8.9e-4). *Intermediate misread*: thought "Euler artifact, analog immune" (cont.7). **fugu caught the bug**: that 8.9e-4 at ε=0.01 is the STEP residual `r=ε·g`, i.e. the SAME force-floor g≈0.09 — so the sweep proves the oscillation is discrete-amplified but does NOT prove a clean continuous fixed point exists.
4. **ε-TRAINING-monotonicity (3 runs)**: ε=0.1→blew@**2.74**; ε=0.1,t2sel=160 (BETTER gradient)→blew@**3.02** EARLIER (⇒ gradient quality is NOT the lever); ε=0.05→blew@**2.41**. Smaller ε → strictly lower wall.
5. **eig_probe.py** (matrix-free FINITE-DIFFERENCE-JVP Arnoldi on M=I+εJ — DECISIVE): leading continuous μ=(λ−1)/ε at the operating point:
   - s2000 (3.13): all |λ|<1, **Re μ=−0.024 STABLE**
   - s3200 (2.74): top |λ|=1.044, **Re μ=+0.44, COMPLEX** (μ=+0.26±1.37j rotating)
   - ep_eps05 (2.41): top |λ|=1.14, **Re μ=+1.35**, complex (±2.08j)
   - **Re μ grows −0.02→+0.44→+1.35 as CE drops, |Im μ| grows ~0→1.37→2.08.** GENUINE instability, growing.
   - NOTE: autograd JVP gave 0 (blk.force detaches internally) → use **FD-JVP** `(F(z+h·u)−F(z))/h·‖v‖`, h=1e-3·‖z‖.
6. **anderson_control.py** — s3200: plain relax floors (res 2.7e-2), Anderson CANNOT reach a root (best 1.4e-3), near-root has **Re μ=+0.24, |Im μ|≈2.0 UNSTABLE rotating**. Solver can't manufacture an absent/unstable equilibrium.

### Reconciles with everything
- ε-monotonicity: with Re μ>0, `|1+εμ|>1` for ANY ε; smaller ε just makes the blow-up SLOWER (less discrete over-amplification, esp. the rotating (εb)² term) → delays → wall RELOCATES (2.74→2.41), never closes.
- ep_t2fix earlier-blow: cleaner gradient drives into the stiff/unstable regime faster.
- It's a COMPOUND: continuous instability (root, Re μ>0) + explicit-Euler over-amplification (2nd layer). ε attacks layer 2 (delay); jacreg/resreg attack the root.

## 2. THE FIX — resreg AND jacreg both work, by controlling the finite-T1 LE
Both add back the **finite-time contraction defense** that equilibrium-EP's L(z*) structurally lacks (and that BPTT has implicitly — BPTT differentiates the T1 unroll, so a non-converged z_T1 → bad output → penalized).

- **jacreg** = penalize ‖J_nc·v‖ (the non-conservative/rotating Jacobian = attention+FFN; lt_ep_train.py:211-219). **Cause-side**: shrinks |Im μ| → pushes the complex pair back to Re μ<0.
- **resreg** = penalize the T1-residual ‖εF(z_T1)‖ (lt_ep_train.py:220-231). **Symptom-side**: residual ~ρ^T1=exp(T1·LE), so resreg ≈ a DIRECT finite-T1 Lyapunov-exponent penalty (catches non-normal transients the eigenvalues miss). ~orthogonal to BPTT-grad (cos −0.047) — a constraint that keeps res low so the EP estimate doesn't collapse.
- **Geometry**: relaxation map M=I+εJ; stable ⟺ ρ(M)<1 ⟺ finite-T1 LE<0. Hopf = non-conservative part pushes complex μ past Re μ=0. resreg controls LE directly (output), jacreg controls the structural source (‖J_nc‖). **They stack** (orthogonal handles; cause+symptom).
- **eig_jacreg.py CONFIRMED jacreg at the mechanism level** (cont.9): at the SAME loss ~2.74, FROZEN jacreg (redx) = Re μ=+0.45 rotating UNSTABLE g_floor 0.26; ADAPTIVE jacreg (ep_jacreg @2.75) = **Re μ=−0.23 STABLE real, g_floor 0.0001 (true fixed point restored)**. jacreg killed the Hopf + restored AsymEP validity.

## 3. ★★ THE 2.09 CONFIG (recovered from EP_BELOW210:97-101) — the key stabilizer is RESREG, NOT jacreg ★★
The session spent days on adaptive-jacreg; the USER pushed to find the actual 2.09 recipe. It is **FROM SCRATCH + resreg=0.2 + FROZEN jr=0.1** (the original ep_resreg2 reached **2.0573**, lowest EP ever; lost to /tmp wipe; rebuilt ep_resreg_scratch reached 2.22):
```
python3 lt_ep_train.py --mode ep --attn_mode thick --B 24 --C 512 --H 16 --T 256 --c 1.0 \
  --jacreg 0.1 --jr_floor 0.1 --jr_max 0.1 --holo 2 --hr 0.02 --t2sel 40 --track --pema 0.999 --t1max 300 \
  --res_est 1e-4 --res_gate 0 --resreg 0.2 --qknorm --resinit 0.1 --warmup 800 --T1 150 --T2 20 --lr 6e-4 \
  --wsd 0.25 --steps 32000 ...   # NO --init_ckpt = from scratch
```
- jr_max=jr_floor=0.1 = jacreg FROZEN (controller off). Adaptive jacreg = release jr_max (e.g. 16); controller :520-529 ramps jr by `(res/res_target)^0.3`.
- res_gate MUST be 0 (the gate early-returns before the resreg penalty → bypasses it; res_gate≠0 blew @200 historically).

## 4. RUNS this session (all C512, warm from s2000 unless noted)
- **s2000** = `runs/redx_traj/s2000.pt` (redx step 2000, val 3.13) = the STABLE pre-bifurcation start (Re μ=−0.024). Use THIS to warm-start; ep_eps05.pt (2.41, Re μ=+1.35) is UNSTABLE — resreg/jacreg can't RESCUE an already-unstable operator (they PREVENT, they don't reverse).
- ep_jacreg (adaptive jacreg, warm s2000): twitchy (jr-spikes to 15 → CE spikes), crawled to ~2.32 then ~stuck/slow. Broke past 2.74/2.41 cleanly though.
- **ep_resreg_warm** (resreg=0.2 eager, t2sel=160, warm s2000): SMOOTH (peak res 1.6e-2, no spikes), LEADING at **2.2985** — the clean 2.09 test, still alive.
- ep_resreg_fast (resreg t2sel40, warm from UNSTABLE ep_eps05@2.41): BLEW — start-point was unstable. (Confirmed by direct eval: fp32 & TF32 both 2.55 → load fine, TF32 didn't change relaxation; the operator just destabilizes under training.)
- ep_resreg_c (resreg t2sel40 +compile, warm s2000): BLEW @2.31. Cause = **t2sel40 (lean gradient)**, NOT compile, NOT warm-start (resreg_warm same warm-start is fine).
- ep_rr_scratch (FROM SCRATCH original recipe +compile): launched 2026-06-24 05:32 on GPU3 — tests the user's "from-scratch is robust" hypothesis + the proven 2.09 path.
- ep_eps05 (ε=0.05): blew @2.41 (the ε-monotonicity run).

## 5. INFRA / #14 speedup findings
- **compile: EXONERATED + SAFE.** lt_ep_train has `--compile` (compiles the FREE-phase via `tforce`, the no_grad fast path; gradient stays eager). Verified numerically identical: tforce vs force rel-diff 9e-7; compile-z150 vs eager-z150 **1.6e-7** (just fp32 op-reorder rounding). Speedup ~1.43x (free phase) / ~3.3x with t2sel40. reduce-overhead/CUDA-graphs BROKEN (0.07x, graph breaks — needs fullgraph/static-shape fix).
- **TF32: DROPPED (user decision).** `--tf32` exists (lt_ep_train:368, sets allow_tf32). 10-bit mantissa ≈ 1e-3 precision loss. The relaxation is HYPER-precision-sensitive (ε 0.1→0.05 moved the wall 0.33!), so TF32's 1e-3 perturbation is too coarse → risky. compile(fp32, 1.6e-7) is ~10⁶× below the sensitivity scale → safe; TF32 is not. **DO NOT use --tf32.**
- **EP parallelism advantages for #14 (esp. for the scaling/deep phase):**
  1. NO sequential backward (vs BPTT's N reverse layers) + NO activation graph (memory-light).
  2. COUPLED equilibrium stack (#13, like the Hopfield-ResNet) → all layers relax CONCURRENTLY each step → depth parallelizes (vs BPTT's 2N sequential). (DEQ-style z*=f(z*) with deep f does NOT parallelize.)
  3. **adaptive-T1** (relax until residual<tol, the t1max machinery already does this for z*) — easier/cleaner than adaptive-ε (convergence signal is cleaner than overshoot). Speed + auto-converged readout + cap-hit=instability-flag.
  4. adaptive-ε (#30) as 2nd-order.
- GPU: 4× RTX A6000 (49GB). GPU0/1/3 = ours; **GPU2 = others' NV-Embed-v2 server (port 8555/8556) — DON'T TOUCH.** For RENTING (user found cheap provider, RTX models + H800, no A100/H100): **EP is memory-light + FLOP-bound → consumer 4090/5090 = best $/FLOP** for the bulk; 48-80GB (A6000/H800) only for BPTT-twins (memory-heavy) + 0.6B. EP per-step FLOPs ~1.5-3x BPTT (2 relaxations vs fwd+bwd) but that cost is SIM-only (analog relaxation is free physics).

## 6. Hopfield-ResNet paper (arxiv 2509.26003) — confirms our diagnosis
"Scaling EP to Deeper Architectures" trained 12-conv Hopfield-ResNet with EP. It is **CONSERVATIVE** (energy function Φ, SYMMETRIC weights, monotone energy descent — "no oscillation or limit cycles"). No attention, no non-symmetric ops. **Confirms: non-conservativity is OUR culprit; conservative systems get depth free (no Hopf). We are the first to EP-train NON-conservative attention (which has the Hopf), solved via jacreg/resreg.** Good for the dossier: prior EP-deep = conservative/no-attention; ours = the harder non-conservative case.

## 7. fugu-ultra consultations (all in ep_run/)
- FUGU_VERDICT_FULL.md (Q1-Q4): confirmed attention-driven oscillation, FLAGGED the Euler-artifact (we then measured it's a true Hopf), said the eigenpair is the decisive measurement. Fix=adaptive jacreg homeostat+res_gate; sub-threshold attention IS expressive (BPTT 1.83 proves it); keep below the instability.
- FUGU_OPTIONS_VERDICT.md (Q1-Q3): the step-vs-force-residual CORRECTION; adaptive ε eliminates only if ε_min<ε_crit; jacreg raises ε_crit (model-side, same wall) + a real analog settling benefit + true-Hopf insurance; Anderson/implicit are emulator-fidelity (analog-faithful), jacreg edits the model. Recommended: adaptive-ε+Anderson(both phases)+bounded jacreg, report FORCE-residual + Re μ.

## 8. KEY FILES
- Probes: ep_run/{eig_probe.py, anderson_control.py, eig_jacreg.py, adaptive_eps_calib.py, adaptive_eps_calib2.py, eps_sweep_s3200.py, knockout_s3200.py, eval_relax_s3200.py, compile_bench.py}
- Dossiers: ep_run/{EP_DIAGNOSIS_DOSSIER.md, FUGU_VERDICT_FULL.md, FUGU_Q_OPTIONS.md, FUGU_OPTIONS_VERDICT.md, FUGU_Q1_VERDICT.md}
- EP_BELOW210_DIAGNOSIS_FIX.md: cont.6 (structural/forward-mode — SUPERSEDED), cont.7 (ε-artifact — SUPERSEDED), **cont.8 (Hopf correction — CURRENT)**, **cont.9 (jacreg confirmed at mechanism level)** + OBS (oscillation = benign weight transient, not Hopf), **2026-06-23 ε-monotonicity RESULT**, the 2.09 recipe at :97-101.
- Code: lt_ep_train.py — force/tforce:81-106, relax:123, ep_step:140, jacreg:211-219, resreg:220-231, jr controller:520-529, --compile (works), --tf32 (DON'T use).

## 9. CURRENT STATE + NEXT
- Running: ep_jacreg (~2.32), ep_resreg_warm (LEADING 2.2985, clean 2.09 test), ep_rr_scratch (from-scratch, just launched). Watchers: ep_resreg_check.py→2.20, ep_jacreg_binary.py→2.30 (detached→/tmp), ep_rr_scratch needs one.
- **THE open question: does resreg break 2.09?** ep_resreg_warm (warm+t2sel160) at 2.2985 smooth → likely; ep_rr_scratch (from-scratch, proven recipe) = the robust confirmation.
- Lessons: (a) the 2.09 stabilizer is RESREG (from scratch), not adaptive jacreg; (b) warm-start ONLY from a STABLE operator (s2000), never an already-blown one (ep_eps05); (c) t2sel40 (lean grad) is fragile deep, t2sel160 safer; (d) compile safe (fp32), TF32 unsafe (precision); (e) the run is precision-hyper-sensitive.
- Recurring bug to avoid: `pkill -f "ckpt runs/X.pt"` SELF-MATCHES the bash → exit 144; kill by explicit PID instead. And `nohup python … &` inside a run_in_background bash DETACHES it (no notify) — run `python3 watcher.py` directly as the tracked task.