omega/norm-family refuted as stability signal; fingerprint story retracted; eigreg v2 = true map-eigenvalue (spec_penalty)HEAD master

- eig_control: fix plain-PI bug (shifted PI for lambda_max of indefinite Sym); add lead_rho + spec_penalty (soft one-sided cap on |lam|(I+eps*J_F), 2-D Rayleigh-Ritz, matvec-only) — aep 'spectral' ported. eig_penalty demoted to diagnostic. - eig_recheck.py (Lanczos audit): omega=+5..+13 on ALL operators incl the stablest (s2000 +12.8 while true alpha=-0.02); gap omega-alpha~10; old 'warm -10.14 vs scratch +1.11' numbers were PI-mixture artifacts. RETRACTED. - eig_v2_smoke/depth: v2 mechanics validated vs ARPACK; z_T1 readings >1 are unconverged-state contamination (150: 1.009 -> 400/800: 0.997-0.999, mu=-0.02..-0.006 matching eig_probe); fixed-point top = BAND of slow modes. - lt_ep_train: --eigreg now spec_penalty (--eig_margin 0.995 = rho target); --fingerprint reports rho/Re_mu instead of num_abscissa. - ONBOARDING §4-7 + FINDINGS 2026-07-03: retraction + verdict (fundamental quantity = finite-horizon path LE / resreg axis; de-cliff via floss-ept; spec_penalty = measure-mode scalpel for a detaching Hopf pair). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_014FAPDWQ49M5Ye3NpTndTpn
author: Yuren Hao <yurenh2@illinois.edu> 2026-07-03 07:57:22 -0500
committer: Yuren Hao <yurenh2@illinois.edu> 2026-07-03 07:57:22 -0500
commit: bcec9560cf5c9b113e9381a52d1a941daa8865f2 (patch)
tree: bae3baf6d742b816d90e642d70b9744a86a4d189 /ONBOARDING.md
parent: c0b507fb1760be291e1e1ed33f33fb18f16d8c2d (diff)
1 files changed, 16 insertions, 10 deletions
diff --git a/ONBOARDING.md b/ONBOARDING.md
index 1b36a72..25f1286 100644
--- a/ONBOARDING.md
+++ b/ONBOARDING.md
@@ -39,7 +39,7 @@ The binding constraint is **NOT the gradient** — it's **forward fixed-point ST
 optimization makes attention more expressive/non-conservative, the operator loses contraction, a complex-eigenvalue
 pair of its Jacobian crosses the imaginary axis (**a supercritical Hopf bifurcation**), the relaxation stops
 converging (residual → 0.1+), and training breaks. Controls that hold it: **`resreg`** (penalize the T1 residual),
-**`jacreg`** (penalize the Jacobian norm), and the new **`eigreg`** (leading-abscissa / log-norm control, §5).
+**`jacreg`** (penalize the Jacobian norm), and the new **`eigreg`** (v2: TRUE leading map-eigenvalue / spectral control, §5).
 > This stability question generalized into a **standalone paper** — *"Dynamics and Convergence of Equilibrium
 > Learning"* (the report we shared with Ben Scellier is that spin-off, in `/home/yurenh2/aep-dynamics/`): the Hopf +
 > a leading-spectral-signal cure, shown across MLP/CNN/RNN and across learning rules (EP and DEQ/RBP). ept is the
@@ -47,12 +47,16 @@ converging (residual → 0.1+), and training breaks. Controls that hold it: **`r
 
 ## 5. Open problems — where you can plug in (ranked)
 1. **★ Crack from-scratch below 2.0 (the crux).** We *ultimately need* from-scratch (no magic warm checkpoint) for a
-   real / hardware result. Diagnosis (via the new `--fingerprint`): the warm source `s2000` is a **deeply contractive**
-   operator (numerical abscissa −10) with a well-aligned EP gradient; a from-scratch plateau operator sits **near the
-   Hopf boundary** (abscissa +1.11) with a modestly worse gradient — and *training drifts the operator toward the
-   boundary as it learns* (val 3.16→2.24 tracks abscissa −10→+1.11). **Hypothesis to test:** hold the operator
-   deeply-contractive from scratch with `--eigreg` (leading-abscissa control) → crack the plateau without a warm start.
-   Tools are built and default-off: `diag_cos.py` (`--diag_cos N`, `--fingerprint`), `eig_control.py` (`--eigreg`).
+   real / hardware result. The stability signal that matters is the **TRUE leading eigenvalue of the forward map**
+   (`eig_probe.py` FD-JVP — it resolves the complex Hopf pair): the healthy warm source `s2000` sits at Re μ ≈ −0.02,
+   and un-stabilized runs cross to +0.44 / +1.35 as CE drops. ⚠️ *Retraction (2026-07-03):* an earlier fingerprint
+   story ("warm = deeply contractive ω=−10 vs scratch = near-boundary ω=+1.11") was a power-iteration artifact — the
+   gold-standard Lanczos audit (`ep_run/eig_recheck.py`) shows the **numerical abscissa ω is +5..+13 on ALL operators**
+   (stable and unstable alike; non-normality gap ω−α ≈ 10), so ω / log-norm / Jacobian-norm quantities are **not usable
+   stability signals** for this operator family. **Hypothesis to test:** control the true map eigenvalue from scratch
+   with `--eigreg` (v2 = soft one-sided penalty on |λ|_lead(I+εJ_F), the aep-dynamics 'spectral' steering ported to
+   C512; `eig_control.py::spec_penalty`) → crack the plateau without a warm start. Tools built and default-off:
+   `diag_cos.py` (`--diag_cos N`, `--fingerprint` — reports ρ / Re μ), `eig_control.py` (`--eigreg`, `--eig_margin 0.995`).
 2. **Scaling** to hundreds-of-M / small-LLM (gated on cloud compute — a Scellier/AWS path is in progress).
 3. **Speed** (`ep_run/profile_ep.py`, `cos_sweep.py`): the holo a-select is ~56% of the step; `t2sel` is a
    cosine-preserving speed lever (160→80 ≈ 1.8× free); multi-GPU data-parallel EP is untried.
@@ -63,8 +67,10 @@ converging (residual → 0.1+), and training breaks. Controls that hold it: **`r
 - **`lt_ep_train.py`** — everything: the block, `ep_step` (EP training), `bptt_step` (exact-gradient control),
   `relax`, `evaluate`, the residual/jacreg controllers, the training loop. The one file to read first.
 - **`holo_ep.py`** — the adaptive-T2 nudged-phase estimator (`holo_a_select`, `holo_a_track`).
-- **`diag_cos.py`** (new) — `cos(EP, BPTT)` trajectory + operator `fingerprint` (res / cos / numerical-abscissa / val).
-- **`eig_control.py`** (new) — the `--eigreg` leading-abscissa control (power-iteration, scalable, analog-compatible).
+- **`diag_cos.py`** (new) — `cos(EP, BPTT)` trajectory + operator `fingerprint` (res / cos / ρ / Re μ / val).
+- **`eig_control.py`** (new) — `--eigreg` v2: TRUE leading map-eigenvalue control (`spec_penalty`, 2-D subspace
+  iteration, matvec-only, analog-compatible; the ω/numerical-abscissa version is kept as diagnostic only — refuted by
+  `eig_recheck.py`, the Lanczos audit).
 - `eig_probe.py`, `cos_sweep.py`, `profile_ep.py`, `bp_transformer.py` (BP baseline) — probes / baselines.
 - `data/` (TinyStories-BPE, ~712M) and `runs/` (~8G checkpoints) — **git-ignored; get these separately.**
 
@@ -77,7 +83,7 @@ python3 lt_ep_train.py --mode ep --attn_mode thick --B 24 --C 512 --H 16 --T 256
   --steps 32000 --data data/tinystories_bpe --ckpt runs/myrun.pt --state runs/myrun.state
 ```
 Diagnostics: add `--diag_cos 500` (log cos-to-BPTT over training) · `--init_ckpt <ckpt> --fingerprint` (print an
-operator's 4-D fingerprint) · `--eigreg 0.1 --eig_margin 1.0` (leading-abscissa control, alt to `--jacreg`).
+operator's fingerprint: res/cos/ρ/Reμ/val) · `--eigreg 0.1 --eig_margin 0.995` (true map-eigenvalue control, alt to `--jacreg`).
 BP baseline (fair control): `--mode bptt`. **All experiment processes must use `nohup`.**
 
 **Getting the data & checkpoints (git-ignored — not in this repo):** one command.
author	Yuren Hao <yurenh2@illinois.edu>	2026-07-03 07:57:22 -0500
committer	Yuren Hao <yurenh2@illinois.edu>	2026-07-03 07:57:22 -0500
commit	bcec9560cf5c9b113e9381a52d1a941daa8865f2 (patch)
tree	bae3baf6d742b816d90e642d70b9744a86a4d189 /ONBOARDING.md
parent	c0b507fb1760be291e1e1ed33f33fb18f16d8c2d (diff)