Initial commit: ept — backprop-free equilibrium transformer (EP)

Code (ep_run/), organized docs (docs/{method,campaign,hardware,outreach,paper}), analysis scripts (scripts/), ONBOARDING.md entry point. Large data/checkpoints git-ignored (share separately). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_014FAPDWQ49M5Ye3NpTndTpn
author: Yuren Hao <yurenh2@illinois.edu> 2026-07-03 05:56:50 -0500
committer: Yuren Hao <yurenh2@illinois.edu> 2026-07-03 05:56:50 -0500
commit: b83947778e2c776f757a07d4719b7ce961d7ed55 (patch)
tree: b9cc01d7adda691d9156d9d04f4fb2f644674e96 /ep_run/FUGU_Q1_VERDICT.md
1 files changed, 123 insertions, 0 deletions
diff --git a/ep_run/FUGU_Q1_VERDICT.md b/ep_run/FUGU_Q1_VERDICT.md
new file mode 100644
index 0000000..03c3c8d
--- /dev/null
+++ b/ep_run/FUGU_Q1_VERDICT.md
@@ -0,0 +1,123 @@
+# Q1 verdict — below-CE-2.1 divergence mechanism
+
+## Bottom line
+
+**Refute “conclusive” as stated.** The dossier evidence is **strong and code-consistent** evidence that the s3200 free relaxation has lost the usable attracting fixed point and is in an **attention-gain-driven non-conservative oscillatory regime**. It is **not yet conclusive evidence for a Hopf / Neimark-Sacker bifurcation** specifically.
+
+The missing piece is **local spectral evidence at an actual fixed point / continued fixed-point branch**: a leading **complex conjugate eigenvalue pair** of the relaxation map
+
+\[
+G(z)=z+\varepsilon F(z), \qquad M = DG(z^*) = I + \varepsilon J, \qquad J=\partial F/\partial z|_{z^*}
+\]
+
+crossing the unit circle, with real eigenvalue instabilities absent. The current data show the attractor and the causal knob, but not the local bifurcation class.
+
+## 1. Is Fact 1 + Fact 2 conclusive for Hopf?
+
+**No.** It is conclusive for a narrower statement:
+
+> At the redx s3200 checkpoint, with `eps=0.1`, the implemented forward relaxation does not converge from the evaluated embedding state to a fixed point within 6000 steps; instead the one-step residual floors around `~2.3e-2` and oscillates. Reducing the attention output gain by scaling `WO` monotonically shrinks this oscillation and restores convergence by about `alpha=0.2`.
+
+That is highly consistent with a non-conservative attention-driven oscillatory instability, but it does **not** uniquely identify a Hopf bifurcation.
+
+### What the evidence establishes
+
+- **Forward non-convergence / cycle-like attractor:** `eval_relax_s3200.py` applies the same explicit relaxation update as training/eval, records the normalized one-step displacement, and the dossier logs a persistent non-monotone residual floor: about `2.3e-2` after thousands of steps, with tail min/max `2.08e-2 / 2.73e-2`. That is incompatible with ordinary monotone convergence to the free fixed point on that trajectory.
+- **Attention gain is the main causal knob:** `knockout_s3200.py` scales `blk.WO` by `alpha`, i.e. scales the attention output contribution, and the dossier logs monotonic shrinkage of the residual floor/oscillation from `alpha=1.0` to `0.2`, where convergence is restored.
+- **The code really allows non-conservative oscillatory dynamics:** the thick force contains independent attention projections plus an untied FFN inside an explicit Euler relaxation map. There is no energy/gradient-flow guarantee in the active `attn_mode='thick'` path.
+
+### The gap
+
+A Hopf/Neimark-Sacker claim is a **local spectral claim** about the derivative of the map at a fixed point branch. The current facts are **trajectory/knockout facts**:
+
+- Fact 1 shows a sustained oscillatory forward trajectory at `eps=0.1`; it does not show which eigenvalue crossed first.
+- Fact 2 shows that reducing total attention output gain removes the oscillation; it does not isolate the **antisymmetric Jacobian part** `A=(J-J^T)/2`, nor does it rule out other nonlinear or discrete-time routes to an oscillatory attractor.
+- Because the reported s3200 `alpha=1` trajectory does not converge, an eigenvalue computed at an arbitrary T1 or cycle point would be only an **instantaneous Jacobian**, not a formal Hopf test unless the underlying fixed point/branch is also identified or continued.
+
+### Alternatives not yet excluded
+
+1. **Real-eigenvalue fixed-point loss / saddle-node-like route.**  
+   A real leading eigenvalue of `M` crossing `+1` would indicate loss of contraction along a non-oscillatory mode. The observed limit cycle could then be a secondary nonlinear attractor reached after the fixed point destabilizes or disappears, not the primary Hopf mechanism.
+
+2. **Discrete Euler artifact.**  
+   The actual implemented dynamics are not continuous-time integration; they are the map `z <- z + eps*F(z)`. If `J` has eigenvalue `nu=a+ib`, the Euler-map eigenvalue is `mu=1+eps*nu`. It is possible to have `a<0` — stable continuous-time linear dynamics — but `|1+eps*nu|>1` at `eps=0.1` because the step is too large. That would be a numerical/discrete relaxation instability, not a true continuous-time Hopf. A real `mu<-1` would be the clean period-2 / flip case.
+
+3. **FFN contribution.**  
+   The thick `nc_force` treats `attention + FFN` as the non-conservative part, and the FFN is untied. The knockout log itself shows `alpha=0.0` still has a tiny residual/oscillation (`res-floor ~1.3e-3`, `osc ~1.2e-3`), so an FFN-only contribution is not zero. The data do support attention as the dominant driver, but not attention as the exclusive source.
+
+4. **qknorm / attention nonlinearity contribution.**  
+   The evaluated block has `blk.qknorm=True`, and q/k RMSNorm is inside attention. Scaling `WO` suppresses the whole attention output path, including effects mediated by qknorm. Therefore the knockout does not separate “antisymmetric attention matrix/gain” from the nonlinear qknorm-shaped attention Jacobian.
+
+So the rigorous conclusion is:
+
+> **Plausible and likely:** attention-dominated non-conservative complex-mode instability.  
+> **Not yet proven:** Hopf/Neimark-Sacker crossing of a complex conjugate pair as the bifurcation mechanism.
+
+## 2. Single cleanest measurement
+
+**Do the local Jacobian spectrum measurement.** More precisely: compute the leading eigenvalues of both `J = dF/dz` and `M = I + eps*J` on the s3200 checkpoint along a fixed-point branch restored by attention scaling. This is more decisive than a pure epsilon sweep or Floquet analysis, because it directly distinguishes complex-pair Hopf from real-eigenvalue loss and also predicts whether `eps=0.1` is a discrete-Euler artifact.
+
+### Exact measurement to run later, not now
+
+Freeze the redx s3200 checkpoint, same batch/sequence and `qknorm=True`. Define
+
+\[
+F_\alpha(z,x)=-(z-x)+\alpha\,Attn(LN_1 z)+FFN(LN_2 z)-c z
+\]
+
+where `alpha` is implemented exactly as in `knockout_s3200.py` by scaling `WO`. For `alpha` values bracketing the observed transition — at minimum around `0.2`, `0.4`, `0.7`, `1.0`, then refined near the first loss of convergence — do:
+
+1. Find a true fixed point `z*_alpha` satisfying `||F_alpha(z*)||/||z*||` very small, using long relaxation where it converges and preferably continuation/Newton from the previous `alpha` so the branch can be followed up to the marginal point.
+2. At each `z*_alpha`, compute the leading eigenvalues `nu_i` of
+   `J_alpha = dF_alpha/dz | z*_alpha`
+   using JVP/Arnoldi or another matrix-free eigensolver.
+3. Convert them to relaxation-map eigenvalues
+   `mu_i = 1 + 0.1 * nu_i`.
+4. Record the leading `|mu_i|`, whether the leading pair is complex or real, and for complex `nu=a+ib` also record `a=Re(nu)` and the Euler stability threshold
+   \[
+   eps_crit = -2a/(a^2+b^2) \quad \text{when } a<0.
+   \]
+
+### Outcome table
+
+- **Confirms Hopf / Neimark-Sacker of the implemented relaxation map:**  
+  A complex conjugate pair is the leading spectrum and crosses `|mu|=1` as `alpha` or training step increases; real eigenvalues stay inside the unit circle. The observed oscillation frequency should be compatible with `arg(mu)` per relaxation step. This confirms the map-level Hopf mechanism.
+
+- **Confirms true continuous-time Hopf rather than Euler artifact:**  
+  The same complex pair has `Re(nu)` crossing through `0` to positive values. Then shrinking `eps` changes the discretization but does not restore continuous-time stability once `Re(nu)>0`.
+
+- **Shows Euler-step artifact instead:**  
+  The leading pair is complex and `|1+0.1*nu| >= 1`, but `Re(nu) < 0`. Then the continuous-time linearization is damped, while the explicit Euler step is unstable. The predicted stabilizing step is `eps < eps_crit`; an epsilon sweep would be confirmatory, but the spectrum already gives the answer.
+
+- **Shows real saddle-node / steady instability instead:**  
+  The leading eigenvalue crossing is real near `mu=+1` / `nu=0`. Then the Hopf claim is wrong; the limit cycle is downstream nonlinear behavior after a real fixed-point loss.
+
+- **Shows flip / two-cycle artifact:**  
+  A real map eigenvalue crosses `mu=-1` or is `< -1`. Then the oscillation is a discrete period-doubling / 2-cycle-type instability, not Hopf.
+
+- **Shows FFN is materially involved:**  
+  If the unstable/near-unstable pair remains when `alpha=0`, or if the leading antisymmetric contribution is dominated by the FFN block, then “attention antisymmetric part drives it” is overstated. If the pair moves safely inside the unit circle as `alpha` is reduced and disappears with attention removed, then the attention-dominant mechanism is supported.
+
+Why not make the epsilon sweep the primary measurement? It is useful, but indirect. If smaller `eps` converges, that could indicate an Euler artifact, but it would not by itself distinguish complex Euler instability from real flip or other nonlinear step-size effects. The Jacobian spectrum gives the bifurcation class and the epsilon prediction in one measurement.
+
+Why not Floquet/period first? Floquet multipliers of the observed cycle would quantify stability of the cycle, and period/frequency could corroborate `arg(mu)`, but they do not identify which fixed-point eigenvalue caused the attractor to appear. Use Floquet/period only as a follow-up.
+
+## 3. Consistency with the actual code
+
+The proposed mechanism is **consistent with the implemented force and relaxation map**, with the caveat that the code implicates `attention + FFN` as the active non-conservative block, not mathematically pure attention alone.
+
+- **The thick force is exactly the stated form.** In `lt_ep_train.py`, `tforce` computes layer-normed attention and FFN and returns `-(z - xin) + self.attn(h1) + ff - self.c * z` (`lt_ep_train.py:81-85`). With `c=1`, this is `xin - 2z + Attn(LN z) + FFN(LN z)`. The autograd-enabled `force` path for `attn_mode == 'thick'` computes the same structure and returns it at `lt_ep_train.py:99-106`.
+
+- **The relaxation is explicit Euler.** `relax` updates `z` by `z = z + eps * blk.force(z, xin).detach()` (`lt_ep_train.py:123-133`). Therefore the linearized relaxation map is exactly `M = I + eps*J`.
+
+- **The free phase used by EP is this relaxation state.** `ep_step` embeds the input, computes `zs = relax(..., T1, eps)`, then measures a one-step residual from that state (`lt_ep_train.py:140-145`). The code explicitly records this as the T1 free-phase state before any optional refinement (`lt_ep_train.py:146`).
+
+- **The attention path is non-conservative in the active model.** Attention uses independent `WQ`, `WK`, `WV`, `WO` projections (`lt_ep_train.py:58-68`), and optional q/k RMSNorm when `blk.qknorm` is set (`lt_ep_train.py:63-65`). The eval scripts do set `blk.qknorm=True` (`eval_relax_s3200.py:8`, `knockout_s3200.py:10`). There is no tied-energy construction in the thick path.
+
+- **The knockout really scales attention output.** `knockout_s3200.py` loads the same checkpoint and performs `blk.WO.mul_(alpha)` before relaxation (`knockout_s3200.py:9-17`). Thus the logged alpha trend is a legitimate intervention on total attention output gain.
+
+- **The code itself treats FFN as part of the non-conservative component.** In thick mode, `nc_force` returns `attention + FFN`, not attention alone (`lt_ep_train.py:92-97`). The AEP nudged correction also applies `Jv - JTv` of this `nc_force` in the real/thick modes (`lt_ep_train.py:171-179`). In `holo_ep.py`, the holomorphic and real-axis thick forces match the same `-(z-xin)+att+ff-c*z` structure (`holo_ep.py:36-51`, `holo_ep.py:134-152`), and their AEP correction again uses `Jv-JTv` of `blk.nc_force` (`holo_ep.py:76-84`, `holo_ep.py:176-185`).
+
+## Final verdict
+
+**The Hopf story is code-consistent and likely, but not proven.** The current evidence nails an attention-dominated non-conservative forward oscillation at the implemented `eps=0.1`; it does **not** yet nail the bifurcation class. The decisive next measurement is the **leading spectrum of `J` and `M=I+eps*J` on the s3200 fixed-point branch under attention-gain continuation**. A complex conjugate pair crossing `|mu|=1`, with real modes stable and with `Re(nu)` interpreted to rule in/out Euler-step instability, would settle the question.
author	Yuren Hao <yurenh2@illinois.edu>	2026-07-03 05:56:50 -0500
committer	Yuren Hao <yurenh2@illinois.edu>	2026-07-03 05:56:50 -0500
commit	b83947778e2c776f757a07d4719b7ce961d7ed55 (patch)
tree	b9cc01d7adda691d9156d9d04f4fb2f644674e96 /ep_run/FUGU_Q1_VERDICT.md