summaryrefslogtreecommitdiff
path: root/ep_run/FUGU_Q1_VERDICT.md
blob: 03c3c8daa90a7048fdb2be4adb73ed34d9b6f31f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
# Q1 verdict — below-CE-2.1 divergence mechanism

## Bottom line

**Refute “conclusive” as stated.** The dossier evidence is **strong and code-consistent** evidence that the s3200 free relaxation has lost the usable attracting fixed point and is in an **attention-gain-driven non-conservative oscillatory regime**. It is **not yet conclusive evidence for a Hopf / Neimark-Sacker bifurcation** specifically.

The missing piece is **local spectral evidence at an actual fixed point / continued fixed-point branch**: a leading **complex conjugate eigenvalue pair** of the relaxation map

\[
G(z)=z+\varepsilon F(z), \qquad M = DG(z^*) = I + \varepsilon J, \qquad J=\partial F/\partial z|_{z^*}
\]

crossing the unit circle, with real eigenvalue instabilities absent. The current data show the attractor and the causal knob, but not the local bifurcation class.

## 1. Is Fact 1 + Fact 2 conclusive for Hopf?

**No.** It is conclusive for a narrower statement:

> At the redx s3200 checkpoint, with `eps=0.1`, the implemented forward relaxation does not converge from the evaluated embedding state to a fixed point within 6000 steps; instead the one-step residual floors around `~2.3e-2` and oscillates. Reducing the attention output gain by scaling `WO` monotonically shrinks this oscillation and restores convergence by about `alpha=0.2`.

That is highly consistent with a non-conservative attention-driven oscillatory instability, but it does **not** uniquely identify a Hopf bifurcation.

### What the evidence establishes

- **Forward non-convergence / cycle-like attractor:** `eval_relax_s3200.py` applies the same explicit relaxation update as training/eval, records the normalized one-step displacement, and the dossier logs a persistent non-monotone residual floor: about `2.3e-2` after thousands of steps, with tail min/max `2.08e-2 / 2.73e-2`. That is incompatible with ordinary monotone convergence to the free fixed point on that trajectory.
- **Attention gain is the main causal knob:** `knockout_s3200.py` scales `blk.WO` by `alpha`, i.e. scales the attention output contribution, and the dossier logs monotonic shrinkage of the residual floor/oscillation from `alpha=1.0` to `0.2`, where convergence is restored.
- **The code really allows non-conservative oscillatory dynamics:** the thick force contains independent attention projections plus an untied FFN inside an explicit Euler relaxation map. There is no energy/gradient-flow guarantee in the active `attn_mode='thick'` path.

### The gap

A Hopf/Neimark-Sacker claim is a **local spectral claim** about the derivative of the map at a fixed point branch. The current facts are **trajectory/knockout facts**:

- Fact 1 shows a sustained oscillatory forward trajectory at `eps=0.1`; it does not show which eigenvalue crossed first.
- Fact 2 shows that reducing total attention output gain removes the oscillation; it does not isolate the **antisymmetric Jacobian part** `A=(J-J^T)/2`, nor does it rule out other nonlinear or discrete-time routes to an oscillatory attractor.
- Because the reported s3200 `alpha=1` trajectory does not converge, an eigenvalue computed at an arbitrary T1 or cycle point would be only an **instantaneous Jacobian**, not a formal Hopf test unless the underlying fixed point/branch is also identified or continued.

### Alternatives not yet excluded

1. **Real-eigenvalue fixed-point loss / saddle-node-like route.**  
   A real leading eigenvalue of `M` crossing `+1` would indicate loss of contraction along a non-oscillatory mode. The observed limit cycle could then be a secondary nonlinear attractor reached after the fixed point destabilizes or disappears, not the primary Hopf mechanism.

2. **Discrete Euler artifact.**  
   The actual implemented dynamics are not continuous-time integration; they are the map `z <- z + eps*F(z)`. If `J` has eigenvalue `nu=a+ib`, the Euler-map eigenvalue is `mu=1+eps*nu`. It is possible to have `a<0` — stable continuous-time linear dynamics — but `|1+eps*nu|>1` at `eps=0.1` because the step is too large. That would be a numerical/discrete relaxation instability, not a true continuous-time Hopf. A real `mu<-1` would be the clean period-2 / flip case.

3. **FFN contribution.**  
   The thick `nc_force` treats `attention + FFN` as the non-conservative part, and the FFN is untied. The knockout log itself shows `alpha=0.0` still has a tiny residual/oscillation (`res-floor ~1.3e-3`, `osc ~1.2e-3`), so an FFN-only contribution is not zero. The data do support attention as the dominant driver, but not attention as the exclusive source.

4. **qknorm / attention nonlinearity contribution.**  
   The evaluated block has `blk.qknorm=True`, and q/k RMSNorm is inside attention. Scaling `WO` suppresses the whole attention output path, including effects mediated by qknorm. Therefore the knockout does not separate “antisymmetric attention matrix/gain” from the nonlinear qknorm-shaped attention Jacobian.

So the rigorous conclusion is:

> **Plausible and likely:** attention-dominated non-conservative complex-mode instability.  
> **Not yet proven:** Hopf/Neimark-Sacker crossing of a complex conjugate pair as the bifurcation mechanism.

## 2. Single cleanest measurement

**Do the local Jacobian spectrum measurement.** More precisely: compute the leading eigenvalues of both `J = dF/dz` and `M = I + eps*J` on the s3200 checkpoint along a fixed-point branch restored by attention scaling. This is more decisive than a pure epsilon sweep or Floquet analysis, because it directly distinguishes complex-pair Hopf from real-eigenvalue loss and also predicts whether `eps=0.1` is a discrete-Euler artifact.

### Exact measurement to run later, not now

Freeze the redx s3200 checkpoint, same batch/sequence and `qknorm=True`. Define

\[
F_\alpha(z,x)=-(z-x)+\alpha\,Attn(LN_1 z)+FFN(LN_2 z)-c z
\]

where `alpha` is implemented exactly as in `knockout_s3200.py` by scaling `WO`. For `alpha` values bracketing the observed transition — at minimum around `0.2`, `0.4`, `0.7`, `1.0`, then refined near the first loss of convergence — do:

1. Find a true fixed point `z*_alpha` satisfying `||F_alpha(z*)||/||z*||` very small, using long relaxation where it converges and preferably continuation/Newton from the previous `alpha` so the branch can be followed up to the marginal point.
2. At each `z*_alpha`, compute the leading eigenvalues `nu_i` of
   `J_alpha = dF_alpha/dz | z*_alpha`
   using JVP/Arnoldi or another matrix-free eigensolver.
3. Convert them to relaxation-map eigenvalues
   `mu_i = 1 + 0.1 * nu_i`.
4. Record the leading `|mu_i|`, whether the leading pair is complex or real, and for complex `nu=a+ib` also record `a=Re(nu)` and the Euler stability threshold
   \[
   eps_crit = -2a/(a^2+b^2) \quad \text{when } a<0.
   \]

### Outcome table

- **Confirms Hopf / Neimark-Sacker of the implemented relaxation map:**  
  A complex conjugate pair is the leading spectrum and crosses `|mu|=1` as `alpha` or training step increases; real eigenvalues stay inside the unit circle. The observed oscillation frequency should be compatible with `arg(mu)` per relaxation step. This confirms the map-level Hopf mechanism.

- **Confirms true continuous-time Hopf rather than Euler artifact:**  
  The same complex pair has `Re(nu)` crossing through `0` to positive values. Then shrinking `eps` changes the discretization but does not restore continuous-time stability once `Re(nu)>0`.

- **Shows Euler-step artifact instead:**  
  The leading pair is complex and `|1+0.1*nu| >= 1`, but `Re(nu) < 0`. Then the continuous-time linearization is damped, while the explicit Euler step is unstable. The predicted stabilizing step is `eps < eps_crit`; an epsilon sweep would be confirmatory, but the spectrum already gives the answer.

- **Shows real saddle-node / steady instability instead:**  
  The leading eigenvalue crossing is real near `mu=+1` / `nu=0`. Then the Hopf claim is wrong; the limit cycle is downstream nonlinear behavior after a real fixed-point loss.

- **Shows flip / two-cycle artifact:**  
  A real map eigenvalue crosses `mu=-1` or is `< -1`. Then the oscillation is a discrete period-doubling / 2-cycle-type instability, not Hopf.

- **Shows FFN is materially involved:**  
  If the unstable/near-unstable pair remains when `alpha=0`, or if the leading antisymmetric contribution is dominated by the FFN block, then “attention antisymmetric part drives it” is overstated. If the pair moves safely inside the unit circle as `alpha` is reduced and disappears with attention removed, then the attention-dominant mechanism is supported.

Why not make the epsilon sweep the primary measurement? It is useful, but indirect. If smaller `eps` converges, that could indicate an Euler artifact, but it would not by itself distinguish complex Euler instability from real flip or other nonlinear step-size effects. The Jacobian spectrum gives the bifurcation class and the epsilon prediction in one measurement.

Why not Floquet/period first? Floquet multipliers of the observed cycle would quantify stability of the cycle, and period/frequency could corroborate `arg(mu)`, but they do not identify which fixed-point eigenvalue caused the attractor to appear. Use Floquet/period only as a follow-up.

## 3. Consistency with the actual code

The proposed mechanism is **consistent with the implemented force and relaxation map**, with the caveat that the code implicates `attention + FFN` as the active non-conservative block, not mathematically pure attention alone.

- **The thick force is exactly the stated form.** In `lt_ep_train.py`, `tforce` computes layer-normed attention and FFN and returns `-(z - xin) + self.attn(h1) + ff - self.c * z` (`lt_ep_train.py:81-85`). With `c=1`, this is `xin - 2z + Attn(LN z) + FFN(LN z)`. The autograd-enabled `force` path for `attn_mode == 'thick'` computes the same structure and returns it at `lt_ep_train.py:99-106`.

- **The relaxation is explicit Euler.** `relax` updates `z` by `z = z + eps * blk.force(z, xin).detach()` (`lt_ep_train.py:123-133`). Therefore the linearized relaxation map is exactly `M = I + eps*J`.

- **The free phase used by EP is this relaxation state.** `ep_step` embeds the input, computes `zs = relax(..., T1, eps)`, then measures a one-step residual from that state (`lt_ep_train.py:140-145`). The code explicitly records this as the T1 free-phase state before any optional refinement (`lt_ep_train.py:146`).

- **The attention path is non-conservative in the active model.** Attention uses independent `WQ`, `WK`, `WV`, `WO` projections (`lt_ep_train.py:58-68`), and optional q/k RMSNorm when `blk.qknorm` is set (`lt_ep_train.py:63-65`). The eval scripts do set `blk.qknorm=True` (`eval_relax_s3200.py:8`, `knockout_s3200.py:10`). There is no tied-energy construction in the thick path.

- **The knockout really scales attention output.** `knockout_s3200.py` loads the same checkpoint and performs `blk.WO.mul_(alpha)` before relaxation (`knockout_s3200.py:9-17`). Thus the logged alpha trend is a legitimate intervention on total attention output gain.

- **The code itself treats FFN as part of the non-conservative component.** In thick mode, `nc_force` returns `attention + FFN`, not attention alone (`lt_ep_train.py:92-97`). The AEP nudged correction also applies `Jv - JTv` of this `nc_force` in the real/thick modes (`lt_ep_train.py:171-179`). In `holo_ep.py`, the holomorphic and real-axis thick forces match the same `-(z-xin)+att+ff-c*z` structure (`holo_ep.py:36-51`, `holo_ep.py:134-152`), and their AEP correction again uses `Jv-JTv` of `blk.nc_force` (`holo_ep.py:76-84`, `holo_ep.py:176-185`).

## Final verdict

**The Hopf story is code-consistent and likely, but not proven.** The current evidence nails an attention-dominated non-conservative forward oscillation at the implemented `eps=0.1`; it does **not** yet nail the bifurcation class. The decisive next measurement is the **leading spectrum of `J` and `M=I+eps*J` on the s3200 fixed-point branch under attention-gain continuation**. A complex conjugate pair crossing `|mu|=1`, with real modes stable and with `Re(nu)` interpreted to rule in/out Euler-step instability, would settle the question.