# FUGU_OPTIONS_VERDICT — Q1–Q3 (independently verified)

Scope: answers grounded in `lt_ep_train.py` (`force`/`tforce` :81-106, `relax` :123-133,
`ep_step` :140-232, `jacreg` :211-219, weight caps :52-53/398-399/563-567), `holo_ep.py`,
the calibration probes (`adaptive_eps_calib.py`, `adaptive_eps_calib2.py`, `eps_sweep_s3200.py`,
`jnc_scaling.py`, `lt_ep_anderson.py`), and the diagnosis dossiers. Each claim is flagged
**[SOLID]** (proved by code/data in repo) or **[UNCERTAIN]** (reasoned, not measured here).

---

## Shared mechanism (the object all three questions act on)

**[SOLID]** The active free relaxation is explicit (forward) Euler:
`z = z + eps * blk.force(z, xin).detach()` (`relax`, :123-133). In thick mode the force is
`F(z) = -(z - xin) + Attn(LN1 z) + FFN(LN2 z) - c*z` (`tforce`/`force` :81-85, :102-106), c=1.
So the per-step linear stability object is the **discrete map** `M = I + eps*J`, `J = dF/dz`.

**[SOLID]** For a continuous eigenvalue `mu = a + i b` of `J`, the Euler multiplier is
`lambda = 1 + eps*mu`, and the map is stable iff `|1+eps*mu| < 1`, i.e.
`eps < eps_crit = -2a/(a^2 + b^2)` for `a < 0`. A continuous-STABLE rotating mode (`a<0`, `b` large)
is destabilized purely by too-large `eps`.

**[SOLID]** The ε-monotonicity training data are decisive that this is an *integration* wall, not a
*gradient-quality* wall: eps=0.1 blew @ CE 2.74; eps=0.1 with a strictly better gradient (t2sel=160,
cos 0.998) blew EARLIER @ 3.02; eps=0.05 reached 2.41 before blowing. Better gradient → not later but
earlier; smaller step → strictly lower wall. That is exactly the `|1+eps*mu|>1` signature.

### One correction to the dossier's "continuous/analog is stable at s3200" framing
**[SOLID — verified, refines prior verdict]** The eps-sweep "CONVERGED at eps=0.01" is measured with a
*different residual* than the cycle floor. `eps_sweep_s3200.py:17` reports the **step** residual
`r = ‖z2-z‖/‖z‖ = eps·‖F‖/‖z‖`; `adaptive_eps_calib.py:15` reports the **force** residual
`g = ‖F‖/‖z‖`. At eps=0.01 the sweep's `r≈8.9e-4` is just `0.01 × 0.089` — i.e. the *same* force-floor
`g≈0.09` that is called a "cycle" at eps=0.1. `FUGU_Q_OPTIONS.md` itself flags this:
"s3200 g floors ~0.09 even at tiny ε (genuinely no fixed point at the marginal op, OR just slow
finite-step convergence — ambiguous)."
**Implication:** the eps-sweep robustly proves *the oscillation/blow-up is a discrete-Euler artifact*
(the cycle amplitude dies as eps→0). It does **not** by itself prove the s3200 operator has a true
attracting fixed point (g→0) in continuous time — the force floor g≈0.09 persists. The clean
continuous-stable case is s2000 (g→0). So "analog HW would have no problem" is **[SOLID]** for the
*oscillatory blow-up* but **[UNCERTAIN]** for "s3200 settles to a usable equilibrium." The decisive
missing measurement remains the leading eigenpair of `J`/`M` at a continued fixed-point branch
(sign of `Re mu`).

---

## Q1 — Evaluate (a) adaptive ε, (b) jacreg, (c) smaller fixed ε

**Bottom line:**
- **(c) smaller fixed ε — RELOCATES the wall. [SOLID]** Already shown empirically (2.74→2.41).
- **(b) jacreg — RAISES/RELOCATES the wall from the model side. [SOLID it raises eps_crit; UNCERTAIN whether it can eliminate]** It lifts `eps_crit` by cutting `|Im mu|`/gain, but at fixed ε it is still a wall in `eps_crit`-space; it also taxes the expressivity it suppresses.
- **(a) adaptive ε — ELIMINATES the fixed-ε wall *iff* its floor stays below the instantaneous `eps_crit`; otherwise it degenerates to (c). [SOLID for the mechanism; the guarantee is conditional]**

### Ranking
**To remove the measured software wall while preserving the model and the analog target:**
1. **Adaptive ε / robust solver** — only option that removes the *fixed-step* wall with **zero model/expressivity cost** and **zero change to the analog target**. It is pure integration-axis.
2. **jacreg** — effective secondary homeostat; raises `eps_crit`, but changes the learned operator and can cap the non-normality the good (BPTT-1.83) solution uses.
3. **smaller fixed ε** — diagnostic/fallback only; permanently pays the small-step cost on *every* example (including smooth ones) and still fails once stiffening passes the new floor.

**For the analog (continuous) target specifically:** adaptive ε and smaller fixed ε are *emulator*
choices that leave the model identical to what analog HW runs — they are the right kind of fix.
jacreg *changes the model that analog HW would run* (see Q2).

### (a) Adaptive ε — grounded in code
**[SOLID]** `adaptive_eps_calib2.py` uses the correct signal: shrink only on **overshoot**
(`g_t > prev*tol` → `eps*=down`), grow otherwise. The naive `adaptive_eps_calib.py` controller
(shrink on slow contraction) is shown to mis-park ε at the floor on all ops — it conflates small-ε's
slow contraction with instability. The corrected controller behaves as a continuous-relaxation
emulator: stiff s3200 → ε to 0.003-0.008; smooth s2000 → ε grows toward 0.1 and reaches g=0.

### Is adaptive ε *guaranteed* to eliminate the wall? — the eps_min question
**[SOLID, decisive]** No, not unconditionally. With a hard floor `eps_min`, adaptive ε eliminates the
wall only while `eps_min < eps_crit = -2a/(a^2+b^2)`. If training keeps stiffening the rotating mode so
`eps_crit` falls below `eps_min`, adaptive ε becomes a fixed small step at the floor — i.e. it
**degenerates into option (c) and merely relocates the wall.** So the guarantee is conditional on the
floor, and equivalently on whether `eps_crit` (hence `|Im mu|`) is bounded away from where the floor
sits.

### Does |Im μ| (b) saturate or grow unboundedly as CE drops?
This is the crux, and the honest answer is split:

- **[SOLID] There IS structural stiffness-bounding machinery in the code that argues for saturation.**
  (i) `qknorm` RMSNorms q,k → softmax logits are bounded regardless of weight growth (`attn` :63-67);
  (ii) **weight-norm caps**: `capw = {WQ,WK,WV,WO,Wm,Wh,fc,pj}` are each projected back to
  `capx × initial-norm` every optimizer step (`:52-53`, `:398-399`, `:563-567`); (iii) damping `c=1`
  gives a passive `-(1+c)z = xin-2z` contraction floor; (iv) LayerNorm bounds input scale into attn/FFN;
  (v) weight decay. With qknorm + capped projections + LN, the per-matrix gains feeding `J_nc` cannot
  grow without bound, which bounds `|Im mu|` and therefore keeps `eps_crit` bounded **below**. This is a
  genuine reason to expect `|Im mu|` to **saturate** (or at least be bounded) rather than diverge.

- **[SOLID, opposing data point] But within the *observed* range stiffness was still rising:** fixed
  ε=0.1→0.05 moved the wall 2.74→2.41 rather than removing it, i.e. `eps_crit` was still falling across
  that CE interval. So saturation, if it exists, had not yet bitten in the measured window.

- **[UNCERTAIN] No direct eigenvalue/`|Im mu|`-vs-CE trace exists in the repo.** `jnc_scaling.py`
  measures `‖J_nc‖` growth-per-step vs width but is not a CE-resolved `|Im mu|` curve. So whether `b`
  truly plateaus before `eps_crit` reaches a practical `eps_min` is **not measured**.

**Synthesis (decisive, hedged correctly):** adaptive ε is the best wall-eliminator and the only
zero-tax, analog-faithful one — **and** the code's caps/qknorm/damping make it *likely* that `|Im mu|`
is bounded, so a sufficiently small `eps_min` should eliminate (not merely relocate) the wall in
practice. But this is a *bounded-floor* guarantee, not an unconditional one: if `|Im mu|` were to grow
without bound, any finite `eps_min` is eventually a wall. **Recommended:** make the floor itself
log an `eps_crit` proxy (overshoot persisting at the floor) and either drop the floor, reject the step,
or hand off to Anderson — i.e. fail-open rather than fail-into-(c).

---

## Q2 — The jacreg paradox

**Verdict: no paradox. jacreg works by RAISING `eps_crit` from the model side — it fixes the SAME
discretization wall, not a demonstrated continuous-time instability. Relative to adaptive ε it is a
sim-crutch for the measured failure, but it carries a *separate, real* analog benefit (settling
quality), and it would become a genuine fix if a true continuous instability (Re μ≥0) ever emerges.**

### Why a model-side stiffness penalty fixes a simulation artifact — mechanism
**[SOLID]** `jacreg` is a Hutchinson JVP penalty `R = jacreg·‖J_nc·er‖²/‖er‖²` (`:211-219`), and in thick
mode `nc_force = Attn + FFN` (`:92-97`). Minimizing `‖J_nc‖` reduces the learned non-conservative
gain, which reduces the rotating component `|b|=|Im mu|` (and non-normal amplification). Since
`eps_crit = -2a/(a^2+b^2)`, smaller `|b|` → **larger** `eps_crit` → fixed ε=0.1 stays under the
Euler-stability boundary longer. So jacreg moves the *same* `|1+eps*mu|=1` wall by shrinking `b`, while
adaptive ε moves the *same* wall by shrinking `eps`. Two knobs on one inequality.

### Raising eps_crit vs fixing a continuous-time problem
**[SOLID for measured regime]** For s3200-type failures the relevant mode has `Re mu < 0` (the cycle
dies as eps→0). There is no *established* continuous instability to fix, so jacreg's contribution there
is purely "raise eps_crit" — discretization-wall relief from the model side.
**[UNCERTAIN beyond it]** If training ever drives `Re mu → 0⁺` (a true Hopf), then no integrator
(adaptive ε, implicit, Anderson) can stabilize the original continuous equilibrium; only a model-side
change (jacreg, stronger damping/c, structural monotonicity, gain/asymmetry bounds) is a real fix.
jacreg is the insurance policy for that case.

### Does the benefit transfer to analog hardware? — two benefits, separated
**[SOLID] (i) The "prevents eps=0.1 Euler blow-up" benefit does NOT transfer.** Analog HW has no `eps`
and does not iterate `z←z+εF`; it performs continuous relaxation. If `Re mu<0`, the analog ODE is
stable and never had this wall. To the extent jacreg only buys eps_crit headroom, it is papering over a
sim artifact analog wouldn't have — a crutch.

**[SOLID/UNCERTAIN-magnitude] (ii) The "less stiff/less ringy continuous dynamics" benefit DOES
transfer.** Even with `Re mu<0`, a large `|Im mu|` mode has a poor damping ratio: it rings, settles
slowly, demands more bandwidth, longer observation/integration windows, and is more noise/delay
sensitive — all of which degrade the *physical* free-phase settling and the readout of nudged
equilibria on analog HW. Reducing `‖J_nc‖` improves the continuous damping ratio. So jacreg is *also* a
legitimate analog settling/robustness regularizer. **[UNCERTAIN]** the size of this analog benefit is
not measured here.

### Real fix or sim-crutch, relative to adaptive ε?
**[SOLID]** For the *confirmed explicit-Euler artifact*:
- **adaptive ε / Anderson / implicit = the real fix of the emulator** — they preserve the learned
  vector field and make the digital sim stop inventing a cycle the analog system wouldn't have.
- **jacreg = a model-changing crutch for that artifact**, but simultaneously a *real* (if secondary)
  analog settling regularizer and the *only* lever if a genuine continuous instability appears.

**Recommended composition (not "either/or"):** (1) use adaptive ε / a real solver as the primary
emulator fix so the sim is faithful; (2) keep jacreg as a **bounded, adaptive** homeostat
(the controller already exists, `:520-529`) sized for analog settling-time/robustness or true
marginality — NOT as a strong fixed penalty that taxes the non-normality the BPTT-1.83 solution needs.
The historical evidence fits this: the validated ~2.40 runs used *adaptive* jacreg; the diverging runs
*froze it weak* — i.e. they removed the homeostat, not the integrator.

---

## Q3 — Anderson acceleration / implicit (IMEX) integrators

**Verdict: Yes — they can replace explicit Euler as the *solver* and kill the discretization
instability, and they are compatible with AsymEP *provided they converge to the same equilibria of the
same vector fields*. They change nothing about the analog model; they are emulator-fidelity choices.
Implicit Euler is unconditionally stable but per-step expensive (the solve is itself a relaxation).
Anderson is the more practical lever: it both accelerates and can suppress the Euler cycle when a true
fixed point exists, but it is not guaranteed and needs damping/restarts/residual gating.**

### (i) Compatibility with AsymEP
**[SOLID]** The EP estimator depends on the *states*, not on how they were reached. `ep_step` computes
`zs = relax(...)` and treats it as the free equilibrium (`:142-144`); the AsymEP correction uses local
`Jv = jvp(nc_force, zs, v)`, `JTv = vjp(nc_force, zs, v)`, `corr = Jv - JTv` at `zs` (`:172-178`); the
parameter gradient is `(a * f).sum()` with `f = force(zs.detach(), xin, cg=True)` (`:202-205`). None of
this requires explicit Euler — it requires that `zs` is a genuine root `F(zs)≈0` and that the nudged
states are equilibria of the nudged/corrected force. A better solver that returns the *same roots* is
fully compatible, and the `-2A` correction is computed *at* `z*` regardless of the solver that found it.

**[SOLID — important, refines prior framing] The nudged phase must also be re-solved.** The free phase
is not the only explicit-Euler loop: the nudge (`nudge()` :163-180) and every holomorphic estimator
(`holo_a`, `holo_a_select2`, `holo_a_track`, `holo_a_lockin` in `holo_ep.py`) advance with
`z = z + eps*(f - corr)`. The `-2A` correction lives *inside* these loops. So "swap the integrator"
means swap it in **both** phases; a solver that converges the free `z*` but leaves the nudged phase on
coarse Euler will still corrupt `a = -dz*/dβ`.

**[SOLID] Hard limit:** if the continuous field has no attracting root in the operating regime, no
solver can manufacture the stationary state AsymEP needs — it will fail, find a spurious root, or
return a numerical artifact. A solver fixes *integration*, not *non-existence of equilibrium*. (This is
why the s3200 force-floor ambiguity from the Shared-mechanism section matters: confirm a true root
exists before trusting AsymEP there.)

### (ii) Implicit / IMEX — tractable or self-defeating?
**[SOLID, theory]** Backward Euler multiplier is `1/(1-h·mu)`, A-stable: for any `Re mu<0` it is stable
at *every* step size, so it would kill the stiff-rotation Euler cycle outright.
**[SOLID, cost]** Each backward step solves `y - h·F(y) - z_n = 0`, where `F` contains LN, causal
softmax attention, and FFN. A Newton/Krylov/Picard solve needs several force evals and matrix-free
JVP/VJP linear solves over the full `B·T·C` state per step — i.e. **the per-step solve is itself a
relaxation/root-find**, which is the self-defeating risk for a default inner loop.
**[UNCERTAIN/qualitative] IMEX nuance:** making only the cheap leak `-(1+c)z` implicit is trivial but
does **not** tame the dangerous learned rotating attention mode (the danger is in `J_nc`, not the leak);
treating `J_nc` implicitly reintroduces the big linear solve. So implicit/IMEX is best as a **robust
fallback / macro-step / offline reference**, not the default per-step integrator.

### (iii) Anderson — speed only, or stabilization too?
**[SOLID, conceptual]** Anderson (DEQ-style; `lt_ep_anderson.py` stores recent `X`, `G(X)=z+εF`, solves
a small regularized least-squares for the mixing coefficients, extrapolates) is a quasi-Newton/GMRES-on-
the-residual. For a Picard/Euler map whose oscillatory multiplier sits just outside the unit circle,
the residual-minimizing extrapolation can **suppress the limit cycle**, not merely speed a contracting
one — so it is more than acceleration. `lt_ep_anderson.py` is explicitly framed as exactly this test
("can a fixed-point solver converge the free phase where plain relaxation cannot?").
**[SOLID, caveats]** Not guaranteed: it cannot create a root that doesn't exist; aggressive mixing can
diverge; it needs damping (β-mixing), restarts, and residual-monotonicity gating; and (per (i)) it must
wrap the nudged phase too. Net: **strongest practical candidate** — cheaper than full implicit Newton,
able to stabilize when a root exists, but must be safeguarded.

### (iv) Does integrator choice matter for the ANALOG target?
**[SOLID] For the analog model itself: no.** Analog HW performs the true continuous relaxation of `F`;
it runs no explicit Euler, no Anderson, no backward Euler. The integrator is not part of the deployed
computation.
**[SOLID] For digital training/eval of that target: yes, decisively.** Coarse explicit Euler can invent
a limit cycle the analog system would never exhibit, corrupting both the loss and the equilibrium the
EP gradient is taken at. The correct framing — and the right way to state it in the thesis — is exactly:

> Analog HW does the true continuous relaxation; the simulator only needs a **faithful + cheap emulator**
> of that relaxation. Adaptive ε, Anderson, and implicit/IMEX are all just *better emulators* — they
> change the simulation's fidelity/cost, not the EP objective or the analog primitive.

The one asymmetry to keep in mind: **jacreg is NOT in this "just a better emulator" bucket** (it edits
the model the analog HW would run), whereas adaptive ε / Anderson / implicit ARE. That is the precise
sense in which the integrator family is the analog-faithful fix and jacreg is the model-side one.

### Recommended solver strategy
1. Replace fixed ε=0.1 explicit Euler in the **free** phase with an overshoot/step-rejection adaptive
   solver (the corrected `adaptive_eps_calib2.py` logic), with a fail-open floor (Q1).
2. Add **damped Anderson with restarts + residual gating** for both free and nudged phases once the
   residual stalls/cycles; solve `F=0` rather than running a fixed Euler count and hoping.
3. Keep **implicit/backward Euler as a reference/fallback**, not the default inner loop (per-step cost).
4. Leave **AsymEP unchanged in principle**: find `z*`, find nudged equilibria, apply `Jv-JTv` at `z*`,
   and **gate the update** (`res_gate`, `:153-162`) when residual says no stationary state was found.
5. Retain **jacreg as a bounded adaptive homeostat** (analog settling / true-Hopf insurance), not as the
   primary fix.
6. For analog claims, report **solver-independent diagnostics**: force residual `‖F(z*)‖/‖z*‖` (NOT just
   the eps-scaled step residual — they differ by a factor of eps, which confounded the eps-sweep), and,
   when feasible, the leading continuous `mu` (sign of `Re mu`) and settling/ringing time.

---

## Summary table

| Option | Eliminates or relocates wall | Changes model? | Analog-faithful? | Verdict |
|---|---|---|---|---|
| (a) adaptive ε | Eliminates if eps_min < eps_crit; else relocates | No | Yes (emulator) | **Primary fix** [SOLID mechanism; bounded-floor guarantee] |
| (b) jacreg | Raises eps_crit (relocates in eps_crit-space) | Yes | No for the wall; yes for settling | **Secondary homeostat / crutch + true-Hopf insurance** |
| (c) smaller fixed ε | Relocates only | No | Yes but inefficient | **Diagnostic / fallback** [SOLID] |
| Anderson | Can eliminate cycle if a root exists | No | Yes (emulator) | **Best practical solver, needs safeguards** |
| Implicit/IMEX | Eliminates (A-stable) | No | Yes (emulator) | **Correct but per-step costly; fallback/reference** |

Key uncertainties flagged: (1) whether `|Im mu|` saturates vs grows as CE drops is **not directly
measured** — code caps/qknorm/damping argue for bounded, but ε=0.1→0.05 data show it was still rising
in-window; (2) whether s3200 has a true continuous fixed point (g→0) vs only a dead oscillation is
**ambiguous** because the eps-sweep's step-residual ≠ force-residual; the clean continuous-stable
evidence is s2000, not s3200.