# FUGU_OPTIONS_VERDICT — Q1–Q3 (independently verified) Scope: answers grounded in `lt_ep_train.py` (`force`/`tforce` :81-106, `relax` :123-133, `ep_step` :140-232, `jacreg` :211-219, weight caps :52-53/398-399/563-567), `holo_ep.py`, the calibration probes (`adaptive_eps_calib.py`, `adaptive_eps_calib2.py`, `eps_sweep_s3200.py`, `jnc_scaling.py`, `lt_ep_anderson.py`), and the diagnosis dossiers. Each claim is flagged **[SOLID]** (proved by code/data in repo) or **[UNCERTAIN]** (reasoned, not measured here). --- ## Shared mechanism (the object all three questions act on) **[SOLID]** The active free relaxation is explicit (forward) Euler: `z = z + eps * blk.force(z, xin).detach()` (`relax`, :123-133). In thick mode the force is `F(z) = -(z - xin) + Attn(LN1 z) + FFN(LN2 z) - c*z` (`tforce`/`force` :81-85, :102-106), c=1. So the per-step linear stability object is the **discrete map** `M = I + eps*J`, `J = dF/dz`. **[SOLID]** For a continuous eigenvalue `mu = a + i b` of `J`, the Euler multiplier is `lambda = 1 + eps*mu`, and the map is stable iff `|1+eps*mu| < 1`, i.e. `eps < eps_crit = -2a/(a^2 + b^2)` for `a < 0`. A continuous-STABLE rotating mode (`a<0`, `b` large) is destabilized purely by too-large `eps`. **[SOLID]** The ε-monotonicity training data are decisive that this is an *integration* wall, not a *gradient-quality* wall: eps=0.1 blew @ CE 2.74; eps=0.1 with a strictly better gradient (t2sel=160, cos 0.998) blew EARLIER @ 3.02; eps=0.05 reached 2.41 before blowing. Better gradient → not later but earlier; smaller step → strictly lower wall. That is exactly the `|1+eps*mu|>1` signature. ### One correction to the dossier's "continuous/analog is stable at s3200" framing **[SOLID — verified, refines prior verdict]** The eps-sweep "CONVERGED at eps=0.01" is measured with a *different residual* than the cycle floor. `eps_sweep_s3200.py:17` reports the **step** residual `r = ‖z2-z‖/‖z‖ = eps·‖F‖/‖z‖`; `adaptive_eps_calib.py:15` reports the **force** residual `g = ‖F‖/‖z‖`. At eps=0.01 the sweep's `r≈8.9e-4` is just `0.01 × 0.089` — i.e. the *same* force-floor `g≈0.09` that is called a "cycle" at eps=0.1. `FUGU_Q_OPTIONS.md` itself flags this: "s3200 g floors ~0.09 even at tiny ε (genuinely no fixed point at the marginal op, OR just slow finite-step convergence — ambiguous)." **Implication:** the eps-sweep robustly proves *the oscillation/blow-up is a discrete-Euler artifact* (the cycle amplitude dies as eps→0). It does **not** by itself prove the s3200 operator has a true attracting fixed point (g→0) in continuous time — the force floor g≈0.09 persists. The clean continuous-stable case is s2000 (g→0). So "analog HW would have no problem" is **[SOLID]** for the *oscillatory blow-up* but **[UNCERTAIN]** for "s3200 settles to a usable equilibrium." The decisive missing measurement remains the leading eigenpair of `J`/`M` at a continued fixed-point branch (sign of `Re mu`). --- ## Q1 — Evaluate (a) adaptive ε, (b) jacreg, (c) smaller fixed ε **Bottom line:** - **(c) smaller fixed ε — RELOCATES the wall. [SOLID]** Already shown empirically (2.74→2.41). - **(b) jacreg — RAISES/RELOCATES the wall from the model side. [SOLID it raises eps_crit; UNCERTAIN whether it can eliminate]** It lifts `eps_crit` by cutting `|Im mu|`/gain, but at fixed ε it is still a wall in `eps_crit`-space; it also taxes the expressivity it suppresses. - **(a) adaptive ε — ELIMINATES the fixed-ε wall *iff* its floor stays below the instantaneous `eps_crit`; otherwise it degenerates to (c). [SOLID for the mechanism; the guarantee is conditional]** ### Ranking **To remove the measured software wall while preserving the model and the analog target:** 1. **Adaptive ε / robust solver** — only option that removes the *fixed-step* wall with **zero model/expressivity cost** and **zero change to the analog target**. It is pure integration-axis. 2. **jacreg** — effective secondary homeostat; raises `eps_crit`, but changes the learned operator and can cap the non-normality the good (BPTT-1.83) solution uses. 3. **smaller fixed ε** — diagnostic/fallback only; permanently pays the small-step cost on *every* example (including smooth ones) and still fails once stiffening passes the new floor. **For the analog (continuous) target specifically:** adaptive ε and smaller fixed ε are *emulator* choices that leave the model identical to what analog HW runs — they are the right kind of fix. jacreg *changes the model that analog HW would run* (see Q2). ### (a) Adaptive ε — grounded in code **[SOLID]** `adaptive_eps_calib2.py` uses the correct signal: shrink only on **overshoot** (`g_t > prev*tol` → `eps*=down`), grow otherwise. The naive `adaptive_eps_calib.py` controller (shrink on slow contraction) is shown to mis-park ε at the floor on all ops — it conflates small-ε's slow contraction with instability. The corrected controller behaves as a continuous-relaxation emulator: stiff s3200 → ε to 0.003-0.008; smooth s2000 → ε grows toward 0.1 and reaches g=0. ### Is adaptive ε *guaranteed* to eliminate the wall? — the eps_min question **[SOLID, decisive]** No, not unconditionally. With a hard floor `eps_min`, adaptive ε eliminates the wall only while `eps_min < eps_crit = -2a/(a^2+b^2)`. If training keeps stiffening the rotating mode so `eps_crit` falls below `eps_min`, adaptive ε becomes a fixed small step at the floor — i.e. it **degenerates into option (c) and merely relocates the wall.** So the guarantee is conditional on the floor, and equivalently on whether `eps_crit` (hence `|Im mu|`) is bounded away from where the floor sits. ### Does |Im μ| (b) saturate or grow unboundedly as CE drops? This is the crux, and the honest answer is split: - **[SOLID] There IS structural stiffness-bounding machinery in the code that argues for saturation.** (i) `qknorm` RMSNorms q,k → softmax logits are bounded regardless of weight growth (`attn` :63-67); (ii) **weight-norm caps**: `capw = {WQ,WK,WV,WO,Wm,Wh,fc,pj}` are each projected back to `capx × initial-norm` every optimizer step (`:52-53`, `:398-399`, `:563-567`); (iii) damping `c=1` gives a passive `-(1+c)z = xin-2z` contraction floor; (iv) LayerNorm bounds input scale into attn/FFN; (v) weight decay. With qknorm + capped projections + LN, the per-matrix gains feeding `J_nc` cannot grow without bound, which bounds `|Im mu|` and therefore keeps `eps_crit` bounded **below**. This is a genuine reason to expect `|Im mu|` to **saturate** (or at least be bounded) rather than diverge. - **[SOLID, opposing data point] But within the *observed* range stiffness was still rising:** fixed ε=0.1→0.05 moved the wall 2.74→2.41 rather than removing it, i.e. `eps_crit` was still falling across that CE interval. So saturation, if it exists, had not yet bitten in the measured window. - **[UNCERTAIN] No direct eigenvalue/`|Im mu|`-vs-CE trace exists in the repo.** `jnc_scaling.py` measures `‖J_nc‖` growth-per-step vs width but is not a CE-resolved `|Im mu|` curve. So whether `b` truly plateaus before `eps_crit` reaches a practical `eps_min` is **not measured**. **Synthesis (decisive, hedged correctly):** adaptive ε is the best wall-eliminator and the only zero-tax, analog-faithful one — **and** the code's caps/qknorm/damping make it *likely* that `|Im mu|` is bounded, so a sufficiently small `eps_min` should eliminate (not merely relocate) the wall in practice. But this is a *bounded-floor* guarantee, not an unconditional one: if `|Im mu|` were to grow without bound, any finite `eps_min` is eventually a wall. **Recommended:** make the floor itself log an `eps_crit` proxy (overshoot persisting at the floor) and either drop the floor, reject the step, or hand off to Anderson — i.e. fail-open rather than fail-into-(c). --- ## Q2 — The jacreg paradox **Verdict: no paradox. jacreg works by RAISING `eps_crit` from the model side — it fixes the SAME discretization wall, not a demonstrated continuous-time instability. Relative to adaptive ε it is a sim-crutch for the measured failure, but it carries a *separate, real* analog benefit (settling quality), and it would become a genuine fix if a true continuous instability (Re μ≥0) ever emerges.** ### Why a model-side stiffness penalty fixes a simulation artifact — mechanism **[SOLID]** `jacreg` is a Hutchinson JVP penalty `R = jacreg·‖J_nc·er‖²/‖er‖²` (`:211-219`), and in thick mode `nc_force = Attn + FFN` (`:92-97`). Minimizing `‖J_nc‖` reduces the learned non-conservative gain, which reduces the rotating component `|b|=|Im mu|` (and non-normal amplification). Since `eps_crit = -2a/(a^2+b^2)`, smaller `|b|` → **larger** `eps_crit` → fixed ε=0.1 stays under the Euler-stability boundary longer. So jacreg moves the *same* `|1+eps*mu|=1` wall by shrinking `b`, while adaptive ε moves the *same* wall by shrinking `eps`. Two knobs on one inequality. ### Raising eps_crit vs fixing a continuous-time problem **[SOLID for measured regime]** For s3200-type failures the relevant mode has `Re mu < 0` (the cycle dies as eps→0). There is no *established* continuous instability to fix, so jacreg's contribution there is purely "raise eps_crit" — discretization-wall relief from the model side. **[UNCERTAIN beyond it]** If training ever drives `Re mu → 0⁺` (a true Hopf), then no integrator (adaptive ε, implicit, Anderson) can stabilize the original continuous equilibrium; only a model-side change (jacreg, stronger damping/c, structural monotonicity, gain/asymmetry bounds) is a real fix. jacreg is the insurance policy for that case. ### Does the benefit transfer to analog hardware? — two benefits, separated **[SOLID] (i) The "prevents eps=0.1 Euler blow-up" benefit does NOT transfer.** Analog HW has no `eps` and does not iterate `z←z+εF`; it performs continuous relaxation. If `Re mu<0`, the analog ODE is stable and never had this wall. To the extent jacreg only buys eps_crit headroom, it is papering over a sim artifact analog wouldn't have — a crutch. **[SOLID/UNCERTAIN-magnitude] (ii) The "less stiff/less ringy continuous dynamics" benefit DOES transfer.** Even with `Re mu<0`, a large `|Im mu|` mode has a poor damping ratio: it rings, settles slowly, demands more bandwidth, longer observation/integration windows, and is more noise/delay sensitive — all of which degrade the *physical* free-phase settling and the readout of nudged equilibria on analog HW. Reducing `‖J_nc‖` improves the continuous damping ratio. So jacreg is *also* a legitimate analog settling/robustness regularizer. **[UNCERTAIN]** the size of this analog benefit is not measured here. ### Real fix or sim-crutch, relative to adaptive ε? **[SOLID]** For the *confirmed explicit-Euler artifact*: - **adaptive ε / Anderson / implicit = the real fix of the emulator** — they preserve the learned vector field and make the digital sim stop inventing a cycle the analog system wouldn't have. - **jacreg = a model-changing crutch for that artifact**, but simultaneously a *real* (if secondary) analog settling regularizer and the *only* lever if a genuine continuous instability appears. **Recommended composition (not "either/or"):** (1) use adaptive ε / a real solver as the primary emulator fix so the sim is faithful; (2) keep jacreg as a **bounded, adaptive** homeostat (the controller already exists, `:520-529`) sized for analog settling-time/robustness or true marginality — NOT as a strong fixed penalty that taxes the non-normality the BPTT-1.83 solution needs. The historical evidence fits this: the validated ~2.40 runs used *adaptive* jacreg; the diverging runs *froze it weak* — i.e. they removed the homeostat, not the integrator. --- ## Q3 — Anderson acceleration / implicit (IMEX) integrators **Verdict: Yes — they can replace explicit Euler as the *solver* and kill the discretization instability, and they are compatible with AsymEP *provided they converge to the same equilibria of the same vector fields*. They change nothing about the analog model; they are emulator-fidelity choices. Implicit Euler is unconditionally stable but per-step expensive (the solve is itself a relaxation). Anderson is the more practical lever: it both accelerates and can suppress the Euler cycle when a true fixed point exists, but it is not guaranteed and needs damping/restarts/residual gating.** ### (i) Compatibility with AsymEP **[SOLID]** The EP estimator depends on the *states*, not on how they were reached. `ep_step` computes `zs = relax(...)` and treats it as the free equilibrium (`:142-144`); the AsymEP correction uses local `Jv = jvp(nc_force, zs, v)`, `JTv = vjp(nc_force, zs, v)`, `corr = Jv - JTv` at `zs` (`:172-178`); the parameter gradient is `(a * f).sum()` with `f = force(zs.detach(), xin, cg=True)` (`:202-205`). None of this requires explicit Euler — it requires that `zs` is a genuine root `F(zs)≈0` and that the nudged states are equilibria of the nudged/corrected force. A better solver that returns the *same roots* is fully compatible, and the `-2A` correction is computed *at* `z*` regardless of the solver that found it. **[SOLID — important, refines prior framing] The nudged phase must also be re-solved.** The free phase is not the only explicit-Euler loop: the nudge (`nudge()` :163-180) and every holomorphic estimator (`holo_a`, `holo_a_select2`, `holo_a_track`, `holo_a_lockin` in `holo_ep.py`) advance with `z = z + eps*(f - corr)`. The `-2A` correction lives *inside* these loops. So "swap the integrator" means swap it in **both** phases; a solver that converges the free `z*` but leaves the nudged phase on coarse Euler will still corrupt `a = -dz*/dβ`. **[SOLID] Hard limit:** if the continuous field has no attracting root in the operating regime, no solver can manufacture the stationary state AsymEP needs — it will fail, find a spurious root, or return a numerical artifact. A solver fixes *integration*, not *non-existence of equilibrium*. (This is why the s3200 force-floor ambiguity from the Shared-mechanism section matters: confirm a true root exists before trusting AsymEP there.) ### (ii) Implicit / IMEX — tractable or self-defeating? **[SOLID, theory]** Backward Euler multiplier is `1/(1-h·mu)`, A-stable: for any `Re mu<0` it is stable at *every* step size, so it would kill the stiff-rotation Euler cycle outright. **[SOLID, cost]** Each backward step solves `y - h·F(y) - z_n = 0`, where `F` contains LN, causal softmax attention, and FFN. A Newton/Krylov/Picard solve needs several force evals and matrix-free JVP/VJP linear solves over the full `B·T·C` state per step — i.e. **the per-step solve is itself a relaxation/root-find**, which is the self-defeating risk for a default inner loop. **[UNCERTAIN/qualitative] IMEX nuance:** making only the cheap leak `-(1+c)z` implicit is trivial but does **not** tame the dangerous learned rotating attention mode (the danger is in `J_nc`, not the leak); treating `J_nc` implicitly reintroduces the big linear solve. So implicit/IMEX is best as a **robust fallback / macro-step / offline reference**, not the default per-step integrator. ### (iii) Anderson — speed only, or stabilization too? **[SOLID, conceptual]** Anderson (DEQ-style; `lt_ep_anderson.py` stores recent `X`, `G(X)=z+εF`, solves a small regularized least-squares for the mixing coefficients, extrapolates) is a quasi-Newton/GMRES-on- the-residual. For a Picard/Euler map whose oscillatory multiplier sits just outside the unit circle, the residual-minimizing extrapolation can **suppress the limit cycle**, not merely speed a contracting one — so it is more than acceleration. `lt_ep_anderson.py` is explicitly framed as exactly this test ("can a fixed-point solver converge the free phase where plain relaxation cannot?"). **[SOLID, caveats]** Not guaranteed: it cannot create a root that doesn't exist; aggressive mixing can diverge; it needs damping (β-mixing), restarts, and residual-monotonicity gating; and (per (i)) it must wrap the nudged phase too. Net: **strongest practical candidate** — cheaper than full implicit Newton, able to stabilize when a root exists, but must be safeguarded. ### (iv) Does integrator choice matter for the ANALOG target? **[SOLID] For the analog model itself: no.** Analog HW performs the true continuous relaxation of `F`; it runs no explicit Euler, no Anderson, no backward Euler. The integrator is not part of the deployed computation. **[SOLID] For digital training/eval of that target: yes, decisively.** Coarse explicit Euler can invent a limit cycle the analog system would never exhibit, corrupting both the loss and the equilibrium the EP gradient is taken at. The correct framing — and the right way to state it in the thesis — is exactly: > Analog HW does the true continuous relaxation; the simulator only needs a **faithful + cheap emulator** > of that relaxation. Adaptive ε, Anderson, and implicit/IMEX are all just *better emulators* — they > change the simulation's fidelity/cost, not the EP objective or the analog primitive. The one asymmetry to keep in mind: **jacreg is NOT in this "just a better emulator" bucket** (it edits the model the analog HW would run), whereas adaptive ε / Anderson / implicit ARE. That is the precise sense in which the integrator family is the analog-faithful fix and jacreg is the model-side one. ### Recommended solver strategy 1. Replace fixed ε=0.1 explicit Euler in the **free** phase with an overshoot/step-rejection adaptive solver (the corrected `adaptive_eps_calib2.py` logic), with a fail-open floor (Q1). 2. Add **damped Anderson with restarts + residual gating** for both free and nudged phases once the residual stalls/cycles; solve `F=0` rather than running a fixed Euler count and hoping. 3. Keep **implicit/backward Euler as a reference/fallback**, not the default inner loop (per-step cost). 4. Leave **AsymEP unchanged in principle**: find `z*`, find nudged equilibria, apply `Jv-JTv` at `z*`, and **gate the update** (`res_gate`, `:153-162`) when residual says no stationary state was found. 5. Retain **jacreg as a bounded adaptive homeostat** (analog settling / true-Hopf insurance), not as the primary fix. 6. For analog claims, report **solver-independent diagnostics**: force residual `‖F(z*)‖/‖z*‖` (NOT just the eps-scaled step residual — they differ by a factor of eps, which confounded the eps-sweep), and, when feasible, the leading continuous `mu` (sign of `Re mu`) and settling/ringing time. --- ## Summary table | Option | Eliminates or relocates wall | Changes model? | Analog-faithful? | Verdict | |---|---|---|---|---| | (a) adaptive ε | Eliminates if eps_min < eps_crit; else relocates | No | Yes (emulator) | **Primary fix** [SOLID mechanism; bounded-floor guarantee] | | (b) jacreg | Raises eps_crit (relocates in eps_crit-space) | Yes | No for the wall; yes for settling | **Secondary homeostat / crutch + true-Hopf insurance** | | (c) smaller fixed ε | Relocates only | No | Yes but inefficient | **Diagnostic / fallback** [SOLID] | | Anderson | Can eliminate cycle if a root exists | No | Yes (emulator) | **Best practical solver, needs safeguards** | | Implicit/IMEX | Eliminates (A-stable) | No | Yes (emulator) | **Correct but per-step costly; fallback/reference** | Key uncertainties flagged: (1) whether `|Im mu|` saturates vs grows as CE drops is **not directly measured** — code caps/qknorm/damping argue for bounded, but ε=0.1→0.05 data show it was still rising in-window; (2) whether s3200 has a true continuous fixed point (g→0) vs only a dead oscillation is **ambiguous** because the eps-sweep's step-residual ≠ force-residual; the clean continuous-stable evidence is s2000, not s3200.