# Path to a citable, build-on-able preprint — status Framing (locked 2026-06-19, per user correction): the axis is **expansive vs more-expansive** (graded; for TRM both classes have λ₁>0), NOT settled-vs-chaotic. The phenomenon is the **cleanness of a graded separation**; the **mechanism is explicitly OPEN** and is the natural seed for follow-on projects. This preprint = rigorous phenomenology + precise characterization + honest open-mechanism. Do NOT force a fixed-point / suppression-of-chaos framing (rejected). ## Tier 0 — measurement bulletproofing (others build on it) - [x] **T0.1 estimator validation** — `paper/validation/`: QR/Benettin core recovers known spectra to <1e-3 (diagonal, symmetric, non-normal asymptotic) and Hénon λ₁ to 8e-5. PASS. Confirms the numerical core (orthonormalization cadence, log|diagR| bookkeeping, ordering, averaging). - [ ] **T0.2 robustness reruns (GPU)** — λ stability vs t_ons, tangent-basis seed, k>8. Window dependence already covered offline (Char 3). Small queue; spec below. - [ ] **T0.3 language/scope pass** — finite-time vs asymptotic, "expansive not chaotic" for HRM (negative λ), metric/coordinate-dependence caveat (Lohmiller–Slotine). Prose task. ## Characterization (replaces the rejected Tier 2; describes WHAT, not WHY) - [x] **Char 1 whole-spectrum** — separation is a ~rigid shift of the ENTIRE k=8 spectrum, not a single mode (per-exponent AUC uniformly 0.98–0.99; HRM gap ≈constant −0.16/exponent). Spectral MEAN separates ≥ λ₁ alone (AUC 0.991–0.995). CAVEAT: KS-proxy Σλ⁺ is the wrong aggregate for HRM (all-negative spectra → 0); use spectral mean for HRM. - [x] **Char 2 shape** — two overlapping UNIMODAL classes with well-separated means, NOT two discrete clusters (within-class BC 0.26–0.40). Outcome is a moderately sharp threshold on the λ₁ continuum (25→75% transition spans 12–30% of the λ₁ spread). - [x] **Char 3 integration-time scaling (the key descriptor)** — separation BUILDS monotonically with window H: Cohen's d 1.06→4.84 (TRM, H=2→16), 0.03→3.45 (HRM). Near-zero at H=2, near-perfect at the full 16-segment budget. The cleanness is an integration-time phenomenon. COHERENCE with E5: this accumulation tracks the unfolding of outcomes (more trajectories revealed by larger H), NOT anticipation — among undecided@H examples λ₁ still doesn't predict (E5). State both together. - [x] **Char 4 effect size** — "clean" quantified: Cohen's d 3.4–4.8, distributional overlap <10% (TRM hist-overlap 0.049). Beyond AUC. ## Tier 1 — causal content (the level-up from correlation) - [ ] **T1 inference-side causal probe** — nudge a failing trajectory toward lower expansion (or toward the success-mean manifold) mid-rollout and measure outcome recovery; conversely inject expansion into a settling-correct trajectory. Tests settling⟹correct as causal, not correlational. Spec next. GPU. ## Open-mechanism (NOT this paper; the hook for follow-ons) Why a graded (both-expansive) difference separates so cleanly. Char 1–4 bound the description; the why is deferred. Candidate angles are the user's to pursue, not asserted here. ## Maze cross-task result + checkpoint evolution (2026-06-20) **Deflationary finding stands and is now grounded:** the FTLE/CLV separation reduces to convergence+confidence (λ1, full k=8 spectrum, AND leading-CLV geometry all reduce; partial-corr →0 once drift+q_halt controlled). The dynamical signal is a (redundant) convergence readout. **Maze (TRM att, friend's run, all 10 ckpts, k=1):** separation WEAK (λ1 Cohen's d 0.2–0.5 vs Sudoku 3–5). Failures SETTLE (B/fail 0.81–0.98, D/fail 0.02–0.19) at ALL ckpts and are NEAR-MISSES (token_acc ~0.97). Opposite of Sudoku (failures wander, far-from-correct token ~0.63). **Checkpoint evolution (the key new result, offline):** wandering is a LATE-TRAINING property. Sudoku HRM failures SETTLE early (B/fail ~0.9 at acc 2–15%) then flip to WANDER late (D/fail ~1.0 at acc 50%), transition ~step 13–18k. So "failures wander" is learned, not intrinsic. BUT matched-accuracy contrast cuts the other way: at acc≈0.76, Sudoku-TRM D/fail=1.00 vs Maze-TRM D/fail=0.19 — same skill, opposite dynamics → TASK STRUCTURE also matters, not just maturity. And early-Sudoku settling (token 0.63, confidently-wrong) ≠ Maze settling (token 0.97, near-miss): not the same phenomenon. Fig: analysis_2x2/checkpoint_evolution_wander.png. **Task structure (offline):** Maze solution path (median 113 cells) passes through ~76 branch points (67% of path cells at deg≥3 junctions; 48% of open cells are junctions) → abundant locally-coherent alternative paths = many STABLE WRONG ANSWERS available. Sudoku: unique globally-coupled solution, a wrong cell violates constraints globally → no local near-miss equilibrium. This structurally explains settle-to-near-miss (Maze) vs wander (Sudoku). **Unresolved confound (queued):** TRM-Maze never develops wandering, but can't tell task-structure from TRM-Maze SATURATION (Maze too easy for TRM). Queued before HRM-Maze: (1) continue-train TRM-Maze from step_130200 (does acc climb toward ~1.0 = saturation, or plateau?); (2) per-cell failure structure (are failure errors a connected detour = coherent stable wrong path, or scattered?). Then HRM-Maze (harder model-task fit, more likely to be stressed into wandering). ## Solution-space test (2026-06-20) — refutes the measurement-artifact concern, strengthens task-structure User asked: is weak Maze separation an artifact of analyzing the FULL latent (88% trivial copy) instead of the SOLUTION space? Tested directly: per-step decoded-ANSWER Hamming drift over solution cells (label!=input), Maze vs Sudoku control. - MAZE: failures SETTLE in solution space too (late answer-drift median 0.00, 98.4% settled; AUC 0.30). Same conclusion as full-latent. NOT an artifact. - SUDOKU control: failures DON'T settle in solution space (late drift median 8.5/step, 0% settled; AUC 0.99). Same as full-latent. Both spaces agree. - Per-cell failure STRUCTURE (direct task-structure evidence): MAZE failures = CONNECTED DETOUR (97% have ≤2 error components, median 22 cells one blob) = a coherent stable wrong PATH. SUDOKU failures = SCATTERED (100% have ≥5 components, median 13) = no coherent wrong answer. Fig: analysis_2x2/maze_failure_detour.png. This is the mechanism-grounding for why Maze settles (stable wrong answers exist as detours) and Sudoku wanders (no stable wrong answer). ## CORRECTION (2026-06-20) — Maze exact-match labeling was the artifact; failure=more-chaotic HOLDS The earlier "Maze dissociates / completeness≠correctness" reading was largely a LABELING ARTIFACT, not a real dynamical dissociation. Maze exact-match marks VALID alternative solutions (incl. equal-length valid shortest paths) as "failures"; 100% of exact-match "failures" are valid connected paths (complete answers) → they settle, trivially. That is a benchmark-design flaw, not a result. **Under the correct criterion (CONNECTIVITY = is it a valid complete path = is it actually solved):** genuine failures (broken/disconnected) ARE more chaotic — AUC(-late_drift→connected) = 0.864 @step_13020 (15 broken), 0.895 pooled (18 broken); bootstrap 95% CI [0.80, 0.96], excludes 0.5. So "failure = more chaotic" is TASK-GENERAL (Sudoku + Maze) once failure is defined by validity. LIMITATION (now RESOLVED): trained Maze SATURATES before the first saved ckpt (step_13020 already 97% complete) → only n=18 broken from existing ckpts. FIX DONE: fresh early-save TRM-Maze run (maze_earlysave_freshTRM, saved every 250 epochs) captured the broken-rich pre-saturation phase; cheap forward dumps (drift_zH + connectivity, no JVP) on 8 early ckpts give **n=4096, 1835 broken**. **Pooled: AUC(-latent drift_zH -> connected/complete) = 0.834, bootstrap 95% CI [0.822, 0.846]** (broken late-drift median 1.06 vs connected 0.56). Per-ckpt AUC rises with training 0.66->0.88 (mirrors Sudoku's separation-grows-with-training). So 'genuine failure (incomplete) = more chaotic' is now LARGE-N BULLETPROOF on Maze under validity labeling. Fig: maze_broken_morechaotic.png. Honest detail: 'more chaotic' is a LATENT-dynamics property (drift_zH AUC 0.834, λ1 AUC 0.86); the DECODED-answer drift does NOT separate (ans_drift AUC 0.38) — broken paths commit an incomplete decoded answer while churning internally. Consistent with the FTLE/drift (latent) story. ## Synthesis for the paper (current honest thesis, corrected) Genuine failures (incomplete/invalid answers) are MORE CHAOTIC — measurable, task-general (Sudoku; Maze under validity labeling). Mechanism: the dynamical signal detects answer completeness/convergence (FTLE reducible to drift+q_halt). On unique-solution tasks completeness=correctness, so it predicts correctness directly. On multi-solution tasks exact-match mislabels valid alternatives as failures; use validity labeling. The phenomenon stands; the convergence-detection mechanism is the honest interpretation, not a refutation. ## Status: offline T0.1 + Char 1–4 + Maze evolution + task structure DONE. Running: TRM CLV (done), ## HRM CLV (queued on card1), maze-followup queue (continue-train + per-cell, waiting for GPU). ## Remaining: T0.2/T0.3, T1, HRM-Maze (after saturation test).