diff options
Diffstat (limited to 'paper/readiness.md')
| -rw-r--r-- | paper/readiness.md | 119 |
1 files changed, 119 insertions, 0 deletions
diff --git a/paper/readiness.md b/paper/readiness.md new file mode 100644 index 0000000..f07f78e --- /dev/null +++ b/paper/readiness.md @@ -0,0 +1,119 @@ +# Path to a citable, build-on-able preprint — status + +Framing (locked 2026-06-19, per user correction): the axis is **expansive vs more-expansive** +(graded; for TRM both classes have λ₁>0), NOT settled-vs-chaotic. The phenomenon is the +**cleanness of a graded separation**; the **mechanism is explicitly OPEN** and is the natural seed +for follow-on projects. This preprint = rigorous phenomenology + precise characterization + +honest open-mechanism. Do NOT force a fixed-point / suppression-of-chaos framing (rejected). + +## Tier 0 — measurement bulletproofing (others build on it) +- [x] **T0.1 estimator validation** — `paper/validation/`: QR/Benettin core recovers known spectra + to <1e-3 (diagonal, symmetric, non-normal asymptotic) and Hénon λ₁ to 8e-5. PASS. Confirms the + numerical core (orthonormalization cadence, log|diagR| bookkeeping, ordering, averaging). +- [ ] **T0.2 robustness reruns (GPU)** — λ stability vs t_ons, tangent-basis seed, k>8. Window + dependence already covered offline (Char 3). Small queue; spec below. +- [ ] **T0.3 language/scope pass** — finite-time vs asymptotic, "expansive not chaotic" for HRM + (negative λ), metric/coordinate-dependence caveat (Lohmiller–Slotine). Prose task. + +## Characterization (replaces the rejected Tier 2; describes WHAT, not WHY) +- [x] **Char 1 whole-spectrum** — separation is a ~rigid shift of the ENTIRE k=8 spectrum, not a + single mode (per-exponent AUC uniformly 0.98–0.99; HRM gap ≈constant −0.16/exponent). Spectral + MEAN separates ≥ λ₁ alone (AUC 0.991–0.995). CAVEAT: KS-proxy Σλ⁺ is the wrong aggregate for HRM + (all-negative spectra → 0); use spectral mean for HRM. +- [x] **Char 2 shape** — two overlapping UNIMODAL classes with well-separated means, NOT two + discrete clusters (within-class BC 0.26–0.40). Outcome is a moderately sharp threshold on the + λ₁ continuum (25→75% transition spans 12–30% of the λ₁ spread). +- [x] **Char 3 integration-time scaling (the key descriptor)** — separation BUILDS monotonically + with window H: Cohen's d 1.06→4.84 (TRM, H=2→16), 0.03→3.45 (HRM). Near-zero at H=2, near-perfect + at the full 16-segment budget. The cleanness is an integration-time phenomenon. COHERENCE with + E5: this accumulation tracks the unfolding of outcomes (more trajectories revealed by larger H), + NOT anticipation — among undecided@H examples λ₁ still doesn't predict (E5). State both together. +- [x] **Char 4 effect size** — "clean" quantified: Cohen's d 3.4–4.8, distributional overlap + <10% (TRM hist-overlap 0.049). Beyond AUC. + +## Tier 1 — causal content (the level-up from correlation) +- [ ] **T1 inference-side causal probe** — nudge a failing trajectory toward lower expansion (or + toward the success-mean manifold) mid-rollout and measure outcome recovery; conversely inject + expansion into a settling-correct trajectory. Tests settling⟹correct as causal, not correlational. + Spec next. GPU. + +## Open-mechanism (NOT this paper; the hook for follow-ons) +Why a graded (both-expansive) difference separates so cleanly. Char 1–4 bound the description; +the why is deferred. Candidate angles are the user's to pursue, not asserted here. + +## Maze cross-task result + checkpoint evolution (2026-06-20) + +**Deflationary finding stands and is now grounded:** the FTLE/CLV separation reduces to +convergence+confidence (λ1, full k=8 spectrum, AND leading-CLV geometry all reduce; partial-corr +→0 once drift+q_halt controlled). The dynamical signal is a (redundant) convergence readout. + +**Maze (TRM att, friend's run, all 10 ckpts, k=1):** separation WEAK (λ1 Cohen's d 0.2–0.5 vs +Sudoku 3–5). Failures SETTLE (B/fail 0.81–0.98, D/fail 0.02–0.19) at ALL ckpts and are NEAR-MISSES +(token_acc ~0.97). Opposite of Sudoku (failures wander, far-from-correct token ~0.63). + +**Checkpoint evolution (the key new result, offline):** wandering is a LATE-TRAINING property. +Sudoku HRM failures SETTLE early (B/fail ~0.9 at acc 2–15%) then flip to WANDER late +(D/fail ~1.0 at acc 50%), transition ~step 13–18k. So "failures wander" is learned, not intrinsic. +BUT matched-accuracy contrast cuts the other way: at acc≈0.76, Sudoku-TRM D/fail=1.00 vs +Maze-TRM D/fail=0.19 — same skill, opposite dynamics → TASK STRUCTURE also matters, not just maturity. +And early-Sudoku settling (token 0.63, confidently-wrong) ≠ Maze settling (token 0.97, near-miss): +not the same phenomenon. Fig: analysis_2x2/checkpoint_evolution_wander.png. + +**Task structure (offline):** Maze solution path (median 113 cells) passes through ~76 branch +points (67% of path cells at deg≥3 junctions; 48% of open cells are junctions) → abundant +locally-coherent alternative paths = many STABLE WRONG ANSWERS available. Sudoku: unique +globally-coupled solution, a wrong cell violates constraints globally → no local near-miss +equilibrium. This structurally explains settle-to-near-miss (Maze) vs wander (Sudoku). + +**Unresolved confound (queued):** TRM-Maze never develops wandering, but can't tell task-structure +from TRM-Maze SATURATION (Maze too easy for TRM). Queued before HRM-Maze: +(1) continue-train TRM-Maze from step_130200 (does acc climb toward ~1.0 = saturation, or plateau?); +(2) per-cell failure structure (are failure errors a connected detour = coherent stable wrong path, +or scattered?). Then HRM-Maze (harder model-task fit, more likely to be stressed into wandering). + +## Solution-space test (2026-06-20) — refutes the measurement-artifact concern, strengthens task-structure +User asked: is weak Maze separation an artifact of analyzing the FULL latent (88% trivial copy) +instead of the SOLUTION space? Tested directly: per-step decoded-ANSWER Hamming drift over +solution cells (label!=input), Maze vs Sudoku control. +- MAZE: failures SETTLE in solution space too (late answer-drift median 0.00, 98.4% settled; + AUC 0.30). Same conclusion as full-latent. NOT an artifact. +- SUDOKU control: failures DON'T settle in solution space (late drift median 8.5/step, 0% settled; + AUC 0.99). Same as full-latent. Both spaces agree. +- Per-cell failure STRUCTURE (direct task-structure evidence): MAZE failures = CONNECTED DETOUR + (97% have ≤2 error components, median 22 cells one blob) = a coherent stable wrong PATH. + SUDOKU failures = SCATTERED (100% have ≥5 components, median 13) = no coherent wrong answer. + Fig: analysis_2x2/maze_failure_detour.png. This is the mechanism-grounding for why Maze settles + (stable wrong answers exist as detours) and Sudoku wanders (no stable wrong answer). + +## CORRECTION (2026-06-20) — Maze exact-match labeling was the artifact; failure=more-chaotic HOLDS +The earlier "Maze dissociates / completeness≠correctness" reading was largely a LABELING ARTIFACT, +not a real dynamical dissociation. Maze exact-match marks VALID alternative solutions (incl. +equal-length valid shortest paths) as "failures"; 100% of exact-match "failures" are valid connected +paths (complete answers) → they settle, trivially. That is a benchmark-design flaw, not a result. +**Under the correct criterion (CONNECTIVITY = is it a valid complete path = is it actually solved):** +genuine failures (broken/disconnected) ARE more chaotic — AUC(-late_drift→connected) = 0.864 @step_13020 +(15 broken), 0.895 pooled (18 broken); bootstrap 95% CI [0.80, 0.96], excludes 0.5. So +"failure = more chaotic" is TASK-GENERAL (Sudoku + Maze) once failure is defined by validity. +LIMITATION (now RESOLVED): trained Maze SATURATES before the first saved ckpt (step_13020 already +97% complete) → only n=18 broken from existing ckpts. FIX DONE: fresh early-save TRM-Maze run +(maze_earlysave_freshTRM, saved every 250 epochs) captured the broken-rich pre-saturation phase; +cheap forward dumps (drift_zH + connectivity, no JVP) on 8 early ckpts give **n=4096, 1835 broken**. +**Pooled: AUC(-latent drift_zH -> connected/complete) = 0.834, bootstrap 95% CI [0.822, 0.846]** +(broken late-drift median 1.06 vs connected 0.56). Per-ckpt AUC rises with training 0.66->0.88 +(mirrors Sudoku's separation-grows-with-training). So 'genuine failure (incomplete) = more chaotic' +is now LARGE-N BULLETPROOF on Maze under validity labeling. Fig: maze_broken_morechaotic.png. +Honest detail: 'more chaotic' is a LATENT-dynamics property (drift_zH AUC 0.834, λ1 AUC 0.86); +the DECODED-answer drift does NOT separate (ans_drift AUC 0.38) — broken paths commit an incomplete +decoded answer while churning internally. Consistent with the FTLE/drift (latent) story. + +## Synthesis for the paper (current honest thesis, corrected) +Genuine failures (incomplete/invalid answers) are MORE CHAOTIC — measurable, task-general (Sudoku; +Maze under validity labeling). Mechanism: the dynamical signal detects answer completeness/convergence +(FTLE reducible to drift+q_halt). On unique-solution tasks completeness=correctness, so it predicts +correctness directly. On multi-solution tasks exact-match mislabels valid alternatives as failures; +use validity labeling. The phenomenon stands; the convergence-detection mechanism is the honest +interpretation, not a refutation. + +## Status: offline T0.1 + Char 1–4 + Maze evolution + task structure DONE. Running: TRM CLV (done), +## HRM CLV (queued on card1), maze-followup queue (continue-train + per-cell, waiting for GPU). +## Remaining: T0.2/T0.3, T1, HRM-Maze (after saturation test). |
