summaryrefslogtreecommitdiff
path: root/paper/readiness.md
diff options
context:
space:
mode:
Diffstat (limited to 'paper/readiness.md')
-rw-r--r--paper/readiness.md119
1 files changed, 119 insertions, 0 deletions
diff --git a/paper/readiness.md b/paper/readiness.md
new file mode 100644
index 0000000..f07f78e
--- /dev/null
+++ b/paper/readiness.md
@@ -0,0 +1,119 @@
+# Path to a citable, build-on-able preprint — status
+
+Framing (locked 2026-06-19, per user correction): the axis is **expansive vs more-expansive**
+(graded; for TRM both classes have λ₁>0), NOT settled-vs-chaotic. The phenomenon is the
+**cleanness of a graded separation**; the **mechanism is explicitly OPEN** and is the natural seed
+for follow-on projects. This preprint = rigorous phenomenology + precise characterization +
+honest open-mechanism. Do NOT force a fixed-point / suppression-of-chaos framing (rejected).
+
+## Tier 0 — measurement bulletproofing (others build on it)
+- [x] **T0.1 estimator validation** — `paper/validation/`: QR/Benettin core recovers known spectra
+ to <1e-3 (diagonal, symmetric, non-normal asymptotic) and Hénon λ₁ to 8e-5. PASS. Confirms the
+ numerical core (orthonormalization cadence, log|diagR| bookkeeping, ordering, averaging).
+- [ ] **T0.2 robustness reruns (GPU)** — λ stability vs t_ons, tangent-basis seed, k>8. Window
+ dependence already covered offline (Char 3). Small queue; spec below.
+- [ ] **T0.3 language/scope pass** — finite-time vs asymptotic, "expansive not chaotic" for HRM
+ (negative λ), metric/coordinate-dependence caveat (Lohmiller–Slotine). Prose task.
+
+## Characterization (replaces the rejected Tier 2; describes WHAT, not WHY)
+- [x] **Char 1 whole-spectrum** — separation is a ~rigid shift of the ENTIRE k=8 spectrum, not a
+ single mode (per-exponent AUC uniformly 0.98–0.99; HRM gap ≈constant −0.16/exponent). Spectral
+ MEAN separates ≥ λ₁ alone (AUC 0.991–0.995). CAVEAT: KS-proxy Σλ⁺ is the wrong aggregate for HRM
+ (all-negative spectra → 0); use spectral mean for HRM.
+- [x] **Char 2 shape** — two overlapping UNIMODAL classes with well-separated means, NOT two
+ discrete clusters (within-class BC 0.26–0.40). Outcome is a moderately sharp threshold on the
+ λ₁ continuum (25→75% transition spans 12–30% of the λ₁ spread).
+- [x] **Char 3 integration-time scaling (the key descriptor)** — separation BUILDS monotonically
+ with window H: Cohen's d 1.06→4.84 (TRM, H=2→16), 0.03→3.45 (HRM). Near-zero at H=2, near-perfect
+ at the full 16-segment budget. The cleanness is an integration-time phenomenon. COHERENCE with
+ E5: this accumulation tracks the unfolding of outcomes (more trajectories revealed by larger H),
+ NOT anticipation — among undecided@H examples λ₁ still doesn't predict (E5). State both together.
+- [x] **Char 4 effect size** — "clean" quantified: Cohen's d 3.4–4.8, distributional overlap
+ <10% (TRM hist-overlap 0.049). Beyond AUC.
+
+## Tier 1 — causal content (the level-up from correlation)
+- [ ] **T1 inference-side causal probe** — nudge a failing trajectory toward lower expansion (or
+ toward the success-mean manifold) mid-rollout and measure outcome recovery; conversely inject
+ expansion into a settling-correct trajectory. Tests settling⟹correct as causal, not correlational.
+ Spec next. GPU.
+
+## Open-mechanism (NOT this paper; the hook for follow-ons)
+Why a graded (both-expansive) difference separates so cleanly. Char 1–4 bound the description;
+the why is deferred. Candidate angles are the user's to pursue, not asserted here.
+
+## Maze cross-task result + checkpoint evolution (2026-06-20)
+
+**Deflationary finding stands and is now grounded:** the FTLE/CLV separation reduces to
+convergence+confidence (λ1, full k=8 spectrum, AND leading-CLV geometry all reduce; partial-corr
+→0 once drift+q_halt controlled). The dynamical signal is a (redundant) convergence readout.
+
+**Maze (TRM att, friend's run, all 10 ckpts, k=1):** separation WEAK (λ1 Cohen's d 0.2–0.5 vs
+Sudoku 3–5). Failures SETTLE (B/fail 0.81–0.98, D/fail 0.02–0.19) at ALL ckpts and are NEAR-MISSES
+(token_acc ~0.97). Opposite of Sudoku (failures wander, far-from-correct token ~0.63).
+
+**Checkpoint evolution (the key new result, offline):** wandering is a LATE-TRAINING property.
+Sudoku HRM failures SETTLE early (B/fail ~0.9 at acc 2–15%) then flip to WANDER late
+(D/fail ~1.0 at acc 50%), transition ~step 13–18k. So "failures wander" is learned, not intrinsic.
+BUT matched-accuracy contrast cuts the other way: at acc≈0.76, Sudoku-TRM D/fail=1.00 vs
+Maze-TRM D/fail=0.19 — same skill, opposite dynamics → TASK STRUCTURE also matters, not just maturity.
+And early-Sudoku settling (token 0.63, confidently-wrong) ≠ Maze settling (token 0.97, near-miss):
+not the same phenomenon. Fig: analysis_2x2/checkpoint_evolution_wander.png.
+
+**Task structure (offline):** Maze solution path (median 113 cells) passes through ~76 branch
+points (67% of path cells at deg≥3 junctions; 48% of open cells are junctions) → abundant
+locally-coherent alternative paths = many STABLE WRONG ANSWERS available. Sudoku: unique
+globally-coupled solution, a wrong cell violates constraints globally → no local near-miss
+equilibrium. This structurally explains settle-to-near-miss (Maze) vs wander (Sudoku).
+
+**Unresolved confound (queued):** TRM-Maze never develops wandering, but can't tell task-structure
+from TRM-Maze SATURATION (Maze too easy for TRM). Queued before HRM-Maze:
+(1) continue-train TRM-Maze from step_130200 (does acc climb toward ~1.0 = saturation, or plateau?);
+(2) per-cell failure structure (are failure errors a connected detour = coherent stable wrong path,
+or scattered?). Then HRM-Maze (harder model-task fit, more likely to be stressed into wandering).
+
+## Solution-space test (2026-06-20) — refutes the measurement-artifact concern, strengthens task-structure
+User asked: is weak Maze separation an artifact of analyzing the FULL latent (88% trivial copy)
+instead of the SOLUTION space? Tested directly: per-step decoded-ANSWER Hamming drift over
+solution cells (label!=input), Maze vs Sudoku control.
+- MAZE: failures SETTLE in solution space too (late answer-drift median 0.00, 98.4% settled;
+ AUC 0.30). Same conclusion as full-latent. NOT an artifact.
+- SUDOKU control: failures DON'T settle in solution space (late drift median 8.5/step, 0% settled;
+ AUC 0.99). Same as full-latent. Both spaces agree.
+- Per-cell failure STRUCTURE (direct task-structure evidence): MAZE failures = CONNECTED DETOUR
+ (97% have ≤2 error components, median 22 cells one blob) = a coherent stable wrong PATH.
+ SUDOKU failures = SCATTERED (100% have ≥5 components, median 13) = no coherent wrong answer.
+ Fig: analysis_2x2/maze_failure_detour.png. This is the mechanism-grounding for why Maze settles
+ (stable wrong answers exist as detours) and Sudoku wanders (no stable wrong answer).
+
+## CORRECTION (2026-06-20) — Maze exact-match labeling was the artifact; failure=more-chaotic HOLDS
+The earlier "Maze dissociates / completeness≠correctness" reading was largely a LABELING ARTIFACT,
+not a real dynamical dissociation. Maze exact-match marks VALID alternative solutions (incl.
+equal-length valid shortest paths) as "failures"; 100% of exact-match "failures" are valid connected
+paths (complete answers) → they settle, trivially. That is a benchmark-design flaw, not a result.
+**Under the correct criterion (CONNECTIVITY = is it a valid complete path = is it actually solved):**
+genuine failures (broken/disconnected) ARE more chaotic — AUC(-late_drift→connected) = 0.864 @step_13020
+(15 broken), 0.895 pooled (18 broken); bootstrap 95% CI [0.80, 0.96], excludes 0.5. So
+"failure = more chaotic" is TASK-GENERAL (Sudoku + Maze) once failure is defined by validity.
+LIMITATION (now RESOLVED): trained Maze SATURATES before the first saved ckpt (step_13020 already
+97% complete) → only n=18 broken from existing ckpts. FIX DONE: fresh early-save TRM-Maze run
+(maze_earlysave_freshTRM, saved every 250 epochs) captured the broken-rich pre-saturation phase;
+cheap forward dumps (drift_zH + connectivity, no JVP) on 8 early ckpts give **n=4096, 1835 broken**.
+**Pooled: AUC(-latent drift_zH -> connected/complete) = 0.834, bootstrap 95% CI [0.822, 0.846]**
+(broken late-drift median 1.06 vs connected 0.56). Per-ckpt AUC rises with training 0.66->0.88
+(mirrors Sudoku's separation-grows-with-training). So 'genuine failure (incomplete) = more chaotic'
+is now LARGE-N BULLETPROOF on Maze under validity labeling. Fig: maze_broken_morechaotic.png.
+Honest detail: 'more chaotic' is a LATENT-dynamics property (drift_zH AUC 0.834, λ1 AUC 0.86);
+the DECODED-answer drift does NOT separate (ans_drift AUC 0.38) — broken paths commit an incomplete
+decoded answer while churning internally. Consistent with the FTLE/drift (latent) story.
+
+## Synthesis for the paper (current honest thesis, corrected)
+Genuine failures (incomplete/invalid answers) are MORE CHAOTIC — measurable, task-general (Sudoku;
+Maze under validity labeling). Mechanism: the dynamical signal detects answer completeness/convergence
+(FTLE reducible to drift+q_halt). On unique-solution tasks completeness=correctness, so it predicts
+correctness directly. On multi-solution tasks exact-match mislabels valid alternatives as failures;
+use validity labeling. The phenomenon stands; the convergence-detection mechanism is the honest
+interpretation, not a refutation.
+
+## Status: offline T0.1 + Char 1–4 + Maze evolution + task structure DONE. Running: TRM CLV (done),
+## HRM CLV (queued on card1), maze-followup queue (continue-train + per-cell, waiting for GPU).
+## Remaining: T0.2/T0.3, T1, HRM-Maze (after saturation test).