rrm workspace: TRM/HRM/SRM code, Maze dataset, dynamical-analysis pipelineHEAD main

Curated export for clone-and-run Maze training (2x A6000) + diagnostics. trm/hrm pretrain.py carry trajectory-augmentation code (backward-compatible). Heavy artifacts (checkpoints/wandb/npz) gitignored; see PROVENANCE.md. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
author: YurenHao0426 <blackhao0426@gmail.com> 2026-06-13 12:35:36 -0500
committer: YurenHao0426 <blackhao0426@gmail.com> 2026-06-13 12:35:36 -0500
commit: 66e0d8b9fd4d0f7a2231d689c055e26fdf1cf04a (patch)
tree: c29cba61124018755a19b02c9d33e3ad5f2e05cc /research/flossing/analysis_2x2/OBSERVATIONS.md
1 files changed, 155 insertions, 0 deletions
diff --git a/research/flossing/analysis_2x2/OBSERVATIONS.md b/research/flossing/analysis_2x2/OBSERVATIONS.md
new file mode 100644
index 0000000..d0b0670
--- /dev/null
+++ b/research/flossing/analysis_2x2/OBSERVATIONS.md
@@ -0,0 +1,155 @@
+# 2×2 analysis: (late-trajectory settling) × (answer correctness) — observations
+
+Date: 2026-06-11. Code: `analyze_2x2.py` (+ ad-hoc strict-threshold checks logged in session).
+All statements below are measurements on existing diagnostic npz files; no new GPU runs.
+
+## Data provenance (CRITICAL: λ scales are NOT comparable across estimator versions)
+
+| dataset | model / ckpt | n | estimator | window |
+|---|---|---|---|---|
+| `diag_8k.npz` | HRM righteous-python @ step_26040 (H=2,L=2) | 8192 | `diagnose_hrm.py` (May 22 version) | full 16 ACT steps |
+| `diag_trm_singleGPU_step*_512.npz` ×10 | TRM mlp_t singleGPU @ 26041…260410 | 512 each | `diagnose_trm_joint.py` (joint JVP+QR, per-sub-update norm) | full 16 ACT steps |
+| `diag_hrm_step_*_512.npz` ×10 | HRM @ 2604…26040 | 512 each | joint estimator (−0.15-scale, matches step7 reports) | full 16 ACT steps |
+
+Within-dataset comparisons only. "Converged"/"settled" here = late z_H drift (mean over ACT steps 13–16)
+falls in the low-drift band; threshold from Otsu on pooled log10 drift, robustness via percentile sweep.
+Settled ≠ literal fixed point: the settled band has a narrow characteristic residual velocity
+(HRM ≈0.96/step, IQR-width <0.03 in log10; TRM ≈18.5/step, q10–q90 = 15.8–21.4).
+
+## Headline numbers
+
+### HRM @26040, n=8192, exact_acc 0.525 (full-window λ, May-22 estimator scale)
+Otsu τ → cells: A(settled,correct)=4103, B(settled,wrong)=63, C(unsettled,correct)=195, D(unsettled,wrong)=3831.
+
+- Failure mass is overwhelmingly unsettled: wrong-that-settled = 1.6% at Otsu τ; at a STRICT in-band
+  threshold (45th pct, drift<0.97) it is 21/3894 = **0.55% of failures**.
+- The strict-band B examples (n=21) have λ₁ median **−0.842** vs A −0.867 (success-like contraction),
+  q_halt(final) median **+7.47 — identical to A** (ACT-confident), token_acc 0.41–0.88 (median 0.62,
+  substantially wrong, not near-misses). These are confidently-wrong settled answers — invisible to
+  both a stability-based and an ACT-confidence-based selector.
+- λ₁ as a predictor: AUC(−λ₁→correct) = 0.984 ≈ AUC(−λ₁→settled) = 0.986. At the Otsu split the
+  within-settled outcome gap (Δλ₁ −0.27, AUC 0.852) is a THRESHOLD-MIXTURE ARTIFACT: the Otsu τ sits
+  above the band gap (drift distribution has a near-empty region between pct≈45 and pct≈55,
+  τ jumps −0.013→1.39), so Otsu-"settled" includes mid-drift examples. At the strict in-band τ the
+  within-settled gap nearly closes (−0.867 vs −0.842).
+- Residual outcome signal in the unsettled stratum: C vs D Δλ₁(median) = −0.094, AUC 0.818
+  (C n=195). Note C drifts MORE than D (median 55.4 vs 40.8) while being more contracting in λ₁.
+- AUC(−log late-drift→correct) = 0.964.
+
+### TRM singleGPU @260410 (final), n=512, exact_acc 0.770 (joint estimator)
+Cells: A=383, B=**0**, C=11, D=118.
+
+- **B = 0 at every threshold**: drift distributions of settled-correct (q90 = 21.4) and wrong
+  (q10 = 47.8) are completely separated — no wrong example reaches the settled band. Robust across
+  the entire percentile sweep (nB=0, nD=118 constant). Same at step 130205. Across the 10-ckpt series,
+  fB>0 only at step 26041 (1.2%).
+- λ₁ medians: A +0.0047 (≈0, consistent with "TRM success short-window λ₁>0 is acceptable"),
+  C +0.0998, D +0.1023. Within-unsettled outcome gap ≈ 0 (Δ −0.0025; AUC 0.619, C n=11).
+  AUC(−λ₁→correct) = 0.989 ≈ AUC(−λ₁→settled) = 0.996: at the final checkpoint, λ₁'s outcome
+  signal in TRM is (almost) entirely settling-regime detection.
+- q_halt(final) median: A +7.44, C +7.41, D −11.1.
+
+## Checkpoint evolution (512/ckpt; within-series comparisons)
+
+- TRM (26041→260410): fB≈0 from 52082 on; fD shrinks 0.41→0.23; λ₁(D) rises monotonically
+  +0.036→+0.102 while λ₁(A) stays ≈0 — the success/failure λ₁ gap widens over training via the
+  failure cell becoming more expansive.
+- HRM (2604→26040, joint-estimator series): mass migration over training:
+  fB 0.89→0.008 (early ckpts: nearly all examples low-drift & wrong), fD rises to ~0.52 mid-training,
+  fA grows with accuracy; λ₁(D) rises from −0.087 to +0.023 (sign flip ~step 15–18k), λ₁(A) stays
+  −0.10…−0.20.
+
+## Direct answers to the motivating questions
+
+1. **Is the failure FTLE cluster a mixture of wandering + wrong-fixed-point modes?**
+   Measured: yes, but extremely lopsided. Wandering dominates (HRM ≥98.4% of failures at Otsu τ,
+   99.45% at strict τ; TRM final 100%). The wrong-fixed-point mode exists in HRM (21 examples) with
+   exactly the predicted signature (success-like λ₁), and is absent in TRM at late checkpoints.
+2. **Does conditioning on settling absorb the success/failure FTLE gap?**
+   TRM final: essentially yes (within-stratum AUC 0.619 with n=11; regime AUC 0.996).
+   HRM: mostly yes after threshold correction (strict-band within-settled gap ≈0.025), with one
+   genuine residual: the unsettled stratum retains an outcome gap (AUC 0.818, C n=195).
+3. **Relation to published taxonomies** (factual cross-references):
+   Ren & Liu's four HRM modes map onto our cells; their "non-trivial failure (converged to wrong
+   fixed point)" = our strict-band B (rare: 0.55% of failures); their "trivial failure (wanders or
+   oscillates)" = our D (dominant). Efstathiou & Balwani's "failed runs plateau at stable high-loss
+   attractors" (TRM): by our state-drift criterion, TRM failures are NOT settled (B=0, drift ≥~48/step,
+   ~0.77× the early-trajectory velocity); their "stable" refers to loss plateaus/bounded regions,
+   not state convergence. Our drift+λ₁ measurement distinguishes these.
+
+## Addendum (same day, offline follow-ups — see offline_followups/followups.md)
+
+**REVISION of headline point 2.** "λ₁'s outcome signal is (almost) entirely settling-regime detection"
+was based on comparing raw AUCs and is too strong as a mediation claim. The proper control —
+AUC(−λ₁→correct) **within matched late-drift deciles** of the unsettled stratum — shows substantial
+independent signal (HRM: per-decile 0.97→0.69 from low to high drift, weighted mean 0.879;
+unconditioned within-unsettled 0.933). TRM official @58590 with a strict band τ shows the same
+qualitatively: unsettled-correct λ₁ ≈ +0.017 (n=141) vs unsettled-wrong +0.103 (n=64) at overlapping
+drift levels. Corrected statement: **λ₁ correlates strongly with settledness, but at matched drift
+level it still separates outcome** — drift and λ₁ are not redundant observables.
+
+**Difficulty control (#givens, crude proxy).** HRM n=8192: Spearman(correct, givens)=+0.28;
+Spearman(λ₁, givens)=−0.35 overall but −0.16/−0.18 within outcome. Within-givens-bin
+AUC(−λ₁→correct) = 0.976–0.987 (weighted 0.982, vs overall 0.984): at the #givens level, the
+FTLE-outcome separation is NOT a difficulty artifact. (Solver-backtrack difficulty not available
+offline; #givens is a weak proxy — flag for the writeup.)
+
+**Strict-B per-example (n=21).** All 21 have halted_at ∈ [4,9] (median 6) — under real ACT inference
+every one would have halted early, confidently (q_halt +7.4–7.5), and wrong (token_acc 0.41–0.88).
+The three lowest-token-acc cases are all 17-givens (minimum-clue) puzzles. Per-example table in
+followups.md; drift profiles indistinguishable from the A band (fig_hrm_strictB_profiles.png).
+
+**TRM official @58590 note.** 90% of its settled-correct examples are still descending at window end
+(slope median −0.147) — unlike singleGPU @260410 whose A-band is flat (~18.5/step). The "settled band"
+criterion is checkpoint-specific; cross-checkpoint comparisons must re-derive τ per dataset.
+
+## Addendum 2 (2026-06-12, n=2048 retest + early-window pairing; npz in retest/)
+
+Retest ran on GPU 0 (shared, 12h-fallback claim). Four diagnostics, seed 0, idx-paired.
+
+**1. TRM official @58590 (87.6%), full window, n=2048: B = 0 confirmed.**
+254 failures, none settled; the MINIMUM late-drift among wrong examples (log10 1.664 ≈ 46/step)
+exceeds the late-drift of 96.5% of correct examples — near-complete distribution separation,
+threshold-free. λ₁: wrong +0.103 / correct +0.012, AUC 0.993. Within-unsettled outcome AUC 0.848
+(C n=70) — residual signal confirmed at usable n.
+
+**2. HRM @26040, full window, joint estimator, n=2048: replicates diag_8k on a second estimator.**
+acc 0.526; A/B/C/D = 1020/14/57/957; λ₁(A) −0.152 vs λ₁(D) +0.032 (the email's −0.15/+0.04 scale);
+strict-band B n=5 (0.5% of failures) with λ₁ −0.141 ≈ A and q_halt +7.47 (selector-blind, replicated);
+unsettled within-decile AUC weighted 0.900 (was 0.879 on diag_8k).
+
+**3. Early-window (first 4 ACT steps) does NOT forecast eventual success among still-unsolved
+examples — and on HRM the dynamical signals point the OTHER way.**
+Unconditioned early AUCs are inflated by already-solved-at-4 examples (TRM 69.4% solved@4,
+HRM 34.5%). Restricted to not-yet-correct@4:
+
+| signal @ step 4 | TRM (n=626, 59.4% eventually correct) | HRM (n=1342, 27.6% eventually correct) |
+|---|---|---|
+| AUC(−λ₁_early → eventual correct) | 0.543 | **0.448** (reversed) |
+| AUC(−drift@4 → eventual correct) | 0.492 | **0.312** (reversed: MORE early movement ↔ eventual success, +dir AUC 0.688) |
+| AUC(q_halt@4 → eventual correct) | 0.521 | **0.734** |
+
+Observations: (i) the λ₁/outcome association is concurrent with the trajectory's fate (final window),
+not antecedent — early-window λ₁ has no forward predictive power at this granularity; (ii) on HRM the
+sign of the early association is inverted: among undecided examples, higher early drift (and
+marginally higher early λ₁) accompany eventual success; (iii) the one early signal with real forecast
+power is HRM's learned q_halt (0.734) — absent in TRM (0.521); factual architecture note: TRM removed
+HRM's Q-learning continue-head (BCE halt only). Window length 4 was chosen to match train-time; other
+horizons untested.
+
+**Consequences for writeup/claims:** "failure ↔ chaos" should be stated as an outcome-concurrent
+dynamical signature, not an early predictor; early-exit/reallocation applications are unsupported at
+this granularity (and sign-reversed on HRM); the within-stratum independence result (drift-matched
+AUC 0.88-0.90) plus the difficulty control (Addendum 1) remain the strongest positive claims.
+
+## Caveats
+
+- n(B)=21 and n(C)=11/195 are small; per-cell statements about those cells are low-precision.
+- "Settled" is a relative (band) criterion; both A-bands have nonzero characteristic residual velocity.
+- exact_correct is evaluated at ACT step 16 under fixed unroll; post-window corruption or recovery
+  (Ren & Liu's fixed-point violation) is not observable in these arrays.
+- λ window = full trajectory (336 sub-updates TRM / 16 segments HRM); the early-window
+  (length-matched) version of this analysis is NOT covered by existing npz files — the `_short`
+  diagnose scripts exist (4 ACT steps) but produced no saved npz we could find. Open follow-up.
+- diag_8k (HRM) and the HRM evolution series use different estimator normalizations; only
+  within-file comparisons are reported above.
author	YurenHao0426 <blackhao0426@gmail.com>	2026-06-13 12:35:36 -0500
committer	YurenHao0426 <blackhao0426@gmail.com>	2026-06-13 12:35:36 -0500
commit	66e0d8b9fd4d0f7a2231d689c055e26fdf1cf04a (patch)
tree	c29cba61124018755a19b02c9d33e3ad5f2e05cc /research/flossing/analysis_2x2/OBSERVATIONS.md