diff options
| author | YurenHao0426 <blackhao0426@gmail.com> | 2026-06-13 12:35:36 -0500 |
|---|---|---|
| committer | YurenHao0426 <blackhao0426@gmail.com> | 2026-06-13 12:35:36 -0500 |
| commit | 66e0d8b9fd4d0f7a2231d689c055e26fdf1cf04a (patch) | |
| tree | c29cba61124018755a19b02c9d33e3ad5f2e05cc /research/flossing/paper/sample_intro.md | |
Curated export for clone-and-run Maze training (2x A6000) + diagnostics.
trm/hrm pretrain.py carry trajectory-augmentation code (backward-compatible).
Heavy artifacts (checkpoints/wandb/npz) gitignored; see PROVENANCE.md.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Diffstat (limited to 'research/flossing/paper/sample_intro.md')
| -rw-r--r-- | research/flossing/paper/sample_intro.md | 49 |
1 files changed, 49 insertions, 0 deletions
diff --git a/research/flossing/paper/sample_intro.md b/research/flossing/paper/sample_intro.md new file mode 100644 index 0000000..183faa4 --- /dev/null +++ b/research/flossing/paper/sample_intro.md @@ -0,0 +1,49 @@ +# Sample section: Introduction (taste-calibration draft) + +Recursive reasoning models solve constraint-satisfaction problems that defeat much larger +language models by iterating a small network on a latent state — up to several hundred state +updates per puzzle in the Hierarchical Reasoning Model (HRM) and the Tiny Recursive Model +(TRM). When such a model fails, what is dynamically different about the trajectory it +produced? Recent mechanistic studies have answered with attractor language: failed runs +"plateau at stable high-loss attractors" (Efstathiou & Balwani, 2026), or converge to spurious +fixed points that rival the correct one (Ren & Liu, 2026). These accounts rest on indirect +evidence — loss plateaus, two-dimensional projections of 512-dimensional trajectories — and +the two papers do not agree: one describes failure as premature stability, the other partly as +wandering. Neither measures stability itself. + +We measure it directly. For every test puzzle we record two per-example quantities along the +full 16-segment inference trajectory: the finite-time Lyapunov spectrum of the joint latent +dynamics, and the per-segment state displacement. Conditioning these on outcome over 2,048 to +8,192 puzzles per model yields a complete decomposition of failure for HRM (52.6% accuracy) +and an official-recipe TRM (87.6%), and the decomposition contradicts the settled-attractor +picture. Correct trajectories enter a narrow low-velocity band and stay in it; failed +trajectories never do. In TRM, not one of 254 failures settles — the least mobile failure still +moves faster at the end of inference than 96.5% of successes — while remaining locally +expansive (median λ₁ = +0.103 versus +0.012 for successes; AUC 0.993). In HRM, settled-but-wrong +trajectories exist but account for 0.55% of failures; the other 99.45% wander. Failure in these +models is not a wrong attractor. It is the sustained absence of settling. + +Two controls sharpen what the Lyapunov signature adds. Matched for displacement level within +the unsettled population, λ₁ still separates eventual successes from failures (decile-matched +AUC 0.88–0.90), so the exponent is not merely re-measuring non-convergence; and binning by +puzzle givens leaves the separation intact (within-bin AUC 0.982 versus 0.984 overall), so it +is not a difficulty artifact. The signature is, however, strictly retrospective. Restricted to +puzzles still unsolved after four segments, nothing dynamical about those first four segments +predicts which will eventually be solved: AUC ≈ 0.5 in TRM for exponent, displacement, and +halting confidence alike — and in HRM the association inverts, with eventual successes moving +*more* in the early trajectory than eventual failures (AUC 0.69 in the positive direction). +The chaos of failure is concurrent with the outcome, not an omen visible at the start. + +These measurements reframe both the diagnosis and the levers. Because failure is almost never +a stable wrong answer, selection-based inference strategies have a high ceiling — final-step +halting confidence tracks correctness on all but the ~0.5% of failures that settle confidently +— and because the early trajectory carries no dynamical death sentence, compute is better +spent on restarts than on early pruning. We quantify both points, correct the published +attractor labels they depend on, and release the per-example measurement tooling. + +--- +*[Style notes for review, not part of the draft: (1) every paragraph opens with a finding or a +question, none with "In recent years"; (2) the two prior papers are quoted precisely and +credited for what their data shows before the correction is made; (3) hedges appear only where +the claim table concedes (e.g., "almost never", "~0.5%"); (4) the one rhetorical flourish — +"not an omen" — is load-bearing; cut it if it reads as flavor.]* |
