research/flossing/paper/sample_intro.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49

# Sample section: Introduction (taste-calibration draft)

Recursive reasoning models solve constraint-satisfaction problems that defeat much larger
language models by iterating a small network on a latent state — up to several hundred state
updates per puzzle in the Hierarchical Reasoning Model (HRM) and the Tiny Recursive Model
(TRM). When such a model fails, what is dynamically different about the trajectory it
produced? Recent mechanistic studies have answered with attractor language: failed runs
"plateau at stable high-loss attractors" (Efstathiou & Balwani, 2026), or converge to spurious
fixed points that rival the correct one (Ren & Liu, 2026). These accounts rest on indirect
evidence — loss plateaus, two-dimensional projections of 512-dimensional trajectories — and
the two papers do not agree: one describes failure as premature stability, the other partly as
wandering. Neither measures stability itself.

We measure it directly. For every test puzzle we record two per-example quantities along the
full 16-segment inference trajectory: the finite-time Lyapunov spectrum of the joint latent
dynamics, and the per-segment state displacement. Conditioning these on outcome over 2,048 to
8,192 puzzles per model yields a complete decomposition of failure for HRM (52.6% accuracy)
and an official-recipe TRM (87.6%), and the decomposition contradicts the settled-attractor
picture. Correct trajectories enter a narrow low-velocity band and stay in it; failed
trajectories never do. In TRM, not one of 254 failures settles — the least mobile failure still
moves faster at the end of inference than 96.5% of successes — while remaining locally
expansive (median λ₁ = +0.103 versus +0.012 for successes; AUC 0.993). In HRM, settled-but-wrong
trajectories exist but account for 0.55% of failures; the other 99.45% wander. Failure in these
models is not a wrong attractor. It is the sustained absence of settling.

Two controls sharpen what the Lyapunov signature adds. Matched for displacement level within
the unsettled population, λ₁ still separates eventual successes from failures (decile-matched
AUC 0.88–0.90), so the exponent is not merely re-measuring non-convergence; and binning by
puzzle givens leaves the separation intact (within-bin AUC 0.982 versus 0.984 overall), so it
is not a difficulty artifact. The signature is, however, strictly retrospective. Restricted to
puzzles still unsolved after four segments, nothing dynamical about those first four segments
predicts which will eventually be solved: AUC ≈ 0.5 in TRM for exponent, displacement, and
halting confidence alike — and in HRM the association inverts, with eventual successes moving
*more* in the early trajectory than eventual failures (AUC 0.69 in the positive direction).
The chaos of failure is concurrent with the outcome, not an omen visible at the start.

These measurements reframe both the diagnosis and the levers. Because failure is almost never
a stable wrong answer, selection-based inference strategies have a high ceiling — final-step
halting confidence tracks correctness on all but the ~0.5% of failures that settle confidently
— and because the early trajectory carries no dynamical death sentence, compute is better
spent on restarts than on early pruning. We quantify both points, correct the published
attractor labels they depend on, and release the per-example measurement tooling.

---
*[Style notes for review, not part of the draft: (1) every paragraph opens with a finding or a
question, none with "In recent years"; (2) the two prior papers are quoted precisely and
credited for what their data shows before the correction is made; (3) hedges appear only where
the claim table concedes (e.g., "almost never", "~0.5%"); (4) the one rhetorical flourish —
"not an omen" — is load-bearing; cut it if it reads as flavor.]*