diff options
Diffstat (limited to 'research/flossing/paper/intro.md')
| -rw-r--r-- | research/flossing/paper/intro.md | 57 |
1 files changed, 57 insertions, 0 deletions
diff --git a/research/flossing/paper/intro.md b/research/flossing/paper/intro.md new file mode 100644 index 0000000..85f06e7 --- /dev/null +++ b/research/flossing/paper/intro.md @@ -0,0 +1,57 @@ +# Recursive Reasoning Models Fail by Wandering, Not by Settling + +## 1 Introduction + +Recursive reasoning models such as the Hierarchical Reasoning Model (HRM; Wang et al., 2025) +and the Tiny Recursive Model (TRM; Jolicoeur-Martineau, 2025) solve constraint-satisfaction +puzzles that defeat far larger language models, by iterating a small network on a latent state +for hundreds of updates per puzzle. When such a model fails, what is dynamically different +about the trajectory it produced? Two recent mechanistic studies answer in attractor language. +Failed TRM runs "plateau at stable high-loss attractors" (Efstathiou & Balwani, 2026); failed +HRM runs converge to spurious fixed points that rival the correct one (Ren & Liu, 2026). The +evidence behind both labels is indirect, resting on loss plateaus and two-dimensional +projections of 512-dimensional trajectories, and the labels disagree about the basic character +of failure: premature stability in one account, partly aimless drift in the other. Neither +measures the trajectory's stability directly. We do, per example, and the measurements support +a third description: recursive reasoning models fail by wandering, not by settling. + +Across 2,048 to 8,192 held-out Sudoku-Extreme puzzles, correct trajectories end inside a +narrow low-velocity band of the latent dynamics, and failures essentially never do. In an +official-recipe TRM at 87.6% test accuracy, none of 254 failures settles: the least mobile +failure still moves faster at the end of inference than 96.5% of successes, a separation of +distributions that no threshold choice can undo, and failed trajectories remain locally +expansive throughout (median leading finite-time Lyapunov exponent λ₁ = +0.103, against +0.012 +for successes; AUC 0.993). HRM shows the same structure with one addition. Settled-but-wrong +trajectories exist, but they account for 0.55% of failures, carry success-like contraction +(λ₁ = −0.84, against −0.87 for settled successes) and success-like halting confidence, and +every one of them would have halted early under adaptive computation. The wrong-attractor +failure mode is real, rare, and the only failure a confidence-based selector cannot catch. + +Two controls locate what the Lyapunov signature adds, and a third experiment locates when it +exists. Matched for displacement level within the unsettled population, λ₁ still separates +eventual successes from failures (decile-matched AUC 0.88–0.90), so the exponent does more +than restate non-convergence. Binned by the number of givens, the separation is unchanged +(within-bin AUC 0.982, against 0.984 unconditioned), so it is not an artifact of problem +difficulty. It is, however, strictly retrospective. Restricted to puzzles still unsolved after +four of sixteen segments, neither early-window exponents nor early state velocity predicts +which trajectories will eventually succeed (AUC ≈ 0.5 in TRM), and in HRM the association +inverts — among the undecided, the trajectories that move more in the early segments are the +ones that go on to solve the puzzle (positive-direction AUC 0.69). The chaos of failure +arrives with the failure; nothing dynamical in the early trajectory anticipates it. + +These measurements redraw the intervention map for this model class. Because failure is almost +never a stable wrong answer, restart-and-select inference strategies have a high ceiling and a +quantifiable blind spot of roughly half a percent. Because the early trajectory carries no +dynamical death sentence, compute spent on early failure prediction is compute wasted, and +restart diversity is the better buy. Our contributions: (i) per-example, outcome-conditioned +measurement of settling and finite-time Lyapunov spectra in HRM and TRM, at sample sizes up to +8,192 and replicated across two estimator implementations; (ii) a decomposition of failure +that corrects the settled-attractor reading and bounds the wrong-attractor mode at ~0.5% of +failures; (iii) controls showing the signature is not reducible to non-convergence or +difficulty; (iv) evidence that the signature is concurrent with the outcome and carries no +early-warning content at the granularity tested. + +--- +*[em-dash count: 1. Contrast-template count: title + one echo (end of ¶1). Flourish count: +1 ("death sentence", ¶4) — cuttable. "essentially never" is the one hedge in ¶2, scoped by +the 0.55% in the next sentence.]* |
