summaryrefslogtreecommitdiff
path: root/research/flossing/paper/intro.md
diff options
context:
space:
mode:
Diffstat (limited to 'research/flossing/paper/intro.md')
-rw-r--r--research/flossing/paper/intro.md57
1 files changed, 57 insertions, 0 deletions
diff --git a/research/flossing/paper/intro.md b/research/flossing/paper/intro.md
new file mode 100644
index 0000000..85f06e7
--- /dev/null
+++ b/research/flossing/paper/intro.md
@@ -0,0 +1,57 @@
+# Recursive Reasoning Models Fail by Wandering, Not by Settling
+
+## 1 Introduction
+
+Recursive reasoning models such as the Hierarchical Reasoning Model (HRM; Wang et al., 2025)
+and the Tiny Recursive Model (TRM; Jolicoeur-Martineau, 2025) solve constraint-satisfaction
+puzzles that defeat far larger language models, by iterating a small network on a latent state
+for hundreds of updates per puzzle. When such a model fails, what is dynamically different
+about the trajectory it produced? Two recent mechanistic studies answer in attractor language.
+Failed TRM runs "plateau at stable high-loss attractors" (Efstathiou & Balwani, 2026); failed
+HRM runs converge to spurious fixed points that rival the correct one (Ren & Liu, 2026). The
+evidence behind both labels is indirect, resting on loss plateaus and two-dimensional
+projections of 512-dimensional trajectories, and the labels disagree about the basic character
+of failure: premature stability in one account, partly aimless drift in the other. Neither
+measures the trajectory's stability directly. We do, per example, and the measurements support
+a third description: recursive reasoning models fail by wandering, not by settling.
+
+Across 2,048 to 8,192 held-out Sudoku-Extreme puzzles, correct trajectories end inside a
+narrow low-velocity band of the latent dynamics, and failures essentially never do. In an
+official-recipe TRM at 87.6% test accuracy, none of 254 failures settles: the least mobile
+failure still moves faster at the end of inference than 96.5% of successes, a separation of
+distributions that no threshold choice can undo, and failed trajectories remain locally
+expansive throughout (median leading finite-time Lyapunov exponent λ₁ = +0.103, against +0.012
+for successes; AUC 0.993). HRM shows the same structure with one addition. Settled-but-wrong
+trajectories exist, but they account for 0.55% of failures, carry success-like contraction
+(λ₁ = −0.84, against −0.87 for settled successes) and success-like halting confidence, and
+every one of them would have halted early under adaptive computation. The wrong-attractor
+failure mode is real, rare, and the only failure a confidence-based selector cannot catch.
+
+Two controls locate what the Lyapunov signature adds, and a third experiment locates when it
+exists. Matched for displacement level within the unsettled population, λ₁ still separates
+eventual successes from failures (decile-matched AUC 0.88–0.90), so the exponent does more
+than restate non-convergence. Binned by the number of givens, the separation is unchanged
+(within-bin AUC 0.982, against 0.984 unconditioned), so it is not an artifact of problem
+difficulty. It is, however, strictly retrospective. Restricted to puzzles still unsolved after
+four of sixteen segments, neither early-window exponents nor early state velocity predicts
+which trajectories will eventually succeed (AUC ≈ 0.5 in TRM), and in HRM the association
+inverts — among the undecided, the trajectories that move more in the early segments are the
+ones that go on to solve the puzzle (positive-direction AUC 0.69). The chaos of failure
+arrives with the failure; nothing dynamical in the early trajectory anticipates it.
+
+These measurements redraw the intervention map for this model class. Because failure is almost
+never a stable wrong answer, restart-and-select inference strategies have a high ceiling and a
+quantifiable blind spot of roughly half a percent. Because the early trajectory carries no
+dynamical death sentence, compute spent on early failure prediction is compute wasted, and
+restart diversity is the better buy. Our contributions: (i) per-example, outcome-conditioned
+measurement of settling and finite-time Lyapunov spectra in HRM and TRM, at sample sizes up to
+8,192 and replicated across two estimator implementations; (ii) a decomposition of failure
+that corrects the settled-attractor reading and bounds the wrong-attractor mode at ~0.5% of
+failures; (iii) controls showing the signature is not reducible to non-convergence or
+difficulty; (iv) evidence that the signature is concurrent with the outcome and carries no
+early-warning content at the granularity tested.
+
+---
+*[em-dash count: 1. Contrast-template count: title + one echo (end of ¶1). Flourish count:
+1 ("death sentence", ¶4) — cuttable. "essentially never" is the one hedge in ¶2, scoped by
+the 0.55% in the next sentence.]*