# Sample section: Introduction (taste-calibration draft) Recursive reasoning models solve constraint-satisfaction problems that defeat much larger language models by iterating a small network on a latent state — up to several hundred state updates per puzzle in the Hierarchical Reasoning Model (HRM) and the Tiny Recursive Model (TRM). When such a model fails, what is dynamically different about the trajectory it produced? Recent mechanistic studies have answered with attractor language: failed runs "plateau at stable high-loss attractors" (Efstathiou & Balwani, 2026), or converge to spurious fixed points that rival the correct one (Ren & Liu, 2026). These accounts rest on indirect evidence — loss plateaus, two-dimensional projections of 512-dimensional trajectories — and the two papers do not agree: one describes failure as premature stability, the other partly as wandering. Neither measures stability itself. We measure it directly. For every test puzzle we record two per-example quantities along the full 16-segment inference trajectory: the finite-time Lyapunov spectrum of the joint latent dynamics, and the per-segment state displacement. Conditioning these on outcome over 2,048 to 8,192 puzzles per model yields a complete decomposition of failure for HRM (52.6% accuracy) and an official-recipe TRM (87.6%), and the decomposition contradicts the settled-attractor picture. Correct trajectories enter a narrow low-velocity band and stay in it; failed trajectories never do. In TRM, not one of 254 failures settles — the least mobile failure still moves faster at the end of inference than 96.5% of successes — while remaining locally expansive (median λ₁ = +0.103 versus +0.012 for successes; AUC 0.993). In HRM, settled-but-wrong trajectories exist but account for 0.55% of failures; the other 99.45% wander. Failure in these models is not a wrong attractor. It is the sustained absence of settling. Two controls sharpen what the Lyapunov signature adds. Matched for displacement level within the unsettled population, λ₁ still separates eventual successes from failures (decile-matched AUC 0.88–0.90), so the exponent is not merely re-measuring non-convergence; and binning by puzzle givens leaves the separation intact (within-bin AUC 0.982 versus 0.984 overall), so it is not a difficulty artifact. The signature is, however, strictly retrospective. Restricted to puzzles still unsolved after four segments, nothing dynamical about those first four segments predicts which will eventually be solved: AUC ≈ 0.5 in TRM for exponent, displacement, and halting confidence alike — and in HRM the association inverts, with eventual successes moving *more* in the early trajectory than eventual failures (AUC 0.69 in the positive direction). The chaos of failure is concurrent with the outcome, not an omen visible at the start. These measurements reframe both the diagnosis and the levers. Because failure is almost never a stable wrong answer, selection-based inference strategies have a high ceiling — final-step halting confidence tracks correctness on all but the ~0.5% of failures that settle confidently — and because the early trajectory carries no dynamical death sentence, compute is better spent on restarts than on early pruning. We quantify both points, correct the published attractor labels they depend on, and release the per-example measurement tooling. --- *[Style notes for review, not part of the draft: (1) every paragraph opens with a finding or a question, none with "In recent years"; (2) the two prior papers are quoted precisely and credited for what their data shows before the correction is made; (3) hedges appear only where the claim table concedes (e.g., "almost never", "~0.5%"); (4) the one rhetorical flourish — "not an omen" — is load-bearing; cut it if it reads as flavor.]*