# 2 Setup **Models and task.** We study two trained recursive reasoners on Sudoku-Extreme with the 1k×1000-augmentation training set: HRM (27M parameters; checkpoint at step 26,040; 52.6% exact accuracy on our evaluation samples) and TRM-MLP trained with the official recipe (5M; global batch size 768; checkpoint at step 58,590, the best of its run; 86.9% on the full test set, 87.6% on our n=2,048 sample). Inference runs a fixed 16-segment unroll; the adaptive-computation halting signal (q_halt) is recorded at every segment but not applied, so every trajectory is observed for the full budget. Answers are decoded at segment 16. **Per-example measurements.** Along each trajectory we record three families of quantities. First, the leading k=8 finite-time Lyapunov exponents of the joint latent dynamics: tangent vectors in the concatenated (z_H, z_L) space are propagated with Jacobian-vector products through every state update and re-orthonormalized by QR at each sub-step; λ_i is the time-average of the log diagonal of R over the full trajectory (336 sub-updates for TRM, 64 for HRM). Second, the per-segment state displacement ‖z^{(t)} − z^{(t−1)}‖ for z_H and z_L separately ("drift"). Third, q_halt, exact correctness, and token accuracy. Exponent values are comparable only within an estimator implementation; we replicate the HRM analysis under a second, earlier implementation (n=8,192) and report its scale separately. **The settling criterion.** Late drift — the mean z_H displacement over the final four segments — is bimodal in log scale for every checkpoint we examine: a narrow low-velocity band (characteristic residual velocity 0.96 per segment for HRM, 18.5 for TRM, interquartile width under 10%) separated from a high-velocity mode by one to two orders of magnitude. We call a trajectory *settled* if its late drift falls in the low band. Thresholds are set by Otsu's method on the pooled log distribution; every result below is reported with a full percentile sweep, and the headline TRM result is threshold-free. Settled is a band property, not a fixed point: both bands have nonzero characteristic velocity. **Design.** Crossing the settling criterion with answer correctness yields four cells: settled-correct (A), settled-wrong (B), unsettled-correct (C), unsettled-wrong (D). The analysis asks three questions. How is failure distributed over B versus D? What does λ₁ add beyond the settling split? And when along the trajectory does the discriminative signal exist? # 3 Results ## 3.1 Failure is wandering: the 2×2 decomposition At the end of inference, success and failure occupy different dynamical regimes almost without exception (Table 1). In TRM, 254 of 2,048 puzzles are answered incorrectly and none of them is settled: the minimum late drift among failures (log₁₀ = 1.66, ≈46 per segment) exceeds the late drift of 96.5% of successes, so no threshold assignment can place a failure in the settled band. Failed trajectories also remain locally expansive over the full window (median λ₁ = +0.103, IQR +0.094 to +0.111) while settled successes sit at the edge of contraction (+0.011). The same decomposition at a mid-training checkpoint, and across a ten-checkpoint series, shows the settled-wrong cell empty from 20% of training onward. HRM adds the one exception, and it is small. At the strict band threshold, 21 of 3,894 failures (0.55%; n=8,192) end settled; the replication under the second estimator gives 5 of 971 (0.5%; n=2,048). These settled-wrong trajectories are dynamically indistinguishable from successes: λ₁ median −0.842 against −0.867 for settled-correct, drift profiles inside the A band from segment ~4 onward (Figure 2), and final halting confidence identical to successes (median q_halt +7.47 in both cells, against −9.6 for wandering failures). All 21 crossed the halting threshold between segments 4 and 9; under adaptive computation each would have stopped early, confident and wrong. Their token accuracy spans 0.41–0.88, and the three least accurate are all 17-givens (minimum-clue) puzzles. This cell is the wrong-attractor mode of Ren & Liu (2026), measured: it exists, it carries exactly the contraction signature their account predicts, and it is two orders of magnitude less common than wandering. The unsettled-correct cell (C) is the mirror curiosity: 3–7% of successes are still moving at segment 16 (70 of 1,794 in TRM; 57 of 1,077 in HRM), with halting confidence as high as settled successes. Their existence shows the decode head can read a correct answer off a moving state; we do not observe what happens to them past the window. ## 3.2 What the exponent is not measuring The λ₁ separation is not a restatement of the settling split. Within the unsettled population, where every trajectory is still moving, λ₁ ranks eventual successes above failures inside narrow displacement bands: splitting unsettled HRM trajectories into late-drift deciles (decile width ≤0.04 log units over most of the range) gives within-decile AUC from 0.97 at low drift to 0.69 at the highest decile, weighted mean 0.879 (n=8,192); the second estimator gives 0.900 (n=2,048). A trajectory's expansion rate carries outcome information beyond how fast it is moving. The separation is also not a difficulty artifact, at least not at the resolution of clue count. Accuracy varies strongly with the number of givens (Spearman +0.28), and λ₁ is itself difficulty-correlated (−0.35 overall, −0.16/−0.18 within outcome classes), yet conditioning removes nothing: within givens bins, AUC(−λ₁ → correct) is 0.976–0.987 (weighted 0.982) against 0.984 unconditioned. Givens count is a coarse proxy — solver backtrack counts would be the sharper control — but at this resolution the dynamical signature is orthogonal to how hard the puzzle is. ## 3.3 When the signal exists: concurrent, with no early warning The discriminative power of the dynamics is a property of the realized trajectory, and it is absent at the start. We re-measured both models over only the first four segments (idx-paired with the full-window runs, same sampling) and asked whether anything visible by segment 4 forecasts the final outcome. Unconditioned, early-window λ₁ appears predictive (AUC 0.89 TRM, 0.73 HRM), but the appearance is inherited from puzzles already solved by segment 4 (69% of TRM successes, 34% of HRM's). Restricted to the decision-relevant population — puzzles not yet correct at segment 4 — prediction collapses (Table 3). In TRM (n=626, of which 59% eventually succeed), AUC is 0.543 for early λ₁, 0.492 for early drift, 0.521 for early halting confidence. In HRM (n=1,342, 28% eventually succeed) the dynamical associations invert: eventual successes have marginally higher early λ₁ (reverse AUC 0.448) and substantially higher early displacement (AUC 0.688 in the positive direction). Among undecided HRM trajectories, the ones still moving vigorously are the ones that go on to solve the puzzle. One early signal does carry information, and it is learned, not dynamical: HRM's q_halt at segment 4 predicts eventual success at AUC 0.734. TRM's does not (0.521); TRM's training removes HRM's Q-learning continue head in favor of a binary halt loss, a difference we note without interpreting. Window length is the untested variable here: four segments matches the deep-supervision horizon, and we have not yet swept longer prefixes. ## 3.4 Training widens the gap from the failure side Over the TRM training series (ten checkpoints, 512 puzzles each), λ₁ of wandering failures rises monotonically from +0.036 to +0.102 while λ₁ of settled successes stays within ±0.03 of zero; the settled-wrong cell empties by step 52k and stays empty. The outcome separation grows over training because the failures become more expansive, while the success regime barely moves. HRM's series shows a mass migration instead: at early checkpoints nearly all trajectories are low-drift and wrong (the model barely updates state), this cell drains through mid-training into high-drift wandering, and accuracy growth then tracks transfer from wandering into the settled-correct band. A preliminary intervention probe is consistent with the decomposition. HRM checkpoints trained with multi-rollout initial-state perturbation (K=4, log-uniform noise) shrink the wandering- failure cell at matched steps relative to an ordinary baseline (D: 274→175 at step 20,832 and 247→176 at 23,436, accuracy +0.20 and +0.15), with surviving failures more expansive, and the known late-run collapse of this variant coincides with the settled band itself destabilizing (λ₁ of settled successes flipping to +0.04). The comparison baseline differs in training objective (ACT-streaming versus fixed unroll), so we report this as directional evidence pending a matched-objective control. --- *[Section pass notes: em-dash count 2 (§3.2, §3.3 one each). Contrast-template count: 0 (budget spent in title/intro). Flourish count: 1 ("mirror curiosity", §3.1) — cuttable. Tables referenced: T1 = 2×2 cells both models; T2 = decile + givens AUCs; T3 = early-window restricted AUCs. All numbers traceable to analysis_2x2/OBSERVATIONS.md (+ addenda) and offline_followups/followups.md.]*