README.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

# Recursive Reasoning Dynamics

Analysis of the inference-trajectory dynamics of recursive reasoning models (HRM, TRM) on
Sudoku-Extreme and Maze-Hard. Central finding and its honest deflation:

- Along the recurrent inference trajectory, **genuine failures are more chaotic** (higher
  finite-time Lyapunov exponents / latent drift) than successes, in the *same* trained network —
  task-general (Sudoku; Maze under a validity/connectivity criterion, not exact-match).
- This signal **reduces to convergence / answer-completeness detection** (the FTLE separation is
  recoverable from late drift + the model's own halting confidence; leading + full Ginelli CLV
  geometry add nothing beyond it). On unique-solution tasks completeness aliases onto correctness.
- Open: the *mechanism* of the non-convergence (multistability vs transient chaos vs input-induced) —
  the current line of work (see `paper/readiness.md`, `extend_rollout.py`, `toy/`).

This repo is the **analysis half** (diagnostics, experiments, paper drafts, toy models). The model
code (TRM/HRM forks with trajectory-augmentation training) lives in the companion `rrm` repo;
heavy artifacts (checkpoints, npz, wandb, figures) are not committed.

## Entry points
- `paper/` — `readiness.md` (live status + all results), `claims.md` (claim/evidence/counter table),
  `intro.md`, `setup_results.md`, `style_contract.md`, `experiment_framework.md`.
- `analysis_2x2/` — the 2×2 (settling × correctness) decomposition, characterization, CLV reducibility,
  checkpoint-evolution, connectivity-labeled Maze analysis. Result `.md`/`.csv` kept; npz/png ignored.
- `diagnose_{trm,hrm}_joint*.py` — per-example FTLE (JVP+QR), drift, CLV estimators. `_maze` variant
  forces the math SDP backend (FlashAttention lacks the JVP double-backward); `_clv` adds Ginelli CLVs.
- `extend_rollout.py` — long-rollout probe (transient vs multistable vs persistent non-convergence).
- `toy/` — minimal analytically-grounded toy models reproducing "failure = more chaotic".
- `maze_pred_dump.py` — cheap forward dump (connectivity + drift, no JVP) for Maze validity labeling.

## Estimator validation
`paper/validation/validate_le_estimator.py` — the QR/Benettin FTLE core recovers known spectra
(diagonal/symmetric/non-normal linear maps, Hénon λ₁) to <1e-3.