diff options
| author | YurenHao0426 <blackhao0426@gmail.com> | 2026-06-29 12:15:51 -0500 |
|---|---|---|
| committer | YurenHao0426 <blackhao0426@gmail.com> | 2026-06-29 12:15:51 -0500 |
| commit | a6ec4288a2232988b130b2f00bb2565f81706966 (patch) | |
| tree | 1bb86e7f0b899b823b9e7fdf383e832d30a181e0 /README.md | |
Recursive reasoning dynamics: analysis pipeline, paper drafts, toy models
Failure=more-chaotic (task-general under validity labeling) reduces to convergence/completeness
detection; mechanism (transient chaos vs multistability vs input-induced) under investigation.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Diffstat (limited to 'README.md')
| -rw-r--r-- | README.md | 32 |
1 files changed, 32 insertions, 0 deletions
diff --git a/README.md b/README.md new file mode 100644 index 0000000..9b9d3cb --- /dev/null +++ b/README.md @@ -0,0 +1,32 @@ +# Recursive Reasoning Dynamics + +Analysis of the inference-trajectory dynamics of recursive reasoning models (HRM, TRM) on +Sudoku-Extreme and Maze-Hard. Central finding and its honest deflation: + +- Along the recurrent inference trajectory, **genuine failures are more chaotic** (higher + finite-time Lyapunov exponents / latent drift) than successes, in the *same* trained network — + task-general (Sudoku; Maze under a validity/connectivity criterion, not exact-match). +- This signal **reduces to convergence / answer-completeness detection** (the FTLE separation is + recoverable from late drift + the model's own halting confidence; leading + full Ginelli CLV + geometry add nothing beyond it). On unique-solution tasks completeness aliases onto correctness. +- Open: the *mechanism* of the non-convergence (multistability vs transient chaos vs input-induced) — + the current line of work (see `paper/readiness.md`, `extend_rollout.py`, `toy/`). + +This repo is the **analysis half** (diagnostics, experiments, paper drafts, toy models). The model +code (TRM/HRM forks with trajectory-augmentation training) lives in the companion `rrm` repo; +heavy artifacts (checkpoints, npz, wandb, figures) are not committed. + +## Entry points +- `paper/` — `readiness.md` (live status + all results), `claims.md` (claim/evidence/counter table), + `intro.md`, `setup_results.md`, `style_contract.md`, `experiment_framework.md`. +- `analysis_2x2/` — the 2×2 (settling × correctness) decomposition, characterization, CLV reducibility, + checkpoint-evolution, connectivity-labeled Maze analysis. Result `.md`/`.csv` kept; npz/png ignored. +- `diagnose_{trm,hrm}_joint*.py` — per-example FTLE (JVP+QR), drift, CLV estimators. `_maze` variant + forces the math SDP backend (FlashAttention lacks the JVP double-backward); `_clv` adds Ginelli CLVs. +- `extend_rollout.py` — long-rollout probe (transient vs multistable vs persistent non-convergence). +- `toy/` — minimal analytically-grounded toy models reproducing "failure = more chaotic". +- `maze_pred_dump.py` — cheap forward dump (connectivity + drift, no JVP) for Maze validity labeling. + +## Estimator validation +`paper/validation/validate_le_estimator.py` — the QR/Benettin FTLE core recovers known spectra +(diagonal/symmetric/non-normal linear maps, Hénon λ₁) to <1e-3. |
