# SRM (Stable Recursion Model) — Codex Design Synthesis Returned 2026-05-22 by codex-rescue. ## Core insight: target mild contraction, not aggressive Empirical λ_1(success) ≈ -0.15 → effective gain ≈ exp(-0.15) ≈ **0.86**. Empirical λ_1(failure) ≈ +0.04 → gain ≈ **1.04**. **Target κ ∈ (0.85, 0.95)**, NOT 0.3-0.5. Over-contraction kills constraint propagation in Sudoku. ## Architectural sketch State: `z = (h, ℓ) ∈ R^{d_H + d_L}`, weighted norm `‖z‖_P² = ‖h‖² + η·‖ℓ‖²`. Joint feature map `ψ_θ(z, x)` via **Sandwich Layers** (Wang & Manchester 2023) constrained `Lip_P(ψ) ≤ 1`. Block gain operator: ``` A = [[a_HH·I, a_HL·U_HL], [a_LH·U_LH, a_LL·I]] ``` where `U_HL, U_LH` orthogonal, and gains satisfy block-row-sum under weighted metric: - `a_HH + √η · a_HL ≤ κ` - `a_LL + η^{-1/2} · a_LH ≤ κ` with `κ ∈ (0.85, 0.95)`. Update rule: ``` z_{t+1} = (1-α) z_t + α · A · ψ_θ(z_t, x) + b(x) ``` ⇒ `Lip_P(T) ≤ (1-α) + α·κ < 1` by construction. With `α=1, κ=0.86`: λ_1 ≤ log(0.86) ≈ -0.15 — exactly matches empirical success regime. ## Key methodological corrections vs my initial sketch 1. **Constrain JOINT operator, not individual blocks**. HRM got this wrong: stable H and stable L don't imply stable joint due to cross-coupling J_HL, J_LH. Block-row-sum bound under weighted metric is the right translation of CF's empirical signal. 2. **Use tied-time but single joint operator**: TRM's weight-tying across iterations is good (turns it into iterative solver). But fold H and L into one joint operator (unlike HRM's separate modules) to enforce shared contraction metric. 3. **Damping alone isn't sufficient**: `z + β·f(z)` only contracts if f is already Lipschitz-bounded. Damping is for margin, not the main guarantee. ## Failure modes to watch 1. **Over-contraction**: κ too low → constraint propagation collapses → underperforms TRM 2. **Fake certification**: approximate spectral norm leaves hidden expansion directions; use exact Sandwich parameterization 3. **Cross-coupling starvation**: `a_HL, a_LH → 0` → decoupled two-state system loses reasoning capacity (need lower bounds on coupling gains too?) ## Literature to anchor Primary: - **Sandwich Layers** (Wang & Manchester 2023) — exact Lipschitz parameterization - **Deep Equilibrium Models** (Bai et al.) — for the fixed-point formulation Secondary (conceptual): - Lipschitz RNN (Erichson 2021) - AntisymmetricRNN (Chang 2019) - (CoRNN less relevant; oscillatory not the right inductive bias for Sudoku) ## Implementation path 1. Replace HRM/TRM's L_level/H_level with a **single tied joint operator** on (z_H, z_L) 2. Implement Sandwich layer ψ with `Lip ≤ 1` 3. Parameterize block gain matrix A with constraint `a_HH + √η·a_HL ≤ κ`, `a_LL + η^{-1/2}·a_LH ≤ κ` 4. α as learnable sigmoid (margin), κ as hyperparameter or learnable bounded < 1 5. Sweep κ over {0.85, 0.90, 0.95} to find expressivity sweet spot