diff options
Diffstat (limited to 'research/flossing/srm_design_codex.md')
| -rw-r--r-- | research/flossing/srm_design_codex.md | 69 |
1 files changed, 69 insertions, 0 deletions
diff --git a/research/flossing/srm_design_codex.md b/research/flossing/srm_design_codex.md new file mode 100644 index 0000000..6d5a167 --- /dev/null +++ b/research/flossing/srm_design_codex.md @@ -0,0 +1,69 @@ +# SRM (Stable Recursion Model) — Codex Design Synthesis + +Returned 2026-05-22 by codex-rescue. + +## Core insight: target mild contraction, not aggressive + +Empirical λ_1(success) ≈ -0.15 → effective gain ≈ exp(-0.15) ≈ **0.86**. +Empirical λ_1(failure) ≈ +0.04 → gain ≈ **1.04**. + +**Target κ ∈ (0.85, 0.95)**, NOT 0.3-0.5. Over-contraction kills constraint propagation in Sudoku. + +## Architectural sketch + +State: `z = (h, ℓ) ∈ R^{d_H + d_L}`, weighted norm `‖z‖_P² = ‖h‖² + η·‖ℓ‖²`. + +Joint feature map `ψ_θ(z, x)` via **Sandwich Layers** (Wang & Manchester 2023) constrained `Lip_P(ψ) ≤ 1`. + +Block gain operator: +``` +A = [[a_HH·I, a_HL·U_HL], + [a_LH·U_LH, a_LL·I]] +``` +where `U_HL, U_LH` orthogonal, and gains satisfy block-row-sum under weighted metric: +- `a_HH + √η · a_HL ≤ κ` +- `a_LL + η^{-1/2} · a_LH ≤ κ` + +with `κ ∈ (0.85, 0.95)`. + +Update rule: +``` +z_{t+1} = (1-α) z_t + α · A · ψ_θ(z_t, x) + b(x) +``` + +⇒ `Lip_P(T) ≤ (1-α) + α·κ < 1` by construction. + +With `α=1, κ=0.86`: λ_1 ≤ log(0.86) ≈ -0.15 — exactly matches empirical success regime. + +## Key methodological corrections vs my initial sketch + +1. **Constrain JOINT operator, not individual blocks**. HRM got this wrong: stable H and stable L don't imply stable joint due to cross-coupling J_HL, J_LH. Block-row-sum bound under weighted metric is the right translation of CF's empirical signal. + +2. **Use tied-time but single joint operator**: TRM's weight-tying across iterations is good (turns it into iterative solver). But fold H and L into one joint operator (unlike HRM's separate modules) to enforce shared contraction metric. + +3. **Damping alone isn't sufficient**: `z + β·f(z)` only contracts if f is already Lipschitz-bounded. Damping is for margin, not the main guarantee. + +## Failure modes to watch + +1. **Over-contraction**: κ too low → constraint propagation collapses → underperforms TRM +2. **Fake certification**: approximate spectral norm leaves hidden expansion directions; use exact Sandwich parameterization +3. **Cross-coupling starvation**: `a_HL, a_LH → 0` → decoupled two-state system loses reasoning capacity (need lower bounds on coupling gains too?) + +## Literature to anchor + +Primary: +- **Sandwich Layers** (Wang & Manchester 2023) — exact Lipschitz parameterization +- **Deep Equilibrium Models** (Bai et al.) — for the fixed-point formulation + +Secondary (conceptual): +- Lipschitz RNN (Erichson 2021) +- AntisymmetricRNN (Chang 2019) +- (CoRNN less relevant; oscillatory not the right inductive bias for Sudoku) + +## Implementation path + +1. Replace HRM/TRM's L_level/H_level with a **single tied joint operator** on (z_H, z_L) +2. Implement Sandwich layer ψ with `Lip ≤ 1` +3. Parameterize block gain matrix A with constraint `a_HH + √η·a_HL ≤ κ`, `a_LL + η^{-1/2}·a_LH ≤ κ` +4. α as learnable sigmoid (margin), κ as hyperparameter or learnable bounded < 1 +5. Sweep κ over {0.85, 0.90, 0.95} to find expressivity sweet spot |
