research/flossing/srm_design_codex.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69

# SRM (Stable Recursion Model) — Codex Design Synthesis

Returned 2026-05-22 by codex-rescue.

## Core insight: target mild contraction, not aggressive

Empirical λ_1(success) ≈ -0.15 → effective gain ≈ exp(-0.15) ≈ **0.86**.
Empirical λ_1(failure) ≈ +0.04 → gain ≈ **1.04**.

**Target κ ∈ (0.85, 0.95)**, NOT 0.3-0.5. Over-contraction kills constraint propagation in Sudoku.

## Architectural sketch

State: `z = (h, ℓ) ∈ R^{d_H + d_L}`, weighted norm `‖z‖_P² = ‖h‖² + η·‖ℓ‖²`.

Joint feature map `ψ_θ(z, x)` via **Sandwich Layers** (Wang & Manchester 2023) constrained `Lip_P(ψ) ≤ 1`.

Block gain operator:
```
A = [[a_HH·I,    a_HL·U_HL],
     [a_LH·U_LH, a_LL·I]]
```
where `U_HL, U_LH` orthogonal, and gains satisfy block-row-sum under weighted metric:
- `a_HH + √η · a_HL ≤ κ`
- `a_LL + η^{-1/2} · a_LH ≤ κ`

with `κ ∈ (0.85, 0.95)`.

Update rule:
```
z_{t+1} = (1-α) z_t + α · A · ψ_θ(z_t, x) + b(x)
```

⇒ `Lip_P(T) ≤ (1-α) + α·κ < 1` by construction.

With `α=1, κ=0.86`: λ_1 ≤ log(0.86) ≈ -0.15 — exactly matches empirical success regime.

## Key methodological corrections vs my initial sketch

1. **Constrain JOINT operator, not individual blocks**. HRM got this wrong: stable H and stable L don't imply stable joint due to cross-coupling J_HL, J_LH. Block-row-sum bound under weighted metric is the right translation of CF's empirical signal.

2. **Use tied-time but single joint operator**: TRM's weight-tying across iterations is good (turns it into iterative solver). But fold H and L into one joint operator (unlike HRM's separate modules) to enforce shared contraction metric.

3. **Damping alone isn't sufficient**: `z + β·f(z)` only contracts if f is already Lipschitz-bounded. Damping is for margin, not the main guarantee.

## Failure modes to watch

1. **Over-contraction**: κ too low → constraint propagation collapses → underperforms TRM
2. **Fake certification**: approximate spectral norm leaves hidden expansion directions; use exact Sandwich parameterization
3. **Cross-coupling starvation**: `a_HL, a_LH → 0` → decoupled two-state system loses reasoning capacity (need lower bounds on coupling gains too?)

## Literature to anchor

Primary:
- **Sandwich Layers** (Wang & Manchester 2023) — exact Lipschitz parameterization
- **Deep Equilibrium Models** (Bai et al.) — for the fixed-point formulation

Secondary (conceptual):
- Lipschitz RNN (Erichson 2021)
- AntisymmetricRNN (Chang 2019)
- (CoRNN less relevant; oscillatory not the right inductive bias for Sudoku)

## Implementation path

1. Replace HRM/TRM's L_level/H_level with a **single tied joint operator** on (z_H, z_L)
2. Implement Sandwich layer ψ with `Lip ≤ 1`
3. Parameterize block gain matrix A with constraint `a_HH + √η·a_HL ≤ κ`, `a_LL + η^{-1/2}·a_LH ≤ κ`
4. α as learnable sigmoid (margin), κ as hyperparameter or learnable bounded < 1
5. Sweep κ over {0.85, 0.90, 0.95} to find expressivity sweet spot