NOTE.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125

# Experiment Notes

## Experiment Phases
- **debug**: Initial implementation, rapid iteration (commits ce24e36)
- **pilot**: Controlled iteration (commits 0b9ebb2, 7baf7ae)
- **frozen**: Code at commit 0b9ebb2 for all reported results

## Status: PHASE 2 EXPLORE IN PROGRESS

---

## Final Results Summary

### Toy LQ (3 seeds, 8000 steps)
| Method | Costate Cosine | ρ | Nudging |
|--------|---------------|---|---------|
| DFA | 0.001±0.003 | 0.001±0.007 | 0.000±0.001 |
| State Bridge | 0.945±0.002 | 0.931±0.003 | -0.344±0.019 |
| Credit Bridge | 0.944±0.001 | 0.930±0.002 | -0.342±0.019 |

### CIFAR-10 (3 seeds, 100 epochs)
| Method | Test Accuracy |
|--------|:------------:|
| BP | 59.2%±0.4% |
| DFA | 30.0%±0.3% |
| Credit Bridge | 29.6%±1.0% |
| State Bridge | 18.5%±1.8% |

### CIFAR-10 Diagnostics (seed 42)
| Method | BP Cosine | ρ | Nudge |
|--------|-----------|---|-------|
| BP | 0.940 | 0.990 | -0.027 |
| Credit Bridge | 0.056 | ~0 | ~0 |
| DFA | 0.030 | 0.005 | ~0 |
| State Bridge | 0.021 | 0.004 | ~0 |

---

## Key Findings

1. **Terminal gradient matching is essential** for credit bridge.
   Without it, V learns correct values but uninformative gradients (cos → 0.03).
   With it, credit bridge matches state bridge on toy (~0.94 cosine).

2. **State bridge fails on nonlinear systems** despite near-perfect state prediction.
   State prediction error → 0.0000 but test accuracy = 18.5% (worst of all methods).
   This confirms the core hypothesis: bridging state ≠ bridging credit.

3. **Credit bridge modestly outperforms DFA in BP cosine** (0.056 vs 0.030, ~2x)
   but accuracy is comparable (29.6% vs 30.0%).

4. **All non-BP methods struggle** on the deep 12-block MLP architecture.
   The gap to BP (59.2%) is large for all methods.

---

## Changes Log
- `ce24e36`: Initial implementation
- `0b9ebb2`: Sync state bridge to use normalized MSE in both toy and CIFAR
- `7baf7ae`: Add experiment notes and .gitignore

## Experiment IDs
- `toy_lq_frozen/`: Final toy results (3 seeds, synced state bridge)
- `cifar10/`, `cifar10_seed123/`, `cifar10_seed456/`: Final CIFAR results
- `toy_lq/`: Debug-phase toy results (raw state bridge, for ablation)
- `smoke_test/`, `smoke_test2/`: FashionMNIST debug runs

## Design Decisions
1. Terminal gradient matching (term_grad_weight=1.0): output-layer-local, not hidden BP
2. DFA warmup for credit bridge (20% of epochs): prevents value net bootstrap failure
3. Normalized MSE for state bridge: numerical stability
4. Credit normalization: a_norm = a / (RMS(a) + 1e-6)

---

## Phase 2: Explore (commit 2403960+)

### Synthetic Nonlinearity Ladder (Phase 1 of explore)

**Setup**: Teacher-student with phi_alpha(z) = (1-alpha)*z + alpha*tanh(z)
- alpha in {0, 0.25, 0.5, 1.0}, L in {2, 4, 8, 12}
- d=128, C=10, 80 epochs, 3 seeds

**Critical Finding**: Credit bridge advantage scales with nonlinearity.

At alpha=1.0 (full tanh), credit bridge is the BEST method on Gamma and rho at ALL depths:

| L | DFA Gamma | SB Gamma | CB Gamma | DFA rho | SB rho | CB rho |
|---|-----------|----------|----------|---------|--------|--------|
| 2 | 0.03 | 0.52 | **0.53** | 0.03 | 0.47 | **0.57** |
| 4 | 0.05 | 0.34 | **0.45** | 0.06 | 0.32 | **0.51** |
| 8 | 0.06 | 0.25 | **0.36** | 0.07 | 0.23 | **0.42** |
| 12 | 0.07 | 0.22 | **0.24** | 0.07 | 0.21 | **0.32** |

At alpha=0.5 (moderate nonlinearity), SB still wins on Gamma but CB wins on rho at L=4.
At alpha=0 (linear), SB dominates.

**Interpretation**: State bridge fails via Jacobian mismatch, not value prediction error.
Credit bridge avoids this by learning value field gradients directly.
The crossover happens around alpha=0.7-1.0.

### CIFAR-10 Depth Scan (Phase 2 of explore, in progress)

Sweep L={2,4,6,8,12}, d=512, 100 epochs on CIFAR-10.
Preliminary results (L=2,4, seed=42):

| L | Method | Acc | Gamma | rho |
|---|--------|-----|-------|-----|
| 2 | DFA | 0.312 | 0.196 | 0.001 |
| 2 | CB | 0.311 | 0.175 | **0.031** |
| 4 | DFA | 0.314 | 0.100 | 0.003 |
| 4 | CB | 0.298 | 0.123 | -0.002 |

CIFAR is much harder -- rho signal is very weak for all non-BP methods.

### Changes Log (explore phase)
- `2403960`: Add synthetic ladder and CIFAR depth scan experiments
- Student blocks now use pre-LayerNorm for stability (fixes L>=8 blowup)
- Added gradient clipping to block updates

### Experiment IDs (explore phase)
- `synth_ladder_smoke/`: Initial 3-alpha x 2-depth smoke test
- `synth_ladder_v2_lo/`: Full alpha=0,0.25 x L=2,4,8,12 x 3 seeds
- `synth_ladder_v2_hi/`: Full alpha=0.5,1.0 x L=2,4,8,12 x 3 seeds
- `cifar_depth_scan_s42/`: CIFAR L=2,4,6 x d=512 x seed=42 (in progress)