NOTE.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65

# Experiment Notes

## Experiment Phases
- **debug**: Initial implementation, rapid iteration. Code may change between runs.
- **pilot**: Controlled iteration. Each change requires commit + rationale.
- **frozen**: Code frozen at specific commit hash. Only formal multi-seed runs.

## Current Phase: PILOT
- Commit for toy frozen runs: `0b9ebb2` (state bridge synced to normalized MSE)
- CIFAR runs started from commit `ce24e36` (CIFAR code unchanged by sync commit)

---

## 2026-03-23: Implementation and Experiments

### Setup
- GPU: NVIDIA RTX A6000 x4 (GPU 0 occupied, using GPUs 1-3)
- PyTorch 2.10.0+cu128

### Key Findings

#### 1. Credit Bridge requires terminal gradient matching
- **Without** terminal gradient matching: credit bridge costate cosine collapses to ~0.03 (no signal)
- **With** terminal gradient matching: credit bridge achieves ~0.94 cosine (matches state bridge)
- Terminal gradient uses only output-layer local info (not hidden BP) → allowed
- This is the most important finding so far

#### 2. Toy LQ Results (3 seeds, 8000 steps, commit 0b9ebb2)
| Method | Costate Cosine | Perturbation ρ | Nudging |
|--------|---------------|----------------|---------|
| DFA | 0.003±0.001 | 0.010±0.012 | -0.001±0.000 |
| State Bridge | 0.941±0.003 | 0.927±0.004 | -0.335±0.015 |
| Credit Bridge | 0.942±0.002 | 0.929±0.003 | -0.334±0.015 |

- Both State Bridge and Credit Bridge match closely on the linear system
- DFA provides essentially no directional credit (random level)
- Bridge residual decreases steadily during training
- FM auxiliary provides marginal improvement (0.946 vs 0.940 cosine)

#### 3. CIFAR-10 (in progress, 3 seeds on GPUs 1-3)
- BP baseline: ~59% test accuracy (expected for flat MLP on CIFAR-10)
- DFA: ~28% test accuracy at epoch 30 (struggling on deep network)
- State Bridge: running
- Credit Bridge: running with warmup (20% DFA warmup + linear blend)

### Design Decisions
1. **Terminal gradient matching** (term_grad_weight=1.0): Essential for credit bridge. The bridge consistency loss alone constrains V values but not gradients. Terminal gradient matching provides curvature info from output-layer-local computation.
2. **DFA warmup for credit bridge**: Without warmup, the credit bridge collapses because value net can't learn useful credits while forward net is being updated with random signals.
3. **Normalized MSE for state bridge**: `((pred - target) / max(||target||, 1.0))^2` for numerical stability on CIFAR where hidden states can have large norms.
4. **Credit normalization**: All methods use `a_norm = a / (RMS(a) + 1e-6)` in local surrogate to control credit magnitude.

### Changes Log
- `ce24e36`: Initial implementation with all models, methods, toy and CIFAR experiments
- `0b9ebb2`: Sync state bridge to use normalized MSE in both toy and CIFAR (consistency fix)

### Experiment IDs
- `toy_lq_v1`: Original toy, no terminal gradient matching (for ablation)
- `toy_lq_v2`: Toy with terminal gradient matching (primary)
- `toy_lq_frozen`: Re-run of v2 with synced state bridge (for final report)
- `cifar10_seed42/123/456`: Main CIFAR-10 experiments

### Known Issues
- DFA accuracy on CIFAR-10 is low (~28% at epoch 30). Expected for DFA on deep MLPs.
- State bridge had astronomical prediction errors before normalization fix.
- Credit bridge needs DFA warmup phase to bootstrap stable training.