# Experiment Notes ## Experiment Phases - **debug**: Initial implementation, rapid iteration (commits ce24e36) - **pilot**: Controlled iteration (commits 0b9ebb2, 7baf7ae) - **frozen**: Code at commit 0b9ebb2 for all reported results ## Status: COMPLETE --- ## Final Results Summary ### Toy LQ (3 seeds, 8000 steps) | Method | Costate Cosine | ρ | Nudging | |--------|---------------|---|---------| | DFA | 0.001±0.003 | 0.001±0.007 | 0.000±0.001 | | State Bridge | 0.945±0.002 | 0.931±0.003 | -0.344±0.019 | | Credit Bridge | 0.944±0.001 | 0.930±0.002 | -0.342±0.019 | ### CIFAR-10 (3 seeds, 100 epochs) | Method | Test Accuracy | |--------|:------------:| | BP | 59.2%±0.4% | | DFA | 30.0%±0.3% | | Credit Bridge | 29.6%±1.0% | | State Bridge | 18.5%±1.8% | ### CIFAR-10 Diagnostics (seed 42) | Method | BP Cosine | ρ | Nudge | |--------|-----------|---|-------| | BP | 0.940 | 0.990 | -0.027 | | Credit Bridge | 0.056 | ~0 | ~0 | | DFA | 0.030 | 0.005 | ~0 | | State Bridge | 0.021 | 0.004 | ~0 | --- ## Key Findings 1. **Terminal gradient matching is essential** for credit bridge. Without it, V learns correct values but uninformative gradients (cos → 0.03). With it, credit bridge matches state bridge on toy (~0.94 cosine). 2. **State bridge fails on nonlinear systems** despite near-perfect state prediction. State prediction error → 0.0000 but test accuracy = 18.5% (worst of all methods). This confirms the core hypothesis: bridging state ≠ bridging credit. 3. **Credit bridge modestly outperforms DFA in BP cosine** (0.056 vs 0.030, ~2x) but accuracy is comparable (29.6% vs 30.0%). 4. **All non-BP methods struggle** on the deep 12-block MLP architecture. The gap to BP (59.2%) is large for all methods. --- ## Changes Log - `ce24e36`: Initial implementation - `0b9ebb2`: Sync state bridge to use normalized MSE in both toy and CIFAR - `7baf7ae`: Add experiment notes and .gitignore ## Experiment IDs - `toy_lq_frozen/`: Final toy results (3 seeds, synced state bridge) - `cifar10/`, `cifar10_seed123/`, `cifar10_seed456/`: Final CIFAR results - `toy_lq/`: Debug-phase toy results (raw state bridge, for ablation) - `smoke_test/`, `smoke_test2/`: FashionMNIST debug runs ## Design Decisions 1. Terminal gradient matching (term_grad_weight=1.0): output-layer-local, not hidden BP 2. DFA warmup for credit bridge (20% of epochs): prevents value net bootstrap failure 3. Normalized MSE for state bridge: numerical stability 4. Credit normalization: a_norm = a / (RMS(a) + 1e-6)