Add final report, plots, experiment guide, and complete NOTE.md

All experiments complete: - Toy LQ: credit bridge matches state bridge (~0.94 costate cosine) - CIFAR-10: credit bridge (29.6%) comparable to DFA (30.0%), both beat state bridge (18.5%) - State bridge confirms core hypothesis: perfect state prediction != useful credit - Terminal gradient matching is essential for credit bridge
author: YurenHao0426 <Blackhao0426@gmail.com> 2026-03-23 19:46:08 -0500
committer: YurenHao0426 <Blackhao0426@gmail.com> 2026-03-23 19:46:08 -0500
commit: 32123cb36ae9521f60c9b6f67458b931b6540ef2 (patch)
tree: 4731e1dc513f5b613f80c4d20fc4114044c266d3 /NOTE.md
parent: bbb1a36d67f2f0c83106c1e771ea2c2fcb7fd83a (diff)
1 files changed, 65 insertions, 58 deletions
diff --git a/NOTE.md b/NOTE.md
index 2b42750..20b4512 100644
--- a/NOTE.md
+++ b/NOTE.md
@@ -1,65 +1,72 @@
 # Experiment Notes
 
 ## Experiment Phases
-- **debug**: Initial implementation, rapid iteration. Code may change between runs.
-- **pilot**: Controlled iteration. Each change requires commit + rationale.
-- **frozen**: Code frozen at specific commit hash. Only formal multi-seed runs.
+- **debug**: Initial implementation, rapid iteration (commits ce24e36)
+- **pilot**: Controlled iteration (commits 0b9ebb2, 7baf7ae)
+- **frozen**: Code at commit 0b9ebb2 for all reported results
 
-## Current Phase: PILOT
-- Commit for toy frozen runs: `0b9ebb2` (state bridge synced to normalized MSE)
-- CIFAR runs started from commit `ce24e36` (CIFAR code unchanged by sync commit)
+## Status: COMPLETE
 
 ---
 
-## 2026-03-23: Implementation and Experiments
-
-### Setup
-- GPU: NVIDIA RTX A6000 x4 (GPU 0 occupied, using GPUs 1-3)
-- PyTorch 2.10.0+cu128
-
-### Key Findings
-
-#### 1. Credit Bridge requires terminal gradient matching
-- **Without** terminal gradient matching: credit bridge costate cosine collapses to ~0.03 (no signal)
-- **With** terminal gradient matching: credit bridge achieves ~0.94 cosine (matches state bridge)
-- Terminal gradient uses only output-layer local info (not hidden BP) → allowed
-- This is the most important finding so far
-
-#### 2. Toy LQ Results (3 seeds, 8000 steps, commit 0b9ebb2)
-| Method | Costate Cosine | Perturbation ρ | Nudging |
-|--------|---------------|----------------|---------|
-| DFA | 0.003±0.001 | 0.010±0.012 | -0.001±0.000 |
-| State Bridge | 0.941±0.003 | 0.927±0.004 | -0.335±0.015 |
-| Credit Bridge | 0.942±0.002 | 0.929±0.003 | -0.334±0.015 |
-
-- Both State Bridge and Credit Bridge match closely on the linear system
-- DFA provides essentially no directional credit (random level)
-- Bridge residual decreases steadily during training
-- FM auxiliary provides marginal improvement (0.946 vs 0.940 cosine)
-
-#### 3. CIFAR-10 (in progress, 3 seeds on GPUs 1-3)
-- BP baseline: ~59% test accuracy (expected for flat MLP on CIFAR-10)
-- DFA: ~28% test accuracy at epoch 30 (struggling on deep network)
-- State Bridge: running
-- Credit Bridge: running with warmup (20% DFA warmup + linear blend)
-
-### Design Decisions
-1. **Terminal gradient matching** (term_grad_weight=1.0): Essential for credit bridge. The bridge consistency loss alone constrains V values but not gradients. Terminal gradient matching provides curvature info from output-layer-local computation.
-2. **DFA warmup for credit bridge**: Without warmup, the credit bridge collapses because value net can't learn useful credits while forward net is being updated with random signals.
-3. **Normalized MSE for state bridge**: `((pred - target) / max(||target||, 1.0))^2` for numerical stability on CIFAR where hidden states can have large norms.
-4. **Credit normalization**: All methods use `a_norm = a / (RMS(a) + 1e-6)` in local surrogate to control credit magnitude.
-
-### Changes Log
-- `ce24e36`: Initial implementation with all models, methods, toy and CIFAR experiments
-- `0b9ebb2`: Sync state bridge to use normalized MSE in both toy and CIFAR (consistency fix)
-
-### Experiment IDs
-- `toy_lq_v1`: Original toy, no terminal gradient matching (for ablation)
-- `toy_lq_v2`: Toy with terminal gradient matching (primary)
-- `toy_lq_frozen`: Re-run of v2 with synced state bridge (for final report)
-- `cifar10_seed42/123/456`: Main CIFAR-10 experiments
-
-### Known Issues
-- DFA accuracy on CIFAR-10 is low (~28% at epoch 30). Expected for DFA on deep MLPs.
-- State bridge had astronomical prediction errors before normalization fix.
-- Credit bridge needs DFA warmup phase to bootstrap stable training.
+## Final Results Summary
+
+### Toy LQ (3 seeds, 8000 steps)
+| Method | Costate Cosine | ρ | Nudging |
+|--------|---------------|---|---------|
+| DFA | 0.001±0.003 | 0.001±0.007 | 0.000±0.001 |
+| State Bridge | 0.945±0.002 | 0.931±0.003 | -0.344±0.019 |
+| Credit Bridge | 0.944±0.001 | 0.930±0.002 | -0.342±0.019 |
+
+### CIFAR-10 (3 seeds, 100 epochs)
+| Method | Test Accuracy |
+|--------|:------------:|
+| BP | 59.2%±0.4% |
+| DFA | 30.0%±0.3% |
+| Credit Bridge | 29.6%±1.0% |
+| State Bridge | 18.5%±1.8% |
+
+### CIFAR-10 Diagnostics (seed 42)
+| Method | BP Cosine | ρ | Nudge |
+|--------|-----------|---|-------|
+| BP | 0.940 | 0.990 | -0.027 |
+| Credit Bridge | 0.056 | ~0 | ~0 |
+| DFA | 0.030 | 0.005 | ~0 |
+| State Bridge | 0.021 | 0.004 | ~0 |
+
+---
+
+## Key Findings
+
+1. **Terminal gradient matching is essential** for credit bridge.
+   Without it, V learns correct values but uninformative gradients (cos → 0.03).
+   With it, credit bridge matches state bridge on toy (~0.94 cosine).
+
+2. **State bridge fails on nonlinear systems** despite near-perfect state prediction.
+   State prediction error → 0.0000 but test accuracy = 18.5% (worst of all methods).
+   This confirms the core hypothesis: bridging state ≠ bridging credit.
+
+3. **Credit bridge modestly outperforms DFA in BP cosine** (0.056 vs 0.030, ~2x)
+   but accuracy is comparable (29.6% vs 30.0%).
+
+4. **All non-BP methods struggle** on the deep 12-block MLP architecture.
+   The gap to BP (59.2%) is large for all methods.
+
+---
+
+## Changes Log
+- `ce24e36`: Initial implementation
+- `0b9ebb2`: Sync state bridge to use normalized MSE in both toy and CIFAR
+- `7baf7ae`: Add experiment notes and .gitignore
+
+## Experiment IDs
+- `toy_lq_frozen/`: Final toy results (3 seeds, synced state bridge)
+- `cifar10/`, `cifar10_seed123/`, `cifar10_seed456/`: Final CIFAR results
+- `toy_lq/`: Debug-phase toy results (raw state bridge, for ablation)
+- `smoke_test/`, `smoke_test2/`: FashionMNIST debug runs
+
+## Design Decisions
+1. Terminal gradient matching (term_grad_weight=1.0): output-layer-local, not hidden BP
+2. DFA warmup for credit bridge (20% of epochs): prevents value net bootstrap failure
+3. Normalized MSE for state bridge: numerical stability
+4. Credit normalization: a_norm = a / (RMS(a) + 1e-6)
author	YurenHao0426 <Blackhao0426@gmail.com>	2026-03-23 19:46:08 -0500
committer	YurenHao0426 <Blackhao0426@gmail.com>	2026-03-23 19:46:08 -0500
commit	32123cb36ae9521f60c9b6f67458b931b6540ef2 (patch)
tree	4731e1dc513f5b613f80c4d20fc4114044c266d3 /NOTE.md
parent	bbb1a36d67f2f0c83106c1e771ea2c2fcb7fd83a (diff)