summaryrefslogtreecommitdiff
path: root/report
diff options
context:
space:
mode:
authorYurenHao0426 <Blackhao0426@gmail.com>2026-03-23 19:46:08 -0500
committerYurenHao0426 <Blackhao0426@gmail.com>2026-03-23 19:46:08 -0500
commit32123cb36ae9521f60c9b6f67458b931b6540ef2 (patch)
tree4731e1dc513f5b613f80c4d20fc4114044c266d3 /report
parentbbb1a36d67f2f0c83106c1e771ea2c2fcb7fd83a (diff)
Add final report, plots, experiment guide, and complete NOTE.md
All experiments complete: - Toy LQ: credit bridge matches state bridge (~0.94 costate cosine) - CIFAR-10: credit bridge (29.6%) comparable to DFA (30.0%), both beat state bridge (18.5%) - State bridge confirms core hypothesis: perfect state prediction != useful credit - Terminal gradient matching is essential for credit bridge
Diffstat (limited to 'report')
-rw-r--r--report/REPORT.md140
-rw-r--r--report/cifar_accuracy.pngbin0 -> 210725 bytes
-rw-r--r--report/cifar_diagnostics.pngbin0 -> 131811 bytes
-rw-r--r--report/cifar_feature_drift.pngbin0 -> 126594 bytes
-rw-r--r--report/cifar_state_vs_credit.pngbin0 -> 135306 bytes
-rw-r--r--report/cifar_summary.pngbin0 -> 50530 bytes
-rw-r--r--report/toy_per_layer_diagnostics.pngbin166670 -> 167302 bytes
-rw-r--r--report/toy_term_grad_effect.pngbin80994 -> 84936 bytes
-rw-r--r--report/toy_training_curves.pngbin179278 -> 169558 bytes
9 files changed, 140 insertions, 0 deletions
diff --git a/report/REPORT.md b/report/REPORT.md
new file mode 100644
index 0000000..2fa8e31
--- /dev/null
+++ b/report/REPORT.md
@@ -0,0 +1,140 @@
+# Credit Bridge: Terminal-Conditioned Value Field for Local Credit Assignment
+
+## Experiment Report
+
+### 1. Method Summary
+
+We compare four methods for training a deep residual MLP (d=512, L=12 blocks) without hidden-layer backpropagation (except for BP baseline):
+
+| Method | Description | Uses Hidden BP? |
+|--------|-------------|-----------------|
+| **BP** | Standard end-to-end backpropagation | Yes (upper bound) |
+| **DFA** | Direct Feedback Alignment with fixed random B_l | No |
+| **State Bridge** | Predict h_L from (h_l, t_l, s), derive credit via grad through predictor | No |
+| **Credit Bridge** | Learn scalar V_phi(h_l, t_l, s), credit = grad_h V | No |
+
+Key constraint: **No hidden BP anchor** — intermediate layers never receive exact backprop gradients during training. Only the output layer uses exact CE gradient.
+
+### 2. Phase A: Toy Linear-Quadratic Sanity Check
+
+**Setup**: d=64, m=10, L=12 layers, fixed linear dynamics h_{l+1} = M_l h_l + noise. Forward net is frozen; only feedback/bridge models train. Exact costate is analytically available.
+
+**Results (3 seeds, 8000 steps, commit 0b9ebb2):**
+
+| Method | Costate Cosine | Perturbation ρ | Nudging |
+|--------|:--------------:|:--------------:|:-------:|
+| DFA | 0.001 ± 0.003 | 0.001 ± 0.007 | 0.000 ± 0.001 |
+| State Bridge | 0.945 ± 0.002 | 0.931 ± 0.003 | -0.344 ± 0.019 |
+| Credit Bridge | 0.944 ± 0.001 | 0.930 ± 0.002 | -0.342 ± 0.019 |
+
+**Key findings:**
+- Credit bridge matches state bridge (~0.94 cosine) on the linear system.
+- Both far exceed DFA, which provides essentially zero directional credit.
+- Credit bridge requires **terminal gradient matching** to succeed. Without it, the value function learns correct values but has uninformative gradients (cosine collapses to ~0.03). Terminal gradient matching uses output-layer-local info only (not hidden BP).
+- FM auxiliary provides marginal additional improvement (0.946 vs 0.940).
+
+![Per-layer diagnostics](toy_per_layer_diagnostics.png)
+![Terminal gradient effect](toy_term_grad_effect.png)
+![Training curves](toy_training_curves.png)
+
+### 3. Phase B: CIFAR-10 Deep Residual MLP
+
+**Setup**: CIFAR-10, d=512, L=12 residual blocks, pre-LayerNorm + GELU, AdamW lr=1e-3, 100 epochs, batch size 128. Credit bridge uses 20-epoch DFA warmup + linear blend.
+
+**Accuracy Results (3 seeds):**
+
+| Method | Test Accuracy |
+|--------|:------------:|
+| BP | 59.2% ± 0.4% |
+| DFA | 30.0% ± 0.3% |
+| Credit Bridge | 29.6% ± 1.0% |
+| State Bridge | 18.5% ± 1.8% |
+
+![CIFAR summary](cifar_summary.png)
+
+**Diagnostic Results (seed 42, per-layer averages):**
+
+| Method | BP Cosine | Perturbation ρ | Nudge (η=0.01) |
+|--------|:---------:|:--------------:|:--------------:|
+| BP | 0.940 | 0.990 | -0.027 |
+| DFA | 0.030 | 0.005 | ~0.000 |
+| State Bridge | 0.021 | 0.004 | ~0.000 |
+| Credit Bridge | 0.056 | ~0.000 | ~0.000 |
+
+**State Bridge Prediction Quality:**
+- Final state prediction error: ~0.0000 (near-perfect h_L prediction)
+- Yet worst test accuracy at 18.5%
+
+This is the project's most striking result: the state bridge achieves **near-zero state prediction error** but produces the **worst credit** of all methods. This directly validates the core hypothesis: **bridging state ≠ bridging credit**.
+
+![Accuracy curves](cifar_accuracy.png)
+![State vs Credit](cifar_state_vs_credit.png)
+![Diagnostics](cifar_diagnostics.png)
+![Feature drift](cifar_feature_drift.png)
+
+### 4. Key Observations
+
+**4.1 Credit Bridge vs DFA:**
+- Credit bridge accuracy is comparable to DFA (29.6% vs 30.0%), not clearly better.
+- Credit bridge has higher BP cosine (0.056 vs 0.030) — nearly 2x — suggesting slightly better credit direction.
+- However, in absolute terms, both methods have very low credit quality on this deep nonlinear architecture.
+- The credit bridge still benefits from DFA warmup, indicating it cannot bootstrap independently.
+
+**4.2 Why State Bridge Fails on Nonlinear Systems:**
+In the linear toy case, the state predictor's Jacobian matches the true forward Jacobian, so grad through the predictor equals the true costate. In the nonlinear CIFAR case, the predictor learns a separate function that maps h_l → h_L with correct values but incorrect Jacobian. The credit derived from this mismatched Jacobian is essentially random.
+
+**4.3 Terminal Gradient Matching is Essential:**
+Without terminal gradient matching (an output-layer-local computation), the credit bridge's value function has correct value predictions but flat gradients. The bridge consistency loss constrains V's values, not its curvature. Terminal gradient matching provides the curvature information needed to propagate useful credit.
+
+**4.4 Deep MLP Architecture Limitation:**
+All non-BP methods perform poorly on the 12-block deep MLP. DFA is known to struggle on very deep architectures. The d=512, C=10 dimension ratio means DFA's random projection maps 10 → 512 dims with only ~2% useful signal. This affects all DFA-derived methods including the credit bridge warmup.
+
+### 5. Conclusions
+
+**Q1: Can credit bridge learn useful credit without hidden BP?**
+*Partially.* On the linear toy system, yes — it matches the exact costate (0.94 cosine). On CIFAR-10 with a deep MLP, it produces slightly better BP-aligned credit than DFA (0.056 vs 0.030 cosine) and achieves comparable accuracy. However, the absolute credit quality is still low on the nonlinear task.
+
+**Q2: Is credit bridge better than state bridge as a credit assignment object?**
+*Unambiguously yes.* State bridge achieves near-perfect terminal state prediction (error ≈ 0) but produces the worst credit of all methods (18.5% accuracy, lowest BP cosine). Credit bridge avoids this failure mode by directly learning the value field rather than the state mapping. This validates the core thesis: bridging credit/value is fundamentally different from bridging state.
+
+**Q3: Does credit bridge outperform DFA on diagnostic metrics?**
+*Modestly.* Credit bridge shows ~2x higher BP cosine than DFA (0.056 vs 0.030), but both are weak in absolute terms on CIFAR-10. The perturbation correlation and nudging metrics are near-zero for all non-BP methods on this deep architecture. The improvement is directionally correct but not yet large enough to translate into consistent accuracy gains.
+
+### 6. Failure Analysis and Next Steps
+
+**What worked:**
+- Terminal gradient matching is a clean, principled enhancement that stays within the "no hidden BP" constraint
+- The state bridge failure is clear and reproducible — strong evidence for the project's thesis
+- Credit bridge value loss converges well (0.48 → 0.0007)
+
+**What didn't work as hoped:**
+- Credit bridge accuracy didn't clearly beat DFA (29.6% vs 30.0%)
+- Per-layer credit quality metrics (rho, nudging) are weak for all non-BP methods
+- The DFA warmup dependency suggests credit bridge can't fully bootstrap
+
+**Engineering vs theoretical failure:**
+The weak CIFAR results are likely a mix of both:
+- *Theoretical*: The bridge consistency with finite noise samples may not provide enough gradient information in high-dimensional nonlinear settings
+- *Engineering*: The architecture (12-block deep MLP on flattened images) is inherently difficult for non-BP methods; DFA itself struggles
+
+**Recommended next steps:**
+1. Try shallower architectures (L=4-6) where DFA is known to work better
+2. Increase noise samples K and sigma_bridge for richer bridge targets
+3. Add FM auxiliary on the CIFAR task (second-order gradient smoothness)
+4. Try intermediate architectures between linear and deep nonlinear (e.g., shallow nonlinear)
+5. Investigate whether the credit bridge credits improve with longer training
+
+### 7. Reproducibility
+
+- Commit hash: `0b9ebb2` (state bridge synced)
+- All experiments use 3 random seeds: 42, 123, 456
+- PyTorch 2.10.0+cu128, NVIDIA RTX A6000
+- Full configs in `configs/` directory
+- Run commands:
+ ```bash
+ # Toy LQ
+ python experiments/toy_lq_v2.py --seed 42 --term_grad_weight 1.0 --output_dir results/toy_lq_frozen
+
+ # CIFAR-10
+ python experiments/cifar_resmlp.py --dataset cifar10 --seeds 42 --output_dir results/cifar10
+ ```
diff --git a/report/cifar_accuracy.png b/report/cifar_accuracy.png
new file mode 100644
index 0000000..949d116
--- /dev/null
+++ b/report/cifar_accuracy.png
Binary files differ
diff --git a/report/cifar_diagnostics.png b/report/cifar_diagnostics.png
new file mode 100644
index 0000000..8476bd8
--- /dev/null
+++ b/report/cifar_diagnostics.png
Binary files differ
diff --git a/report/cifar_feature_drift.png b/report/cifar_feature_drift.png
new file mode 100644
index 0000000..a7c5d22
--- /dev/null
+++ b/report/cifar_feature_drift.png
Binary files differ
diff --git a/report/cifar_state_vs_credit.png b/report/cifar_state_vs_credit.png
new file mode 100644
index 0000000..e915023
--- /dev/null
+++ b/report/cifar_state_vs_credit.png
Binary files differ
diff --git a/report/cifar_summary.png b/report/cifar_summary.png
new file mode 100644
index 0000000..cd29f20
--- /dev/null
+++ b/report/cifar_summary.png
Binary files differ
diff --git a/report/toy_per_layer_diagnostics.png b/report/toy_per_layer_diagnostics.png
index d31b188..8af427c 100644
--- a/report/toy_per_layer_diagnostics.png
+++ b/report/toy_per_layer_diagnostics.png
Binary files differ
diff --git a/report/toy_term_grad_effect.png b/report/toy_term_grad_effect.png
index 13f0458..6574da3 100644
--- a/report/toy_term_grad_effect.png
+++ b/report/toy_term_grad_effect.png
Binary files differ
diff --git a/report/toy_training_curves.png b/report/toy_training_curves.png
index cc3532b..e7fedf9 100644
--- a/report/toy_training_curves.png
+++ b/report/toy_training_curves.png
Binary files differ