diff options
| author | YurenHao0426 <Blackhao0426@gmail.com> | 2026-03-23 19:46:08 -0500 |
|---|---|---|
| committer | YurenHao0426 <Blackhao0426@gmail.com> | 2026-03-23 19:46:08 -0500 |
| commit | 32123cb36ae9521f60c9b6f67458b931b6540ef2 (patch) | |
| tree | 4731e1dc513f5b613f80c4d20fc4114044c266d3 /README_experiments.md | |
| parent | bbb1a36d67f2f0c83106c1e771ea2c2fcb7fd83a (diff) | |
Add final report, plots, experiment guide, and complete NOTE.md
All experiments complete:
- Toy LQ: credit bridge matches state bridge (~0.94 costate cosine)
- CIFAR-10: credit bridge (29.6%) comparable to DFA (30.0%), both beat state bridge (18.5%)
- State bridge confirms core hypothesis: perfect state prediction != useful credit
- Terminal gradient matching is essential for credit bridge
Diffstat (limited to 'README_experiments.md')
| -rw-r--r-- | README_experiments.md | 89 |
1 files changed, 89 insertions, 0 deletions
diff --git a/README_experiments.md b/README_experiments.md new file mode 100644 index 0000000..0fa60cd --- /dev/null +++ b/README_experiments.md @@ -0,0 +1,89 @@ +# Experiment Guide + +## Requirements +- Python 3.10+ +- PyTorch 2.x with CUDA +- torchvision, numpy, scipy, matplotlib + +## Project Structure +``` +models/ + residual_mlp.py - Deep residual MLP (pre-LayerNorm + GELU blocks) + value_net.py - Scalar value network V_phi for credit bridge + state_bridge.py - State predictor G_psi for state bridge + +experiments/ + toy_lq_v2.py - Phase A: Linear-quadratic sanity check + cifar_resmlp.py - Phase B: CIFAR-10 main experiment + plot_toy_final.py - Generate toy plots + plot_cifar_final.py - Generate CIFAR plots + +metrics/ + credit_metrics.py - Diagnostic metrics (cosine, rho, nudging, etc.) + +configs/ - YAML configs +report/ - Plots and final report +results/ - Experiment outputs +``` + +## Running Experiments + +### Phase A: Toy LQ Sanity Check +```bash +# Single seed +CUDA_VISIBLE_DEVICES=0 python experiments/toy_lq_v2.py \ + --gpu 0 --seed 42 --num_steps 8000 \ + --sigma_bridge 0.1 --lam 0.1 \ + --term_grad_weight 1.0 --fm_weight 0.0 \ + --output_dir results/toy_lq_frozen + +# All 3 seeds +for seed in 42 123 456; do + CUDA_VISIBLE_DEVICES=0 python experiments/toy_lq_v2.py \ + --gpu 0 --seed $seed --num_steps 8000 \ + --sigma_bridge 0.1 --lam 0.1 \ + --term_grad_weight 1.0 --fm_weight 0.0 \ + --output_dir results/toy_lq_frozen +done +``` + +### Phase B: CIFAR-10 Main Experiment +```bash +# Single seed (runs BP, DFA, State Bridge, Credit Bridge sequentially) +CUDA_VISIBLE_DEVICES=0 python experiments/cifar_resmlp.py \ + --dataset cifar10 --d_hidden 512 --num_blocks 12 \ + --epochs 100 --seeds 42 --gpu 0 \ + --output_dir results/cifar10 + +# Parallel across GPUs +CUDA_VISIBLE_DEVICES=0 python experiments/cifar_resmlp.py --seeds 42 --output_dir results/cifar10 --gpu 0 & +CUDA_VISIBLE_DEVICES=1 python experiments/cifar_resmlp.py --seeds 123 --output_dir results/cifar10_seed123 --gpu 0 & +CUDA_VISIBLE_DEVICES=2 python experiments/cifar_resmlp.py --seeds 456 --output_dir results/cifar10_seed456 --gpu 0 & +wait +``` + +### Generate Plots +```bash +python experiments/plot_toy_final.py +python experiments/plot_cifar_final.py +``` + +## Key Parameters +| Parameter | Toy LQ | CIFAR-10 | Description | +|-----------|--------|----------|-------------| +| d_hidden | 64 | 512 | Hidden dimension | +| num_layers/blocks | 12 | 12 | Depth | +| sigma_bridge | 0.1 | 0.05 | Bridge noise std | +| lam | 0.1 | 0.1 | Temperature | +| K | 8 | 4 | MC samples for bridge target | +| term_grad_weight | 1.0 | 1.0 | Terminal gradient matching weight | +| ema_momentum | 0.995 | 0.995 | EMA for target network | +| lr_fb | 1e-3 | 1e-3 | Feedback net learning rate | + +## Implementation Notes +- **No hidden BP anchor**: Non-BP methods never use exact backprop through hidden layers. +- **Detached hidden copies**: All feedback/value net inputs use `detach().requires_grad_(True)`. +- **Block-local updates**: Each block's parameters updated only from its local forward + credit signal. +- **Output head**: Uses exact CE gradient with detached h_L. +- **Terminal gradient matching**: Matches grad_h V at terminal layer to grad_{h_L} CE. This is output-layer-local information, not hidden BP. +- **Credit bridge warmup**: First 20% epochs use DFA credits, then linearly blend to credit bridge credits. |
