diff options
Diffstat (limited to 'README_experiments.md')
| -rw-r--r-- | README_experiments.md | 89 |
1 files changed, 89 insertions, 0 deletions
diff --git a/README_experiments.md b/README_experiments.md new file mode 100644 index 0000000..0fa60cd --- /dev/null +++ b/README_experiments.md @@ -0,0 +1,89 @@ +# Experiment Guide + +## Requirements +- Python 3.10+ +- PyTorch 2.x with CUDA +- torchvision, numpy, scipy, matplotlib + +## Project Structure +``` +models/ + residual_mlp.py - Deep residual MLP (pre-LayerNorm + GELU blocks) + value_net.py - Scalar value network V_phi for credit bridge + state_bridge.py - State predictor G_psi for state bridge + +experiments/ + toy_lq_v2.py - Phase A: Linear-quadratic sanity check + cifar_resmlp.py - Phase B: CIFAR-10 main experiment + plot_toy_final.py - Generate toy plots + plot_cifar_final.py - Generate CIFAR plots + +metrics/ + credit_metrics.py - Diagnostic metrics (cosine, rho, nudging, etc.) + +configs/ - YAML configs +report/ - Plots and final report +results/ - Experiment outputs +``` + +## Running Experiments + +### Phase A: Toy LQ Sanity Check +```bash +# Single seed +CUDA_VISIBLE_DEVICES=0 python experiments/toy_lq_v2.py \ + --gpu 0 --seed 42 --num_steps 8000 \ + --sigma_bridge 0.1 --lam 0.1 \ + --term_grad_weight 1.0 --fm_weight 0.0 \ + --output_dir results/toy_lq_frozen + +# All 3 seeds +for seed in 42 123 456; do + CUDA_VISIBLE_DEVICES=0 python experiments/toy_lq_v2.py \ + --gpu 0 --seed $seed --num_steps 8000 \ + --sigma_bridge 0.1 --lam 0.1 \ + --term_grad_weight 1.0 --fm_weight 0.0 \ + --output_dir results/toy_lq_frozen +done +``` + +### Phase B: CIFAR-10 Main Experiment +```bash +# Single seed (runs BP, DFA, State Bridge, Credit Bridge sequentially) +CUDA_VISIBLE_DEVICES=0 python experiments/cifar_resmlp.py \ + --dataset cifar10 --d_hidden 512 --num_blocks 12 \ + --epochs 100 --seeds 42 --gpu 0 \ + --output_dir results/cifar10 + +# Parallel across GPUs +CUDA_VISIBLE_DEVICES=0 python experiments/cifar_resmlp.py --seeds 42 --output_dir results/cifar10 --gpu 0 & +CUDA_VISIBLE_DEVICES=1 python experiments/cifar_resmlp.py --seeds 123 --output_dir results/cifar10_seed123 --gpu 0 & +CUDA_VISIBLE_DEVICES=2 python experiments/cifar_resmlp.py --seeds 456 --output_dir results/cifar10_seed456 --gpu 0 & +wait +``` + +### Generate Plots +```bash +python experiments/plot_toy_final.py +python experiments/plot_cifar_final.py +``` + +## Key Parameters +| Parameter | Toy LQ | CIFAR-10 | Description | +|-----------|--------|----------|-------------| +| d_hidden | 64 | 512 | Hidden dimension | +| num_layers/blocks | 12 | 12 | Depth | +| sigma_bridge | 0.1 | 0.05 | Bridge noise std | +| lam | 0.1 | 0.1 | Temperature | +| K | 8 | 4 | MC samples for bridge target | +| term_grad_weight | 1.0 | 1.0 | Terminal gradient matching weight | +| ema_momentum | 0.995 | 0.995 | EMA for target network | +| lr_fb | 1e-3 | 1e-3 | Feedback net learning rate | + +## Implementation Notes +- **No hidden BP anchor**: Non-BP methods never use exact backprop through hidden layers. +- **Detached hidden copies**: All feedback/value net inputs use `detach().requires_grad_(True)`. +- **Block-local updates**: Each block's parameters updated only from its local forward + credit signal. +- **Output head**: Uses exact CE gradient with detached h_L. +- **Terminal gradient matching**: Matches grad_h V at terminal layer to grad_{h_L} CE. This is output-layer-local information, not hidden BP. +- **Credit bridge warmup**: First 20% epochs use DFA credits, then linearly blend to credit bridge credits. |
