# Experiment Guide ## Requirements - Python 3.10+ - PyTorch 2.x with CUDA - torchvision, numpy, scipy, matplotlib ## Project Structure ``` models/ residual_mlp.py - Deep residual MLP (pre-LayerNorm + GELU blocks) value_net.py - Scalar value network V_phi for credit bridge state_bridge.py - State predictor G_psi for state bridge experiments/ toy_lq_v2.py - Phase A: Linear-quadratic sanity check cifar_resmlp.py - Phase B: CIFAR-10 main experiment plot_toy_final.py - Generate toy plots plot_cifar_final.py - Generate CIFAR plots metrics/ credit_metrics.py - Diagnostic metrics (cosine, rho, nudging, etc.) configs/ - YAML configs report/ - Plots and final report results/ - Experiment outputs ``` ## Running Experiments ### Phase A: Toy LQ Sanity Check ```bash # Single seed CUDA_VISIBLE_DEVICES=0 python experiments/toy_lq_v2.py \ --gpu 0 --seed 42 --num_steps 8000 \ --sigma_bridge 0.1 --lam 0.1 \ --term_grad_weight 1.0 --fm_weight 0.0 \ --output_dir results/toy_lq_frozen # All 3 seeds for seed in 42 123 456; do CUDA_VISIBLE_DEVICES=0 python experiments/toy_lq_v2.py \ --gpu 0 --seed $seed --num_steps 8000 \ --sigma_bridge 0.1 --lam 0.1 \ --term_grad_weight 1.0 --fm_weight 0.0 \ --output_dir results/toy_lq_frozen done ``` ### Phase B: CIFAR-10 Main Experiment ```bash # Single seed (runs BP, DFA, State Bridge, Credit Bridge sequentially) CUDA_VISIBLE_DEVICES=0 python experiments/cifar_resmlp.py \ --dataset cifar10 --d_hidden 512 --num_blocks 12 \ --epochs 100 --seeds 42 --gpu 0 \ --output_dir results/cifar10 # Parallel across GPUs CUDA_VISIBLE_DEVICES=0 python experiments/cifar_resmlp.py --seeds 42 --output_dir results/cifar10 --gpu 0 & CUDA_VISIBLE_DEVICES=1 python experiments/cifar_resmlp.py --seeds 123 --output_dir results/cifar10_seed123 --gpu 0 & CUDA_VISIBLE_DEVICES=2 python experiments/cifar_resmlp.py --seeds 456 --output_dir results/cifar10_seed456 --gpu 0 & wait ``` ### Generate Plots ```bash python experiments/plot_toy_final.py python experiments/plot_cifar_final.py ``` ## Key Parameters | Parameter | Toy LQ | CIFAR-10 | Description | |-----------|--------|----------|-------------| | d_hidden | 64 | 512 | Hidden dimension | | num_layers/blocks | 12 | 12 | Depth | | sigma_bridge | 0.1 | 0.05 | Bridge noise std | | lam | 0.1 | 0.1 | Temperature | | K | 8 | 4 | MC samples for bridge target | | term_grad_weight | 1.0 | 1.0 | Terminal gradient matching weight | | ema_momentum | 0.995 | 0.995 | EMA for target network | | lr_fb | 1e-3 | 1e-3 | Feedback net learning rate | ## Implementation Notes - **No hidden BP anchor**: Non-BP methods never use exact backprop through hidden layers. - **Detached hidden copies**: All feedback/value net inputs use `detach().requires_grad_(True)`. - **Block-local updates**: Each block's parameters updated only from its local forward + credit signal. - **Output head**: Uses exact CE gradient with detached h_L. - **Terminal gradient matching**: Matches grad_h V at terminal layer to grad_{h_L} CE. This is output-layer-local information, not hidden BP. - **Credit bridge warmup**: First 20% epochs use DFA credits, then linearly blend to credit bridge credits.