From bbb1a36d67f2f0c83106c1e771ea2c2fcb7fd83a Mon Sep 17 00:00:00 2001 From: YurenHao0426 Date: Mon, 23 Mar 2026 18:23:29 -0500 Subject: Add experiment notes and .gitignore Track experiment phases (debug/pilot/frozen), key findings, and design decisions. --- .gitignore | 4 ++++ NOTE.md | 71 +++++++++++++++++++++++++++++++++++++++++++++++++------------- 2 files changed, 61 insertions(+), 14 deletions(-) create mode 100644 .gitignore diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..00d9c79 --- /dev/null +++ b/.gitignore @@ -0,0 +1,4 @@ +__pycache__/ +*.pyc +data/ +*.pt diff --git a/NOTE.md b/NOTE.md index 2e37841..2b42750 100644 --- a/NOTE.md +++ b/NOTE.md @@ -1,22 +1,65 @@ # Experiment Notes -## 2026-03-23: Initial Implementation and Experiments +## Experiment Phases +- **debug**: Initial implementation, rapid iteration. Code may change between runs. +- **pilot**: Controlled iteration. Each change requires commit + rationale. +- **frozen**: Code frozen at specific commit hash. Only formal multi-seed runs. + +## Current Phase: PILOT +- Commit for toy frozen runs: `0b9ebb2` (state bridge synced to normalized MSE) +- CIFAR runs started from commit `ce24e36` (CIFAR code unchanged by sync commit) + +--- + +## 2026-03-23: Implementation and Experiments ### Setup -- GPU: NVIDIA RTX A6000 x4 (using GPU 1) +- GPU: NVIDIA RTX A6000 x4 (GPU 0 occupied, using GPUs 1-3) - PyTorch 2.10.0+cu128 -- All code written from scratch following CLAUDE.md specifications -### Phase A: Toy LQ Sanity Check -- Status: Running... -- Config: d=64, m=10, L=12, sigma=0.03, 5000 steps, batch=256 -- Methods: DFA, State Bridge, Credit Bridge +### Key Findings + +#### 1. Credit Bridge requires terminal gradient matching +- **Without** terminal gradient matching: credit bridge costate cosine collapses to ~0.03 (no signal) +- **With** terminal gradient matching: credit bridge achieves ~0.94 cosine (matches state bridge) +- Terminal gradient uses only output-layer local info (not hidden BP) → allowed +- This is the most important finding so far + +#### 2. Toy LQ Results (3 seeds, 8000 steps, commit 0b9ebb2) +| Method | Costate Cosine | Perturbation ρ | Nudging | +|--------|---------------|----------------|---------| +| DFA | 0.003±0.001 | 0.010±0.012 | -0.001±0.000 | +| State Bridge | 0.941±0.003 | 0.927±0.004 | -0.335±0.015 | +| Credit Bridge | 0.942±0.002 | 0.929±0.003 | -0.334±0.015 | + +- Both State Bridge and Credit Bridge match closely on the linear system +- DFA provides essentially no directional credit (random level) +- Bridge residual decreases steadily during training +- FM auxiliary provides marginal improvement (0.946 vs 0.940 cosine) + +#### 3. CIFAR-10 (in progress, 3 seeds on GPUs 1-3) +- BP baseline: ~59% test accuracy (expected for flat MLP on CIFAR-10) +- DFA: ~28% test accuracy at epoch 30 (struggling on deep network) +- State Bridge: running +- Credit Bridge: running with warmup (20% DFA warmup + linear blend) + +### Design Decisions +1. **Terminal gradient matching** (term_grad_weight=1.0): Essential for credit bridge. The bridge consistency loss alone constrains V values but not gradients. Terminal gradient matching provides curvature info from output-layer-local computation. +2. **DFA warmup for credit bridge**: Without warmup, the credit bridge collapses because value net can't learn useful credits while forward net is being updated with random signals. +3. **Normalized MSE for state bridge**: `((pred - target) / max(||target||, 1.0))^2` for numerical stability on CIFAR where hidden states can have large norms. +4. **Credit normalization**: All methods use `a_norm = a / (RMS(a) + 1e-6)` in local surrogate to control credit magnitude. ### Changes Log -- Created full project structure: models/, methods/, experiments/, metrics/, configs/ -- models/residual_mlp.py: ResidualMLP with pre-LayerNorm residual blocks -- models/value_net.py: ValueNet V_phi with sinusoidal time embedding -- models/state_bridge.py: StateBridgeNet G_psi -- experiments/toy_lq.py: Linear-quadratic sanity check -- experiments/cifar_resmlp.py: CIFAR-10 main experiment -- metrics/credit_metrics.py: All diagnostic metrics +- `ce24e36`: Initial implementation with all models, methods, toy and CIFAR experiments +- `0b9ebb2`: Sync state bridge to use normalized MSE in both toy and CIFAR (consistency fix) + +### Experiment IDs +- `toy_lq_v1`: Original toy, no terminal gradient matching (for ablation) +- `toy_lq_v2`: Toy with terminal gradient matching (primary) +- `toy_lq_frozen`: Re-run of v2 with synced state bridge (for final report) +- `cifar10_seed42/123/456`: Main CIFAR-10 experiments + +### Known Issues +- DFA accuracy on CIFAR-10 is low (~28% at epoch 30). Expected for DFA on deep MLPs. +- State bridge had astronomical prediction errors before normalization fix. +- Credit bridge needs DFA warmup phase to bootstrap stable training. -- cgit v1.2.3