From bbb1a36d67f2f0c83106c1e771ea2c2fcb7fd83a Mon Sep 17 00:00:00 2001
From: YurenHao0426 <Blackhao0426@gmail.com>
Date: Mon, 23 Mar 2026 18:23:29 -0500
Subject: Add experiment notes and .gitignore

Track experiment phases (debug/pilot/frozen), key findings, and design decisions.
---
 .gitignore |  4 ++++
 NOTE.md    | 71 +++++++++++++++++++++++++++++++++++++++++++++++++-------------
 2 files changed, 61 insertions(+), 14 deletions(-)
 create mode 100644 .gitignore

diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..00d9c79
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,4 @@
+__pycache__/
+*.pyc
+data/
+*.pt
diff --git a/NOTE.md b/NOTE.md
index 2e37841..2b42750 100644
--- a/NOTE.md
+++ b/NOTE.md
@@ -1,22 +1,65 @@
 # Experiment Notes
 
-## 2026-03-23: Initial Implementation and Experiments
+## Experiment Phases
+- **debug**: Initial implementation, rapid iteration. Code may change between runs.
+- **pilot**: Controlled iteration. Each change requires commit + rationale.
+- **frozen**: Code frozen at specific commit hash. Only formal multi-seed runs.
+
+## Current Phase: PILOT
+- Commit for toy frozen runs: `0b9ebb2` (state bridge synced to normalized MSE)
+- CIFAR runs started from commit `ce24e36` (CIFAR code unchanged by sync commit)
+
+---
+
+## 2026-03-23: Implementation and Experiments
 
 ### Setup
-- GPU: NVIDIA RTX A6000 x4 (using GPU 1)
+- GPU: NVIDIA RTX A6000 x4 (GPU 0 occupied, using GPUs 1-3)
 - PyTorch 2.10.0+cu128
-- All code written from scratch following CLAUDE.md specifications
 
-### Phase A: Toy LQ Sanity Check
-- Status: Running...
-- Config: d=64, m=10, L=12, sigma=0.03, 5000 steps, batch=256
-- Methods: DFA, State Bridge, Credit Bridge
+### Key Findings
+
+#### 1. Credit Bridge requires terminal gradient matching
+- **Without** terminal gradient matching: credit bridge costate cosine collapses to ~0.03 (no signal)
+- **With** terminal gradient matching: credit bridge achieves ~0.94 cosine (matches state bridge)
+- Terminal gradient uses only output-layer local info (not hidden BP) → allowed
+- This is the most important finding so far
+
+#### 2. Toy LQ Results (3 seeds, 8000 steps, commit 0b9ebb2)
+| Method | Costate Cosine | Perturbation ρ | Nudging |
+|--------|---------------|----------------|---------|
+| DFA | 0.003±0.001 | 0.010±0.012 | -0.001±0.000 |
+| State Bridge | 0.941±0.003 | 0.927±0.004 | -0.335±0.015 |
+| Credit Bridge | 0.942±0.002 | 0.929±0.003 | -0.334±0.015 |
+
+- Both State Bridge and Credit Bridge match closely on the linear system
+- DFA provides essentially no directional credit (random level)
+- Bridge residual decreases steadily during training
+- FM auxiliary provides marginal improvement (0.946 vs 0.940 cosine)
+
+#### 3. CIFAR-10 (in progress, 3 seeds on GPUs 1-3)
+- BP baseline: ~59% test accuracy (expected for flat MLP on CIFAR-10)
+- DFA: ~28% test accuracy at epoch 30 (struggling on deep network)
+- State Bridge: running
+- Credit Bridge: running with warmup (20% DFA warmup + linear blend)
+
+### Design Decisions
+1. **Terminal gradient matching** (term_grad_weight=1.0): Essential for credit bridge. The bridge consistency loss alone constrains V values but not gradients. Terminal gradient matching provides curvature info from output-layer-local computation.
+2. **DFA warmup for credit bridge**: Without warmup, the credit bridge collapses because value net can't learn useful credits while forward net is being updated with random signals.
+3. **Normalized MSE for state bridge**: `((pred - target) / max(||target||, 1.0))^2` for numerical stability on CIFAR where hidden states can have large norms.
+4. **Credit normalization**: All methods use `a_norm = a / (RMS(a) + 1e-6)` in local surrogate to control credit magnitude.
 
 ### Changes Log
-- Created full project structure: models/, methods/, experiments/, metrics/, configs/
-- models/residual_mlp.py: ResidualMLP with pre-LayerNorm residual blocks
-- models/value_net.py: ValueNet V_phi with sinusoidal time embedding
-- models/state_bridge.py: StateBridgeNet G_psi
-- experiments/toy_lq.py: Linear-quadratic sanity check
-- experiments/cifar_resmlp.py: CIFAR-10 main experiment
-- metrics/credit_metrics.py: All diagnostic metrics
+- `ce24e36`: Initial implementation with all models, methods, toy and CIFAR experiments
+- `0b9ebb2`: Sync state bridge to use normalized MSE in both toy and CIFAR (consistency fix)
+
+### Experiment IDs
+- `toy_lq_v1`: Original toy, no terminal gradient matching (for ablation)
+- `toy_lq_v2`: Toy with terminal gradient matching (primary)
+- `toy_lq_frozen`: Re-run of v2 with synced state bridge (for final report)
+- `cifar10_seed42/123/456`: Main CIFAR-10 experiments
+
+### Known Issues
+- DFA accuracy on CIFAR-10 is low (~28% at epoch 30). Expected for DFA on deep MLPs.
+- State bridge had astronomical prediction errors before normalization fix.
+- Credit bridge needs DFA warmup phase to bootstrap stable training.
-- 
cgit v1.2.3