summaryrefslogtreecommitdiff
path: root/NOTE.md
diff options
context:
space:
mode:
authorYurenHao0426 <Blackhao0426@gmail.com>2026-03-24 01:20:21 -0500
committerYurenHao0426 <Blackhao0426@gmail.com>2026-03-24 01:20:21 -0500
commite0cbfefc64ac46b6b899ef95f3a90e52e5043390 (patch)
tree4e668b71dc1ae6a845d9e82adb450d2630cc7d2b /NOTE.md
parent13668ac1050fee1fa84067fa07c5eaab1a1bc939 (diff)
Add Phase 3 boundary-condition ablation results and combined memo
Key findings: - deltaL (output-layer gradient) gives best Gamma (0.562 vs 0.452 for eT) - Concatenating h_L to s destroys credit quality (value net cheats) - Terminal gradient matching is monotonically beneficial - Best config: deltaL + tgw=1.0 + wr=0.05 -> Gamma=0.768, rho=0.691 - CIFAR depth scan shows no Goldilocks regime (dimensionality bottleneck) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Diffstat (limited to 'NOTE.md')
-rw-r--r--NOTE.md32
1 files changed, 31 insertions, 1 deletions
diff --git a/NOTE.md b/NOTE.md
index 53d9dc3..8d6091d 100644
--- a/NOTE.md
+++ b/NOTE.md
@@ -122,4 +122,34 @@ CIFAR is much harder -- rho signal is very weak for all non-BP methods.
- `synth_ladder_smoke/`: Initial 3-alpha x 2-depth smoke test
- `synth_ladder_v2_lo/`: Full alpha=0,0.25 x L=2,4,8,12 x 3 seeds
- `synth_ladder_v2_hi/`: Full alpha=0.5,1.0 x L=2,4,8,12 x 3 seeds
-- `cifar_depth_scan_s42/`: CIFAR L=2,4,6 x d=512 x seed=42 (in progress)
+- `cifar_depth_scan_s42/`: CIFAR L=2,4,6,8,12 x d=512 x seed=42 (COMPLETE)
+- `boundary_ablation_s_sweep/`: s_type in {eT, deltaL, eT_hL, deltaL_hL}
+- `boundary_ablation_tgw_sweep/`: tgw in {0, 0.25, 1.0, 4.0}
+- `boundary_ablation_wr_sweep/`: warmup ratio in {0, 0.05, 0.2, 0.5}
+- `boundary_ablation_s123/`, `boundary_ablation_s456/`: s_type sweep with seeds 123, 456
+- `boundary_ablation_deltaL_wr/`: deltaL with warmup ratio sweep
+
+### Phase 3 Results: Boundary-Condition Ablation
+
+At alpha=1.0, L=4 (best synthetic regime), 3 seeds:
+
+**s_type (conditioning code):**
+| Code | Gamma | rho | Acc |
+|------|-------|-----|-----|
+| eT (dim=10) | 0.452+/-0.042 | 0.509+/-0.033 | 0.523 |
+| deltaL (dim=d) | **0.562+/-0.007** | **0.510+/-0.014** | 0.448 |
+| eT+proj(h_L) | 0.002 | 0.016 | 0.559 |
+| deltaL+proj(h_L) | 0.018 | 0.026 | 0.564 |
+
+**deltaL gives best Gamma. Concatenating h_L destroys credit quality (value net cheats).**
+
+**Terminal gradient matching weight:**
+tgw=0 -> Gamma=0.12; tgw=1 -> Gamma=0.46; tgw=4 -> Gamma=0.57 (but acc drops).
+Terminal gradient matching is monotonically beneficial for credit quality.
+
+**Warmup ratio:**
+wr=0 -> best Gamma (0.68) but worst acc (0.46).
+wr=0.5 -> worst Gamma (0.23) but best acc (0.66).
+Clear tradeoff between credit quality and accuracy.
+
+Best single config: deltaL + tgw=1.0 + wr=0.05 -> **Gamma=0.768, rho=0.691**