diff options
| author | YurenHao0426 <Blackhao0426@gmail.com> | 2026-03-24 01:20:21 -0500 |
|---|---|---|
| committer | YurenHao0426 <Blackhao0426@gmail.com> | 2026-03-24 01:20:21 -0500 |
| commit | e0cbfefc64ac46b6b899ef95f3a90e52e5043390 (patch) | |
| tree | 4e668b71dc1ae6a845d9e82adb450d2630cc7d2b /NOTE.md | |
| parent | 13668ac1050fee1fa84067fa07c5eaab1a1bc939 (diff) | |
Add Phase 3 boundary-condition ablation results and combined memo
Key findings:
- deltaL (output-layer gradient) gives best Gamma (0.562 vs 0.452 for eT)
- Concatenating h_L to s destroys credit quality (value net cheats)
- Terminal gradient matching is monotonically beneficial
- Best config: deltaL + tgw=1.0 + wr=0.05 -> Gamma=0.768, rho=0.691
- CIFAR depth scan shows no Goldilocks regime (dimensionality bottleneck)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Diffstat (limited to 'NOTE.md')
| -rw-r--r-- | NOTE.md | 32 |
1 files changed, 31 insertions, 1 deletions
@@ -122,4 +122,34 @@ CIFAR is much harder -- rho signal is very weak for all non-BP methods. - `synth_ladder_smoke/`: Initial 3-alpha x 2-depth smoke test - `synth_ladder_v2_lo/`: Full alpha=0,0.25 x L=2,4,8,12 x 3 seeds - `synth_ladder_v2_hi/`: Full alpha=0.5,1.0 x L=2,4,8,12 x 3 seeds -- `cifar_depth_scan_s42/`: CIFAR L=2,4,6 x d=512 x seed=42 (in progress) +- `cifar_depth_scan_s42/`: CIFAR L=2,4,6,8,12 x d=512 x seed=42 (COMPLETE) +- `boundary_ablation_s_sweep/`: s_type in {eT, deltaL, eT_hL, deltaL_hL} +- `boundary_ablation_tgw_sweep/`: tgw in {0, 0.25, 1.0, 4.0} +- `boundary_ablation_wr_sweep/`: warmup ratio in {0, 0.05, 0.2, 0.5} +- `boundary_ablation_s123/`, `boundary_ablation_s456/`: s_type sweep with seeds 123, 456 +- `boundary_ablation_deltaL_wr/`: deltaL with warmup ratio sweep + +### Phase 3 Results: Boundary-Condition Ablation + +At alpha=1.0, L=4 (best synthetic regime), 3 seeds: + +**s_type (conditioning code):** +| Code | Gamma | rho | Acc | +|------|-------|-----|-----| +| eT (dim=10) | 0.452+/-0.042 | 0.509+/-0.033 | 0.523 | +| deltaL (dim=d) | **0.562+/-0.007** | **0.510+/-0.014** | 0.448 | +| eT+proj(h_L) | 0.002 | 0.016 | 0.559 | +| deltaL+proj(h_L) | 0.018 | 0.026 | 0.564 | + +**deltaL gives best Gamma. Concatenating h_L destroys credit quality (value net cheats).** + +**Terminal gradient matching weight:** +tgw=0 -> Gamma=0.12; tgw=1 -> Gamma=0.46; tgw=4 -> Gamma=0.57 (but acc drops). +Terminal gradient matching is monotonically beneficial for credit quality. + +**Warmup ratio:** +wr=0 -> best Gamma (0.68) but worst acc (0.46). +wr=0.5 -> worst Gamma (0.23) but best acc (0.66). +Clear tradeoff between credit quality and accuracy. + +Best single config: deltaL + tgw=1.0 + wr=0.05 -> **Gamma=0.768, rho=0.691** |
