| Age | Commit message (Collapse) | Author |
|
held-out transfer
At epoch 5 (acc=49%), Vec_M4 5-step: dL_held=-0.005 (PUR=0.70)
Oracle BP 5-step: dL_held=-0.009 (PUR=1.05)
DFA 5-step: dL_held=+0.003 (always hurts held-out)
By epoch 20, generalization window closes. Held-out failure is late-snapshot artifact.
Better credit → lower update variance (Vec=0.8 vs DFA=40), not higher.
Key implication: DFA warmup delays credit bridge past its useful window.
Credit should be used from epoch 0, not after 20% warmup.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|
Phase 6A's "better credit → worse loss" was a protocol artifact caused by:
1. Credit normalization (inflated DFA, suppressed Vec magnitude ordering)
2. Held-out evaluation (measured generalization failure, not exploitability)
3. Gradient clamping
With strict same-batch evaluation:
- Oracle BP: dL_same = -0.406 (strongest descent)
- Vec_M4: dL_same = -0.135
- ScalarCB: dL_same = -0.025
- DFA: dL_same = -0.003
Same-batch loss decrease is MONOTONIC with credit quality.
But held-out loss INCREASES for all non-DFA methods (Case D: overfitting).
The bottleneck is batch-level generalization, not surrogate exploitability.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|
Phase 6A: Better credit is ANTI-CORRELATED with loss decrease on fixed snapshot.
DFA (Gamma=0.01) → dL=-0.0001 (only method that decreases loss)
Vec_M4 (Gamma=0.38) → dL=+0.057 (increases loss most)
Oracle BP (Gamma=1.0) → dL=+0.011 (still increases loss)
Phase 6C: Target-shift rule reduces damage but cannot make non-DFA credits productive.
The inner-product surrogate <F_l(h), a_l> is fundamentally mismatched with directional credit.
Conclusion: Case B — the primary bottleneck is the local update paradigm itself,
not the credit estimator quality or tracking/co-adaptation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|
Phase 5A: Audit passes — shuffle control collapses, gains are real
Phase 5B: Transfer SUCCESS — vec_M4 beats scalar CB by +0.25 Gamma, +0.31 rho on frozen CIFAR
Phase 5C: Online FAILURE — vec does worse than scalar CB online despite better frozen credit
Core finding: bottleneck is in local surrogate / co-adaptation, not estimator quality
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|
scan, vector field pilot
Key findings:
- Frozen CIFAR: estimators CAN recover credit (SB best, CB 20x > DFA)
- Online shallow: cb_eT wr=0.2 tgw=1.0 achieves S1>0, S2 marginal
- Vector credit field: 0.91-0.96 Gamma/rho on synthetic (vs 0.34 scalar CB)
- Direct vector field avoids scalar V curvature problem entirely
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|
- CIFAR deltaL: s=grad_hL CE (dim=512) -> acc=17.2%, Gamma≈0
Confirms scalar value field has dimensionality bottleneck on CIFAR
- Pivot memo: direct vector credit field a_phi(h,t,s) -> R^d
Trained with perturbation-based target, avoids curvature problem
Still satisfies no hidden BP anchor constraint
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|
Key findings:
- deltaL (output-layer gradient) gives best Gamma (0.562 vs 0.452 for eT)
- Concatenating h_L to s destroys credit quality (value net cheats)
- Terminal gradient matching is monotonically beneficial
- Best config: deltaL + tgw=1.0 + wr=0.05 -> Gamma=0.768, rho=0.691
- CIFAR depth scan shows no Goldilocks regime (dimensionality bottleneck)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|
Key finding: credit bridge advantage scales with nonlinearity.
At alpha=1.0 (full tanh), CB > SB > DFA on both Gamma and rho at all depths.
The crossover where CB surpasses SB happens around alpha=0.7-1.0.
Full 4x4x3 grid complete with 3 seeds each.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|
All experiments complete:
- Toy LQ: credit bridge matches state bridge (~0.94 costate cosine)
- CIFAR-10: credit bridge (29.6%) comparable to DFA (30.0%), both beat state bridge (18.5%)
- State bridge confirms core hypothesis: perfect state prediction != useful credit
- Terminal gradient matching is essential for credit bridge
|
|
Track experiment phases (debug/pilot/frozen), key findings, and design decisions.
|
|
Debug phase. Toy LQ experiments (3 seeds) complete with terminal gradient matching.
Credit bridge matches state bridge on linear system (~0.94 cosine).
CIFAR experiments in progress.
|