|
held-out transfer
At epoch 5 (acc=49%), Vec_M4 5-step: dL_held=-0.005 (PUR=0.70)
Oracle BP 5-step: dL_held=-0.009 (PUR=1.05)
DFA 5-step: dL_held=+0.003 (always hurts held-out)
By epoch 20, generalization window closes. Held-out failure is late-snapshot artifact.
Better credit → lower update variance (Vec=0.8 vs DFA=40), not higher.
Key implication: DFA warmup delays credit bridge past its useful window.
Credit should be used from epoch 0, not after 20% warmup.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|