diff options
| author | YurenHao0426 <Blackhao0426@gmail.com> | 2026-03-26 16:27:53 -0500 |
|---|---|---|
| committer | YurenHao0426 <Blackhao0426@gmail.com> | 2026-03-26 16:27:53 -0500 |
| commit | 610e1169e19378cccd2d9b92a588c24dca7f3df7 (patch) | |
| tree | 532f8dc2fda6c68ab1409b20d7431b76d8d6f378 /report_explore | |
| parent | ef4aed70130e2212b4ed1cb7212e2ea6c7c7adb2 (diff) | |
Add Phase 10A.5: blend gain is implicit regularization, not learned credit
Dissection of 6 branches from same DFA checkpoint:
- blend_random_frozen: 12.6% (CATASTROPHIC — frozen noise destroys training)
- blend_random_trainable: 32.2% (+1.2% — trainable network helps)
- blend_shuffled_trainable: 32.5% (+1.4% — even wrong targets work!)
- blend_gaussian_noise: 30.8% (neutral)
- scaled_DFA_norm_match: 31.0% (neutral)
The gain comes from implicit regularization through a co-optimized auxiliary
network, NOT from learned credit quality. Phase 9A's +1.5% was an optimization
dynamics effect, not evidence of useful credit assignment.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Diffstat (limited to 'report_explore')
| -rw-r--r-- | report_explore/MEMO_10A5_blend_dissection.md | 38 |
1 files changed, 38 insertions, 0 deletions
diff --git a/report_explore/MEMO_10A5_blend_dissection.md b/report_explore/MEMO_10A5_blend_dissection.md new file mode 100644 index 0000000..3634cba --- /dev/null +++ b/report_explore/MEMO_10A5_blend_dissection.md @@ -0,0 +1,38 @@ +# Phase 10A.5 Memo: Blend Mechanism Dissection + +**Date**: 2026-03-26 + +## Question +Phase 10A's gain from blend(random Vec, DFA) — is it learned correction or blend mechanism? + +## Answer: Neither. It's implicit regularization from a trainable auxiliary network. + +| Branch | final acc | diff vs DFA | +|--------|-----------|-------------| +| continue_DFA | 0.311 | baseline | +| blend_random_**frozen** | **0.126** | **-18.5%** | +| blend_random_**trainable** | 0.322 | +1.2% | +| blend_shuffled_trainable | 0.325 | +1.4% | +| blend_gaussian_noise | 0.308 | -0.3% | +| scaled_DFA_norm_match | 0.310 | -0.0% | + +## Key findings + +1. **Frozen random Vec destroys training** (12.6%). A fixed random direction blended at 75% is catastrophic. This rules out "any signal diversification helps." + +2. **Trainable Vec helps** even from random init (+1.2%), even with shuffled targets (+1.4%). The Vec network doesn't need to learn correct credit — it just needs to be trainable. + +3. **Gaussian noise and norm scaling are neutral.** The mechanism is NOT noise injection or step-size calibration. + +4. **Gamma/rho stay near zero** for all trainable branches throughout training. The Vec never learns semantically correct credit. + +## Mechanism + +The gain comes from **implicit regularization through a co-optimized auxiliary network**. The Vec network, even with wrong training targets, adjusts its outputs during training in a way that smoothly regularizes the block-local updates. This is analogous to how auxiliary tasks in multi-task learning can improve main task performance even when the auxiliary task is unrelated — the shared optimization dynamics provide implicit regularization. + +## Implications + +1. **The Phase 9A narrative was wrong**: the +1.5% was NOT from Vec learning useful credit +2. **The credit bridge hypothesis is not validated by online results**: the gain has nothing to do with credit quality +3. **The gain is real but has a different cause**: it's an optimization dynamics phenomenon, not a credit assignment phenomenon +4. **This does NOT invalidate the frozen CIFAR results**: Vec truly learns better credit on frozen features. But that credit quality doesn't transfer to online improvement — the online improvement comes from a different mechanism entirely. |
