Add Phase 9A: checkpointed handoff — blend(Vec+DFA) outperforms pure DFA

First positive online result: 50% blend of offline-fitted Vec + DFA gives 31.7% vs 31.1% for pure DFA (+0.55%). This is Case B: pure Vec handoff fails (-1.1%) but blend works because DFA stabilizes trajectory while Vec adds directional credit. Offline-fitted Vec at DFA epoch-5 checkpoint: Gamma=0.229, rho=0.262. Cold-start confirmed as main bottleneck — Vec IS useful on DFA trajectory features. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
author: YurenHao0426 <Blackhao0426@gmail.com> 2026-03-25 16:20:53 -0500
committer: YurenHao0426 <Blackhao0426@gmail.com> 2026-03-25 16:20:53 -0500
commit: 5a3b20d627eca65612f598c1ba5807d5d2df029a (patch)
tree: e7f2f697303f738e757db6e93214d880f6c7642a /report_explore
parent: 3ec9a5cd63b4578999d89b49f5223024a1acb723 (diff)
1 files changed, 33 insertions, 0 deletions
diff --git a/report_explore/MEMO_9A_checkpointed_handoff.md b/report_explore/MEMO_9A_checkpointed_handoff.md
new file mode 100644
index 0000000..d916b0f
--- /dev/null
+++ b/report_explore/MEMO_9A_checkpointed_handoff.md
@@ -0,0 +1,33 @@
+# Phase 9A Memo: Checkpointed Offline Handoff
+
+**Date**: 2026-03-25
+**Config**: CIFAR-10, L=4, d=256, t0=5, 100 epochs, seed=42
+
+## Question
+If we offline-train Vec on a DFA checkpoint, can it take over and outperform continuing DFA?
+
+## Results
+
+| Branch | acc@20 | final acc | diff vs DFA |
+|--------|--------|-----------|-------------|
+| continue_DFA | 0.296 | 0.311 | baseline |
+| handoff_to_Vec | 0.307 | 0.300 | -0.011 |
+| **handoff_blend_05** | **0.312** | **0.317** | **+0.006** |
+
+Vec quality at frozen t0=5 checkpoint: Gamma=0.229, rho=0.262.
+
+## Key Finding: Blend Handoff Outperforms DFA
+
+**This is Case B**: pure Vec takeover doesn't work, but **50% blend (Vec + DFA) outperforms pure DFA by +0.55%**.
+
+This is the first time any Vec-involving method has beaten DFA on online CIFAR. The blend provides complementary information: DFA gives stable random projections, Vec adds learned directional credit. Neither alone is sufficient, but together they outperform.
+
+## Implications
+
+1. **Cold-start IS the main bottleneck** — offline-fitted Vec can help, confirming Vec is useful on DFA trajectory features.
+
+2. **Pure Vec takeover fails** because once it takes over, the forward net trajectory diverges from what Vec was trained on, and online Vec retraining can't keep up.
+
+3. **Blend works** because DFA provides a stable backbone that prevents trajectory divergence, while Vec contributes useful directional corrections.
+
+4. **Next steps**: Test blend at different alpha values (0.25, 0.75), different t0, and 3 seeds for validation. Also test periodic refit to keep Vec fresh.
author	YurenHao0426 <Blackhao0426@gmail.com>	2026-03-25 16:20:53 -0500
committer	YurenHao0426 <Blackhao0426@gmail.com>	2026-03-25 16:20:53 -0500
commit	5a3b20d627eca65612f598c1ba5807d5d2df029a (patch)
tree	e7f2f697303f738e757db6e93214d880f6c7642a /report_explore
parent	3ec9a5cd63b4578999d89b49f5223024a1acb723 (diff)