# Phase 8 Memo: Schedule Hypothesis Test **Date**: 2026-03-25 **Config**: CIFAR-10, L=4, d=256, 100 epochs, seed=42 ## Question Does high-quality credit need to be used from epoch 0 rather than after DFA warmup? ## Answer: NO. The timing hypothesis does not transfer from frozen to online. ### Results | Schedule | acc@5 | acc@20 | final acc | |----------|-------|--------|-----------| | DFA_only | **0.297** | **0.308** | **0.312** | | Vec_only_from_0 | 0.135 | 0.151 | 0.154 | | Vec_T5_then_DFA | 0.135 | 0.213 | 0.266 | | DFA_T20_then_Vec | 0.297 | 0.308 | 0.129 | ### Key findings 1. **Pure Vec from epoch 0 fails** (15.4%). The online vector field starts from random initialization alongside a random forward net. It cannot learn useful credit fast enough — Gamma drops from 0.047 (epoch 1) to 0.003 (epoch 10). 2. **DFA-then-Vec also fails** (12.9%). Switching to Vec at epoch 20 destroys the DFA-built features. 3. **DFA alone remains best** (31.2%). 4. **Vec_early_then_DFA partially recovers** (26.6%) but is still worse than pure DFA. ### Why Phase 7A's result doesn't transfer Phase 7A showed Vec credit works on early snapshots. But that was with a **converged offline-trained estimator** on frozen features. In online training: - The Vec estimator starts from scratch alongside the forward net - Both are random at epoch 0 — there's no useful credit to exploit - The perturbation-based training target needs coherent forward dynamics, which don't exist at initialization - By the time Vec learns anything, the features have already moved past the useful window ### Diagnosis This is **Case D** from the decision tree: all early-Vec schedules are worse than DFA. The fundamental problem is the **cold-start paradox**: Vec credit is most useful on early features, but the Vec estimator can only learn useful credit from features that already have some structure. DFA provides that structure (albeit slowly), but by the time Vec would be ready, the early window has closed. ### Implications The project has reached a principled boundary: - The vector credit estimator works (synthetic + frozen CIFAR confirmed) - The local surrogate can exploit it on same-batch (Phase 6.5A confirmed) - Early snapshots show held-out generalization (Phase 7A confirmed) - But online co-learning of estimator + forward net is the remaining unsolved problem Potential directions: 1. **Pre-train Vec on DFA-warmed features, then deploy** (hybrid warmup) 2. **Meta-learning the Vec initialization** to reduce cold-start time 3. **Accept DFA as the practical method** and position Vec as a diagnostic tool