faeval.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2026-03-26	Add Phase 10A: no prefit threshold — even random Vec blend beats DFA by +1.3%	YurenHao0426
	E_prefit=0 (random Vec) + blend(0.75): 32.4% vs DFA 31.1% (+1.3%) E_prefit=15: 32.3% (+1.2%) E_prefit=60: 32.5% (+1.4%) Frozen Gamma/rho near zero at all prefit levels. The Phase 9A success was NOT from Vec learning useful credit — it was from the blend mechanism itself providing regularization/diversification over pure DFA. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26	Add Phase 9B+9C: periodic refit fails, top-down curriculum neutral	YurenHao0426
	Phase 9B (periodic refit K=5 R=1 alpha=0.75): 14.0% — Vec starts random, periodic refits insufficient without offline pretraining. Phase 9C (top-down curriculum): last1_vec=30.8%, last2_vec=31.1% vs DFA=31.2%. Near-neutral. Cold-start problem persists even for single-block Vec. Only Phase 9A's offline prefit + blend handoff (+1.5%) works. The key ingredient is offline Vec training on frozen checkpoint features. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25	Full Phase 9A: blend(0.75) outperforms DFA by +1.5% across multiple t0	YurenHao0426
	Best configs (seed=42): - t0=5, blend_075 (75%Vec+25%DFA): 32.6% vs DFA 31.0% (+1.5%) - t0=10, blend_075: 32.5% vs 31.0% (+1.4%) - t0=1, blend_05: 31.9% vs 31.0% (+0.9%) Higher Vec fraction (0.75) consistently outperforms lower (0.25, 0.5) at t0>=5. Pure Vec handoff still fails at all checkpoints. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25	Add Phase 9A: checkpointed handoff — blend(Vec+DFA) outperforms pure DFA	YurenHao0426
	First positive online result: 50% blend of offline-fitted Vec + DFA gives 31.7% vs 31.1% for pure DFA (+0.55%). This is Case B: pure Vec handoff fails (-1.1%) but blend works because DFA stabilizes trajectory while Vec adds directional credit. Offline-fitted Vec at DFA epoch-5 checkpoint: Gamma=0.229, rho=0.262. Cold-start confirmed as main bottleneck — Vec IS useful on DFA trajectory features. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25	Add Phase 8: schedule timing test — online co-learning is the remaining ↵	YurenHao0426
	bottleneck Vec_only_from_0: 15.4% (cold-start failure, can't learn credit on random features) DFA_only: 31.2% (remains best non-BP method) DFA_then_Vec_T20: 12.9% (switching to Vec destroys DFA-built features) Vec_T5_then_DFA: 26.6% (partial recovery but still worse than pure DFA) Phase 7A's early-window finding doesn't transfer: it required offline-trained Vec on frozen features. Online Vec estimator faces cold-start paradox — needs structured features to learn credit, but structured features need good credit to form. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25	Add Phase 7A: snapshot time sweep shows early snapshots have positive ↵	YurenHao0426
	held-out transfer At epoch 5 (acc=49%), Vec_M4 5-step: dL_held=-0.005 (PUR=0.70) Oracle BP 5-step: dL_held=-0.009 (PUR=1.05) DFA 5-step: dL_held=+0.003 (always hurts held-out) By epoch 20, generalization window closes. Held-out failure is late-snapshot artifact. Better credit → lower update variance (Vec=0.8 vs DFA=40), not higher. Key implication: DFA warmup delays credit bridge past its useful window. Credit should be used from epoch 0, not after 20% warmup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25	Add Phase 6.5A: same-batch linesearch REVISES Phase 6A conclusion	YurenHao0426
	Phase 6A's "better credit → worse loss" was a protocol artifact caused by: 1. Credit normalization (inflated DFA, suppressed Vec magnitude ordering) 2. Held-out evaluation (measured generalization failure, not exploitability) 3. Gradient clamping With strict same-batch evaluation: - Oracle BP: dL_same = -0.406 (strongest descent) - Vec_M4: dL_same = -0.135 - ScalarCB: dL_same = -0.025 - DFA: dL_same = -0.003 Same-batch loss decrease is MONOTONIC with credit quality. But held-out loss INCREASES for all non-DFA methods (Case D: overfitting). The bottleneck is batch-level generalization, not surrogate exploitability. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24	Add Phase 6: snapshot exploitability reveals local update rule is the bottleneck	YurenHao0426
	Phase 6A: Better credit is ANTI-CORRELATED with loss decrease on fixed snapshot. DFA (Gamma=0.01) → dL=-0.0001 (only method that decreases loss) Vec_M4 (Gamma=0.38) → dL=+0.057 (increases loss most) Oracle BP (Gamma=1.0) → dL=+0.011 (still increases loss) Phase 6C: Target-shift rule reduces damage but cannot make non-DFA credits productive. The inner-product surrogate <F_l(h), a_l> is fundamentally mismatched with directional credit. Conclusion: Case B — the primary bottleneck is the local update paradigm itself, not the credit estimator quality or tracking/co-adaptation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24	Add Phase 5: vector field audit, frozen CIFAR transfer, online pilot	YurenHao0426
	Phase 5A: Audit passes — shuffle control collapses, gains are real Phase 5B: Transfer SUCCESS — vec_M4 beats scalar CB by +0.25 Gamma, +0.31 rho on frozen CIFAR Phase 5C: Online FAILURE — vec does worse than scalar CB online despite better frozen credit Core finding: bottleneck is in local surrogate / co-adaptation, not estimator quality Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24	Add Phase 4 diagnostic dissection: frozen credit recovery, online shallow ↵	YurenHao0426
	scan, vector field pilot Key findings: - Frozen CIFAR: estimators CAN recover credit (SB best, CB 20x > DFA) - Online shallow: cb_eT wr=0.2 tgw=1.0 achieves S1>0, S2 marginal - Vector credit field: 0.91-0.96 Gamma/rho on synthetic (vs 0.34 scalar CB) - Direct vector field avoids scalar V curvature problem entirely Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24	Add CIFAR deltaL test (failed) and pivot design memo	YurenHao0426
	- CIFAR deltaL: s=grad_hL CE (dim=512) -> acc=17.2%, Gamma≈0 Confirms scalar value field has dimensionality bottleneck on CIFAR - Pivot memo: direct vector credit field a_phi(h,t,s) -> R^d Trained with perturbation-based target, avoids curvature problem Still satisfies no hidden BP anchor constraint Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24	Add exploration visualization: CIFAR depth scan, boundary ablation, synth vs ↵	YurenHao0426
	CIFAR gap Three new plots: - cifar_depth_scan.png: acc/Gamma/rho vs depth for all methods - boundary_ablation.png: s_type, tgw, warmup ratio sweeps - synth_vs_cifar.png: dimensionality gap comparison (d=128 vs d=512) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24	Add Phase 3 boundary-condition ablation results and combined memo	YurenHao0426
	Key findings: - deltaL (output-layer gradient) gives best Gamma (0.562 vs 0.452 for eT) - Concatenating h_L to s destroys credit quality (value net cheats) - Terminal gradient matching is monotonically beneficial - Best config: deltaL + tgw=1.0 + wr=0.05 -> Gamma=0.768, rho=0.691 - CIFAR depth scan shows no Goldilocks regime (dimensionality bottleneck) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23	Add Phase 1 synthetic ladder results and memo	YurenHao0426
	Key finding: credit bridge advantage scales with nonlinearity. At alpha=1.0 (full tanh), CB > SB > DFA on both Gamma and rho at all depths. The crossover where CB surpasses SB happens around alpha=0.7-1.0. Full 4x4x3 grid complete with 3 seeds each. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23	Add Phase 2 explore experiments: synthetic nonlinearity ladder + CIFAR depth ↵	YurenHao0426
	scan - synth_nonlinearity_ladder.py: teacher-student with phi_alpha(z) = (1-a)z + a*tanh(z) Sweeps alpha x depth to find where state bridge / credit bridge fail - cifar_depth_scan.py: CIFAR-10 with L={2,4,6,8,12}, d={256,512} Finds Goldilocks regime for credit bridge vs DFA - plot_synth_ladder.py: phase diagram visualization Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23	Add sweep results confirming terminal gradient matching is essential	YurenHao0426
	12-config sweep: no hyperparameter combination recovers useful credit gradients without terminal gradient matching (best cos ~0.3 early, decays to ~0).
2026-03-23	Add final report, plots, experiment guide, and complete NOTE.md	YurenHao0426
	All experiments complete: - Toy LQ: credit bridge matches state bridge (~0.94 costate cosine) - CIFAR-10: credit bridge (29.6%) comparable to DFA (30.0%), both beat state bridge (18.5%) - State bridge confirms core hypothesis: perfect state prediction != useful credit - Terminal gradient matching is essential for credit bridge
2026-03-23	Add experiment notes and .gitignore	YurenHao0426
	Track experiment phases (debug/pilot/frozen), key findings, and design decisions.
2026-03-23	Sync state bridge: use normalized MSE target in both toy and CIFAR	YurenHao0426
	Reason: toy used raw MSE, CIFAR used normalized. They must be the same method for consistent reporting. Normalized MSE is more robust to varying h_L magnitudes.
2026-03-23	Initial implementation: all models, methods, toy and CIFAR experiments	YurenHao0426
	Debug phase. Toy LQ experiments (3 seeds) complete with terminal gradient matching. Credit bridge matches state bridge on linear system (~0.94 cosine). CIFAR experiments in progress.