faeval.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2026-03-25	Add Phase 8: schedule timing test — online co-learning is the remaining ↵	YurenHao0426
	bottleneck Vec_only_from_0: 15.4% (cold-start failure, can't learn credit on random features) DFA_only: 31.2% (remains best non-BP method) DFA_then_Vec_T20: 12.9% (switching to Vec destroys DFA-built features) Vec_T5_then_DFA: 26.6% (partial recovery but still worse than pure DFA) Phase 7A's early-window finding doesn't transfer: it required offline-trained Vec on frozen features. Online Vec estimator faces cold-start paradox — needs structured features to learn credit, but structured features need good credit to form. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25	Add Phase 7A: snapshot time sweep shows early snapshots have positive ↵	YurenHao0426
	held-out transfer At epoch 5 (acc=49%), Vec_M4 5-step: dL_held=-0.005 (PUR=0.70) Oracle BP 5-step: dL_held=-0.009 (PUR=1.05) DFA 5-step: dL_held=+0.003 (always hurts held-out) By epoch 20, generalization window closes. Held-out failure is late-snapshot artifact. Better credit → lower update variance (Vec=0.8 vs DFA=40), not higher. Key implication: DFA warmup delays credit bridge past its useful window. Credit should be used from epoch 0, not after 20% warmup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25	Add Phase 6.5A: same-batch linesearch REVISES Phase 6A conclusion	YurenHao0426
	Phase 6A's "better credit → worse loss" was a protocol artifact caused by: 1. Credit normalization (inflated DFA, suppressed Vec magnitude ordering) 2. Held-out evaluation (measured generalization failure, not exploitability) 3. Gradient clamping With strict same-batch evaluation: - Oracle BP: dL_same = -0.406 (strongest descent) - Vec_M4: dL_same = -0.135 - ScalarCB: dL_same = -0.025 - DFA: dL_same = -0.003 Same-batch loss decrease is MONOTONIC with credit quality. But held-out loss INCREASES for all non-DFA methods (Case D: overfitting). The bottleneck is batch-level generalization, not surrogate exploitability. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24	Add Phase 6: snapshot exploitability reveals local update rule is the bottleneck	YurenHao0426
	Phase 6A: Better credit is ANTI-CORRELATED with loss decrease on fixed snapshot. DFA (Gamma=0.01) → dL=-0.0001 (only method that decreases loss) Vec_M4 (Gamma=0.38) → dL=+0.057 (increases loss most) Oracle BP (Gamma=1.0) → dL=+0.011 (still increases loss) Phase 6C: Target-shift rule reduces damage but cannot make non-DFA credits productive. The inner-product surrogate <F_l(h), a_l> is fundamentally mismatched with directional credit. Conclusion: Case B — the primary bottleneck is the local update paradigm itself, not the credit estimator quality or tracking/co-adaptation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24	Add Phase 5: vector field audit, frozen CIFAR transfer, online pilot	YurenHao0426
	Phase 5A: Audit passes — shuffle control collapses, gains are real Phase 5B: Transfer SUCCESS — vec_M4 beats scalar CB by +0.25 Gamma, +0.31 rho on frozen CIFAR Phase 5C: Online FAILURE — vec does worse than scalar CB online despite better frozen credit Core finding: bottleneck is in local surrogate / co-adaptation, not estimator quality Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24	Add Phase 4 diagnostic dissection: frozen credit recovery, online shallow ↵	YurenHao0426
	scan, vector field pilot Key findings: - Frozen CIFAR: estimators CAN recover credit (SB best, CB 20x > DFA) - Online shallow: cb_eT wr=0.2 tgw=1.0 achieves S1>0, S2 marginal - Vector credit field: 0.91-0.96 Gamma/rho on synthetic (vs 0.34 scalar CB) - Direct vector field avoids scalar V curvature problem entirely Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24	Add CIFAR deltaL test (failed) and pivot design memo	YurenHao0426
	- CIFAR deltaL: s=grad_hL CE (dim=512) -> acc=17.2%, Gamma≈0 Confirms scalar value field has dimensionality bottleneck on CIFAR - Pivot memo: direct vector credit field a_phi(h,t,s) -> R^d Trained with perturbation-based target, avoids curvature problem Still satisfies no hidden BP anchor constraint Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24	Add exploration visualization: CIFAR depth scan, boundary ablation, synth vs ↵	YurenHao0426
	CIFAR gap Three new plots: - cifar_depth_scan.png: acc/Gamma/rho vs depth for all methods - boundary_ablation.png: s_type, tgw, warmup ratio sweeps - synth_vs_cifar.png: dimensionality gap comparison (d=128 vs d=512) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-24	Add Phase 3 boundary-condition ablation results and combined memo	YurenHao0426
	Key findings: - deltaL (output-layer gradient) gives best Gamma (0.562 vs 0.452 for eT) - Concatenating h_L to s destroys credit quality (value net cheats) - Terminal gradient matching is monotonically beneficial - Best config: deltaL + tgw=1.0 + wr=0.05 -> Gamma=0.768, rho=0.691 - CIFAR depth scan shows no Goldilocks regime (dimensionality bottleneck) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23	Add Phase 2 explore experiments: synthetic nonlinearity ladder + CIFAR depth ↵	YurenHao0426
	scan - synth_nonlinearity_ladder.py: teacher-student with phi_alpha(z) = (1-a)z + a*tanh(z) Sweeps alpha x depth to find where state bridge / credit bridge fail - cifar_depth_scan.py: CIFAR-10 with L={2,4,6,8,12}, d={256,512} Finds Goldilocks regime for credit bridge vs DFA - plot_synth_ladder.py: phase diagram visualization Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-23	Add final report, plots, experiment guide, and complete NOTE.md	YurenHao0426
	All experiments complete: - Toy LQ: credit bridge matches state bridge (~0.94 costate cosine) - CIFAR-10: credit bridge (29.6%) comparable to DFA (30.0%), both beat state bridge (18.5%) - State bridge confirms core hypothesis: perfect state prediction != useful credit - Terminal gradient matching is essential for credit bridge
2026-03-23	Sync state bridge: use normalized MSE target in both toy and CIFAR	YurenHao0426
	Reason: toy used raw MSE, CIFAR used normalized. They must be the same method for consistent reporting. Normalized MSE is more robust to varying h_L magnitudes.
2026-03-23	Initial implementation: all models, methods, toy and CIFAR experiments	YurenHao0426
	Debug phase. Toy LQ experiments (3 seeds) complete with terminal gradient matching. Credit bridge matches state bridge on linear system (~0.94 cosine). CIFAR experiments in progress.