faeval.git/report_explore, branch master

Depth-utility ladder: trainable-block sweep (BP/FA/DFA) on ResMLP CIFAR-10

2026-06-15T01:32:31+00:00

Appendix experiment triangulating the depth-utility diagnostic (D3) by varying
the number of trainable residual blocks k (last-k trainable, first L-k frozen at
init; embed/LN/head always trained).

- d=256 L=4 and d=512 L=2, 3 seeds, recipe identical to the main audit.
- BP climbs monotonically (+22-23pp); DFA peaks at the frozen baseline (k=0) and
  declines once any deep block is trained; FA shows partial/no net depth utility.
- Cross-checks reproduce existing anchors (BP 0.617, DFA 0.301, FA 0.402, frozen 0.349).
- frozen_init_identity_check quantifies frozen stack as a near-norm-preserving
  random feature map (per-block ||f||/||h||~0.10, stack cos 0.981), explaining the
  above-chance k=0 rung.

Co-Authored-By: Claude Opus 4.8 (1M context)

Add Phase 10A.6: gain requires trainable depth-aware aux, not semantic credit

2026-03-27T03:07:35+00:00

9-branch dissection results:
- zero_target crashes (-9.1%): aux must output non-zero
- constant_input neutral (+0.0%): needs at least depth info
- time_only works (+1.0%): h_l not needed, just depth index
- shuffled/fresh_random work (+1.3-1.4%): no semantic content needed
- prefit60_trainable ≈ random_trainable: prefit adds nothing
- All frozen branches crash: trainability is essential

Mechanism: depth-aware trainable auxiliary perturbation that diversifies
block-local updates. Not semantic credit, not pure trainability.

Co-Authored-By: Claude Opus 4.6 (1M context)

Add Phase 10A.5: blend gain is implicit regularization, not learned credit

2026-03-26T21:27:53+00:00

Dissection of 6 branches from same DFA checkpoint:
- blend_random_frozen: 12.6% (CATASTROPHIC — frozen noise destroys training)
- blend_random_trainable: 32.2% (+1.2% — trainable network helps)
- blend_shuffled_trainable: 32.5% (+1.4% — even wrong targets work!)
- blend_gaussian_noise: 30.8% (neutral)
- scaled_DFA_norm_match: 31.0% (neutral)

The gain comes from implicit regularization through a co-optimized auxiliary
network, NOT from learned credit quality. Phase 9A's +1.5% was an optimization
dynamics effect, not evidence of useful credit assignment.

Co-Authored-By: Claude Opus 4.6 (1M context)

Add Phase 10A: no prefit threshold — even random Vec blend beats DFA by +1.3%

2026-03-26T13:37:39+00:00

E_prefit=0 (random Vec) + blend(0.75): 32.4% vs DFA 31.1% (+1.3%)
E_prefit=15: 32.3% (+1.2%)
E_prefit=60: 32.5% (+1.4%)

Frozen Gamma/rho near zero at all prefit levels. The Phase 9A success was NOT
from Vec learning useful credit — it was from the blend mechanism itself providing
regularization/diversification over pure DFA.

Co-Authored-By: Claude Opus 4.6 (1M context)

Add Phase 9A: checkpointed handoff — blend(Vec+DFA) outperforms pure DFA

2026-03-25T21:20:53+00:00

First positive online result: 50% blend of offline-fitted Vec + DFA gives 31.7%
vs 31.1% for pure DFA (+0.55%). This is Case B: pure Vec handoff fails (-1.1%)
but blend works because DFA stabilizes trajectory while Vec adds directional credit.

Offline-fitted Vec at DFA epoch-5 checkpoint: Gamma=0.229, rho=0.262.
Cold-start confirmed as main bottleneck — Vec IS useful on DFA trajectory features.

Co-Authored-By: Claude Opus 4.6 (1M context)

Add Phase 8: schedule timing test — online co-learning is the remaining bottleneck

2026-03-25T19:23:13+00:00

Vec_only_from_0: 15.4% (cold-start failure, can't learn credit on random features)
DFA_only: 31.2% (remains best non-BP method)
DFA_then_Vec_T20: 12.9% (switching to Vec destroys DFA-built features)
Vec_T5_then_DFA: 26.6% (partial recovery but still worse than pure DFA)

Phase 7A's early-window finding doesn't transfer: it required offline-trained Vec
on frozen features. Online Vec estimator faces cold-start paradox — needs structured
features to learn credit, but structured features need good credit to form.

Co-Authored-By: Claude Opus 4.6 (1M context)

Add Phase 7A: snapshot time sweep shows early snapshots have positive held-out transfer

2026-03-25T15:23:19+00:00

At epoch 5 (acc=49%), Vec_M4 5-step: dL_held=-0.005 (PUR=0.70)
  Oracle BP 5-step: dL_held=-0.009 (PUR=1.05)
  DFA 5-step: dL_held=+0.003 (always hurts held-out)

By epoch 20, generalization window closes. Held-out failure is late-snapshot artifact.
Better credit → lower update variance (Vec=0.8 vs DFA=40), not higher.

Key implication: DFA warmup delays credit bridge past its useful window.
Credit should be used from epoch 0, not after 20% warmup.

Co-Authored-By: Claude Opus 4.6 (1M context)

Add Phase 6.5A: same-batch linesearch REVISES Phase 6A conclusion

2026-03-25T13:22:04+00:00

Phase 6A's "better credit → worse loss" was a protocol artifact caused by:
1. Credit normalization (inflated DFA, suppressed Vec magnitude ordering)
2. Held-out evaluation (measured generalization failure, not exploitability)
3. Gradient clamping

With strict same-batch evaluation:
- Oracle BP: dL_same = -0.406 (strongest descent)
- Vec_M4: dL_same = -0.135
- ScalarCB: dL_same = -0.025
- DFA: dL_same = -0.003
Same-batch loss decrease is MONOTONIC with credit quality.

But held-out loss INCREASES for all non-DFA methods (Case D: overfitting).
The bottleneck is batch-level generalization, not surrogate exploitability.

Co-Authored-By: Claude Opus 4.6 (1M context)

Add Phase 6: snapshot exploitability reveals local update rule is the bottleneck

2026-03-25T01:07:03+00:00

Phase 6A: Better credit is ANTI-CORRELATED with loss decrease on fixed snapshot.
  DFA (Gamma=0.01) → dL=-0.0001 (only method that decreases loss)
  Vec_M4 (Gamma=0.38) → dL=+0.057 (increases loss most)
  Oracle BP (Gamma=1.0) → dL=+0.011 (still increases loss)

Phase 6C: Target-shift rule reduces damage but cannot make non-DFA credits productive.
  The inner-product surrogate  is fundamentally mismatched with directional credit.

Conclusion: Case B — the primary bottleneck is the local update paradigm itself,
not the credit estimator quality or tracking/co-adaptation.

Co-Authored-By: Claude Opus 4.6 (1M context)

Add Phase 5: vector field audit, frozen CIFAR transfer, online pilot

2026-03-24T23:03:55+00:00

Phase 5A: Audit passes — shuffle control collapses, gains are real
Phase 5B: Transfer SUCCESS — vec_M4 beats scalar CB by +0.25 Gamma, +0.31 rho on frozen CIFAR
Phase 5C: Online FAILURE — vec does worse than scalar CB online despite better frozen credit
Core finding: bottleneck is in local surrogate / co-adaptation, not estimator quality

Co-Authored-By: Claude Opus 4.6 (1M context)