faeval.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2026-04-08	Commit dfa_pen_short lam=1e-4 s123/s456 JSONs (auditable source for §5 ¶2)	YurenHao0426
	The §5 ¶2 lambda sweep claim "at λ=1e-4, three-seed mean ‖h_L‖≈2.2e4 and ‖g_L‖≈7.0e-7" depends on these three files: results/dfa_pen_short/dfa_pen_lam0.0001_s42.json (already committed) results/dfa_pen_short/dfa_pen_lam0.0001_s123.json (this commit) results/dfa_pen_short/dfa_pen_lam0.0001_s456.json (this commit) The s123 and s456 files were untracked. Committing them as part of the auditable source set for the §5 ¶2 lambda-sweep claim. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08	λ sweep on penalty strength: lam ∈ {1e-4, 1e-2, 1e-1} cos + rho results	YurenHao0426
	Round 19's #5 recommendation. Major new finding for the paper: \| lam \| acc \| \|\|h_L\|\| \| \|\|g_2\|\| \| deep cos \| deep rho \| \|-------\|------:\|--------:\|--------:\|---------:\|---------:\| \| 0 \| 0.308 \| 4e8 \| 5e-10 \| -0.008 \| -0.003 \| \| 1e-4 \| 0.359 \| 2.4e4 \| 6.3e-7 \| -0.022 \| -0.004 \| \| 1e-2 \| 0.363 \| 4e4 \| 1e-6 \| +0.155 \| +0.080 \| \| 1e-1 \| 0.349 \| 1.2e4 \| 1.6e-6 \| +0.131 \| +0.067 \| KEY: at lam=1e-4 the residual stream is contained AND \|\|g\|\| is healthy (mode 1 ALLEVIATED), but deep cos and rho are still essentially zero (mode 2 NOT alleviated). This is independent dissociation of the two modes via penalty strength: at weak penalty you get mode 1 fix WITHOUT mode 2 fix. Both metrics (cos, rho) agree at every lambda. Penalty strength has a non-monotonic effect on mode 2 alleviation: - lam=1e-4: too weak, mode 2 not alleviated (cos ~0) - lam=1e-2: sweet spot, cos +0.16, rho +0.08 - lam=1e-1: slightly over-constrained, cos +0.13, rho +0.07 This is the 7th independent validation of the two-mode separation, and the strongest one because it shows mode 1 alleviation WITHOUT mode 2 alleviation — the modes do not even respond to the same intervention strength.
2026-04-08	3-seed multi-seed verification of penalized DFA deep cos = +0.17	YurenHao0426
	\| seed \| l0 \| l1 \| l2 \| l3 \| l4 \| layer-mean \| \|---\|---:\|---:\|---:\|---:\|---:\|---:\| \| 42 \| +0.316 \| +0.169 \| +0.151 \| +0.165 \| +0.166 \| +0.193 \| \| 123 \| +0.333 \| +0.093 \| +0.155 \| +0.178 \| +0.177 \| +0.187 \| \| 456 \| +0.339 \| +0.131 \| +0.123 \| +0.150 \| +0.150 \| +0.179 \| 3-seed mean deep cos (l1-l4): ~0.155 ± 0.025 3-seed layer-mean: +0.186 ± 0.007 The +0.17 finding is rock-solid, combined with: - null calibration: training-Bs +0.16 vs fresh-Bs +0.002 - hypothesis B confirmed: vanilla early ep deep cos ~0 - 3-seed reproducibility (this commit) This is the §4 evidence for the paper's 'penalty creates partial deep alignment, partially alleviating mode 2'.
2026-04-08	MAJOR: penalized DFA deep-layer cosine is +0.17, NOT zero	YurenHao0426
	Direct deep-block credit measurement on penalized DFA s42 checkpoint (lam=1e-2, 30 epochs, just trained): per-layer cos(e_T B^T, BP grad) — TRAINING Bs, no eps clamp: l0: +0.316 (±0.188) \|\|g\|\|=9.18e-7 \|\|a\|\|=4.53 l1: +0.169 (±0.087) \|\|g\|\|=8.87e-7 \|\|a\|\|=4.57 l2: +0.151 (±0.084) \|\|g\|\|=8.77e-7 \|\|a\|\|=4.50 l3: +0.165 (±0.099) \|\|g\|\|=8.73e-7 \|\|a\|\|=4.64 l4: +0.166 (±0.098) \|\|g\|\|=8.69e-7 \|\|a\|\|=4.64 layer-mean: +0.193 Compare to vanilla DFA (existing measurement, scale-broken regime): l0: +0.42 l1-4: ~0 (essentially zero) CRITICAL INTERPRETATION: The penalty doesn't just fix scale, it ALSO restores deep-layer direction quality from ~0 to ~0.17. This contradicts the prior 'two failure modes' framing where I assumed direction would remain broken even after scale fix. The honest story is: - vanilla DFA: scale catastrophic, BP grad at floor, cosine measurement DEGENERATE (cos ~0 is noise dominance, not 'no alignment') - penalized DFA: scale fixed, BP grad healthy, cosine measurement INTERPRETABLE — and the value is +0.17 on deep layers (partially aligned, much less than BP's self-cosine of 1.0) - the +0.17 alignment explains why penalized DFA gets 0.36 (60% of BP's 0.61) — partial credit gives partial training, not zero training The 'second failure mode' claim is wrong. There's ONE unified failure mode (scale + measurement degeneracy), and the penalty rescues BOTH. The remaining gap to BP is 'partial credit quality', not a separate failure mode.