faeval.git/results/dfa_pen_short, branch master

Commit dfa_pen_short lam=1e-4 s123/s456 JSONs (auditable source for §5 ¶2)

2026-04-08T23:57:48+00:00

The §5 ¶2 lambda sweep claim "at λ=1e-4, three-seed mean ‖h_L‖≈2.2e4
and ‖g_L‖≈7.0e-7" depends on these three files:
  results/dfa_pen_short/dfa_pen_lam0.0001_s42.json (already committed)
  results/dfa_pen_short/dfa_pen_lam0.0001_s123.json (this commit)
  results/dfa_pen_short/dfa_pen_lam0.0001_s456.json (this commit)

The s123 and s456 files were untracked. Committing them as part of
the auditable source set for the §5 ¶2 lambda-sweep claim.

Co-Authored-By: Claude Opus 4.6 (1M context)

λ sweep on penalty strength: lam ∈ {1e-4, 1e-2, 1e-1} cos + rho results

2026-04-08T08:22:36+00:00

Round 19's #5 recommendation. Major new finding for the paper:

| lam   | acc   | ||h_L|| | ||g_2|| | deep cos | deep rho |
|-------|------:|--------:|--------:|---------:|---------:|
| 0     | 0.308 | 4e8     | 5e-10   | -0.008   | -0.003   |
| 1e-4  | 0.359 | 2.4e4   | 6.3e-7  | -0.022   | -0.004   |
| 1e-2  | 0.363 | 4e4     | 1e-6    | +0.155   | +0.080   |
| 1e-1  | 0.349 | 1.2e4   | 1.6e-6  | +0.131   | +0.067   |

KEY: at lam=1e-4 the residual stream is contained AND ||g|| is healthy
(mode 1 ALLEVIATED), but deep cos and rho are still essentially zero
(mode 2 NOT alleviated). This is independent dissociation of the two
modes via penalty strength: at weak penalty you get mode 1 fix WITHOUT
mode 2 fix.

Both metrics (cos, rho) agree at every lambda. Penalty strength has a
non-monotonic effect on mode 2 alleviation:
  - lam=1e-4: too weak, mode 2 not alleviated (cos ~0)
  - lam=1e-2: sweet spot, cos +0.16, rho +0.08
  - lam=1e-1: slightly over-constrained, cos +0.13, rho +0.07

This is the 7th independent validation of the two-mode separation, and
the strongest one because it shows mode 1 alleviation WITHOUT mode 2
alleviation — the modes do not even respond to the same intervention
strength.

3-seed multi-seed verification of penalized DFA deep cos = +0.17

2026-04-08T06:31:16+00:00

| seed | l0 | l1 | l2 | l3 | l4 | layer-mean |
|---|---:|---:|---:|---:|---:|---:|
| 42 | +0.316 | +0.169 | +0.151 | +0.165 | +0.166 | +0.193 |
| 123 | +0.333 | +0.093 | +0.155 | +0.178 | +0.177 | +0.187 |
| 456 | +0.339 | +0.131 | +0.123 | +0.150 | +0.150 | +0.179 |

3-seed mean deep cos (l1-l4): ~0.155 ± 0.025
3-seed layer-mean: +0.186 ± 0.007

The +0.17 finding is rock-solid, combined with:
  - null calibration: training-Bs +0.16 vs fresh-Bs +0.002
  - hypothesis B confirmed: vanilla early ep deep cos ~0
  - 3-seed reproducibility (this commit)

This is the §4 evidence for the paper's 'penalty creates partial deep
alignment, partially alleviating mode 2'.

MAJOR: penalized DFA deep-layer cosine is +0.17, NOT zero

2026-04-08T05:47:38+00:00

Direct deep-block credit measurement on penalized DFA s42 checkpoint
(lam=1e-2, 30 epochs, just trained):

  per-layer cos(e_T B^T, BP grad) — TRAINING Bs, no eps clamp:
    l0: +0.316  (±0.188)  ||g||=9.18e-7  ||a||=4.53
    l1: +0.169  (±0.087)  ||g||=8.87e-7  ||a||=4.57
    l2: +0.151  (±0.084)  ||g||=8.77e-7  ||a||=4.50
    l3: +0.165  (±0.099)  ||g||=8.73e-7  ||a||=4.64
    l4: +0.166  (±0.098)  ||g||=8.69e-7  ||a||=4.64
  layer-mean: +0.193

Compare to vanilla DFA (existing measurement, scale-broken regime):
    l0: +0.42  l1-4: ~0 (essentially zero)

CRITICAL INTERPRETATION: The penalty doesn't just fix scale, it ALSO
restores deep-layer direction quality from ~0 to ~0.17. This contradicts
the prior 'two failure modes' framing where I assumed direction would
remain broken even after scale fix. The honest story is:

  - vanilla DFA: scale catastrophic, BP grad at floor, cosine measurement
    DEGENERATE (cos ~0 is noise dominance, not 'no alignment')
  - penalized DFA: scale fixed, BP grad healthy, cosine measurement
    INTERPRETABLE — and the value is +0.17 on deep layers (partially
    aligned, much less than BP's self-cosine of 1.0)
  - the +0.17 alignment explains why penalized DFA gets 0.36 (60% of
    BP's 0.61) — partial credit gives partial training, not zero training

The 'second failure mode' claim is wrong. There's ONE unified failure
mode (scale + measurement degeneracy), and the penalty rescues BOTH.
The remaining gap to BP is 'partial credit quality', not a separate
failure mode.