|
Round 19's #5 recommendation. Major new finding for the paper:
| lam | acc | ||h_L|| | ||g_2|| | deep cos | deep rho |
|-------|------:|--------:|--------:|---------:|---------:|
| 0 | 0.308 | 4e8 | 5e-10 | -0.008 | -0.003 |
| 1e-4 | 0.359 | 2.4e4 | 6.3e-7 | -0.022 | -0.004 |
| 1e-2 | 0.363 | 4e4 | 1e-6 | +0.155 | +0.080 |
| 1e-1 | 0.349 | 1.2e4 | 1.6e-6 | +0.131 | +0.067 |
KEY: at lam=1e-4 the residual stream is contained AND ||g|| is healthy
(mode 1 ALLEVIATED), but deep cos and rho are still essentially zero
(mode 2 NOT alleviated). This is independent dissociation of the two
modes via penalty strength: at weak penalty you get mode 1 fix WITHOUT
mode 2 fix.
Both metrics (cos, rho) agree at every lambda. Penalty strength has a
non-monotonic effect on mode 2 alleviation:
- lam=1e-4: too weak, mode 2 not alleviated (cos ~0)
- lam=1e-2: sweet spot, cos +0.16, rho +0.08
- lam=1e-1: slightly over-constrained, cos +0.13, rho +0.07
This is the 7th independent validation of the two-mode separation, and
the strongest one because it shows mode 1 alleviation WITHOUT mode 2
alleviation — the modes do not even respond to the same intervention
strength.
|
|
Direct deep-block credit measurement on penalized DFA s42 checkpoint
(lam=1e-2, 30 epochs, just trained):
per-layer cos(e_T B^T, BP grad) — TRAINING Bs, no eps clamp:
l0: +0.316 (±0.188) ||g||=9.18e-7 ||a||=4.53
l1: +0.169 (±0.087) ||g||=8.87e-7 ||a||=4.57
l2: +0.151 (±0.084) ||g||=8.77e-7 ||a||=4.50
l3: +0.165 (±0.099) ||g||=8.73e-7 ||a||=4.64
l4: +0.166 (±0.098) ||g||=8.69e-7 ||a||=4.64
layer-mean: +0.193
Compare to vanilla DFA (existing measurement, scale-broken regime):
l0: +0.42 l1-4: ~0 (essentially zero)
CRITICAL INTERPRETATION: The penalty doesn't just fix scale, it ALSO
restores deep-layer direction quality from ~0 to ~0.17. This contradicts
the prior 'two failure modes' framing where I assumed direction would
remain broken even after scale fix. The honest story is:
- vanilla DFA: scale catastrophic, BP grad at floor, cosine measurement
DEGENERATE (cos ~0 is noise dominance, not 'no alignment')
- penalized DFA: scale fixed, BP grad healthy, cosine measurement
INTERPRETABLE — and the value is +0.17 on deep layers (partially
aligned, much less than BP's self-cosine of 1.0)
- the +0.17 alignment explains why penalized DFA gets 0.36 (60% of
BP's 0.61) — partial credit gives partial training, not zero training
The 'second failure mode' claim is wrong. There's ONE unified failure
mode (scale + measurement degeneracy), and the penalty rescues BOTH.
The remaining gap to BP is 'partial credit quality', not a separate
failure mode.
|