|
Cross-metric disambiguation confirmation. Vanilla DFA at ep 1
(meaningful regime, ||g||~6e-7) deep rho across 3 seeds:
s42: deep rho -0.008
s123: deep rho +0.000
s456: deep rho -0.000
mean: -0.003 ± 0.005
Compare to penalized DFA 3-seed: deep rho +0.080 ± 0.011.
The disambiguation (penalty CREATES alignment, not just reveals it) is
now confirmed by TWO independent metrics:
- cos: vanilla -0.008 ± 0.013, penalized +0.155 ± 0.025
- rho: vanilla -0.003 ± 0.005, penalized +0.080 ± 0.011
Both metrics agree on the vanilla→penalized transition. The l0 (embedding)
rho is high (~0.25-0.29) at every vanilla checkpoint, mirroring the cos
l0 +0.42 — the embedding layer is genuinely useful while the deep blocks
are not, by BOTH metrics. The penalty restores some deep usefulness to
~+0.08 rho / +0.16 cos.
Cross-metric agreement rules out single-metric artifacts on either side.
|
|
Codex round 19 said: 'use nudging or perturbation correlation on the
penalized checkpoints. In the healthy-gradient regime, that is a more
direct is-the-local-signal-useful test than cosine alone'.
Result on existing checkpoints (eps=1e-3, M=32 random directions, n=1024):
vanilla DFA s42: deep rho +0.002
penalized DFA s42 lam=1e-2 30ep: deep rho +0.094
penalized DFA s123 lam=1e-2 30ep: deep rho +0.073
penalized DFA s456 lam=1e-2 30ep: deep rho +0.072
penalized 3-seed mean: deep rho +0.080 ± 0.011
This INDEPENDENTLY TRIANGULATES the cos +0.17 finding via a different
metric:
- vanilla deep cos ~0 matches vanilla deep rho ~0
- penalized deep cos +0.155 matches penalized deep rho +0.080
The two metrics measure different things:
- cos = directional alignment with BP grad
- rho = correlation between predicted and true loss change under
random perturbation
Both show the same pattern: penalty creates partial usefulness from
essentially zero. This is the 6th independent validation of the mode 2
'penalty creates partial alignment' framing.
Crucially, rho doesn't use F.cosine_similarity (no eps clamp), and it
measures sample-level loss change correlation rather than direction
match — so it rules out 'cos is capturing some directional artifact
unrelated to local usefulness'.
|