|
Codex round 19's #1 critical control. Result on penalized DFA s42 (lam=1e-2, 30 ep):
training-Bs deep-layer cos: +0.1627
fresh-Bs deep-layer cos: +0.0022 ± 0.0220 (n=20 draws)
The +0.17 measurement is REAL signal, not artifact. The network specifically
adapted to its training-time Bs during the penalized run. Fresh Bs give
essentially zero cosine (within noise).
This validates the walk-back interpretation: in the rescued regime where
||g_l|| is meaningful, DFA's local credit signal shows partial alignment
with BP grad — and this alignment is specifically the network learning to
align with its specific Bs.
Round 19 caveat preserved: cannot yet distinguish whether the alignment
was always present in vanilla but hidden by measurement degeneracy, OR
whether it was created by the penalty intervention. The early-epoch
vanilla checkpoint sweep (round 19's other proposed control) would
disambiguate.
|