faeval.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	YurenHao0426 <Blackhao0426@gmail.com>	2026-04-08 02:07:26 -0500
committer	YurenHao0426 <Blackhao0426@gmail.com>	2026-04-08 02:07:26 -0500
commit	a868b29e4c399a3a948e85737e7a632001481969 (patch)
tree	48b1e9d527462135aee3658b2603c0b547f7b160 /results/confirmatory/clean_sparsity/synth_dfa_s1024_a1.0_L8.json
parent	8bf53ab94ac31c7672d23e2edf0e40c787b157d4 (diff)

Add perturbation correlation audit (round 19's recommended alt metric)

Codex round 19 said: 'use nudging or perturbation correlation on the penalized checkpoints. In the healthy-gradient regime, that is a more direct is-the-local-signal-useful test than cosine alone'. Result on existing checkpoints (eps=1e-3, M=32 random directions, n=1024): vanilla DFA s42: deep rho +0.002 penalized DFA s42 lam=1e-2 30ep: deep rho +0.094 penalized DFA s123 lam=1e-2 30ep: deep rho +0.073 penalized DFA s456 lam=1e-2 30ep: deep rho +0.072 penalized 3-seed mean: deep rho +0.080 ± 0.011 This INDEPENDENTLY TRIANGULATES the cos +0.17 finding via a different metric: - vanilla deep cos ~0 matches vanilla deep rho ~0 - penalized deep cos +0.155 matches penalized deep rho +0.080 The two metrics measure different things: - cos = directional alignment with BP grad - rho = correlation between predicted and true loss change under random perturbation Both show the same pattern: penalty creates partial usefulness from essentially zero. This is the 6th independent validation of the mode 2 'penalty creates partial alignment' framing. Crucially, rho doesn't use F.cosine_similarity (no eps clamp), and it measures sample-level loss change correlation rather than direction match — so it rules out 'cos is capturing some directional artifact unrelated to local usefulness'.

Diffstat (limited to 'results/confirmatory/clean_sparsity/synth_dfa_s1024_a1.0_L8.json')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: