diff options
| author | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-07 23:21:32 -0500 |
|---|---|---|
| committer | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-07 23:21:32 -0500 |
| commit | 8f67bdeebac543961871b9896a62cd07b7a5be26 (patch) | |
| tree | 63fec268bf894b61875ccf90e173af4e4264cb81 /results/confirmatory/clean_sparsity/synth_bp_s42_a1.0_L4.json | |
| parent | 5771a122300f9d30a6290fcbfc9bffb5f380e648 (diff) | |
Add fast direction-quality measurement on existing DFA checkpoints
3-seed result on the existing dfa_s{42,123,456}.pt checkpoints from
results/confirmatory/checkpoints_A2/, computing per-layer cosine of
DFA's local credit signal e_T@B_l^T vs the true BP gradient at h_l.
Key findings:
per-layer cos (3-seed mean):
l0: +0.42 (high — embedding alignment)
l1: +0.006 (essentially zero)
l2: -0.015 (essentially zero)
l3: -0.004 (essentially zero)
l4: -0.004 (essentially zero)
layer-mean across all 5: +0.07-0.10
The deep blocks (l1-l4) have essentially zero alignment with BP grad in
the vanilla scale-failure regime. Layer 0 dominates the headline.
The script reconstructs the training-time random Bs by replaying the RNG
sequence (torch.manual_seed + ResidualMLP construction + randn draws),
since the existing checkpoints don't save Bs. For the still-running
direction-quality experiment which DOES save Bs, the script auto-detects
the dict format and uses the saved Bs directly.
Diffstat (limited to 'results/confirmatory/clean_sparsity/synth_bp_s42_a1.0_L4.json')
0 files changed, 0 insertions, 0 deletions
