summaryrefslogtreecommitdiff
path: root/experiments/snapshot_exploitability.py
diff options
context:
space:
mode:
authorYurenHao0426 <Blackhao0426@gmail.com>2026-04-08 12:22:58 -0500
committerYurenHao0426 <Blackhao0426@gmail.com>2026-04-08 12:22:58 -0500
commitc201cb31018b35bf88482f7dc768b8f7a057703b (patch)
treed12e4640b4d0abef34c73f0f667f8a0eb026f794 /experiments/snapshot_exploitability.py
parent35be969067396306c19a3caac2d887bcde48c5d0 (diff)
Round 41 (Appendix L): add per-block drift diagnostic reinforcing cos-vs-acc hypothesis
Extracted from existing round 38 JSON data without running new compute. The drift field (||W_final - W_init||_F / ||W_init||_F) is produced by cifar_resmlp.py's feature_drift() and was already saved but not used in the paper. Key finding: CB+penalty has LARGER block updates than SB+penalty (per-block w2 drift 19.3x vs 14.3x; embed drift 44.6x vs 7.1x) yet 9.3 pp LOWER accuracy. This rules out 'CB just has smaller updates' as an alternative explanation for the cos-vs-acc dissociation. Added 2 sentences to Appendix L paragraph 2 noting this supporting evidence for the 'angular agreement does not certify functional forward-state content' mechanism hypothesis in ยง4. Main content still 9 pages exactly within E&D budget. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Diffstat (limited to 'experiments/snapshot_exploitability.py')
0 files changed, 0 insertions, 0 deletions