summaryrefslogtreecommitdiff
path: root/experiments/resmlp_frozen_blocks_baseline.py
AgeCommit message (Collapse)Author
2026-04-26Find setting where both FA and DFA fail: d=512 L=2 ResMLPYurenHao0426
TASK COMPLETE: Found 3/10 seeds where BOTH FA and DFA fall below the frozen-blocks baseline while reporting positive cosine and nontrivial accuracy — proving that the standard evaluation pair can simultaneously miss both FA and DFA on the same setting. Setting: d=512 L=2 pre-LayerNorm ResMLP, CIFAR-10, 100 epochs Frozen baseline (3-seed mean): 0.349 Qualifying seeds: seed 1: DFA=0.298 (cos +0.206), FA=0.347 (cos +0.484) seed 2: DFA=0.297 (cos +0.179), FA=0.346 (cos +0.472) seed 5: DFA=0.296 (cos +0.194), FA=0.341 (cos +0.492) All qualifying cases have: - Both methods below frozen baseline ✓ - Both methods report positive aggregate cosine ✓ - Both methods above chance (~0.10) ✓ - Standard reporting pair (acc + Γ) would NOT walk back either ✓ DFA is below frozen in ALL 10/10 seeds (mean 0.300 ± 0.009). FA is below frozen in 3/10 seeds (mean across all 10: 0.370 ± 0.026). Also includes: - Frozen baselines for d=512 at L=2,4,8,12 × 3 seeds (12 runs) - resmlp_frozen_blocks_baseline.py patched with --num_blocks arg Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08Sync experiment+protocol scripts with v2.32 corrected control valuesYurenHao0426
The pre-v2.31 unsourced values BP=0.609 and DFA=0.308 (which v2.31 fixed to 0.585 and 0.301 via matched 30-ep controls) were also hardcoded as "compare to" comments in 5 helper scripts: experiments/bp_with_penalty_control.py experiments/dfa_residual_penalty_test.py experiments/resmlp_frozen_blocks_baseline.py protocol/examples/threshold_d_sensitivity.py protocol/examples/plot_penalty_rescue.py These are non-paper-input scripts (their output goes to stdout, not to the paper), so the stale values didn't cause numerical errors in the paper itself. But the original v2.31 BP+pen=0.609 unsourced number bug came from exactly this kind of hardcoded "for-comparison" comment that was never measured. Updating them now to remove the same trap from future runs. Each script now references the matched 30-ep 3-seed values from results/bp_no_penalty_30ep, results/dfa_no_penalty_30ep, results/ dfa_pen_short, and results/bp_with_penalty. protocol/EVIDENCE_SUMMARY.md and PAPER_OUTLINE.md still have stale numbers — these are project scratch documents and not user-facing. Deferred to a separate sweep if needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>