Sync experiment+protocol scripts with v2.32 corrected control values

The pre-v2.31 unsourced values BP=0.609 and DFA=0.308 (which v2.31 fixed to 0.585 and 0.301 via matched 30-ep controls) were also hardcoded as "compare to" comments in 5 helper scripts: experiments/bp_with_penalty_control.py experiments/dfa_residual_penalty_test.py experiments/resmlp_frozen_blocks_baseline.py protocol/examples/threshold_d_sensitivity.py protocol/examples/plot_penalty_rescue.py These are non-paper-input scripts (their output goes to stdout, not to the paper), so the stale values didn't cause numerical errors in the paper itself. But the original v2.31 BP+pen=0.609 unsourced number bug came from exactly this kind of hardcoded "for-comparison" comment that was never measured. Updating them now to remove the same trap from future runs. Each script now references the matched 30-ep 3-seed values from results/bp_no_penalty_30ep, results/dfa_no_penalty_30ep, results/ dfa_pen_short, and results/bp_with_penalty. protocol/EVIDENCE_SUMMARY.md and PAPER_OUTLINE.md still have stale numbers — these are project scratch documents and not user-facing. Deferred to a separate sweep if needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
author: YurenHao0426 <Blackhao0426@gmail.com> 2026-04-08 19:24:06 -0500
committer: YurenHao0426 <Blackhao0426@gmail.com> 2026-04-08 19:24:06 -0500
commit: 2fa24acae8bb7f8c026db2f7fdade4a29b640d8d (patch)
tree: 98bf266ac07a1d6974769262dff916553223612f /experiments/bp_with_penalty_control.py
parent: cebc4c4a81809a982a16dd07da41487aa2f30322 (diff)
1 files changed, 4 insertions, 4 deletions
diff --git a/experiments/bp_with_penalty_control.py b/experiments/bp_with_penalty_control.py
index b986dee..07ee1f1 100644
--- a/experiments/bp_with_penalty_control.py
+++ b/experiments/bp_with_penalty_control.py
@@ -117,10 +117,10 @@ def main():
     log = train_bp_with_penalty(m, train_loader, test_loader, dev, args.epochs, args.lr, args.wd, args.lam)
     final_acc = evaluate(m, test_loader, dev)
     print(f"\nFINAL test acc: {final_acc:.4f}", flush=True)
-    print(f"Compare to:")
-    print(f"  BP-trainable (3-seed mean):   0.609")
-    print(f"  Penalized DFA lam=1e-2:       0.363")
-    print(f"  DFA-shallow:                  0.349")
+    print(f"Compare to (matched 30-epoch 3-seed values, see paper v2.32):")
+    print(f"  BP-trainable no-pen (3-seed): 0.585 ± 0.001")
+    print(f"  Penalized DFA lam=1e-2:       0.360 ± 0.001")
+    print(f"  DFA-shallow (frozen blocks):  0.349 ± 0.002")
     margin = (final_acc - 0.349) * 100
     print(f"\nMargin vs DFA-shallow baseline: {margin:+.2f} pp")
     if margin > 25:
author	YurenHao0426 <Blackhao0426@gmail.com>	2026-04-08 19:24:06 -0500
committer	YurenHao0426 <Blackhao0426@gmail.com>	2026-04-08 19:24:06 -0500
commit	2fa24acae8bb7f8c026db2f7fdade4a29b640d8d (patch)
tree	98bf266ac07a1d6974769262dff916553223612f /experiments/bp_with_penalty_control.py
parent	cebc4c4a81809a982a16dd07da41487aa2f30322 (diff)