From 2fa24acae8bb7f8c026db2f7fdade4a29b640d8d Mon Sep 17 00:00:00 2001
From: YurenHao0426 <Blackhao0426@gmail.com>
Date: Wed, 8 Apr 2026 19:24:06 -0500
Subject: Sync experiment+protocol scripts with v2.32 corrected control values
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The pre-v2.31 unsourced values BP=0.609 and DFA=0.308 (which v2.31 fixed
to 0.585 and 0.301 via matched 30-ep controls) were also hardcoded as
"compare to" comments in 5 helper scripts:

  experiments/bp_with_penalty_control.py
  experiments/dfa_residual_penalty_test.py
  experiments/resmlp_frozen_blocks_baseline.py
  protocol/examples/threshold_d_sensitivity.py
  protocol/examples/plot_penalty_rescue.py

These are non-paper-input scripts (their output goes to stdout, not to
the paper), so the stale values didn't cause numerical errors in the
paper itself. But the original v2.31 BP+pen=0.609 unsourced number bug
came from exactly this kind of hardcoded "for-comparison" comment that
was never measured. Updating them now to remove the same trap from
future runs.

Each script now references the matched 30-ep 3-seed values from
results/bp_no_penalty_30ep, results/dfa_no_penalty_30ep, results/
dfa_pen_short, and results/bp_with_penalty.

protocol/EVIDENCE_SUMMARY.md and PAPER_OUTLINE.md still have stale
numbers — these are project scratch documents and not user-facing.
Deferred to a separate sweep if needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 protocol/examples/threshold_d_sensitivity.py | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

(limited to 'protocol/examples/threshold_d_sensitivity.py')

diff --git a/protocol/examples/threshold_d_sensitivity.py b/protocol/examples/threshold_d_sensitivity.py
index d3f2c58..065efc7 100644
--- a/protocol/examples/threshold_d_sensitivity.py
+++ b/protocol/examples/threshold_d_sensitivity.py
@@ -22,13 +22,15 @@ REPO_ROOT = os.path.dirname(
 
 def main():
     # 3-seed mean accuracies on 4-block d=256 ResMLP CIFAR-10
+    # Updated v2.32 with matched 30-epoch controls
     conditions = [
-        ("BP-trainable",     0.609,  0.004),
-        ("DFA-shallow",      0.349,  0.002),
-        ("DFA-vanilla",      0.308,  0.014),
-        ("DFA-pen lam=1e-3", 0.372,  None),  # 1 seed
-        ("DFA-pen lam=1e-2", 0.363,  0.0007),
-        ("DFA-frozen-rand",  0.349,  0.002),
+        ("BP-trainable 100ep",     0.6147, 0.004),  # protocol_audit
+        ("BP-trainable 30ep",      0.585,  0.001),  # results/bp_no_penalty_30ep
+        ("BP+pen 30ep lam=1e-2",   0.532,  0.006),  # results/bp_with_penalty
+        ("DFA-shallow",            0.349,  0.002),  # frozen baseline
+        ("DFA-vanilla 100ep",      0.306,  0.006),  # protocol_audit
+        ("DFA-vanilla 30ep",       0.301,  0.005),  # results/dfa_no_penalty_30ep
+        ("DFA+pen 30ep lam=1e-2",  0.360,  0.001),  # results/dfa_pen_short
     ]
     shallow_acc = 0.349
 
-- 
cgit v1.2.3