Sync experiment+protocol scripts with v2.32 corrected control values

2026-04-09T00:24:06+00:00

The pre-v2.31 unsourced values BP=0.609 and DFA=0.308 (which v2.31 fixed
to 0.585 and 0.301 via matched 30-ep controls) were also hardcoded as
"compare to" comments in 5 helper scripts:

  experiments/bp_with_penalty_control.py
  experiments/dfa_residual_penalty_test.py
  experiments/resmlp_frozen_blocks_baseline.py
  protocol/examples/threshold_d_sensitivity.py
  protocol/examples/plot_penalty_rescue.py

These are non-paper-input scripts (their output goes to stdout, not to
the paper), so the stale values didn't cause numerical errors in the
paper itself. But the original v2.31 BP+pen=0.609 unsourced number bug
came from exactly this kind of hardcoded "for-comparison" comment that
was never measured. Updating them now to remove the same trap from
future runs.

Each script now references the matched 30-ep 3-seed values from
results/bp_no_penalty_30ep, results/dfa_no_penalty_30ep, results/
dfa_pen_short, and results/bp_with_penalty.

protocol/EVIDENCE_SUMMARY.md and PAPER_OUTLINE.md still have stale
numbers — these are project scratch documents and not user-facing.
Deferred to a separate sweep if needed.

Co-Authored-By: Claude Opus 4.6 (1M context)

Add penalty lambda 3-seed summary script + checkpoint save in penalty test

2026-04-08T05:07:39+00:00

- New script: protocol/examples/penalty_lam_3seed_summary.py
  Loads existing penalty JSON files for lam=1e-3 and lam=1e-2 across
  seeds, computes 3-seed mean margin vs DFA-shallow baseline, and
  explicitly checks the (d) verdict at 2pp threshold per seed and
  in aggregate. Reports MIXED if seeds disagree.

  Current result: lam=1e-2 has 3 seeds (margin +1.38 ± 0.05 pp, all
  FIRE), lam=1e-3 has 1 seed (+2.31 pp, PASSES). Awaiting s123/s456
  for lam=1e-3.

- experiments/dfa_residual_penalty_test.py: now saves model checkpoint
  + Bs alongside JSON log so post-hoc protocol can be applied without
  re-running. Closes the pitfall #6.5 self-disclosure (auxiliary nets
  must be saved for post-hoc Gamma to be reconstructible).

faeval.git/experiments/dfa_residual_penalty_test.py, branch master

Sync experiment+protocol scripts with v2.32 corrected control values

Add penalty lambda 3-seed summary script + checkpoint save in penalty test