Save null_calibration_penalized_dfa.json for §6 ¶2 audit

2026-04-08T23:39:00+00:00

The §6 ¶2 fresh-B null control claim "deep cos +0.002 ± 0.022 (n=20
draws), per-layer stds 0.013-0.023" was verified against a fresh
re-run of experiments/null_calibration_penalized_cos.py:

  training-Bs deep cos:  +0.1627  (matches Appendix L row)
  fresh-Bs deep cos:     +0.0022 ± 0.0220 (per-layer std avg, n=20)
  per-layer stds:        [0.0125, 0.0221, 0.0162, 0.0229, 0.0228] (l0-l4)

The "0.013-0.023" range matches the per-layer std range exactly.
The "± 0.022" is the average per-layer std across deep layers (l1-l4).

Saved as the auditable source. The script (experiments/null_calibration_
penalized_cos.py) can re-derive these values from the saved checkpoint
in results/dfa_pen_short/dfa_pen_lam0.01_s42.pt.

Co-Authored-By: Claude Opus 4.6 (1M context)

faeval.git/results/null_calibration_penalized_dfa.json, branch master

Save null_calibration_penalized_dfa.json for §6 ¶2 audit