faeval.git/protocol/protocol.py, branch master

protocol/protocol.py: sync (c) range docstring with v2.31.13 paper update

2026-04-09T00:21:56+00:00

The cross_batch_direction_stability docstring claimed healthy values
"~0.05-0.18" and drift-dominated "~0.5-0.99" — these were the same
loose ranges that v2.31.13 corrected in the paper §6 ¶3.

Re-aggregated from results/protocol_audit/audit_table_s42_s123_s456.json
(K=10 batches of 128 samples):
  Healthy 6 BP+EP values: range [-0.036, 0.120], median 0.093
  Degen 9 DFA/SB/CB values: range [-0.005, 0.992], median 0.352
                           5/9 above 0.30 cutoff
                           3/9 above 0.50

Updated docstring to match the actual audit data and point at the
JSON source. Now the paper §6 ¶3 prose and the protocol.py docstring
agree exactly on the (c) calibration ranges.

Co-Authored-By: Claude Opus 4.6 (1M context)

Add FA diagnostic protocol reference implementation

2026-04-08T03:20:48+00:00

Codex round 15 #1 priority for the E&D-track paper:
  - protocol/protocol.py: 4 diagnostics (residual norms, BP grad norms,
    cross-batch direction stability, and a frozen-baseline comparator)
  - protocol/report.py: DiagnosticReport with per-diagnostic verdicts and
    pretty-printer
  - protocol/smoke_test.py: validates BP/DFA/EP checkpoints produce the
    expected verdicts (BP/EP trustworthy; DFA walked back via residual
    explosion + BP grad at floor)
  - protocol/README.md: usage, audit cases, threshold rationale
  - protocol/CHECKLIST.md: 6 evaluation pipeline pitfalls (norm(-1),
    cosine_similarity eps clamp, fp16 underflow, Bs reproducibility,
    aggregation, layer-0 dominance)
  - protocol/REPORTING_TEMPLATE.md: per-method fillable form for FA papers