faeval.git/protocol/examples/audit_table.py, branch master

Audit table extension to 3 seeds (s42/s123/s456)

2026-04-08T03:45:41+00:00

3 seeds × 5 methods × 4 diagnostics = 60 measurements. Key reproducibility
findings:

  - BP: trustworthy on all 3 seeds (acc 0.61-0.62, h_L ~200, g_L ~3-4e-4)
  - EP: trustworthy on all 3 seeds (acc 0.29-0.36, h_L 3-8e3, g_L ~1e-4)
  - DFA, SB, CB: walked back on all 3 seeds × all 3 of (a)/(b)/(d)

Diagnostic (c) is bimodal across seeds — confirms the prior memory finding:
  - DFA s42=0.047 (noise), s123=0.436 (drift), s456=-0.005 (noise)
  - SB  s42=0.992 (drift), s123=0.561 (drift), s456=0.035 (noise)
  - CB  s42=0.352 (drift), s123=0.250 (~edge), s456=0.518 (drift)

(c) catches different methods on different seeds. (a)/(b)/(d) catch all 3
failing methods on all 3 seeds — robust binary detection.

Add audit table example: protocol applied to BP/DFA/SB/CB/EP

2026-04-08T03:29:00+00:00

5-method audit table on 4-block d=256 ResMLP CIFAR-10 seed 42:
  - BP: trustworthy (acc 0.615, h_L=2e2, g_L=4e-4, stab 0.099)
  - DFA: walked back via (a)+(b)+(d) — h_L=4e8, g_L=4e-9, undercuts frozen
  - State Bridge: walked back via all 4 diagnostics — stability 0.992 is the
    cleanest possible drift-dominated case
  - Credit Bridge: walked back via all 4 — stability 0.352, also drift mode
  - EP: trustworthy (acc 0.359, h_L=3e3, g_L=2e-4, stab -0.036) — paper's
    internal control case

This is the §2 audit evidence for the main-track paper. Confirms that
standard headline acc + Γ silently fails on 3 of 5 methods on this
architecture, while the 4-diagnostic protocol catches all three.