<feed xmlns='http://www.w3.org/2005/Atom'>
<title>faeval.git/protocol/examples/audit_table.py, branch master</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/'/>
<entry>
<title>Audit table extension to 3 seeds (s42/s123/s456)</title>
<updated>2026-04-08T03:45:41+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-04-08T03:45:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=3a520b203f4f0c75b37b2d5c34d461718729ea02'/>
<id>3a520b203f4f0c75b37b2d5c34d461718729ea02</id>
<content type='text'>
3 seeds × 5 methods × 4 diagnostics = 60 measurements. Key reproducibility
findings:

  - BP: trustworthy on all 3 seeds (acc 0.61-0.62, h_L ~200, g_L ~3-4e-4)
  - EP: trustworthy on all 3 seeds (acc 0.29-0.36, h_L 3-8e3, g_L ~1e-4)
  - DFA, SB, CB: walked back on all 3 seeds × all 3 of (a)/(b)/(d)

Diagnostic (c) is bimodal across seeds — confirms the prior memory finding:
  - DFA s42=0.047 (noise), s123=0.436 (drift), s456=-0.005 (noise)
  - SB  s42=0.992 (drift), s123=0.561 (drift), s456=0.035 (noise)
  - CB  s42=0.352 (drift), s123=0.250 (~edge), s456=0.518 (drift)

(c) catches different methods on different seeds. (a)/(b)/(d) catch all 3
failing methods on all 3 seeds — robust binary detection.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
3 seeds × 5 methods × 4 diagnostics = 60 measurements. Key reproducibility
findings:

  - BP: trustworthy on all 3 seeds (acc 0.61-0.62, h_L ~200, g_L ~3-4e-4)
  - EP: trustworthy on all 3 seeds (acc 0.29-0.36, h_L 3-8e3, g_L ~1e-4)
  - DFA, SB, CB: walked back on all 3 seeds × all 3 of (a)/(b)/(d)

Diagnostic (c) is bimodal across seeds — confirms the prior memory finding:
  - DFA s42=0.047 (noise), s123=0.436 (drift), s456=-0.005 (noise)
  - SB  s42=0.992 (drift), s123=0.561 (drift), s456=0.035 (noise)
  - CB  s42=0.352 (drift), s123=0.250 (~edge), s456=0.518 (drift)

(c) catches different methods on different seeds. (a)/(b)/(d) catch all 3
failing methods on all 3 seeds — robust binary detection.
</pre>
</div>
</content>
</entry>
<entry>
<title>Add audit table example: protocol applied to BP/DFA/SB/CB/EP</title>
<updated>2026-04-08T03:29:00+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-04-08T03:29:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=111bab56e2d49c9fb1f3bfb9e55ea2028da4d008'/>
<id>111bab56e2d49c9fb1f3bfb9e55ea2028da4d008</id>
<content type='text'>
5-method audit table on 4-block d=256 ResMLP CIFAR-10 seed 42:
  - BP: trustworthy (acc 0.615, h_L=2e2, g_L=4e-4, stab 0.099)
  - DFA: walked back via (a)+(b)+(d) — h_L=4e8, g_L=4e-9, undercuts frozen
  - State Bridge: walked back via all 4 diagnostics — stability 0.992 is the
    cleanest possible drift-dominated case
  - Credit Bridge: walked back via all 4 — stability 0.352, also drift mode
  - EP: trustworthy (acc 0.359, h_L=3e3, g_L=2e-4, stab -0.036) — paper's
    internal control case

This is the §2 audit evidence for the main-track paper. Confirms that
standard headline acc + Γ silently fails on 3 of 5 methods on this
architecture, while the 4-diagnostic protocol catches all three.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
5-method audit table on 4-block d=256 ResMLP CIFAR-10 seed 42:
  - BP: trustworthy (acc 0.615, h_L=2e2, g_L=4e-4, stab 0.099)
  - DFA: walked back via (a)+(b)+(d) — h_L=4e8, g_L=4e-9, undercuts frozen
  - State Bridge: walked back via all 4 diagnostics — stability 0.992 is the
    cleanest possible drift-dominated case
  - Credit Bridge: walked back via all 4 — stability 0.352, also drift mode
  - EP: trustworthy (acc 0.359, h_L=3e3, g_L=2e-4, stab -0.036) — paper's
    internal control case

This is the §2 audit evidence for the main-track paper. Confirms that
standard headline acc + Γ silently fails on 3 of 5 methods on this
architecture, while the 4-diagnostic protocol catches all three.
</pre>
</div>
</content>
</entry>
</feed>
