<feed xmlns='http://www.w3.org/2005/Atom'>
<title>faeval.git/results/protocol_audit/audit_table_s42.json, branch master</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/'/>
<entry>
<title>Protocol diagnostic (a): use max per-block growth, not max/min ratio</title>
<updated>2026-04-08T04:00:54+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-04-08T04:00:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=31ddecc9eb646b15c4ac5960c7de9346c8f7be68'/>
<id>31ddecc9eb646b15c4ac5960c7de9346c8f7be68</id>
<content type='text'>
Old metric: max(||h||) / max(||h_0||, eps). False-positives on ViT-style
architectures because the cls token at layer 0 (right after patch_embed)
has anomalously small magnitude (~0.3-1.5), inflating the ratio even on
healthy BP-trained ViTs.

New metric: max_l(||h_{l+1}|| / ||h_l||) — the largest single-block
residual amplification. Architecture-invariant.

Calibration:
  - BP-trained, late training: &lt;5x per block
  - BP ViT, early epochs (cls token resolving): 13-25x max
  - DFA-trained ResMLP/ViT: 100-4000x per block
Threshold raised from 10 to 50 to sit cleanly between healthy-early-
training (max 25) and failure-regime (min 100).

Re-verifications:
  - smoke test (BP/DFA/EP): all 3 verdicts unchanged
  - random init (3 seeds): trustworthy on all 3
  - 5-method audit table single-seed: identical verdicts
  - decision-utility ablation: identical (still 0/5 by S1, 3/5 by S_full)
  - temporal evolution 3-seed: (b) now fires first at ep 3-4, (a) at ep
    8-11. Both well before training ends. The 'protocol fires ~92 epochs
    early' story still holds.
  - ViT temporal evolution: BP no longer false-fires; DFA fires (a) ep 1,
    (b) ep 3 — protocol works on the second architecture.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Old metric: max(||h||) / max(||h_0||, eps). False-positives on ViT-style
architectures because the cls token at layer 0 (right after patch_embed)
has anomalously small magnitude (~0.3-1.5), inflating the ratio even on
healthy BP-trained ViTs.

New metric: max_l(||h_{l+1}|| / ||h_l||) — the largest single-block
residual amplification. Architecture-invariant.

Calibration:
  - BP-trained, late training: &lt;5x per block
  - BP ViT, early epochs (cls token resolving): 13-25x max
  - DFA-trained ResMLP/ViT: 100-4000x per block
Threshold raised from 10 to 50 to sit cleanly between healthy-early-
training (max 25) and failure-regime (min 100).

Re-verifications:
  - smoke test (BP/DFA/EP): all 3 verdicts unchanged
  - random init (3 seeds): trustworthy on all 3
  - 5-method audit table single-seed: identical verdicts
  - decision-utility ablation: identical (still 0/5 by S1, 3/5 by S_full)
  - temporal evolution 3-seed: (b) now fires first at ep 3-4, (a) at ep
    8-11. Both well before training ends. The 'protocol fires ~92 epochs
    early' story still holds.
  - ViT temporal evolution: BP no longer false-fires; DFA fires (a) ep 1,
    (b) ep 3 — protocol works on the second architecture.
</pre>
</div>
</content>
</entry>
<entry>
<title>Add audit table example: protocol applied to BP/DFA/SB/CB/EP</title>
<updated>2026-04-08T03:29:00+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-04-08T03:29:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=111bab56e2d49c9fb1f3bfb9e55ea2028da4d008'/>
<id>111bab56e2d49c9fb1f3bfb9e55ea2028da4d008</id>
<content type='text'>
5-method audit table on 4-block d=256 ResMLP CIFAR-10 seed 42:
  - BP: trustworthy (acc 0.615, h_L=2e2, g_L=4e-4, stab 0.099)
  - DFA: walked back via (a)+(b)+(d) — h_L=4e8, g_L=4e-9, undercuts frozen
  - State Bridge: walked back via all 4 diagnostics — stability 0.992 is the
    cleanest possible drift-dominated case
  - Credit Bridge: walked back via all 4 — stability 0.352, also drift mode
  - EP: trustworthy (acc 0.359, h_L=3e3, g_L=2e-4, stab -0.036) — paper's
    internal control case

This is the §2 audit evidence for the main-track paper. Confirms that
standard headline acc + Γ silently fails on 3 of 5 methods on this
architecture, while the 4-diagnostic protocol catches all three.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
5-method audit table on 4-block d=256 ResMLP CIFAR-10 seed 42:
  - BP: trustworthy (acc 0.615, h_L=2e2, g_L=4e-4, stab 0.099)
  - DFA: walked back via (a)+(b)+(d) — h_L=4e8, g_L=4e-9, undercuts frozen
  - State Bridge: walked back via all 4 diagnostics — stability 0.992 is the
    cleanest possible drift-dominated case
  - Credit Bridge: walked back via all 4 — stability 0.352, also drift mode
  - EP: trustworthy (acc 0.359, h_L=3e3, g_L=2e-4, stab -0.036) — paper's
    internal control case

This is the §2 audit evidence for the main-track paper. Confirms that
standard headline acc + Γ silently fails on 3 of 5 methods on this
architecture, while the 4-diagnostic protocol catches all three.
</pre>
</div>
</content>
</entry>
</feed>
