<feed xmlns='http://www.w3.org/2005/Atom'>
<title>faeval.git/protocol/examples/random_init_sanity.py, branch master</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/'/>
<entry>
<title>Add random-init sanity check: protocol does not flag untrained networks</title>
<updated>2026-04-08T03:48:18+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-04-08T03:48:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=c2e145e162444b31ac5c66a90daa6bc0a1cda591'/>
<id>c2e145e162444b31ac5c66a90daa6bc0a1cda591</id>
<content type='text'>
3-seed random init ResMLP gives chance accuracy (~10%) but the protocol
verdict is 'trustworthy' on all 3 seeds:
  - residual norms ~8.7 across all layers (no growth, bounded)
  - BP gradient norms ~8e-3 (healthy, well above 1e-7 floor)
  - cross-batch stability 0.08-0.18 (in the BP/EP range)

This is the answer to the likely reviewer question: 'is your protocol just
flagging anything that doesn't perform well?' Answer: no. Random init is
at chance and the protocol passes it. The walked-back trained methods are
walked back because of the *measurements*, not because of the accuracy.

Notable: random init g-norms (8e-3) are actually HIGHER than BP-trained
ones (4e-4) — BP training reduces the gradient magnitude as loss decreases.
So the protocol distinguishes 3 distinct regimes: (1) untrained healthy,
(2) trained-and-still-healthy (BP/EP), (3) trained-into-pathology (DFA/SB/CB).
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
3-seed random init ResMLP gives chance accuracy (~10%) but the protocol
verdict is 'trustworthy' on all 3 seeds:
  - residual norms ~8.7 across all layers (no growth, bounded)
  - BP gradient norms ~8e-3 (healthy, well above 1e-7 floor)
  - cross-batch stability 0.08-0.18 (in the BP/EP range)

This is the answer to the likely reviewer question: 'is your protocol just
flagging anything that doesn't perform well?' Answer: no. Random init is
at chance and the protocol passes it. The walked-back trained methods are
walked back because of the *measurements*, not because of the accuracy.

Notable: random init g-norms (8e-3) are actually HIGHER than BP-trained
ones (4e-4) — BP training reduces the gradient magnitude as loss decreases.
So the protocol distinguishes 3 distinct regimes: (1) untrained healthy,
(2) trained-and-still-healthy (BP/EP), (3) trained-into-pathology (DFA/SB/CB).
</pre>
</div>
</content>
</entry>
</feed>
